pts2024

Gabriel Loiseau

PhD Student at Hornet Security

The speaker's profile picture

Sessions

07-05
09:00
20min
Fighting phishing by introducing WikiPhish: a new public dataset based on Wikipedia for legit URLs
Gabriel Loiseau

Over the last decades, the proliferation of phishing websites has emerged as a significant cybersecurity threat, necessitating greater attention and research. These deceptive online websites, designed to mimic legitimate websites, aim to trick unsuspecting users into divulging sensitive information such as usernames, passwords, and financial details. Understanding the mechanics and prevalence of these malicious sites is crucial for developing effective countermeasures and safeguarding users' online security. Supervised machine learning models have become the standard for phishing detection, offering prediction capacities to security systems. These models rely largely on annotated data for their training, evaluation and ongoing maintenance. Thus, there exist a need for the efficient gathering of such annotated data to improve phishing detection methodologies.

In this talk, we will introduce WikiPhish, a novel, renewable, and open-access dataset for phishing website classification. WikiPhish consists of 110,606 webpages sourced from URLs drawn from Wikipedia's references alongside renowned phishing databases OpenPhish and PhishTank. The dataset is designed to address the challenges of phishing detection by leveraging Wikipedia's contribution verification and wide-ranging content. This allows the development of phishing detection models on a strong foundational baseline that can evolve overtime.

Phishing
Amphitheater