Cổng tri thức PTIT

Bài báo quốc tế

Kho tri thức

/

/

DistilBERT for Efficient and Accurate Email Phishing Detection: A Benchmark Against Machine and Deep Learning Models

DistilBERT for Efficient and Accurate Email Phishing Detection: A Benchmark Against Machine and Deep Learning Models

Đàm Minh Lịnh

Email phishing remains a persistent cybersecurity threat that exploits human vulnerabilities, often evading technical safeguards. While machine learning (ML) and deep learning (DL) have been widely applied for phishing detection, systematic benchmarks comparing lightweight transformer models with traditional approaches remain limited. This study addresses this gap by evaluating six models—Naïve Bayes, Random Forest, XGBoost, LSTM, BiLSTM, and a fine-tuned DistilBERT—on a real-world dataset of 17,538 emails using three train-test splits (60:40, 70:30, 80:20). DistilBERT consistently outperforms all baselines across all splits. Under the 80:20 split, it achieves the highest accuracy (98.77%), precision (99.10%), recall (98.97%), F1-score (99.02%), and AUC (99.91%). Remarkably, it maintains low computational overhead with a training time of 342 seconds, demonstrating an optimal trade-off between detection accuracy and efficiency. In contrast, BiLSTM, the best-performing recurrent model, reaches 97.43% accuracy but produces more false negatives—a more critical security risk than false positives in phishing detection. Additional experiments reveal that DistilBERT maintains stable performance across different data splits, with AUC values consistently above 0.998. The confusion matrix analysis shows that DistilBERT misclassifies only 25 legitimate emails as phishing (false positives) and misses only 23 phishing emails (false negatives), significantly outperforming all baseline models. These findings demonstrate that lightweight transformer models like DistilBERT offer a practical, scalable, and cost-effective solution for real-time phishing email detection, effectively bridging the gap between high accuracy and production-ready deployability.

Xuất bản trên:

DistilBERT for Efficient and Accurate Email Phishing Detection: A Benchmark Against Machine and Deep Learning Models


Nhà xuất bản:

Ingénierie des Systèmes d’Information

Địa điểm:


Từ khoá:

phishing email detection, DistilBERT, lightweight transformer, comparative benchmarking, computational efficiency, real-time cybersecurity