Cổng tri thức PTIT

Trang chủ

Giới thiệu

AI Cộng đồng

Kho tri thức

Tin tức

Liên hệ

Bài báo quốc tế

Kho tri thức

Bài báo quốc tế

DistilBERT for Efficient and Accurate Email Phishing Detection: A Benchmark Against Machine and Deep Learning Models

Đàm Minh Lịnh

Email phishing remains a persistent cybersecurity threat that exploits human vulnerabilities, often evading technical safeguards. While machine learning (ML) and deep learning (DL) have been widely applied for phishing detection, systematic benchmarks comparing lightweight transformer models with traditional approaches remain limited. This study addresses this gap by evaluating six models—Naïve Bayes, Random Forest, XGBoost, LSTM, BiLSTM, and a fine-tuned DistilBERT—on a real-world dataset of 17,538 emails using three train-test splits (60:40, 70:30, 80:20). DistilBERT consistently outperforms all baselines across all splits. Under the 80:20 split, it achieves the highest accuracy (98.77%), precision (99.10%), recall (98.97%), F1-score (99.02%), and AUC (99.91%). Remarkably, it maintains low computational overhead with a training time of 342 seconds, demonstrating an optimal trade-off between detection accuracy and efficiency. In contrast, BiLSTM, the best-performing recurrent model, reaches 97.43% accuracy but produces more false negatives—a more critical security risk than false positives in phishing detection. Additional experiments reveal that DistilBERT maintains stable performance across different data splits, with AUC values consistently above 0.998. The confusion matrix analysis shows that DistilBERT misclassifies only 25 legitimate emails as phishing (false positives) and misses only 23 phishing emails (false negatives), significantly outperforming all baseline models. These findings demonstrate that lightweight transformer models like DistilBERT offer a practical, scalable, and cost-effective solution for real-time phishing email detection, effectively bridging the gap between high accuracy and production-ready deployability.

Xuất bản trên:

DistilBERT for Efficient and Accurate Email Phishing Detection: A Benchmark Against Machine and Deep Learning Models

Ngày đăng:

2026

DOI:

https://doi.org/10.18280/isi.310321

Nhà xuất bản:

Ingénierie des Systèmes d’Information

Địa điểm:

Từ khoá:

phishing email detection, DistilBERT, lightweight transformer, comparative benchmarking, computational efficiency, real-time cybersecurity

Bài báo liên quan

From Public Benchmarks to a Low-Resource Target Domain: A Comparative Study of Wood Surface Defect Detection

Nguyễn Trọng Khánh

WiT: Wood Species Identification via a Hybrid CNN–Transformer With Query-Guided Cross-Attention

Ma Công Thành

OnDeploying Bilinear Neural Network Method to Various Solutions of (3+1)-dimensional Potential Yu-Toda-Sasa-Fukuyama Equation

Nguyễn Minh Tuấn

Adaptive Cloud–Edge Coordination for Real-Time Phishing URL Detection With Distributed Caching and ONNX-Based Inference

Đàm Minh Lịnh

A novel entropy Autoencoder-Synchronized Hashing Semi-supervised network for robust Android malware identification

Nguyễn Huy Trung

Toward Robust Malware Detection: A Survey of Datasets, Techniques, and Practical Challenges

Huỳnh Trọng Thưa

Transfer Learning with Particle Swarm Optimization for Durian LeafDisease Image Classiﬁcation

Trần Nguyễn Phi Hùng