Cổng tri thức PTIT

Bài báo quốc tế

Kho tri thức

/

/

Real-time phishing uniform resource locator detection based on hybrid embedding transformer and retraining-free inferencing

Real-time phishing uniform resource locator detection based on hybrid embedding transformer and retraining-free inferencing

Đàm Minh Lịnh

Phishing attacks that evade traditional detection mechanisms by exploiting deceptive uniform resource locators (URLs) remain a significant cybersecurity threat. This study proposes an adaptive phishing URL detection framework that integrates Levenshtein distance-based string similarity, a hybrid embedding transformer (HET) encoder-based server-side verification mechanism, and a dynamically updated local blacklist. First, a rapid local lookup is executed to identify known phishing URLs. If the input URL is absent from the blacklist, the Levenshtein distance algorithm detects subtle character-level variations, identifying typosquatting and obfuscation effectively. For ambiguous cases, the HET-based module uses a lightweight post-hoc inference method that classifies URL embeddings via k-nearest neighbor voting based on Euclidean similarity in the latent space, thereby avoiding retraining and enabling real-time adaptation to emerging phishing threats. Confirmed phishing URLs are added iteratively to the local repository to improve detection continuously, enhancing future classification accuracy. Experimental evaluation on a large-scale dataset comprising 235,795 URLs revealed that the proposed method outperforms state-of-the-art approaches, achieving a detection accuracy of 99.8 %, with a falsepositive rate of 0.441 % and false-negative rate of 0.0617 %. Additionally, real-time validation using a Chrome browser extension confirmed rapid processing, with an average processing time of 4.43–6.84 ms per URL on a dataset comprising 5,000 URLs. These results highlight the efficiency of the proposed framework in real-world cybersecurity contexts, enabling high detection accuracy, fast response times, and adaptability to evolving phishing strategies, and underscore the importance of proactive threat intelligence and real-time phishing mitigation in developing scalable, high-performance security infrastructures.

Xuất bản trên:

Real-time phishing uniform resource locator detection based on hybrid embedding transformer and retraining-free inferencing


Nhà xuất bản:

Computers and Electrical Engineering

Địa điểm:


Từ khoá:

Artificial intelligence, Cybersecurity, Hybrid embedding Transformer, Levenshtein distance, Retraining-free inferencing, URL data classification