Bài báo quốc tế
Kho tri thức
/
Bài báo quốc tế
/
Temporal Degradation in Machine Learning-Based Malware Detection: A Multi-Dataset, Multi-Year Empirical Study
Temporal Degradation in Machine Learning-Based Malware Detection: A Multi-Dataset, Multi-Year Empirical Study
Huỳnh Trọng Thưa
Machine-learning malware detectors achieve near-perfect deployment accuracy yet silently degrade as threats evolve. We present a multi-dataset temporal study of this concept drift on 1.68 million Portable-Executable samples from EMBER 2017, EMBER 2018, and BODMAS (2019–2020), unified in the EMBER v2 feature space and analyzed with three classifier families (LightGBM, Random Forest, MLP) across nine experimental dimensions: in-era baselines, cross-era transfer, monthly drift tracking, incremental retraining, family-level false-negative decomposition, feature-group sensitivity, cumulative Area-Under-Time (AUT) analysis, drift-triggered retraining (ADWIN, DDM), and active-learning sample selection, with 10-seed statistical validation. Six findings emerge. (1) Forward degradation is asymmetric: under a strict appeared-year split, training on 2017 data loses 8.47 percentage points (pp) F1 on 2018 data (LightGBM, 10 seeds), whereas the reverse direction shows no degradation. (2) Unseen malware families dominate failures, with false-negative rates up to 23.92% and same-month ratios exceeding 30× relative to known families in the strongest case. (3) Cross-era robustness is feature-group dependent: SectionInfo and ImportsInfo dominate transfer (+0.85 and +0.37 pp respectively when retained), while HeaderFileInfo and StringExtractor act as temporal artifacts—zeroing them improves cross-era F1 by 0.67 and 0.47 pp respectively. (4) Incremental retraining with only 1% newly labeled data gains +0.56 pp cumulative AUT over a static baseline. (5) ADWIN/DDM-triggered retraining matches that AUT within 0.07–0.13 pp on LightGBM while issuing ∼33–35% fewer retrains, exposing a label-budget vs. accuracy trade-off. (6) Uncertainty sampling delivers a +0.76 pp AUT improvement over random sampling at identical labeling cost (p = 0.0020, Wilcoxon). Together the results form a five-way mitigation ladder—static, fixed 1%/month, ADWIN-triggered, DDM-triggered, and uncertainty-sampled—that practitioners can position along their labeling-budget and AUT requirements.
Xuất bản trên:
Temporal Degradation in Machine Learning-Based Malware Detection: A Multi-Dataset, Multi-Year Empirical Study
Ngày đăng:
2026
Nhà xuất bản:
IEEE Access
Địa điểm:
Từ khoá:
Concept drift , malware detection , machine learning , temporal analysis , PE malware , intrusion detection systems
Bài báo liên quan
EDIL-SegRayDP: Training-Free Iris Segmentation via Segmentation-First Ray-Wise Dynamic Programming
Huỳnh Trọng ThưaPolar Topology Transformers With Anatomical Skip Connections for Efficient Iris Segmentation
Huỳnh Trọng ThưaFrom Public Benchmarks to a Low-Resource Target Domain: A Comparative Study of Wood Surface Defect Detection
Nguyễn Trọng KhánhWiT: Wood Species Identification via a Hybrid CNN–Transformer With Query-Guided Cross-Attention
Ma Công Thành