Bài báo quốc tế
Kho tri thức
/
Bài báo quốc tế
/
When datasets deceive: Exposing overlap in smart contract vulnerability detection
When datasets deceive: Exposing overlap in smart contract vulnerability detection
Trần Tiến Công
Existing smart contract vulnerability datasets exhibit over 34% train–test overlap due to repeated function-level code, causing models to favor structural memorization over semantic generalization. To mitigate this issue, we construct a benchmark dataset with zero function overlap between the training and test partitions. Furthermore, we introduce GraphFusionDetect (GFD), a novel approach that integrates fine-tuned CodeBERT embeddings with Graph Neural Networks (GNNs) to capture inter-function dependencies. GFD achieves F1-scores of 80% for detecting reentrancy vulnerabilities and 89% for timestamp dependency vulnerabilities, surpassing baseline methods and enabling more robust and generalizable vulnerability detection.
Xuất bản trên:
When datasets deceive: Exposing overlap in smart contract vulnerability detection
Ngày đăng:
2026
Nhà xuất bản:
ICT Express
Địa điểm:
Từ khoá:
Smart contracts; Vulnerability detection; Graph Neural Networks; CodeBERT; Dataset curation
Bài báo liên quan
MIST: A Multilingual Dataset and Benchmark for Fine-Grained Audio Inpainting Tampering Localization
Vũ Sơn TùngNF-DCL: Enhancing video anomaly detection with synthetic normal features and Debiased Contrastive Learning
Nguyễn Thu NgaA Workflow-Oriented Architecture Integrating Large Language Models for Automated Multi-Platform Content Management
Nguyễn Tất Thắng