Cổng tri thức PTIT

Bài báo quốc tế

Kho tri thức

/

/

When datasets deceive: Exposing overlap in smart contract vulnerability detection

When datasets deceive: Exposing overlap in smart contract vulnerability detection

Trần Tiến Công

Existing smart contract vulnerability datasets exhibit over 34% train–test overlap due to repeated function-level code, causing models to favor structural memorization over semantic generalization. To mitigate this issue, we construct a benchmark dataset with zero function overlap between the training and test partitions. Furthermore, we introduce GraphFusionDetect (GFD), a novel approach that integrates fine-tuned CodeBERT embeddings with Graph Neural Networks (GNNs) to capture inter-function dependencies. GFD achieves F1-scores of 80% for detecting reentrancy vulnerabilities and 89% for timestamp dependency vulnerabilities, surpassing baseline methods and enabling more robust and generalizable vulnerability detection.

Xuất bản trên:

When datasets deceive: Exposing overlap in smart contract vulnerability detection


Nhà xuất bản:

ICT Express

Địa điểm:


Từ khoá:

Smart contracts; Vulnerability detection; Graph Neural Networks; CodeBERT; Dataset curation