Bài báo quốc tế
Kho tri thức
/
Bài báo quốc tế
/
AgriDetectVL: Emphasizes the Agriculture-Focused Application Combined With Visual–Language Integration
AgriDetectVL: Emphasizes the Agriculture-Focused Application Combined With Visual–Language Integration
Vũ Hoài Nam
Counterfeit-product monitoring in agriculture demands models that exploit temporal context, accept operator feedback, and run under tight compute budgets. We introduce AgriDetectVL, an interactive, resource-efficient vision–language model that fuses time-series imagery with human inputs. AgriDetectVL couples an efficient visual backbone with a lightweight Sequence Prompt Transformer that summarizes recent observations and feedback into compact prompts. Class names and domain phrases are encoded as text prototypes, and images are mapped into a shared, L2-normalized space; decisions are made by temperature-scaled cosine scoring, enabling single-pass, low-latency inference and straightforward zero/few-shot extension. Evaluated on TLU-Fruit (fine-grained varieties) and TLU-States (state/ripeness), AgriDetectVL consistently surpasses strong Convolutional Neural Network (CNN), transformer, and Vision-Language Model (VLM) baselines across F1-Score (F1), accuracy, Area Under the ROC Curve (AUC), and Matthews Correlation Coefficient (MCC), while meeting edge-device constraints. Ablations confirm that sequence-aware prompting and prototype guidance are the primary sources of gain. In longitudinal tests, human-in-the-loop operation reduces manual corrections over time, indicating practical readiness for field deployment. Code is available at: https://github.com/NguyenAnhDucIT/AgriDetectVL.
Xuất bản trên:
AgriDetectVL: Emphasizes the Agriculture-Focused Application Combined With Visual–Language Integration
Ngày đăng:
2025
Nhà xuất bản:
IEEE Access
Địa điểm:
Từ khoá:
Counterfeit agricultural detection, computer vision, image processing, visual-language model.
