Cổng tri thức PTIT

Bài báo quốc tế

Kho tri thức

/

/

AgriDetectVL: Emphasizes the Agriculture-Focused Application Combined With Visual–Language Integration

AgriDetectVL: Emphasizes the Agriculture-Focused Application Combined With Visual–Language Integration

Vũ Hoài Nam

Counterfeit-product monitoring in agriculture demands models that exploit temporal context, accept operator feedback, and run under tight compute budgets. We introduce AgriDetectVL, an interactive, resource-efficient vision–language model that fuses time-series imagery with human inputs. AgriDetectVL couples an efficient visual backbone with a lightweight Sequence Prompt Transformer that summarizes recent observations and feedback into compact prompts. Class names and domain phrases are encoded as text prototypes, and images are mapped into a shared, L2-normalized space; decisions are made by temperature-scaled cosine scoring, enabling single-pass, low-latency inference and straightforward zero/few-shot extension. Evaluated on TLU-Fruit (fine-grained varieties) and TLU-States (state/ripeness), AgriDetectVL consistently surpasses strong Convolutional Neural Network (CNN), transformer, and Vision-Language Model (VLM) baselines across F1-Score (F1), accuracy, Area Under the ROC Curve (AUC), and Matthews Correlation Coefficient (MCC), while meeting edge-device constraints. Ablations confirm that sequence-aware prompting and prototype guidance are the primary sources of gain. In longitudinal tests, human-in-the-loop operation reduces manual corrections over time, indicating practical readiness for field deployment. Code is available at: https://github.com/NguyenAnhDucIT/AgriDetectVL.

Xuất bản trên:

AgriDetectVL: Emphasizes the Agriculture-Focused Application Combined With Visual–Language Integration


Nhà xuất bản:

IEEE Access

Địa điểm:


Từ khoá:

Counterfeit agricultural detection, computer vision, image processing, visual-language model.