Bài báo quốc tế
Kho tri thức
/
Bài báo quốc tế
/
An Object Detection Framework Based on Relationship Between Objects in an Open Vocabulary Using Owl-VIT And RelTransformer
An Object Detection Framework Based on Relationship Between Objects in an Open Vocabulary Using Owl-VIT And RelTransformer
Nguyễn Thị Nguyệt
Object detection has been widely adopted across various applications, but traditional methods mainly provide isolated object locations without capturing their relationships. To address this limitation, we propose a system that detects both objects and their relationships within images based on natural language queries. The approach integrates OWL-ViT for open-vocabulary object detection, RelTR for relationship inference, and Large Language Models (LLMs) for query processing and language understanding. Unlike fixed-vocabulary models, OWL-ViT enables detection from free-text descriptions, improving generalization and supporting flexible user queries. Experimental results show that the proposed framework can localize objects and infer their relations with a recognition accuracy of 27%, demonstrating its potential for intelligent systems such as query-driven surveillance and human–machine interaction. This work is not merely a combination of existing models, but a deliberate integration designed to address the novel challenge of detecting object relationships from natural language queries.
Bài báo liên quan
FA-Net: A Dual-Branch Attention Architecture for Extracting Fine-Grained Anatomical Features of Wood
Ma Công ThànhTinyCDAE: Lightweight Convolutional Denoising Autoencoders for Real-Time Image Denoising on Resource-Constrained IoT Devices
Nguyễn Trọng HuânEstimation of External Government Debt Thresholds: The Case of Vietnam
Đặng Thị Huyền AnhA Survey on Methods of Applying Transformers to Non-NLP Applications
Nguyễn Trung Hiếu