Cổng tri thức PTIT

Bài báo quốc tế

Kho tri thức

/

/

An Object Detection Framework Based on Relationship Between Objects in an Open Vocabulary Using Owl-VIT And RelTransformer

An Object Detection Framework Based on Relationship Between Objects in an Open Vocabulary Using Owl-VIT And RelTransformer

Nguyễn Thị Nguyệt

Object detection has been widely adopted across various applications, but traditional methods mainly provide isolated object locations without capturing their relationships. To address this limitation, we propose a system that detects both objects and their relationships within images based on natural language queries. The approach integrates OWL-ViT for open-vocabulary object detection, RelTR for relationship inference, and Large Language Models (LLMs) for query processing and language understanding. Unlike fixed-vocabulary models, OWL-ViT enables detection from free-text descriptions, improving generalization and supporting flexible user queries. Experimental results show that the proposed framework can localize objects and infer their relations with a recognition accuracy of 27%, demonstrating its potential for intelligent systems such as query-driven surveillance and human–machine interaction. This work is not merely a combination of existing models, but a deliberate integration designed to address the novel challenge of detecting object relationships from natural language queries.

Xuất bản trên:

An Object Detection Framework Based on Relationship Between Objects in an Open Vocabulary Using Owl-VIT And RelTransformer

Ngày đăng:

DOI:


Nhà xuất bản:

Địa điểm:


Từ khoá:

Object Detection, Relationship Detection, Open Vocabulary, OWLViT, Rel-Transformer.