Cổng tri thức PTIT

Bài báo quốc tế

Kho tri thức

/

/

ViT-RBTF: Integrating Vision Transformers with Randomized Binary Tree Forest for Real-Time Image Search

ViT-RBTF: Integrating Vision Transformers with Randomized Binary Tree Forest for Real-Time Image Search

Châu Văn Vân

Content-based image retrieval (CBIR) at web and enterprise scale requires features that capture global context while meeting strict latency targets. Vision Transformers (ViTs) provide strong global representations, yet their computational footprint can hinder real-time deployment [1]. Meanwhile, randomized tree ensembles remain attractive for low-latency indexing and robust generalization [3]. This paper presents ViT-RBTF, a hybrid framework that couples a lightweight ViT extractor with deep hashing and a randomized binary tree forest index. The extractor produces compact binary codes, while the forest prunes candidates in logarithmic time before Hamming re-ranking. Across CIFAR-10, Corel-1K, and ImageNet100 subsets, ViT-RBTF improves mean Average Precision (mAP) by 5.8–7.6% over strong ViT-hashing baselines while reducing median query latency by 35–48% under identical hardware budgets. Ablation studies show that perception-mixing attention and token-aware modules enhance global–local fusion, forest depth and tree count control the accuracy–speed trade-off, and feature-importance filtering stabilizes cross-domain retrieval. These results indicate that ViT-RBTF is an effective method for real-time image search systems, especially in scenarios requiring both high speed and high accuracy, and provide a simple, scalable path toward privacy-aware cloud environments and multimodal retrieval

Xuất bản trên:

ViT-RBTF: Integrating Vision Transformers with Randomized Binary Tree Forest for Real-Time Image Search


Nhà xuất bản:

Địa điểm:


Từ khoá:

ViT, Randomized Binary Tree Forest, Content-Based Image Retrieval, Deep Hashing, Real-Time Image Search.