Cổng tri thức PTIT

Bài báo quốc tế

Kho tri thức

/

/

VIVID: A Culturally Grounded Benchmark Exposing the Figurative Language Gap in Vietnamese NLP

VIVID: A Culturally Grounded Benchmark Exposing the Figurative Language Gap in Vietnamese NLP

Đỗ Trần Tú

We present VIVID (Vietnamese Idioms for Validation and Interpretation Depth), the first systematic benchmark for evaluating culturally grounded figurative language understanding in Vietnamese. VIVID comprises 1,636 idioms and proverbs annotated with five complexity traits (literal expressions, pragmatic nuances, Sino-Vietnamese terms, uncommon vocabulary, folk knowledge) and seven semantic themes. We establish an evaluation framework com- bining generative and discriminative tasks, proposing an LLM-as-a-Judge approach with aspect-based prompting validated against human judgment (Cohen’s κ = 0.792). Evaluating eight state-of-the-art models reveals critical gaps: Vietnamese-specialized models drastically underperform multilingual systems (VinaLLaMA-7B: 0.13 vs. GPT-4o: 2.46), and even top models achieve less than 50% correctness on average. Notably, few-shot prompting does not universally improve performance, with GPT-4o exhibiting degradation due to stylistic overfitting. Our analysis exposes systematic failures including literal over-interpretation, lexical gaps, and pragmatic flattening, demonstrating that current models lack cultural competence for nuanced figurative interpretation. VIVID provides an essential tool for advancing figurative language understanding in culturally rich contexts. We release codes and datasets at https://github.com/ReML-AI/VIVID.

Xuất bản trên:

VIVID: A Culturally Grounded Benchmark Exposing the Figurative Language Gap in Vietnamese NLP

Ngày đăng:

DOI:


Nhà xuất bản:

Địa điểm:


Từ khoá:

Vietnamese Benchmark, Figurative Language Evaluation, Idioms and Proverbs