Cổng tri thức PTIT

Trang chủ

Giới thiệu

AI Cộng đồng

Kho tri thức

Tin tức

Liên hệ

Bài báo quốc tế

Kho tri thức

Bài báo quốc tế

Exploring Linguistic Patterns through Machine Learning: Evidence from Logistic Regression Analysis

Nguyễn Minh Tuấn

This study examines how machine learning techniques can detect and inter pret linguistic patterns in Vietnamese text, with logistic regression used as a core baseline model. The proposed framework integrates linguistic theory with computational analysis to uncover phonological, morphological, syntactic, and semantic structures within a multi-domain Vietnamese text classification corpus. After data preprocessing, tokenization, and stopword removal, several feature extraction strategies including TF-IDF, n-grams, and linguistically enriched fea tures such as part-of-speech and morphological cues were applied to represent both surface-level and deep linguistic regularities. Multiple models, including Logistic Regression, CNN, Bi-LSTM with Attention, and a fine-tuned PhoBERT transformer, were trained and evaluated using standard classification metrics. Experimental results reveal that the Bi-LSTM with Attention model achieved the highest F1-score (0.80), outperforming both the baseline and CNN models, while PhoBERT suffered from overfitting and limited generalization. Analysis of feature weights and attention distributions further highlights meaningful depen dencies across linguistic levels, demonstrating the value of machine learning in uncovering structured linguistic insights. The findings contribute to compu tational linguistics research by providing a scalable, data-driven approach for studying linguistic patterns in low-resource languages such as Vietnamese.

Xuất bản trên:

Exploring Linguistic Patterns through Machine Learning: Evidence from Logistic Regression Analysis

Ngày đăng:

2025

DOI:

https://zenodo.org/records/17722187

Nhà xuất bản:

oeil

Địa điểm:

Từ khoá:

machine learning, logistic regression, linguistic patterns, computational linguistics, data-driven linguistics, predictive modeling, corpus analysis

Bài báo liên quan

Optimizing Resource Allocation for Dynamic IoT Requests Using Network Function Virtualization

Phạm Tuấn Minh

Effective Multi-Stage Training Model For Edge Computing Devices In Intrusion Detection

Huỳnh Trọng Thưa

Real-time phishing detection using deep learning methods by extensions

Đàm Minh Lịnh

On the Combination of Multi-Input and Self-Attention for Sign Language Recognition

Vũ Hoài Nam

Octagonal and hexadecagonal cut algorithms for finding the convex hull of finite sets with linear time complexity

Hoàng Nam Dũng

Hybrid Deep Learning and Distrust Model for Fault Detection in IoT Networks

Nguyễn Mạnh Hùng

CLASSIFICATION OF VIETNAMESE REVIEWS ON E-COMMERCE PLATFORMS

Phan Thị Hà