Bài báo quốc tế
Kho tri thức
/
Bài báo quốc tế
/
Exploring Linguistic Patterns through Machine Learning: Evidence from Logistic Regression Analysis
Exploring Linguistic Patterns through Machine Learning: Evidence from Logistic Regression Analysis
Nguyễn Minh Tuấn
This study examines how machine learning techniques can detect and inter
pret linguistic patterns in Vietnamese text, with logistic regression used as a
core baseline model. The proposed framework integrates linguistic theory with
computational analysis to uncover phonological, morphological, syntactic, and
semantic structures within a multi-domain Vietnamese text classification corpus.
After data preprocessing, tokenization, and stopword removal, several feature
extraction strategies including TF-IDF, n-grams, and linguistically enriched fea
tures such as part-of-speech and morphological cues were applied to represent
both surface-level and deep linguistic regularities. Multiple models, including
Logistic Regression, CNN, Bi-LSTM with Attention, and a fine-tuned PhoBERT
transformer, were trained and evaluated using standard classification metrics.
Experimental results reveal that the Bi-LSTM with Attention model achieved
the highest F1-score (0.80), outperforming both the baseline and CNN models,
while PhoBERT suffered from overfitting and limited generalization. Analysis of
feature weights and attention distributions further highlights meaningful depen
dencies across linguistic levels, demonstrating the value of machine learning
in uncovering structured linguistic insights. The findings contribute to compu
tational linguistics research by providing a scalable, data-driven approach for
studying linguistic patterns in low-resource languages such as Vietnamese.
Xuất bản trên:
Exploring Linguistic Patterns through Machine Learning: Evidence from Logistic Regression Analysis
Ngày đăng:
2025
Nhà xuất bản:
oeil
Địa điểm:
Từ khoá:
machine learning, logistic regression, linguistic patterns, computational linguistics, data-driven linguistics, predictive modeling, corpus analysis
Bài báo liên quan
Optimizing Resource Allocation for Dynamic IoT Requests Using Network Function Virtualization
Phạm Tuấn MinhEffective Multi-Stage Training Model For Edge Computing Devices In Intrusion Detection
Huỳnh Trọng Thưa