Cổng tri thức PTIT

Bài báo quốc tế

Kho tri thức

/

/

Towards Universal Segmentation for Log Parsing

Towards Universal Segmentation for Log Parsing

Lê Văn Hoàng

Log parsing is a crucial step in log analysis, as it transforms unstructured log messages into structured data required by various downstream analysis tasks. The sheer volume of log data generated by modern software systems motivates the development of numerous log parsing techniques in the literature. However, existing log parsers still suffer from unsatisfactory accuracy, which may significantly affect the follow-up analysis such as log-based anomaly detection. We have identified two main limitations that hinder the effectiveness of existing log parsing methods: (1) under-segmentation: most log parsers leverage a fixed, predefined set of delimiters to separate a log message into a set of tokens, which may fail to split log messages correctly due to the heterogeneity of logging formats; (2) over-segmentation: using too many delimiters may lead to the over-segmentation issue, which fragments meaningful units in log messages and makes it difficult to accurately identify templates and parameters. To address these limitations, we propose SCLog, a novel syntax- and contextual-aware segmentation approach for log parsing. SCLog leverages a comprehensive set of syntax-based heuristics to segment log messages into coarse-grained tokens. To further tokenize log messages into fine-grained tokens, SCLog mines the structural patterns of tokens based on their surrounding contexts to identify the optimal delimiters for each token dynamically. We evaluate SCLog on widely-used, large-scale Loghub-2.0 datasets. The results demonstrate that SCLog significantly outperforms state-of-the-art log parsers in terms of parsing accuracy and robustness across diverse datasets.

Xuất bản trên:

Towards Universal Segmentation for Log Parsing


Nhà xuất bản:

Địa điểm:


Từ khoá:

Log Parsing, Segmentation, Syntactic Analysis, Structural Patterns