LogMerge: improved log parsing based on two-step clustering combined with low-level token processing
Logs are a crucial source of data, containing a vast amount of information that reflects the real-time operational status of systems, and are widely used in cybersecurity, system monitoring, and fault diagnosis. Log parsing is the first and most essential step in automated log analysis, aiming to transform semi-structured log data into a structured format. However, due to the large volume and heterogeneous structure of log data, existing parsing methods face significant challenges in accurately identifying log structures and extracting parameters, often leading to over-generalization or over-specialization during the parsing process. To address these limitations, this study proposes a log parsing approach named LogMerge, which integrates heuristic techniques with an efficient two-step clustering strategy. First, LogMerge leverages sets of special delimiters to further split tokens into subtokens, enabling the transformation and handling of variations within tokens. Then, LogMerge applies a two-step clustering process: in the first step, logs are grouped into smaller clusters based on token counts and common substrings; in the second step, a fixed-depth parsing tree is employed to merge these small clusters into larger ones, from which appropriate log templates are extracted. Experimental results on 14 datasets demonstrate that the proposed method achieves superior parsing accuracy, with evaluation metrics exceeding 0.8 on 10 datasets. The extracted log templates are closer to the ground-truth templates while mitigating both over-specialization and over-generalization compared to existing log parsers.
Xuất bản trên:
LogMerge: improved log parsing based on two-step clustering combined with low-level token processing
Nhà xuất bản:
International Journal of Information Technology
Từ khoá:
Clustering algorithm, Heuristic algorithm, Log analysis, Log parsing, Low-level token