Abstract

Auditing is an increasingly essential tool for the defense of computing systems, but the unwieldy nature of log data imposes tremendous burdens on administrators and analysts. To address this issue, a variety of techniques have been proposed for approximating the contents of raw audit logs, facilitating efficient storage and analysis. However, the security value of these approximated logs is difficult to measure – relative to the original log, it is unclear if these techniques retain the forensic evidence needed to effectively investigate threats. Unfortunately, prior work has only been able to investigate this issue anecdotally, demonstrating sufficient evidence is retained for specific attack scenarios.

In this work, we address this gap in the literature through formalizing metrics for quantifying the forensic validity of an approximated audit log under differing threat models. In addition to providing quantifiable security arguments for prior work, we also identify a novel point in the approximation design space – that log events describing typical (benign) system activity can be aggressively approximated, while events that encode anomalous behavior should be preserved with lossless fidelity. We instantiate this notion of Attack-Preserving forensic validity in LogApprox, a new approximation technique that eliminates the redundancy of voluminous file I/O associated with benign process activities. We systematically evaluate LogApprox alongside a corpus of exemplar approximation techniques from prior work. We demonstrate that, while LogApprox enjoys comparable log reduction rates, it is able to retain 100% of attack-associated log events; in contrast, we make the surprising discovery that prior approaches for log approximation retain as little as 7.3% of forensic evidence in certain attack scenarios. This work thus establishes trustworthy foundations for the design of the next generation of efficient auditing frameworks.