Abstract
When performing automatic provenance collection within the operating system, inevitable storage overheads are made worse by the fact that much of the generated lineage is uninteresting, describing noise and background activities that lie outside the scope the system’s intended use. In this work, we propose a novel approach to policy-based provenance pruning - leverage the confinement properties provided by Mandatory Access Control (MAC) systems in order to identify subdomains of system activity for which to collect provenance. We consider the assurances of completeness that such a system could provide by sketching algorithms that reconcile provenance graphs with the information flows permitted by the MAC policy. We go on to identify the design challenges in implementing such a mechanism. In a simplified experiment, we demonstrate that adding a policy component to the Hi-Fi provenance monitor could reduce storage overhead by as much as 82%. To our knowledge, this is the first practical policy-based provenance monitor to be proposed in the literature.