Abstract

The European Union (EU) General Data Protection Regula- tion (GDPR) has expanded data privacy regulations regarding personal data for over half a billion EU citizens. Given the regulation’s effectively global scope and its significant penalties for non-compliance, systems that store or process personal data in increasingly complex workflows will need to demonstrate how data were generated and used. In this paper, we analyze the GDPR text to explicitly identify a set of central challenges for GDPR compliance for which data provenance is applicable; we introduce a data provenance model for representing GDPR workflows; and we present design patterns that demonstrate how data provenance can be used realistically to help in verifying GDPR compliance. We also discuss open questions about what will be practically necessary for a provenance-driven system to be suitable under the GDPR.