Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations

Joshua Reynolds, Adam Bates and Michael Bailey.
27th European Symposium on Research in Computer Security (ESORICS'22).
Copenhagen, Denmark. September 26, 2022.
(acceptance rate=18.5%)
Best Paper Award Recipient.
Available Media
Share
tweet

Abstract

Uniform Resource Locators (URLs) are integral to the Web and have existed for nearly three decades. Yet URL parsing differs subtly among parser implementations, leading to ambiguity that can be abused by attackers. We measure agreement between widely-used URL parsers and find that each has made design decisions that deviate from parsing standards, creating a fractured implementation space where assumptions of uniform interpretation are unreliable. In some cases, deviations are severe enough that clients using different parsers will make requests to different hosts based on a single, “equivocal” URL. We systematize the thousands of differences we observed into seven pitfalls in URL parsing that application developers should beware of. Finally, we demonstrate that this ambiguity can be weaponized through misdirection attacks that evade the Google Safe Browsing and VirusTotal URL classifiers. URL parsing libraries have made a tradeoff to favor permissiveness over strict standards adherence in URL parsing. It is our hope this work will aid in motivating a systemic adoption of a more unified URL parsing standard enabling a more secure Web