From Regexes to Parsing Expression Grammars
Most scripting languages now-a-days use regex pattern-matching libraries. These regex libraries borrow the syntax of regular expressions, but have an in-formal semantics that is different from the semantics of regular expressions, removing the commutativity of alternation and adding ad-hoc extensions that cannot be expressed by formalisms for efficient recognition of regular languages, such as deterministic finite automata. Parsing expression grammars are a formalism that can describe all deterministic context-free languages and has a simple computational model. In this paper, the authors present a formalization of regexes via transformation to parsing expression grammars. The proposed transformation easily accommodates several of the common regex extensions, giving a formal meaning to them.