RWTH Aachen University
Automata processing words and trees over infinite alphabets are attracting significant interest from the database and verification communities, since they can be often used as low-level formalisms for representing and reasoning about data streams, program traces, and serializations of structured documents. The authors provide a Myhill-Nerode-like theorem that characterizes the class of data languages recognized by Deterministic finite-Memory Automata (DMA). As a byproduct of this characterization result, they obtain a canonical representation for any DMA-recognizable language. They then show that this canonical automaton is minimal in a strong sense: it has the minimal number of control states and also the minimal amount of internal storage.