Editing Deterministic finite automaton (section)

==DFA identification from labeled words==
{{main|Induction of regular languages}}
Given a set of ''positive'' words <math>S^+ \subset \Sigma^*</math> and a set of ''negative'' words <math>S^- \subset \Sigma^*</math> one can construct a DFA that accepts all words from <math>S^+</math> and rejects all words from <math>S^-</math>: this problem is called ''DFA identification'' (synthesis, learning).
While ''some'' DFA can be constructed in linear time, the problem of identifying a DFA with the minimal number of states is NP-complete.<ref name="Complexity of Automaton Identificat">{{cite journal|last1=Gold|first1=E. M.|author1-link=E. Mark Gold|title=Complexity of Automaton Identification from Given Data|journal=Information and Control|volume=37|issue=3|pages=302–320|year=1978|doi=10.1016/S0019-9958(78)90562-4|ref=Gold78|doi-access=}}</ref>
The first algorithm for minimal DFA identification has been proposed by Trakhtenbrot and Barzdin<ref>{{Cite book | url=https://books.google.com/books?id=h5XOBQAAQBAJ&pg=PP1 |title = Finite Automata: Behavior and Synthesis|isbn = 9781483297293|last1 = De Vries|first1 = A.|date = 28 June 2014| publisher=Elsevier }}</ref> and is called the ''TB-algorithm''.
However, the TB-algorithm assumes that all words from <math>\Sigma</math> up to a given length are contained in either <math>S^+ \cup S^-</math>.

Later, K. Lang proposed an extension of the TB-algorithm that does not use any assumptions about <math>S^+</math> and <math>S^-</math>, the ''Traxbar'' algorithm.<ref>{{Cite book |doi = 10.1145/130385.130390|chapter = Random DFA's can be approximately learned from sparse uniform examples|title = Proceedings of the fifth annual workshop on Computational learning theory - COLT '92|pages = 45–52|year = 1992|last1 = Lang|first1 = Kevin J.|isbn = 089791497X|s2cid = 7480497}}</ref>
However, Traxbar does not guarantee the minimality of the constructed DFA.
In his work<ref name="Complexity of Automaton Identificat"/> E.M. Gold also proposed a heuristic algorithm for minimal DFA identification.
Gold's algorithm assumes that <math>S^+</math> and <math>S^-</math> contain a ''[[Characteristic Samples|characteristic set]]'' of the regular language; otherwise, the constructed DFA will be inconsistent either with <math>S^+</math> or <math>S^-</math>.
Other notable DFA identification algorithms include the RPNI algorithm,<ref>{{Cite book | doi=10.1142/9789812797902_0004| chapter=Inferring Regular Languages in Polynomial Updated Time| title=Pattern Recognition and Image Analysis| volume=1| pages=49–61| series=Series in Machine Perception and Artificial Intelligence| year=1992| last1=Oncina| first1=J.| last2=García| first2=P.| isbn=978-981-02-0881-3}}</ref> the Blue-Fringe evidence-driven state-merging algorithm,<ref>{{Cite book |doi = 10.1007/BFb0054059|chapter = Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm|title = Grammatical Inference|volume = 1433|pages = 1–12|series = Lecture Notes in Computer Science|year = 1998|last1 = Lang|first1 = Kevin J.|last2 = Pearlmutter|first2 = Barak A.|last3 = Price|first3 = Rodney A.|isbn = 978-3-540-64776-8|url = http://eprints.maynoothuniversity.ie/10250/1/BP-Results-1998.pdf}}</ref> 
and Windowed-EDSM.<ref>{{Cite book | url=https://dl.acm.org/doi/abs/10.5555/645519.655966 | title=Beyond EDSM {{pipe}} Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications| date=23 September 2002| pages=37–48| isbn=9783540442394| last1=Adriaans| first1=Pieter| last2=Fernau| first2=Henning| last3=Zaanen| first3=Menno van| publisher=Springer}}</ref>
Another research direction is the application of [[evolutionary algorithm]]s: the smart state labeling evolutionary algorithm<ref>{{Cite journal |doi = 10.1109/TPAMI.2005.143|pmid = 16013754|title = Learning deterministic finite automata with a smart state labeling evolutionary algorithm|journal = IEEE Transactions on Pattern Analysis and Machine Intelligence|volume = 27|issue = 7|pages = 1063–1074|year = 2005|last1 = Lucas|first1 = S.M.|last2 = Reynolds|first2 = T.J.|s2cid = 14062047}}</ref> allowed to solve a modified DFA identification problem in which the training data (sets <math>S^+</math> and <math>S^-</math>) is ''noisy'' in the sense that some words are attributed to wrong classes.

Yet another step forward is due to application of [[Boolean satisfiability problem|SAT]] solvers by [[Marijn Heule|Marjin J. H. Heule]] and S. Verwer: the minimal DFA identification problem is reduced to deciding the satisfiability of a Boolean formula.<ref name=HW1>{{cite conference|last1=Heule|first1=M. J. H.|title=Grammatical Inference: Theoretical Results and Applications |author-link=Marijn Heule|chapter=Exact DFA Identification Using SAT Solvers|series=Lecture Notes in Computer Science |conference=Grammatical Inference: Theoretical Results and Applications. ICGI 2010. Lecture Notes in Computer Science|date=2010|volume=6339|pages=66–79|doi=10.1007/978-3-642-15488-1_7|isbn=978-3-642-15487-4 |ref=Heule2010}}</ref> The main idea is to build an augmented prefix-tree acceptor (a [[trie]] containing all input words with corresponding labels) based on the input sets and reduce the problem of finding a DFA with <math>C</math> states to ''coloring'' the tree vertices with <math>C</math> states in such a way that when vertices with one color are merged to one state, the generated automaton is deterministic and complies with <math>S^+</math> and <math>S^-</math>.
Though this approach allows finding the minimal DFA, it suffers from exponential blow-up of execution time when the size of input data increases.
Therefore, Heule and Verwer's initial algorithm has later been augmented with making several steps of the EDSM algorithm prior to SAT solver execution: the DFASAT algorithm.<ref>{{Cite journal |doi = 10.1007/s10664-012-9222-z|title = Software model synthesis using satisfiability solvers|journal = Empirical Software Engineering|volume = 18|issue = 4|pages = 825–856|year = 2013|last1 = Heule|first1 = Marijn J. H.|author-link=Marijn Heule|last2 = Verwer|first2 = Sicco|hdl = 2066/103766|s2cid = 17865020|url = https://lirias.kuleuven.be/handle/123456789/370182|hdl-access = free}}</ref>
This allows reducing the search space of the problem, but leads to loss of the minimality guarantee.
Another way of reducing the search space has been proposed by Ulyantsev et al.<ref>{{Cite book |doi = 10.1007/978-3-319-15579-1_48|chapter = BFS-Based Symmetry Breaking Predicates for DFA Identification|title = Language and Automata Theory and Applications|volume = 8977|pages = 611–622|series = Lecture Notes in Computer Science|year = 2015|last1 = Ulyantsev|first1 = Vladimir|last2 = Zakirzyanov|first2 = Ilya|last3 = Shalyto|first3 = Anatoly|isbn = 978-3-319-15578-4}}</ref> by means of new symmetry breaking predicates based on the [[breadth-first search]] algorithm:
the sought DFA's states are constrained to be numbered according to the BFS algorithm launched from the initial state. This approach reduces the search space by <math>C!</math> by eliminating isomorphic automata.