Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Deterministic finite automaton
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==DFA identification from labeled words== {{main|Induction of regular languages}} Given a set of ''positive'' words <math>S^+ \subset \Sigma^*</math> and a set of ''negative'' words <math>S^- \subset \Sigma^*</math> one can construct a DFA that accepts all words from <math>S^+</math> and rejects all words from <math>S^-</math>: this problem is called ''DFA identification'' (synthesis, learning). While ''some'' DFA can be constructed in linear time, the problem of identifying a DFA with the minimal number of states is NP-complete.<ref name="Complexity of Automaton Identificat">{{cite journal|last1=Gold|first1=E. M.|author1-link=E. Mark Gold|title=Complexity of Automaton Identification from Given Data|journal=Information and Control|volume=37|issue=3|pages=302β320|year=1978|doi=10.1016/S0019-9958(78)90562-4|ref=Gold78|doi-access=}}</ref> The first algorithm for minimal DFA identification has been proposed by Trakhtenbrot and Barzdin<ref>{{Cite book | url=https://books.google.com/books?id=h5XOBQAAQBAJ&pg=PP1 |title = Finite Automata: Behavior and Synthesis|isbn = 9781483297293|last1 = De Vries|first1 = A.|date = 28 June 2014| publisher=Elsevier }}</ref> and is called the ''TB-algorithm''. However, the TB-algorithm assumes that all words from <math>\Sigma</math> up to a given length are contained in either <math>S^+ \cup S^-</math>. Later, K. Lang proposed an extension of the TB-algorithm that does not use any assumptions about <math>S^+</math> and <math>S^-</math>, the ''Traxbar'' algorithm.<ref>{{Cite book |doi = 10.1145/130385.130390|chapter = Random DFA's can be approximately learned from sparse uniform examples|title = Proceedings of the fifth annual workshop on Computational learning theory - COLT '92|pages = 45β52|year = 1992|last1 = Lang|first1 = Kevin J.|isbn = 089791497X|s2cid = 7480497}}</ref> However, Traxbar does not guarantee the minimality of the constructed DFA. In his work<ref name="Complexity of Automaton Identificat"/> E.M. Gold also proposed a heuristic algorithm for minimal DFA identification. Gold's algorithm assumes that <math>S^+</math> and <math>S^-</math> contain a ''[[Characteristic Samples|characteristic set]]'' of the regular language; otherwise, the constructed DFA will be inconsistent either with <math>S^+</math> or <math>S^-</math>. Other notable DFA identification algorithms include the RPNI algorithm,<ref>{{Cite book | doi=10.1142/9789812797902_0004| chapter=Inferring Regular Languages in Polynomial Updated Time| title=Pattern Recognition and Image Analysis| volume=1| pages=49β61| series=Series in Machine Perception and Artificial Intelligence| year=1992| last1=Oncina| first1=J.| last2=GarcΓa| first2=P.| isbn=978-981-02-0881-3}}</ref> the Blue-Fringe evidence-driven state-merging algorithm,<ref>{{Cite book |doi = 10.1007/BFb0054059|chapter = Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm|title = Grammatical Inference|volume = 1433|pages = 1β12|series = Lecture Notes in Computer Science|year = 1998|last1 = Lang|first1 = Kevin J.|last2 = Pearlmutter|first2 = Barak A.|last3 = Price|first3 = Rodney A.|isbn = 978-3-540-64776-8|url = http://eprints.maynoothuniversity.ie/10250/1/BP-Results-1998.pdf}}</ref> and Windowed-EDSM.<ref>{{Cite book | url=https://dl.acm.org/doi/abs/10.5555/645519.655966 | title=Beyond EDSM {{pipe}} Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications| date=23 September 2002| pages=37β48| isbn=9783540442394| last1=Adriaans| first1=Pieter| last2=Fernau| first2=Henning| last3=Zaanen| first3=Menno van| publisher=Springer}}</ref> Another research direction is the application of [[evolutionary algorithm]]s: the smart state labeling evolutionary algorithm<ref>{{Cite journal |doi = 10.1109/TPAMI.2005.143|pmid = 16013754|title = Learning deterministic finite automata with a smart state labeling evolutionary algorithm|journal = IEEE Transactions on Pattern Analysis and Machine Intelligence|volume = 27|issue = 7|pages = 1063β1074|year = 2005|last1 = Lucas|first1 = S.M.|last2 = Reynolds|first2 = T.J.|s2cid = 14062047}}</ref> allowed to solve a modified DFA identification problem in which the training data (sets <math>S^+</math> and <math>S^-</math>) is ''noisy'' in the sense that some words are attributed to wrong classes. Yet another step forward is due to application of [[Boolean satisfiability problem|SAT]] solvers by [[Marijn Heule|Marjin J. H. Heule]] and S. Verwer: the minimal DFA identification problem is reduced to deciding the satisfiability of a Boolean formula.<ref name=HW1>{{cite conference|last1=Heule|first1=M. J. H.|title=Grammatical Inference: Theoretical Results and Applications |author-link=Marijn Heule|chapter=Exact DFA Identification Using SAT Solvers|series=Lecture Notes in Computer Science |conference=Grammatical Inference: Theoretical Results and Applications. ICGI 2010. Lecture Notes in Computer Science|date=2010|volume=6339|pages=66β79|doi=10.1007/978-3-642-15488-1_7|isbn=978-3-642-15487-4 |ref=Heule2010}}</ref> The main idea is to build an augmented prefix-tree acceptor (a [[trie]] containing all input words with corresponding labels) based on the input sets and reduce the problem of finding a DFA with <math>C</math> states to ''coloring'' the tree vertices with <math>C</math> states in such a way that when vertices with one color are merged to one state, the generated automaton is deterministic and complies with <math>S^+</math> and <math>S^-</math>. Though this approach allows finding the minimal DFA, it suffers from exponential blow-up of execution time when the size of input data increases. Therefore, Heule and Verwer's initial algorithm has later been augmented with making several steps of the EDSM algorithm prior to SAT solver execution: the DFASAT algorithm.<ref>{{Cite journal |doi = 10.1007/s10664-012-9222-z|title = Software model synthesis using satisfiability solvers|journal = Empirical Software Engineering|volume = 18|issue = 4|pages = 825β856|year = 2013|last1 = Heule|first1 = Marijn J. H.|author-link=Marijn Heule|last2 = Verwer|first2 = Sicco|hdl = 2066/103766|s2cid = 17865020|url = https://lirias.kuleuven.be/handle/123456789/370182|hdl-access = free}}</ref> This allows reducing the search space of the problem, but leads to loss of the minimality guarantee. Another way of reducing the search space has been proposed by Ulyantsev et al.<ref>{{Cite book |doi = 10.1007/978-3-319-15579-1_48|chapter = BFS-Based Symmetry Breaking Predicates for DFA Identification|title = Language and Automata Theory and Applications|volume = 8977|pages = 611β622|series = Lecture Notes in Computer Science|year = 2015|last1 = Ulyantsev|first1 = Vladimir|last2 = Zakirzyanov|first2 = Ilya|last3 = Shalyto|first3 = Anatoly|isbn = 978-3-319-15578-4}}</ref> by means of new symmetry breaking predicates based on the [[breadth-first search]] algorithm: the sought DFA's states are constrained to be numbered according to the BFS algorithm launched from the initial state. This approach reduces the search space by <math>C!</math> by eliminating isomorphic automata.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)