Editing Byzantine fault (section)

== Solutions ==

Several early solutions were described by Lamport, Shostak, and Pease in 1982.<ref name=BGP_Paper/> They began by noting that the Generals' Problem can be reduced to solving a "Commander and Lieutenants" problem where loyal Lieutenants must all act in unison and that their action must correspond to what the Commander ordered in the case that the Commander is loyal:

* One solution considers scenarios in which messages may be forged, but which will be ''Byzantine-fault-tolerant'' as long as the number of disloyal generals is less than one third of the generals. The impossibility of dealing with one-third or more traitors ultimately reduces to proving that the one Commander and two Lieutenants problem cannot be solved, if the Commander is traitorous. To see this, suppose we have a traitorous Commander A, and two Lieutenants, B and C: when A tells B to attack and C to retreat, and B and C send messages to each other, forwarding A's message, neither B nor C can figure out who is the traitor, since it is not necessarily A—the other Lieutenant could have forged the message purportedly from A. It can be shown that if ''n'' is the number of generals in total, and ''t'' is the number of traitors in that ''n'', then there are solutions to the problem only when ''n'' &gt; 3''t'' and the communication is synchronous (bounded delay).<ref>{{cite journal | first1 = P. | last1 = Feldman | first2 = S. | last2 = Micali | url = http://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Distributed%20Computation/An%20Optimal%20Probabilistic%20Algorithm%20for%20Byzantine%20Agreement.pdf | title = An optimal probabilistic protocol for synchronous Byzantine agreement | journal = SIAM J. Comput. | volume = 26 | issue = 4 | pages = 873–933 | date = 1997 | doi = 10.1137/s0097539790187084 | access-date = 2012-06-14 | archive-url = https://web.archive.org/web/20160305012505/http://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Distributed%20Computation/An%20Optimal%20Probabilistic%20Algorithm%20for%20Byzantine%20Agreement.pdf | archive-date = 2016-03-05 | url-status=live }}</ref>  The full set of BFT requirements are:  For F number of Byzantine failures, there needs to be at least 3F+1 players (fault containment zones), 2F+1 independent communication paths, and F+1 rounds of communication.  There can be hybrid fault models in which benign (non-Byzantine) faults as well as Byzantine faults may exist simultaneously.  For each additional benign fault that must be tolerated, the above numbers need to be incremented by one.  If the BFT rounds of communication don't exist, Byzantine failures can occur even with no faulty hardware.
* A second solution requires unforgeable message signatures. For [[Critical system#Security critical|security-critical systems]], [[digital signature]]s (in modern computer systems, this may be achieved, in practice, by using [[public-key cryptography]]) can provide Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals. However, for [[safety-critical system]]s (where "security" addresses intelligent threats while "safety" addresses the inherent dangers of an activity or mission, i.e., faults due to natural phenomena), error detecting codes, such as [[Cyclic redundancy check|CRCs]], provide stronger coverage at a much lower cost. (Note that CRCs can provide guaranteed coverage for errors that cryptography cannot.  If an encryption scheme could provide some guarantee of coverage for message errors, it would have a structure that would make it insecure.<ref>Koopman, Philip; Driscoll, Kevin; Hall, Brendan (March 2015). "Cyclic Redundancy Code and Checksum Algorithms to Ensure Critical Data Integrity" (PDF). Federal Aviation Administration. DOT/FAA/TC-14/49. Archived (PDF) from the original on 18 May 2015. Retrieved 9 May 2015.</ref>) But neither [[digital signature]]s nor error detecting codes such as CRCs provide a known level of protection against Byzantine errors from natural causes.  And more generally, security measures can weaken safety and vice versa. Thus, cryptographic digital signature methods are not a good choice for safety-critical systems, unless there is also a specific security threat as well.<ref name="PaulitschMorris2005">{{cite book | last1=Paulitsch | first1=M. | title=2005 International Conference on Dependable Systems and Networks (DSN'05) | last2=Morris | first2=J. | last3=Hall | first3=B. | last4=Driscoll | first4=K. | last5=Latronico | first5=E. | last6=Koopman | first6=P. | chapter=Coverage and the Use of Cyclic Redundancy Codes in Ultra-Dependable Systems | year=2005 | pages=346–355 | doi=10.1109/DSN.2005.31| isbn=978-0-7695-2282-1 | s2cid=14096385 | url=https://figshare.com/articles/journal_contribution/6621788 }}</ref> While error detecting codes, such as CRCs, are better than cryptographic techniques, neither provide adequate coverage for active electronics in safety-critical systems. This is illustrated by the ''Schrödinger CRC'' scenario where a CRC-protected message with a single Byzantine faulty bit presents different data to different observers and each observer sees a valid CRC.<ref name="DriscollHall2004" /><ref name="DriscollHall2003" />
* Also presented is a variation on the first two solutions allowing Byzantine-fault-tolerant behavior in some situations where not all generals can communicate directly with each other.<ref name=":1" />

There are many systems that claim BFT without meeting the above minimum requirements (e.g., blockchain).  Given that there is mathematical proof that this is impossible, these claims need to include a caveat that their definition of BFT strays from the original.  That is, systems such as blockchain don't guarantee agreement. They use resource-intensive mechanisms that make disagreements impractical to maintain.

Several system architectures were designed c. 1980 that implemented Byzantine fault tolerance. These include: Draper's FTMP,<ref name="HopkinsLala1987">{{ Cite book | last1=Hopkins | first1=Albert L. | title=The Evolution of Fault-Tolerant Computing | last2=Lala | first2=Jaynarayan H. | last3=Smith | first3=T. Basil | chapter=The Evolution of Fault Tolerant Computing at the Charles Stark Draper Laboratory, 1955–85 | volume=1 | year=1987 | pages=121–140 | issn=0932-5581 | doi=10.1007/978-3-7091-8871-2_6| series=Dependable Computing and Fault-Tolerant Systems | isbn=978-3-7091-8873-6 }}</ref> Honeywell's MMFCS,<ref name="MMFCS">{{ citation | last1 = Driscoll | first1 = Kevin | last2 = Papadopoulos | first2 = Gregory | last3 = Nelson | first3 = Scott | last4 = Hartmann | first4 = Gary | last5 = Ramohalli | first5 = Gautham | title = Multi-Microprocessor Flight Control System | publisher = AFWAL/FIGL U.S. Air Force Systems Command | date = 1984 | type = Technical Report | location = Wright-Patterson Air Force Base, OH | id = AFWAL-TR-84-3076 }}</ref> and SRI's SIFT.<ref name=":0">{{cite journal | title=SIFT: design and analysis of a fault-tolerant computer for aircraft control|journal=Microelectronics Reliability|volume=19|issue=3|year=1979|page=190|issn=0026-2714|doi=10.1016/0026-2714(79)90211-7}}</ref>

In 1999, Miguel Castro and [[Barbara Liskov]] introduced the "Practical Byzantine Fault Tolerance" (PBFT) algorithm,<ref>{{cite journal | first1 = M. | last1 = Castro | first2 = B. | last2 = Liskov | citeseerx = 10.1.1.127.6130 | title = Practical Byzantine Fault Tolerance and Proactive Recovery | publisher = [[Association for Computing Machinery]] | journal = ACM Transactions on Computer Systems | volume = 20 | issue = 4 | pages = 398–461 | date = 2002 | doi = 10.1145/571637.571640| s2cid = 18793794 }}</ref> which provides high-performance Byzantine state machine replication, processing thousands of requests per second with sub-millisecond increases in latency.

After PBFT, several BFT protocols were introduced to improve its robustness and performance. For instance, Q/U,<ref>{{cite journal | first1 = M. | last1 = Abd-El-Malek | first2 = G. | last2 = Ganger | first3 = G. | last3 = Goodson | first4 = M. | last4 = Reiter | first5 = J. | last5 = Wylie | doi = 10.1145/1095809.1095817 | title = Fault-scalable Byzantine Fault-Tolerant Services | journal = ACM SIGOPS Operating Systems Review | volume = 39 | issue = 5 | pages = 59 | publisher = [[Association for Computing Machinery]] | date = 2005 }}</ref> HQ,<ref>{{cite conference | first1 = James | last1 = Cowling | first2 = Daniel | last2 = Myers | author-link3 = Barbara Liskov | first3 = Barbara | last3 = Liskov | first4 = Rodrigo | last4 = Rodrigues | first5 = Liuba | last5 = Shrira | url = http://portal.acm.org/citation.cfm?id=1298455.1298473 | title = HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance | conference = Proceedings of the 7th [[USENIX]] Symposium on Operating Systems Design and Implementation | date = 2006 | isbn = 1-931971-47-1 | pages = 177–190}}</ref> Zyzzyva,<ref>{{cite journal | first1 = Ramakrishna | last1 = Kotla | first2 = Lorenzo | last2 = Alvisi | first3 = Mike | last3 = Dahlin | first4 = Allen | last4 = Clement | first5 = Edmund | last5 = Wong | doi = 10.1145/1658357.1658358 | title = Zyzzyva: Speculative Byzantine Fault Tolerance | publisher = [[Association for Computing Machinery]] | journal = ACM Transactions on Computer Systems | volume = 27 | issue = 4 | pages = 1–39 | date = December 2009 }}</ref> and ABsTRACTs,<ref>{{cite conference | first1 = Rachid | last1 = Guerraoui | first2 = Nikola | last2 = Kneževic | first3 = Marko | last3 = Vukolic | first4 = Vivien | last4 = Quéma | url = http://infoscience.epfl.ch/record/144158 | title = The Next 700 BFT Protocols | conference = Proceedings of the 5th European conference on Computer systems | publisher = EuroSys | date = 2010 | access-date = 2011-10-04 | archive-url = https://web.archive.org/web/20111002225957/http://infoscience.epfl.ch/record/144158 | archive-date = 2011-10-02 | url-status=live }}</ref> addressed the performance and cost issues; whereas other protocols, like Aardvark<ref>{{cite conference|first1=A.|last1=Clement|first2=E.|last2=Wong|first3=L.|last3=Alvisi|first4=M.|last4=Dahlin|first5=M.|last5=Marchetti|url=http://www.usenix.org/events/nsdi09/tech/full_papers/clement/clement.pdf|title=Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults|publisher=[[USENIX]]|conference=Symposium on Networked Systems Design and Implementation|date=April 22–24, 2009|access-date=2010-02-17|archive-url=https://web.archive.org/web/20101225155052/https://www.usenix.org/events/nsdi09/tech/full_papers/clement/clement.pdf|archive-date=2010-12-25|url-status=live}}</ref> and RBFT,<ref>{{cite conference|first1=P.-L.|last1=Aublin|first2=S.|last2=Ben Mokhtar|first3=V.|last3=Quéma|url=http://www.temple.edu/cis/icdcs2013/program.html|title=RBFT: Redundant Byzantine Fault Tolerance|publisher=[[International Conference on Distributed Computing Systems]]|conference=33rd IEEE International Conference on Distributed Computing Systems|date=July 8–11, 2013|url-status=dead|archive-url=https://web.archive.org/web/20130805115252/http://www.temple.edu/cis/icdcs2013/program.html|archive-date=August 5, 2013}}</ref> addressed its robustness issues. Furthermore, Adapt<ref>{{Cite book|last1=Bahsoun|first1=J. P.|last2=Guerraoui|first2=R.|last3=Shoker|first3=A.|title=2015 IEEE International Parallel and Distributed Processing Symposium |chapter=Making BFT Protocols Really Adaptive |date=2015-05-01|pages=904–913|doi=10.1109/IPDPS.2015.21|isbn=978-1-4799-8649-1|s2cid=16310807|chapter-url=http://repositorio.inesctec.pt/handle/123456789/4107}}</ref> tried to make use of existing BFT protocols, through switching between them in an adaptive way, to improve system robustness and performance as the underlying conditions change. Furthermore, BFT protocols were introduced that leverage trusted components to reduce the number of replicas, e.g., A2M-PBFT-EA<ref>{{Cite book|last1=Chun|first1=Byung-Gon|last2=Maniatis|first2=Petros|last3=Shenker|first3=Scott|last4=Kubiatowicz|first4=John|title=Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles |chapter=Attested append-only memory |date=2007-01-01|series=SOSP '07|location=New York, NY, USA|publisher=ACM|pages=189–204|doi=10.1145/1294261.1294280|isbn=9781595935915|s2cid=6685352}}</ref> and MinBFT.<ref>{{Cite journal|last1=Veronese|first1=G. S.|last2=Correia|first2=M.|last3=Bessani|first3=A. N.|last4=Lung|first4=L. C.|last5=Verissimo|first5=P.|date=2013-01-01|title=Efficient Byzantine Fault-Tolerance|journal=IEEE Transactions on Computers|volume=62|issue=1|pages=16–30|doi=10.1109/TC.2011.221|issn=0018-9340|citeseerx=10.1.1.408.9972|s2cid=8157723}}</ref>