Editing Byzantine fault

{{short description|Fault in a computer system that presents different symptoms to different observers}}
{{Other uses|Byzantine (disambiguation)}}
A '''Byzantine fault''' is a condition of a system, particularly a [[distributed computing]] system, where a fault occurs such that different symptoms are presented to different observers, including imperfect information on whether a system component has failed. The term takes its name from an [[allegory]], the "Byzantine generals problem",<ref name=":1">{{Cite journal | last1 = Lamport | first1 = L. | author-link1 = Leslie Lamport| last2 = Shostak | first2 = R. | last3 = Pease | first3 = M. | doi = 10.1145/357172.357176 | title = The Byzantine Generals Problem | journal = ACM Transactions on Programming Languages and Systems | volume = 4 | issue = 3 | pages = 382–401 | year = 1982 | url = https://www.microsoft.com/en-us/research/publication/byzantine-generals-problem/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fpeople%2Flamport%2Fpubs%2Fbyz.pdf | url-status=live | archive-url = https://web.archive.org/web/20180613015025/https://www.microsoft.com/en-us/research/publication/byzantine-generals-problem/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fpeople%2Flamport%2Fpubs%2Fbyz.pdf | archive-date = 13 June 2018| citeseerx = 10.1.1.64.2312 | s2cid = 55899582 }}</ref> developed to describe a situation in which, to avoid catastrophic failure of a system, the system's actors must agree on a strategy, but some of these actors are unreliable in such a way as to cause other (good) actors to disagree on the strategy and they may be unaware of the disagreement.

A Byzantine fault is also known as a '''Byzantine generals problem''', a '''Byzantine agreement problem''', or a '''Byzantine failure'''.

'''Byzantine fault tolerance''' ('''BFT''') is the resilience of a [[fault-tolerant computer system]] or similar system to such conditions.

== Definition ==

A Byzantine fault is any fault presenting different symptoms to different observers.<ref name="DriscollHall2004">{{cite book| last1=Driscoll| first1=K.| title=The 23rd Digital Avionics Systems Conference (IEEE Cat. No.04CH37576)| last2=Hall| first2=B.| last3=Paulitsch| first3=M.| last4=Zumsteg| first4=P. |last5=Sivencrona| first5=H.| chapter=The Real Byzantine Generals| year=2004| pages=6.D.4–61–11| doi=10.1109/DASC.2004.1390734| isbn=978-0-7803-8539-9| s2cid=15549497}}</ref> A Byzantine failure is the loss of a system service due to a Byzantine fault in systems that require [[Consensus (computer science)|consensus]] among multiple components.<ref name="DriscollHall2003">{{cite book| last1=Driscoll| first1=Kevin| title=Computer Safety, Reliability, and Security| last2=Hall| first2=Brendan| last3=Sivencrona| first3=Håkan| last4=Zumsteg| first4=Phil | s2cid=12690337| chapter=Byzantine Fault Tolerance, from Theory to Reality| volume=2788| year=2003| pages=235–248| issn=0302-9743| doi=10.1007/978-3-540-39878-3_19| series=Lecture Notes in Computer Science| isbn=978-3-540-20126-7}}</ref>

[[File:Byzantine Generals.png|thumb|upright=1.3|If all generals attack in coordination, the battle is won (left). If two generals falsely declare that they intend to attack, but instead retreat, the battle is lost (right).]]
The Byzantine allegory considers a number of generals who are attacking a fortress. The generals must decide as a group whether to attack or retreat; some may prefer to attack, while others prefer to retreat. The important thing is that all generals agree on a common decision, for a halfhearted attack by a few generals would become a [[rout]], and would be worse than either a coordinated attack or a coordinated retreat.

The problem is complicated by the presence of treacherous generals who may not only cast a vote for a suboptimal strategy; they may do so selectively. For instance, if nine generals are voting, four of whom support attacking while four others are in favor of retreat, the ninth general may send a vote of retreat to those generals in favor of retreat, and a vote of attack to the rest. Those who received a retreat vote from the ninth general will retreat, while the rest will attack (which may not go well for the attackers). The problem is complicated further by the generals being physically separated and having to send their votes via messengers who may fail to deliver votes or may forge false votes.

Without message signing, Byzantine fault tolerance can only be achieved if the total number of generals is greater than three times the number of disloyal (faulty) generals. There can be a default vote value given to missing messages. For example, missing messages can be given a [[Uninitialized variable|"null" value]]. Further, if the agreement is that the null votes are in the majority, a pre-assigned default strategy can be used (e.g., retreat).<ref name="BGP_Paper">{{Cite journal|last1=Lamport|first1=L.|author-link1=Leslie Lamport|last2=Shostak|first2=R.|last3=Pease|first3=M.|year=1982|title=The Byzantine Generals Problem|url=http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf|journal=ACM Transactions on Programming Languages and Systems|volume=4|issue=3|pages=387–389|citeseerx=10.1.1.64.2312|doi=10.1145/357172.357176|s2cid=55899582 |archive-url=https://web.archive.org/web/20170207104645/http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf|archive-date=7 February 2017}}</ref>

The typical mapping of this allegory onto computer systems is that the computers are the generals and their digital communication system links are the messengers. Although the problem is formulated in the allegory as a decision-making and security problem, in electronics, it cannot be solved by [[cryptographic]] [[digital signature]]s alone, because failures such as incorrect voltages can propagate through the encryption process. Thus, a faulty message could be sent such that some recipients detect the message as faulty (bad signature), others see it is having a good signature, and a third group also sees a good signature but with different message contents than the second group.<ref name="DriscollHall2004">{{cite book| last1=Driscoll| first1=K.| title=The 23rd Digital Avionics Systems Conference (IEEE Cat. No.04CH37576)| last2=Hall| first2=B.| last3=Paulitsch| first3=M.| last4=Zumsteg| first4=P. |last5=Sivencrona| first5=H.| chapter=The Real Byzantine Generals| year=2004| pages=6.D.4–61–11| doi=10.1109/DASC.2004.1390734| isbn=978-0-7803-8539-9| s2cid=15549497}}</ref>

== History ==

The problem of obtaining Byzantine consensus was conceived and formalized by [[Robert Shostak]], who dubbed it the ''interactive consistency'' problem. This work was done in 1978 in the context of the NASA-sponsored SIFT<ref name=":0" /> project in the Computer Science Lab at [[SRI International]]. SIFT (for Software Implemented Fault Tolerance) was the brainchild of John Wensley, and was based on the idea of using multiple general-purpose computers that would communicate through pairwise messaging in order to reach a consensus, even if some of the computers were faulty.

At the beginning of the project, it was not clear how many computers in total were needed to guarantee that a conspiracy of ''n'' faulty computers could not "thwart" the efforts of the correctly-operating ones to reach consensus. Shostak showed that a minimum of 3''n+''1 are needed, and devised a two-round 3''n+1'' messaging protocol that would work for ''n''=1. His colleague Marshall Pease generalized the algorithm for any n > 0, proving that 3''n''+1 is both necessary and sufficient. These results, together with a later proof by [[Leslie Lamport]] of the sufficiency of 3''n'' using digital signatures, were published in the seminal paper, ''Reaching Agreement in the Presence of Faults.''<ref>{{Cite journal|last1=Pease|first1=Marshall|last2=Shostak|first2=Robert|last3=Lamport|first3=Leslie|date=April 1980|title=Reaching Agreement in the Presence of Faults|journal=Journal of the Association for Computing Machinery|volume=27|issue=2|pages=228–234|doi=10.1145/322186.322188|citeseerx=10.1.1.68.4044|s2cid=6429068}}</ref> The authors were awarded the 2005 [[Edsger W. Dijkstra Prize]] for this paper.

To make the interactive consistency problem easier to understand, Lamport devised a colorful allegory in which a group of army generals formulate a plan for attacking a city. In its original version, the story cast the generals as commanders of the [[Albania]]n army. The name was changed, eventually settling on "[[Byzantine Empire|Byzantine]]", at the suggestion of Jack Goldberg to future-proof any potential offense-giving.<ref>{{cite journal |last1=Lamport |first1=Leslie |title=The Byzantine Generals Problem |url=https://www.microsoft.com/en-us/research/publication/byzantine-generals-problem/ |journal=ACM Transactions on Programming Languages and Systems |publisher=SRI International |access-date=18 March 2019|date=2016-12-19 }}</ref> This formulation of the problem, together with some additional results, were presented by the same authors in their 1982 paper, "The Byzantine Generals Problem".<ref name=BGP_Paper />

== Mitigation ==

The objective of Byzantine fault tolerance is to be able to defend against failures of system components with or without symptoms that prevent other components of the system from reaching an agreement among themselves, where such an agreement is needed for the correct operation of the system.

The remaining operationally correct components of a Byzantine fault tolerant system will be able to continue providing the system's service as originally intended, assuming there are a sufficient number of accurately-operating components to maintain the service.

When considering failure propagation only via errors, Byzantine failures are considered the most general and most difficult class of failures among the [[Failure cause|failure modes]]. The so-called fail-stop failure mode occupies the simplest end of the spectrum. Whereas the fail-stop failure mode simply means that the only way to fail is a [[Node (computer science)|node]] crash, detected by other nodes, Byzantine failures imply no restrictions on what errors can be created, which means that a failed node can generate arbitrary data, including data that makes it appear like a functioning node to a subset of other nodes. Thus, Byzantine failures can confuse failure detection systems, which makes fault tolerance difficult. Despite the allegory, a Byzantine failure is not necessarily a [[security]] problem involving hostile human interference: it can arise purely from physical or software faults.

The terms fault and failure are used here according to the standard definitions<ref name="AvizienisLaprie2004">{{cite journal| last1=Avizienis | first1=A.| last2=Laprie| first2=J.-C.| last3=Randell| first3=Brian| author-link3=Brian Randell| last4=Landwehr| first4=C.| title=Basic concepts and taxonomy of dependable and secure computing| journal=IEEE Transactions on Dependable and Secure Computing| volume=1| issue=1| year=2004| pages=11–33| issn=1545-5971 | doi=10.1109/TDSC.2004.2| hdl=1903/6459| s2cid=215753451| hdl-access=free}}</ref> originally created by a joint committee on "Fundamental Concepts and Terminology" formed by the [[IEEE]] Computer Society's Technical Committee on Dependable Computing and Fault-Tolerance and [[IFIP]] Working Group 10.4 on Dependable Computing and Fault Tolerance.<ref>{{cite web| title = Dependable Computing and Fault Tolerance| url = http://www.dependability.org| access-date = 2015-03-02| archive-url = https://web.archive.org/web/20150402141319/http://www.dependability.org/| archive-date = 2015-04-02| url-status=live}}</ref> See also [[dependability]].

Byzantine fault tolerance is only concerned with broadcast consistency, that is, the property that when a component broadcasts a value to all the other components, they all receive exactly this same value, or in the case that the broadcaster is not consistent, the other components agree on a common value themselves. This kind of fault tolerance does not encompass the correctness of the value itself; for example, an adversarial component that deliberately sends an incorrect value, but sends that same value consistently to all components, will not be caught in the Byzantine fault tolerance scheme.

== Solutions ==

Several early solutions were described by Lamport, Shostak, and Pease in 1982.<ref name=BGP_Paper/> They began by noting that the Generals' Problem can be reduced to solving a "Commander and Lieutenants" problem where loyal Lieutenants must all act in unison and that their action must correspond to what the Commander ordered in the case that the Commander is loyal:

* One solution considers scenarios in which messages may be forged, but which will be ''Byzantine-fault-tolerant'' as long as the number of disloyal generals is less than one third of the generals. The impossibility of dealing with one-third or more traitors ultimately reduces to proving that the one Commander and two Lieutenants problem cannot be solved, if the Commander is traitorous. To see this, suppose we have a traitorous Commander A, and two Lieutenants, B and C: when A tells B to attack and C to retreat, and B and C send messages to each other, forwarding A's message, neither B nor C can figure out who is the traitor, since it is not necessarily A—the other Lieutenant could have forged the message purportedly from A. It can be shown that if ''n'' is the number of generals in total, and ''t'' is the number of traitors in that ''n'', then there are solutions to the problem only when ''n'' &gt; 3''t'' and the communication is synchronous (bounded delay).<ref>{{cite journal | first1 = P. | last1 = Feldman | first2 = S. | last2 = Micali | url = http://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Distributed%20Computation/An%20Optimal%20Probabilistic%20Algorithm%20for%20Byzantine%20Agreement.pdf | title = An optimal probabilistic protocol for synchronous Byzantine agreement | journal = SIAM J. Comput. | volume = 26 | issue = 4 | pages = 873–933 | date = 1997 | doi = 10.1137/s0097539790187084 | access-date = 2012-06-14 | archive-url = https://web.archive.org/web/20160305012505/http://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Distributed%20Computation/An%20Optimal%20Probabilistic%20Algorithm%20for%20Byzantine%20Agreement.pdf | archive-date = 2016-03-05 | url-status=live }}</ref>  The full set of BFT requirements are:  For F number of Byzantine failures, there needs to be at least 3F+1 players (fault containment zones), 2F+1 independent communication paths, and F+1 rounds of communication.  There can be hybrid fault models in which benign (non-Byzantine) faults as well as Byzantine faults may exist simultaneously.  For each additional benign fault that must be tolerated, the above numbers need to be incremented by one.  If the BFT rounds of communication don't exist, Byzantine failures can occur even with no faulty hardware.
* A second solution requires unforgeable message signatures. For [[Critical system#Security critical|security-critical systems]], [[digital signature]]s (in modern computer systems, this may be achieved, in practice, by using [[public-key cryptography]]) can provide Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals. However, for [[safety-critical system]]s (where "security" addresses intelligent threats while "safety" addresses the inherent dangers of an activity or mission, i.e., faults due to natural phenomena), error detecting codes, such as [[Cyclic redundancy check|CRCs]], provide stronger coverage at a much lower cost. (Note that CRCs can provide guaranteed coverage for errors that cryptography cannot.  If an encryption scheme could provide some guarantee of coverage for message errors, it would have a structure that would make it insecure.<ref>Koopman, Philip; Driscoll, Kevin; Hall, Brendan (March 2015). "Cyclic Redundancy Code and Checksum Algorithms to Ensure Critical Data Integrity" (PDF). Federal Aviation Administration. DOT/FAA/TC-14/49. Archived (PDF) from the original on 18 May 2015. Retrieved 9 May 2015.</ref>) But neither [[digital signature]]s nor error detecting codes such as CRCs provide a known level of protection against Byzantine errors from natural causes.  And more generally, security measures can weaken safety and vice versa. Thus, cryptographic digital signature methods are not a good choice for safety-critical systems, unless there is also a specific security threat as well.<ref name="PaulitschMorris2005">{{cite book | last1=Paulitsch | first1=M. | title=2005 International Conference on Dependable Systems and Networks (DSN'05) | last2=Morris | first2=J. | last3=Hall | first3=B. | last4=Driscoll | first4=K. | last5=Latronico | first5=E. | last6=Koopman | first6=P. | chapter=Coverage and the Use of Cyclic Redundancy Codes in Ultra-Dependable Systems | year=2005 | pages=346–355 | doi=10.1109/DSN.2005.31| isbn=978-0-7695-2282-1 | s2cid=14096385 | url=https://figshare.com/articles/journal_contribution/6621788 }}</ref> While error detecting codes, such as CRCs, are better than cryptographic techniques, neither provide adequate coverage for active electronics in safety-critical systems. This is illustrated by the ''Schrödinger CRC'' scenario where a CRC-protected message with a single Byzantine faulty bit presents different data to different observers and each observer sees a valid CRC.<ref name="DriscollHall2004" /><ref name="DriscollHall2003" />
* Also presented is a variation on the first two solutions allowing Byzantine-fault-tolerant behavior in some situations where not all generals can communicate directly with each other.<ref name=":1" />

There are many systems that claim BFT without meeting the above minimum requirements (e.g., blockchain).  Given that there is mathematical proof that this is impossible, these claims need to include a caveat that their definition of BFT strays from the original.  That is, systems such as blockchain don't guarantee agreement. They use resource-intensive mechanisms that make disagreements impractical to maintain.

Several system architectures were designed c. 1980 that implemented Byzantine fault tolerance. These include: Draper's FTMP,<ref name="HopkinsLala1987">{{ Cite book | last1=Hopkins | first1=Albert L. | title=The Evolution of Fault-Tolerant Computing | last2=Lala | first2=Jaynarayan H. | last3=Smith | first3=T. Basil | chapter=The Evolution of Fault Tolerant Computing at the Charles Stark Draper Laboratory, 1955–85 | volume=1 | year=1987 | pages=121–140 | issn=0932-5581 | doi=10.1007/978-3-7091-8871-2_6| series=Dependable Computing and Fault-Tolerant Systems | isbn=978-3-7091-8873-6 }}</ref> Honeywell's MMFCS,<ref name="MMFCS">{{ citation | last1 = Driscoll | first1 = Kevin | last2 = Papadopoulos | first2 = Gregory | last3 = Nelson | first3 = Scott | last4 = Hartmann | first4 = Gary | last5 = Ramohalli | first5 = Gautham | title = Multi-Microprocessor Flight Control System | publisher = AFWAL/FIGL U.S. Air Force Systems Command | date = 1984 | type = Technical Report | location = Wright-Patterson Air Force Base, OH | id = AFWAL-TR-84-3076 }}</ref> and SRI's SIFT.<ref name=":0">{{cite journal | title=SIFT: design and analysis of a fault-tolerant computer for aircraft control|journal=Microelectronics Reliability|volume=19|issue=3|year=1979|page=190|issn=0026-2714|doi=10.1016/0026-2714(79)90211-7}}</ref>

In 1999, Miguel Castro and [[Barbara Liskov]] introduced the "Practical Byzantine Fault Tolerance" (PBFT) algorithm,<ref>{{cite journal | first1 = M. | last1 = Castro | first2 = B. | last2 = Liskov | citeseerx = 10.1.1.127.6130 | title = Practical Byzantine Fault Tolerance and Proactive Recovery | publisher = [[Association for Computing Machinery]] | journal = ACM Transactions on Computer Systems | volume = 20 | issue = 4 | pages = 398–461 | date = 2002 | doi = 10.1145/571637.571640| s2cid = 18793794 }}</ref> which provides high-performance Byzantine state machine replication, processing thousands of requests per second with sub-millisecond increases in latency.

After PBFT, several BFT protocols were introduced to improve its robustness and performance. For instance, Q/U,<ref>{{cite journal | first1 = M. | last1 = Abd-El-Malek | first2 = G. | last2 = Ganger | first3 = G. | last3 = Goodson | first4 = M. | last4 = Reiter | first5 = J. | last5 = Wylie | doi = 10.1145/1095809.1095817 | title = Fault-scalable Byzantine Fault-Tolerant Services | journal = ACM SIGOPS Operating Systems Review | volume = 39 | issue = 5 | pages = 59 | publisher = [[Association for Computing Machinery]] | date = 2005 }}</ref> HQ,<ref>{{cite conference | first1 = James | last1 = Cowling | first2 = Daniel | last2 = Myers | author-link3 = Barbara Liskov | first3 = Barbara | last3 = Liskov | first4 = Rodrigo | last4 = Rodrigues | first5 = Liuba | last5 = Shrira | url = http://portal.acm.org/citation.cfm?id=1298455.1298473 | title = HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance | conference = Proceedings of the 7th [[USENIX]] Symposium on Operating Systems Design and Implementation | date = 2006 | isbn = 1-931971-47-1 | pages = 177–190}}</ref> Zyzzyva,<ref>{{cite journal | first1 = Ramakrishna | last1 = Kotla | first2 = Lorenzo | last2 = Alvisi | first3 = Mike | last3 = Dahlin | first4 = Allen | last4 = Clement | first5 = Edmund | last5 = Wong | doi = 10.1145/1658357.1658358 | title = Zyzzyva: Speculative Byzantine Fault Tolerance | publisher = [[Association for Computing Machinery]] | journal = ACM Transactions on Computer Systems | volume = 27 | issue = 4 | pages = 1–39 | date = December 2009 }}</ref> and ABsTRACTs,<ref>{{cite conference | first1 = Rachid | last1 = Guerraoui | first2 = Nikola | last2 = Kneževic | first3 = Marko | last3 = Vukolic | first4 = Vivien | last4 = Quéma | url = http://infoscience.epfl.ch/record/144158 | title = The Next 700 BFT Protocols | conference = Proceedings of the 5th European conference on Computer systems | publisher = EuroSys | date = 2010 | access-date = 2011-10-04 | archive-url = https://web.archive.org/web/20111002225957/http://infoscience.epfl.ch/record/144158 | archive-date = 2011-10-02 | url-status=live }}</ref> addressed the performance and cost issues; whereas other protocols, like Aardvark<ref>{{cite conference|first1=A.|last1=Clement|first2=E.|last2=Wong|first3=L.|last3=Alvisi|first4=M.|last4=Dahlin|first5=M.|last5=Marchetti|url=http://www.usenix.org/events/nsdi09/tech/full_papers/clement/clement.pdf|title=Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults|publisher=[[USENIX]]|conference=Symposium on Networked Systems Design and Implementation|date=April 22–24, 2009|access-date=2010-02-17|archive-url=https://web.archive.org/web/20101225155052/https://www.usenix.org/events/nsdi09/tech/full_papers/clement/clement.pdf|archive-date=2010-12-25|url-status=live}}</ref> and RBFT,<ref>{{cite conference|first1=P.-L.|last1=Aublin|first2=S.|last2=Ben Mokhtar|first3=V.|last3=Quéma|url=http://www.temple.edu/cis/icdcs2013/program.html|title=RBFT: Redundant Byzantine Fault Tolerance|publisher=[[International Conference on Distributed Computing Systems]]|conference=33rd IEEE International Conference on Distributed Computing Systems|date=July 8–11, 2013|url-status=dead|archive-url=https://web.archive.org/web/20130805115252/http://www.temple.edu/cis/icdcs2013/program.html|archive-date=August 5, 2013}}</ref> addressed its robustness issues. Furthermore, Adapt<ref>{{Cite book|last1=Bahsoun|first1=J. P.|last2=Guerraoui|first2=R.|last3=Shoker|first3=A.|title=2015 IEEE International Parallel and Distributed Processing Symposium |chapter=Making BFT Protocols Really Adaptive |date=2015-05-01|pages=904–913|doi=10.1109/IPDPS.2015.21|isbn=978-1-4799-8649-1|s2cid=16310807|chapter-url=http://repositorio.inesctec.pt/handle/123456789/4107}}</ref> tried to make use of existing BFT protocols, through switching between them in an adaptive way, to improve system robustness and performance as the underlying conditions change. Furthermore, BFT protocols were introduced that leverage trusted components to reduce the number of replicas, e.g., A2M-PBFT-EA<ref>{{Cite book|last1=Chun|first1=Byung-Gon|last2=Maniatis|first2=Petros|last3=Shenker|first3=Scott|last4=Kubiatowicz|first4=John|title=Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles |chapter=Attested append-only memory |date=2007-01-01|series=SOSP '07|location=New York, NY, USA|publisher=ACM|pages=189–204|doi=10.1145/1294261.1294280|isbn=9781595935915|s2cid=6685352}}</ref> and MinBFT.<ref>{{Cite journal|last1=Veronese|first1=G. S.|last2=Correia|first2=M.|last3=Bessani|first3=A. N.|last4=Lung|first4=L. C.|last5=Verissimo|first5=P.|date=2013-01-01|title=Efficient Byzantine Fault-Tolerance|journal=IEEE Transactions on Computers|volume=62|issue=1|pages=16–30|doi=10.1109/TC.2011.221|issn=0018-9340|citeseerx=10.1.1.408.9972|s2cid=8157723}}</ref>

== Applications ==

Several examples of Byzantine failures that have occurred are given in two equivalent journal papers.<ref name="DriscollHall2004" /><ref name="DriscollHall2003" /> These and other examples are described on the [[NASA]] DASHlink web pages.<ref>{{cite web |last=Driscoll |first=Kevin |date=2012-12-11 |title=Real System Failures |url=https://c3.nasa.gov/dashlink/resources/624/ |url-status=live |archive-url=https://web.archive.org/web/20150402190610/https://c3.nasa.gov/dashlink/resources/624/ |archive-date=2015-04-02 |access-date=2015-03-02 |website=DASHlink |publisher=[[NASA]]}}</ref>

=== Applications in computing ===
Byzantine fault tolerance mechanisms use components that repeat an incoming message (or just its signature, which can be reduced to just a single bit of information if self-checking pairs are used for nodes) to other recipients of that incoming message. All these mechanisms make the assumption that the act of repeating a message blocks the propagation of Byzantine symptoms. For systems that have a high degree of safety or security criticality, these assumptions must be proven to be true to an acceptable level of [[fault coverage]]. When providing proof through testing, one difficulty is creating a sufficiently wide range of signals with Byzantine symptoms.<ref name="NanyaGoosen1989">{{cite journal |last1=Nanya |first1=T. |last2=Goosen |first2=H.A. |year=1989 |title=The Byzantine hardware fault model |journal=IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |volume=8 |issue=11 |pages=1226–1231 |doi=10.1109/43.41508 |issn=0278-0070}}</ref> Such testing will likely require specialized [[Fault injection|fault injectors]].<ref name="MartinsGandhi2013">{{cite book |last1=Martins |first1=Rolando |title=Middleware 2013 |last2=Gandhi |first2=Rajeev |last3=Narasimhan |first3=Priya |last4=Pertet |first4=Soila |last5=Casimiro |first5=António |last6=Kreutz |first6=Diego |last7=Veríssimo |first7=Paulo |year=2013 |isbn=978-3-642-45064-8 |series=Lecture Notes in Computer Science |volume=8275 |pages=41–61 |chapter=Experiences with Fault-Injection in a Byzantine Fault-Tolerant Protocol |doi=10.1007/978-3-642-45065-5_3 |issn=0302-9743 |s2cid=31337539}}</ref><ref>{{cite patent|country=US|number=7475318|title=Method for testing the sensitive input range of Byzantine filters|status=patent|fdate=2006-01-27|pridate=2005-01-28|gdate=2009-01-06|inventor=Kevin R. Driscoll|assign1=Honeywell International Inc.}}</ref>

=== Military applications ===
Byzantine errors were observed infrequently and at irregular points during endurance testing for the newly constructed [[Virginia-class submarine|''Virginia'' class submarines]], at least through 2005 (when the issues were publicly reported).<ref name="WalterEllis2005">{{cite book|last1=Walter|first1=C.|title=Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE'05)|last2=Ellis|first2=P.|last3=LaValley|first3=B.|chapter=The Reliable Platform Service: A Property-Based Fault Tolerant Service Architecture|year=2005|pages=34–43|doi=10.1109/HASE.2005.23|isbn=978-0-7695-2377-4|s2cid=21468069}}</ref>

=== Cryptocurrency applications ===
The [[Bitcoin network]] works in parallel to generate a [[blockchain]] with [[proof-of-work]] allowing the system to overcome Byzantine failures and reach a coherent global view of the system's state.<ref>{{Cite web |last=Rubby |first=Matt |date=20 January 2024 |title=How Byzantine Generals Problem Relates to You in 2024 |url=https://www.swanbitcoin.com/byzantine-generals-problem/ |access-date=2024-01-27 |website=Swan Bitcoin |language=en}}</ref><ref>{{Citation |last1=Tholoniat |first1=Pierre |title=Formal Verification of Blockchain Byzantine Fault Tolerance |date=2022 |work=Handbook on Blockchain |pages=389–412 |editor-last=Tran |editor-first=Duc A. |url=https://doi.org/10.1007/978-3-031-07535-3_12 |access-date=2024-01-27 |series=Springer Optimization and Its Applications |place=Cham |publisher=Springer International Publishing |language=en |doi=10.1007/978-3-031-07535-3_12 |isbn=978-3-031-07535-3 |last2=Gramoli |first2=Vincent |editor2-last=Thai |editor2-first=My T. |editor3-last=Krishnamachari |editor3-first=Bhaskar|arxiv=1909.07453 }}</ref> Some [[proof of stake]] blockchains also use BFT algorithms.{{sfn|Deirmentzoglou|Papakyriakopoulos|Patsakis|2019|p=28716}}

=== Blockchain Technology ===
Byzantine Fault Tolerance (BFT) is a crucial concept in [[Blockchain|blockchain technology]], ensuring that a network can continue to function even when some nodes<ref>{{Cite web |title=Node Operations |url=https://docs.klever.org/node-operations}}</ref> (participants) fail or act maliciously. This tolerance is necessary because blockchains are decentralized systems with no central authority, making it essential to achieve consensus among nodes, even if some try to disrupt the process.

==== Applications and Examples of Byzantine Fault Tolerance in Blockchain ====
'''Safety Mechanisms:''' Different blockchains use various BFT-based consensus mechanisms like Practical Byzantine Fault Tolerance (PBFT), Tendermint, and [[Delegated Proof of Stake (DPoS)]] to handle Byzantine faults. These protocols ensure that the majority of honest nodes can agree on the next block in the chain, securing the network against attacks and preventing [[double-spending]] and other types of fraud. Practical examples of networks include [[Hyperledger|Hyperledger Fabric]], [[Cosmos Blockchain|Cosmos]] and [[Klever blockchain|Klever]] in this sequence.  

'''51% Attack Mitigation:''' While traditional blockchains like Bitcoin use Proof of Work (PoW), which is susceptible to a [[51% attack]], BFT-based systems are designed to tolerate up to one-third of faulty or malicious nodes without compromising the network's integrity.

'''Decentralized Trust:''' Byzantine Fault Tolerance underpins the trust model in [[Decentralization|decentralized]] networks. Instead of relying on a central authority, the network's security depends on the ability of honest nodes to outnumber and outmaneuver malicious ones.

'''Private and Permissioned Blockchains:''' BFT is especially important in private or permissioned blockchains, where a limited number of known participants need to reach a consensus quickly and securely. These networks often use BFT protocols to enhance performance and security.

=== Applications in aviation ===
Some aircraft systems, such as the Boeing 777 [[Aircraft Information Management System]] (via its [[ARINC]] 659 SAFEbus network), the Boeing 777 flight control system, and the Boeing 787 flight control systems, use Byzantine fault tolerance; because these are real-time systems, their Byzantine fault tolerance solutions must have very low latency. For example, SAFEbus can achieve Byzantine fault tolerance within the order of a microsecond of added latency.<ref name="Zurawski2015">{{cite book | last1=M. | first1=Paulitsch | last2=Driscoll | first2=K. | editor-first=Richard | editor-last=Zurawski | title=Industrial Communication Technology Handbook, Second Edition | chapter-url=https://books.google.com/books?id=ppzNBQAAQBAJ | date=9 January 2015 | chapter=Chapter 48:SAFEbus | pages=48–1–48–26 | publisher=CRC Press | isbn=978-1-4822-0733-0}}</ref><ref name="HenzingerKirsch2001">{{cite book | author1 = Thomas A. Henzinger | author2 = Christoph M. Kirsch | title = Embedded Software: First International Workshop, EMSOFT 2001, Tahoe City, CA, USA, October 8-10, 2001. Proceedings | url = http://www.csl.sri.com/papers/emsoft01/emsoft01.pdf | date = 26 September 2001 | publisher = Springer Science & Business Media | isbn = 978-3-540-42673-8 | pages = 307– | access-date = 2015-03-05 | archive-url = https://web.archive.org/web/20150922114036/http://www.csl.sri.com/papers/emsoft01/emsoft01.pdf | archive-date = 2015-09-22 | url-status=live }}</ref><ref name="Yeh2001">{{cite book | last1=Yeh | first1=Y.C. | title=20th DASC. 20th Digital Avionics Systems Conference (Cat. No.01CH37219) | chapter=Safety critical avionics for the 777 primary flight controls system | volume=1 | year=2001 | pages=1C2/1–1C2/11 | doi=10.1109/DASC.2001.963311| isbn=978-0-7803-7034-0 | s2cid=61489128 }}</ref> The [[SpaceX Dragon]] considers Byzantine fault tolerance in its design.<ref>{{Cite web |url=https://lwn.net/Articles/540368/ |title=ELC: SpaceX lessons learned [LWN.net]<!-- Bot generated title --> |access-date=2016-07-21 |archive-url=https://web.archive.org/web/20160805064218/http://lwn.net/Articles/540368/ |archive-date=2016-08-05 |url-status=live }}</ref>

== See also ==

* {{annotated link|Atomic commit}}
* {{annotated link|Brooks–Iyengar algorithm}}
* {{annotated link|List of terms relating to algorithms and data structures}}
* {{annotated link|Paxos (computer science)}}
* {{annotated link|Quantum Byzantine agreement}}
* {{annotated link|Two Generals' Problem}}
* {{annotated link|Conflict-free replicated data type}}

== References ==

{{refs}}

== Sources ==

* {{cite journal | last1 = Deirmentzoglou | first1 = Evangelos | last2 = Papakyriakopoulos | first2 = Georgios | last3 = Patsakis | first3 = Constantinos | title = A Survey on Long-Range Attacks for Proof of Stake Protocols | journal = IEEE Access | date = 2019 | volume = 7 | pages = 28712–28725 | eissn = 2169-3536 | doi = 10.1109/ACCESS.2019.2901858 | pmid = | s2cid = 84185792 | doi-access = free | bibcode = 2019IEEEA...728712D }}
* Bashir, Imran. "Blockchain Consensus." ''Blockchain Consensus - An Introduction to Classical, Blockchain, and Quantum Consensus Protocols''. {{ISBN|978-1-4842-8178-9}} Apress, Berkeley, CA, 2022. {{doi|10.1007/978-1-4842-8179-6}}

== External links ==

* [https://web.archive.org/web/20080828060019/http://www.rkbexplorer.com/explorer/#display=mechanism%2D{http://resex.rkbexplorer.com/id/resilience-mechanism-65b5cef4} Byzantine Fault Tolerance in the RKBExplorer]

[[Category:Public-key cryptography]]
[[Category:Distributed computing problems]]
[[Category:Fault-tolerant computer systems]]
[[Category:Theory of computation]]