Editing Dempster–Shafer theory (section)

==Overview==
Dempster–Shafer theory is a generalization of the [[Bayesian probability|Bayesian theory of subjective probability]]. Belief functions base degrees of belief (or confidence, or trust) for one question on the subjective probabilities for a related question. The degrees of belief themselves may or may not have the mathematical properties of probabilities; how much they differ depends on how closely the two questions are related.<ref name="SH02">Shafer, Glenn; [http://www.glennshafer.com/assets/downloads/articles/article48.pdf ''Dempster–Shafer theory''], 2002</ref> Put another way, it is a way of representing [[epistemology|epistemic]] plausibilities, but it can yield answers that contradict those arrived at using [[probability theory]].

Often used as a method of [[sensor fusion]], Dempster–Shafer theory is based on two ideas: obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster's rule<ref name="DE68">Dempster, Arthur P.; ''[https://web.archive.org/web/20190728221641/https://apps.dtic.mil/dtic/tr/fulltext/u2/664659.pdf A generalization of Bayesian inference]'', Journal of the Royal Statistical Society, Series B, Vol. 30, pp. 205–247, 1968</ref> for combining such degrees of belief when they are based on independent items of evidence. In essence, the degree of belief in a proposition depends primarily upon the number of answers (to the related questions) containing the proposition, and the subjective probability of each answer. Also contributing are the rules of combination that reflect general assumptions about the data.

In this formalism a '''degree of belief''' (also referred to as a '''mass''') is represented as a '''belief function''' rather than a [[Bayesianism|Bayesian]] [[probability distribution]]. Probability values are assigned to ''sets'' of possibilities rather than single events: their appeal rests on the fact they naturally encode evidence in favor of propositions.

Dempster–Shafer theory assigns its masses to all of the subsets of the set of states of a system—in [[Set Theory|set-theoretic]] terms, the [[power set]] of the states. For instance, assume a situation where there are two possible states of a system. For this system, any belief function assigns mass to the first state, the second, to both, and to neither.

===Belief and plausibility===
Shafer's formalism starts from a set of ''possibilities'' under consideration, for instance numerical values of a variable, or pairs of linguistic variables like "date and place of origin of a relic" (asking whether it is antique or a recent fake). A hypothesis is represented by a subset of this ''frame of discernment'', like "(Ming dynasty, China)", or "(19th century, Germany)".<ref name="SH76"/>{{rp|p.35f.}}

Shafer's framework allows for belief about such propositions to be represented as intervals, bounded by two values, ''belief'' (or ''support'') and ''plausibility'':

:''belief'' ≤ ''plausibility''.

In a first step, subjective probabilities (''masses'') are assigned to all subsets of the frame; usually, only a restricted number of sets will have non-zero mass (''focal elements'').<ref name="SH76"/>{{rp|39f.}} ''Belief'' in a hypothesis is constituted by the sum of the masses of all subsets of the hypothesis-set. It is the amount of belief that directly supports either the given hypothesis or a more specific one, thus forming a lower bound on its probability.  Belief (usually denoted ''Bel'') measures the strength of the evidence in favor of a proposition ''p''.  It ranges from 0 (indicating no evidence) to 1 (denoting certainty).  ''Plausibility'' is 1 minus the sum of the masses of all sets whose intersection with the hypothesis is empty. Or, it can be obtained as the sum of the masses of all sets whose intersection with the hypothesis is not empty. It is an upper bound on the possibility that the hypothesis could be true, because there is only so much evidence that contradicts that hypothesis.  Plausibility (denoted by Pl) is thus related to Bel by Pl(''p'')&nbsp;=&nbsp;1&nbsp;&minus;&nbsp;Bel(~''p'').  It also ranges from 0 to 1 and measures the extent to which evidence in favor of ~''p'' leaves room for belief in ''p''.

For example, suppose we have a belief of 0.5 for a proposition, say "the cat in the box is dead." This means that we have evidence that allows us to state strongly that the proposition is true with a confidence of 0.5. However, the evidence contrary to that hypothesis (i.e. "the cat is alive") only has a confidence of 0.2. The remaining mass of 0.3 (the gap between the 0.5 supporting evidence on the one hand, and the 0.2 contrary evidence on the other) is "indeterminate," meaning that the cat could either be dead or alive.  This interval represents the level of uncertainty based on the evidence in the system.

{| class="wikitable"
! Hypothesis !! Mass !! Belief!! Plausibility
|-
| Neither (alive nor dead) || 0 || 0 || 0
|-
| Alive || 0.2 || 0.2 || 0.5
|-
| Dead || 0.5 || 0.5 || 0.8
|-
| Either (alive or dead) || 0.3 || 1.0 || 1.0
|}

The "neither" hypothesis is set to zero by definition (it corresponds to "no solution"). The orthogonal hypotheses "Alive" and "Dead" have probabilities of 0.2 and 0.5, respectively. This could correspond to "Live/Dead Cat Detector" signals, which have respective reliabilities of 0.2 and 0.5. Finally, the all-encompassing "Either" hypothesis (which simply acknowledges there is a cat in the box) picks up the slack so that the sum of the masses is 1. The belief for the "Alive" and "Dead" hypotheses matches their corresponding masses because they have no subsets; belief for "Either" consists of the sum of all three masses (Either, Alive, and Dead) because "Alive" and "Dead" are each subsets of "Either". The "Alive" plausibility is 1&nbsp;−&nbsp;''m'' (Dead): 0.5 and the "Dead" plausibility is 1&nbsp;−&nbsp;''m'' (Alive): 0.8.  In other way, the "Alive" plausibility is ''m''(Alive) + ''m''(Either) and the "Dead" plausibility is ''m''(Dead) + ''m''(Either). Finally, the "Either" plausibility sums ''m''(Alive)&nbsp;+&nbsp;''m''(Dead)&nbsp;+&nbsp;''m''(Either). The universal hypothesis ("Either") will always have 100% belief and plausibility—it acts as a [[checksum]] of sorts.

Here is a somewhat more elaborate example where the behavior of belief and plausibility begins to emerge. We're looking through a variety of detector systems at a single faraway signal light, which can only be coloured in one of three colours (red, yellow, or green):

{| class="wikitable"
! Hypothesis !! Mass !! Belief !! Plausibility
|-
| None || 0 || 0 || 0
|-
| Red || 0.35 || 0.35 || 0.56
|-
| Yellow || 0.25 || 0.25 || 0.45
|-
| Green || 0.15 || 0.15 || 0.34
|-
| Red or Yellow || 0.06 || 0.66 || 0.85
|-
| Red or Green || 0.05 || 0.55 || 0.75
|-
| Yellow or Green || 0.04 || 0.44 || 0.65
|-
| Any || 0.1 || 1.0 || 1.0
|}

Events of this kind would not be modeled as distinct entities in probability space as they are here in mass assignment space. Rather the event "Red or Yellow" would be considered as the union of the events "Red" and "Yellow", and (see [[probability axioms]]) ''P''(Red or Yellow) ≥ ''P''(Yellow), and ''P''(Any)&nbsp;=&nbsp;1, where ''Any'' refers to ''Red'' or ''Yellow'' or ''Green''. In DST the mass assigned to ''Any'' refers to the proportion of evidence that can not be assigned to any of the other states, which here means evidence that says there is a light but does not say anything about what color it is. In this example, the proportion of evidence that shows the light is either ''Red'' or ''Green'' is given a mass of 0.05. Such evidence might, for example, be obtained from a R/G color blind person. DST lets us extract the value of this sensor's evidence. Also, in DST the empty set is considered to have zero mass, meaning here that the signal light system exists and we are examining its possible states, not speculating as to whether it exists at all.

===Combining beliefs===
Beliefs from different sources can be combined with various fusion operators to model specific situations of belief fusion, e.g. with [[#Dempster's rule of combination|Dempster's rule of combination]], which combines belief constraints<ref name="Jos12">{{cite journal|author1=Jøsang, A. |author2=Simon, P.|title=Dempster's Rule as Seen by Little Colored Balls|journal=Computational Intelligence|year=2012|volume=28|issue=4|pages=453–474|doi=10.1111/j.1467-8640.2012.00421.x|s2cid=5143692}}</ref> that are dictated by independent belief sources, such as in the case of combining hints<ref name="KM95">Kohlas, J., and Monney, P.A., 1995. ''[https://books.google.com/books?id=dqnwCAAAQBAJ&dq=%22A+Mathematical+Theory+of+Hints.+An+Approach+to+the+Dempster%E2%80%93Shafer+Theory+of+Evidence%22&pg=PA3 A Mathematical Theory of Hints. An Approach to the Dempster–Shafer Theory of Evidence]''. Vol. 425 in Lecture Notes in Economics and Mathematical Systems. Springer Verlag.</ref> or combining preferences.<ref name="JH12">Jøsang, A., and Hankin, R., 2012. ''Interpretation and Fusion of Hyper Opinions in Subjective Logic''. 15th International Conference on Information Fusion (FUSION)  2012. E-{{ISBN|978-0-9824438-4-2}}, IEEE.|url=https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6289948</ref> Note that the probability masses from propositions that contradict each other can be used to obtain a measure of conflict between the independent belief sources. Other situations can be modeled with different fusion operators, such as cumulative fusion of beliefs from independent sources, which can be modeled with the cumulative fusion operator.<ref name="JDR10">{{cite journal|author1=Jøsang, A. |author2=Diaz, J. |author3=Rifqi, M. |name-list-style=amp|title=Cumulative and averaging fusion of beliefs|journal=Information Fusion|year=2010|volume=11|issue=2|pages=192–200|doi=10.1016/j.inffus.2009.05.005|citeseerx=10.1.1.615.2200 |s2cid=205432025 }}</ref>

Dempster's rule of combination is sometimes interpreted as an approximate generalisation of [[Bayes' rule]]. In this interpretation the priors and conditionals need not be specified, unlike traditional Bayesian methods, which often use a symmetry (minimax error) argument to assign prior probabilities to random variables (''e.g.'' assigning 0.5 to binary values for which no information is available about which is more likely). However, any information contained in the missing priors and conditionals is not used in Dempster's rule of combination unless it can be obtained indirectly—and arguably is then available for calculation using Bayes equations.

Dempster–Shafer theory allows one to specify a degree of ignorance in this situation instead of being forced to supply prior probabilities that add to unity. This sort of situation, and whether there is a real distinction between ''[[risk]]'' and ''[[ignorance]]'', has been extensively discussed by statisticians and economists. See, for example, the contrasting views of [[Ellsberg's paradox|Daniel Ellsberg]], [[Howard Raiffa]], [[Arrovian uncertainty|Kenneth Arrow]] and [[Knightian uncertainty|Frank Knight]].{{Citation needed|reason=The contrast between these views, or some summary of the discussion, should be referenced in the citation. The wiki article links themselves do a poor job of pointing to where the disagreement is.|date=June 2014}}