Editing Protein design (section)

===With mathematical guarantees===

====Dead-end elimination====
{{main|Dead-end elimination}}

The dead-end elimination (DEE) algorithm reduces the search space of the problem iteratively by removing rotamers that can be provably shown to be not part of the global lowest energy conformation (GMEC). On each iteration, the dead-end elimination algorithm compares all possible pairs of rotamers at each residue position, and removes each rotamer <var>r&prime;<sub>i</sub></var> that can be shown to always be of higher energy than another rotamer <var>r<sub>i</sub></var> and is thus not part of the GMEC:

: <math> E(r^\prime_i) + \sum_{j\ne i} \min_{r_j} E(r^\prime_i,r_j) > E(r_i) + \sum_{j\ne i} \max_{r_j} E(r_i,r_j) </math>

Other powerful extensions to the dead-end elimination algorithm include the [[Dead-end elimination#Pairs elimination criterion|pairs elimination criterion]], and the [[Dead-end elimination#Generalization|generalized dead-end elimination criterion]]. This algorithm has also been extended to handle continuous rotamers with provable guarantees.

Although the Dead-end elimination algorithm runs in polynomial time on each iteration, it cannot guarantee convergence. If, after a certain number of iterations, the dead-end elimination algorithm does not prune any more rotamers, then either rotamers have to be merged or another search algorithm must be used to search the remaining search space. In such cases, the dead-end elimination acts as a pre-filtering algorithm to reduce the search space, while other algorithms, such as A*, Monte Carlo, Linear Programming, or FASTER are used to search the remaining search space.<ref name="donald10" />

====Branch and bound====
{{main|Branch and bound}}

The protein design conformational space can be represented as a [[Tree (data structure)|tree]], where the protein residues are ordered in an arbitrary way, and the tree branches at each of the rotamers in a residue. [[Branch and bound]] algorithms use this representation to efficiently explore the conformation tree: At each ''branching'', branch and bound algorithms ''bound'' the conformation space and explore only the promising branches.<ref name="donald10" /><ref name="gordon99">{{cite journal|last=Gordon|first=DB|author2=Mayo, SL |title=Branch-and-terminate: a combinatorial optimization algorithm for protein design.|journal=Structure|date=September 15, 1999|volume=7|issue=9|pages=1089–98|pmid=10508778|doi=10.1016/s0969-2126(99)80176-2|doi-access=free}}</ref><ref name="leach98" >{{cite journal|last=Leach|first=AR|author2=Lemon, AP |title=Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm.|journal=Proteins|date=November 1, 1998|volume=33|issue=2|pages=227–39|pmid=9779790|doi=10.1002/(sici)1097-0134(19981101)33:2<227::aid-prot7>3.0.co;2-f|citeseerx=10.1.1.133.7986|s2cid=12872539 }}</ref>

A popular search algorithm for protein design is the [[A* search algorithm]].<ref name="donald10" /><ref name="leach98" /> A* computes a lower-bound score on each partial tree path that lower bounds (with guarantees) the energy of each of the expanded rotamers. Each partial conformation is added to a priority queue and at each iteration the partial path with the lowest lower bound is popped from the queue and expanded. The algorithm stops once a full conformation has been enumerated and guarantees that the conformation is the optimal.

The A* score <var>f</var> in protein design consists of two parts, <var>f=g+h</var>. <var>g</var> is the exact energy of the rotamers that have already been assigned in the partial conformation. <var>h</var> is a lower bound on the energy of the rotamers that have not yet been assigned. Each is designed as follows, where <var>d</var> is the index of the last assigned residue in the partial conformation.

: <math>g=\sum_{i=1}^d (E(r_i ) + \sum_{j=i+1}^d E(r_i,r_j) )</math>

: <math>h = \sum_{j=d+1}^n [\min_{r_j}(E(r_j) + \sum_{i=1}^d E(r_i,r_j) + \sum_{k=j+1}^n \min_{r_k} E(r_j,r_k))]</math>

====Integer linear programming====
{{Further|Linear programming#Integer unknowns|Integer programming}}

The problem of optimizing <var>E<sub>T</sub></var> (Equation ({{EquationNote|1}})) can be easily formulated as an [[integer linear program]] (ILP).<ref name="kingsford05" /> One of the most powerful formulations uses binary variables to represent the presence of a rotamer and edges in the final solution, and constraints the solution to have exactly one rotamer for each residue and one pairwise interaction for each pair of residues:

: <math>\ \min \sum_{i}\sum_{r_i} E_i(r_i)q_{i}(r_i) + \sum_{j\ne i}\sum_{r_j} E_{ij}(r_i, r_j)q_{ij}(r_i, r_j) \, </math>

s.t.

: <math>\sum_{r_i} q_{i}(r_i) = 1, \ \forall i</math>

: <math>\sum_{r_j} q_{ij}(r_i,r_j) = q_{i}(r_i), \forall i, r_i, j </math>

: <math>q_i, q_{ij} \in \{0,1\}</math>

ILP solvers, such as [[CPLEX]], can compute the exact optimal solution for large instances of protein design problems. These solvers use a [[linear programming relaxation]] of the problem, where <var>q<sub>i</sub></var> and <var>q<sub>ij</sub></var> are allowed to take continuous values, in combination with a [[branch and cut]] algorithm to search only a small portion of the conformation space for the optimal solution. ILP solvers have been shown to solve many instances of the side-chain placement problem.<ref name="kingsford05">{{cite journal|last=Kingsford|first=CL|author2=Chazelle, B |author3=Singh, M |title=Solving and analyzing side-chain positioning problems using linear and integer programming.|journal=Bioinformatics|date=April 1, 2005|volume=21|issue=7|pages=1028–36|pmid=15546935|doi=10.1093/bioinformatics/bti144|doi-access=free}}</ref>

====Message-passing based approximations to the linear programming dual====
ILP solvers depend on linear programming (LP) algorithms, such as the [[Simplex algorithm|Simplex]] or [[barrier function|barrier]]-based methods to perform the LP relaxation at each branch. These LP algorithms were developed as general-purpose optimization methods and are not optimized for the protein design problem (Equation ({{EquationNote|1}})). In consequence, the LP relaxation becomes the bottleneck of ILP solvers when the problem size is large.<ref name=yanover06>{{cite journal|last=Yanover|first=Chen|author2=Talya Meltzer |author3=Yair Weiss |title=Linear Programming Relaxations and Belief Propagation – An Empirical Study|journal=Journal of Machine Learning Research|year=2006|volume=7|pages=1887–1907}}</ref> Recently, several alternatives based on [[belief propagation|message-passing algorithms]] have been designed specifically for the optimization of the LP relaxation of the protein design problem. These algorithms can approximate both the [[Duality (optimization)|dual]] or the [[Duality (optimization)|primal]] instances of the integer programming, but in order to maintain guarantees on optimality, they are most useful when used to approximate the dual of the protein design problem, because approximating the dual guarantees that no solutions are missed.  Message-passing based approximations include the ''tree reweighted max-product message passing'' algorithm,<ref>{{cite journal|last=Wainwright|first=Martin J |author2=Tommi S. Jaakkola |author3=Alan S. Willsky|title=MAP estimation via agreement on trees: message-passing and linear programming.|journal=IEEE Transactions on Information Theory|year=2005|pages=3697–3717|doi=10.1109/tit.2005.856938|volume=51|issue=11 |citeseerx=10.1.1.71.9565 |s2cid=10007532 }}</ref><ref>{{cite journal|last=Kolmogorov|first=Vladimir|title=Convergent tree-reweighted message passing for energy minimization.|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|date=October 28, 2006|volume=28|issue=10|pages=1568–1583|doi=10.1109/TPAMI.2006.200|pmid=16986540|s2cid=8616813}}</ref> and the ''message passing linear programming'' algorithm.<ref>{{cite journal|last=Globerson|first=Amir|author2=Tommi S. Jaakkola |title=Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations.|journal=Advances in Neural Information Processing Systems|year=2007}}</ref>