Editing Protein design (section)

==As an optimization problem==
[[File:ProteinDesignSearch.gif|200px|thumb|This animation illustrates the complexity of a protein design search, which typically compares all the rotamer-conformations from all possible mutations at all residues. In this example, the residues Phe36 and His 106 are allowed to mutate to, respectively, the amino acids Tyr and Asn. Phe and Tyr have 4 rotamers each in the rotamer library, while Asn and His have 7 and 8 rotamers, respectively, in the rotamer library (from the Richardson's penultimate rotamer library<ref name="lovell2000" />). The animation loops through all (4 + 4) x (7 + 8) = 120 possibilities. The structure shown is that of myoglobin, PDB id: 1mbn.]]

The goal of protein design is to find a protein sequence that will fold to a target structure. A protein design algorithm must, thus, search all the conformations of each sequence, with respect to the target fold, and rank sequences according to the lowest-energy conformation of each one, as determined by the protein design energy function. Thus, a typical input to the protein design algorithm is the target fold, the sequence space, the structural flexibility, and the energy function, while the output is one or more sequences that are predicted to fold stably to the target structure.

The number of candidate protein sequences, however, grows exponentially with the number of protein residues; for example, there are 20<sup>100</sup> protein sequences of length 100. Furthermore, even if amino acid side-chain conformations are limited to a few rotamers (see [[Structural flexibility]]), this results in an exponential number of conformations for each sequence. Thus, in our 100 residue protein, and assuming that each amino acid has exactly 10 rotamers, a search algorithm that searches this space will have to search over 200<sup>100</sup> protein conformations.

The most common energy functions can be decomposed into pairwise terms between rotamers and amino acid types, which casts the problem as a combinatorial one, and powerful optimization algorithms can be used to solve it. In those cases, the total energy of each conformation belonging to each sequence can be formulated as a sum of individual and pairwise terms between residue positions. If a designer is interested only in the best sequence, the protein design algorithm only requires the lowest-energy conformation of the lowest-energy sequence. In these cases, the amino acid identity of each rotamer can be ignored and all rotamers belonging to different amino acids can be treated the same. Let <var>r</var><sub><var>i</var></sub> be a rotamer at residue position <var>i</var> in the protein chain, and <var>E(<var>r</var><sub><var>i</var></sub>)</var> the potential energy between the internal atoms of the rotamer. Let <var>E</var>(<var>r</var><sub><var>i</var></sub>, <var>r</var><sub><var>j</var></sub>) be the potential energy between <var>r</var><sub><var>i</var></sub> and rotamer <var>r</var><sub><var>j</var></sub> at residue position <var>j</var>. Then, we define the optimization problem as one of finding the conformation of minimum energy (<var>E</var><sub><var>T</var></sub><!--Moreover, the target structure could potentially be a very high energy state for many sequences; other sequences could preferentially fold into alternate, competing states. -->):

<!--Moreover, the target structure could potentially be a very high energy state for many sequences; other sequences could preferentially fold into alternate, competing states. -->
{{NumBlk|:|<math>\min E_{T} = \sum_{i}\Big[ E_i(r_i) + \sum_{i\ne j} E_{ij}(r_i, r_j)\Big] \, </math>|{{EquationRef|1}}}}

The problem of minimizing <var>E<sub>T</sub></var> is an [[NP-hard]] problem.<ref name="donald10">{{cite book |last1=Donald |first1=Bruce R. |author-link1=Bruce Donald |title=Algorithms in Structural Molecular Biology| year=2011|publisher=MIT Press |location=Cambridge, MA}}</ref><ref>{{cite journal|last=Pierce|first=NA|author2=Winfree, E |title=Protein design is NP-hard.|journal=Protein Engineering|date=October 2002|volume=15|issue=10|pages=779–82|pmid=12468711|doi=10.1093/protein/15.10.779|doi-access=free}}</ref><ref name="voigt00">{{cite journal|last=Voigt|first=CA|author2=Gordon, DB |author3=Mayo, SL |title=Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design.|journal=Journal of Molecular Biology|date=June 9, 2000|volume=299|issue=3|pages=789–803|pmid=10835284|doi=10.1006/jmbi.2000.3758|citeseerx=10.1.1.138.2023}}</ref> Even though the class of problems is NP-hard, in practice many instances of protein design can be solved exactly or optimized satisfactorily through heuristic methods.