Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Protein design
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==As an optimization problem== [[File:ProteinDesignSearch.gif|200px|thumb|This animation illustrates the complexity of a protein design search, which typically compares all the rotamer-conformations from all possible mutations at all residues. In this example, the residues Phe36 and His 106 are allowed to mutate to, respectively, the amino acids Tyr and Asn. Phe and Tyr have 4 rotamers each in the rotamer library, while Asn and His have 7 and 8 rotamers, respectively, in the rotamer library (from the Richardson's penultimate rotamer library<ref name="lovell2000" />). The animation loops through all (4 + 4) x (7 + 8) = 120 possibilities. The structure shown is that of myoglobin, PDB id: 1mbn.]] The goal of protein design is to find a protein sequence that will fold to a target structure. A protein design algorithm must, thus, search all the conformations of each sequence, with respect to the target fold, and rank sequences according to the lowest-energy conformation of each one, as determined by the protein design energy function. Thus, a typical input to the protein design algorithm is the target fold, the sequence space, the structural flexibility, and the energy function, while the output is one or more sequences that are predicted to fold stably to the target structure. The number of candidate protein sequences, however, grows exponentially with the number of protein residues; for example, there are 20<sup>100</sup> protein sequences of length 100. Furthermore, even if amino acid side-chain conformations are limited to a few rotamers (see [[Structural flexibility]]), this results in an exponential number of conformations for each sequence. Thus, in our 100 residue protein, and assuming that each amino acid has exactly 10 rotamers, a search algorithm that searches this space will have to search over 200<sup>100</sup> protein conformations. The most common energy functions can be decomposed into pairwise terms between rotamers and amino acid types, which casts the problem as a combinatorial one, and powerful optimization algorithms can be used to solve it. In those cases, the total energy of each conformation belonging to each sequence can be formulated as a sum of individual and pairwise terms between residue positions. If a designer is interested only in the best sequence, the protein design algorithm only requires the lowest-energy conformation of the lowest-energy sequence. In these cases, the amino acid identity of each rotamer can be ignored and all rotamers belonging to different amino acids can be treated the same. Let <var>r</var><sub><var>i</var></sub> be a rotamer at residue position <var>i</var> in the protein chain, and <var>E(<var>r</var><sub><var>i</var></sub>)</var> the potential energy between the internal atoms of the rotamer. Let <var>E</var>(<var>r</var><sub><var>i</var></sub>, <var>r</var><sub><var>j</var></sub>) be the potential energy between <var>r</var><sub><var>i</var></sub> and rotamer <var>r</var><sub><var>j</var></sub> at residue position <var>j</var>. Then, we define the optimization problem as one of finding the conformation of minimum energy (<var>E</var><sub><var>T</var></sub><!--Moreover, the target structure could potentially be a very high energy state for many sequences; other sequences could preferentially fold into alternate, competing states. -->): <!--Moreover, the target structure could potentially be a very high energy state for many sequences; other sequences could preferentially fold into alternate, competing states. --> {{NumBlk|:|<math>\min E_{T} = \sum_{i}\Big[ E_i(r_i) + \sum_{i\ne j} E_{ij}(r_i, r_j)\Big] \, </math>|{{EquationRef|1}}}} The problem of minimizing <var>E<sub>T</sub></var> is an [[NP-hard]] problem.<ref name="donald10">{{cite book |last1=Donald |first1=Bruce R. |author-link1=Bruce Donald |title=Algorithms in Structural Molecular Biology| year=2011|publisher=MIT Press |location=Cambridge, MA}}</ref><ref>{{cite journal|last=Pierce|first=NA|author2=Winfree, E |title=Protein design is NP-hard.|journal=Protein Engineering|date=October 2002|volume=15|issue=10|pages=779β82|pmid=12468711|doi=10.1093/protein/15.10.779|doi-access=free}}</ref><ref name="voigt00">{{cite journal|last=Voigt|first=CA|author2=Gordon, DB |author3=Mayo, SL |title=Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design.|journal=Journal of Molecular Biology|date=June 9, 2000|volume=299|issue=3|pages=789β803|pmid=10835284|doi=10.1006/jmbi.2000.3758|citeseerx=10.1.1.138.2023}}</ref> Even though the class of problems is NP-hard, in practice many instances of protein design can be solved exactly or optimized satisfactorily through heuristic methods.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)