Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Protein design
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Rational design of new protein molecules}} {{about|rational protein design|the broader engineering of proteins|Protein engineering}} {{Use mdy dates|date=April 2016}} '''Protein design''' is the [[rational design]] of new [[protein]] molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function.<ref>{{cite news|last1=Korendovych|first1=Ivan|title=Minimalist design of peptide and protein catalysts|url=https://plan.core-apps.com/acsnola2018/abstract/3b00ff75f22454219cca274e14edadf8|access-date=22 March 2018|publisher=American Chemical Society|date=19 March 2018}}</ref> Proteins can be designed from scratch (''de novo'' design) or by making calculated variants of a known protein structure and its sequence (termed ''protein redesign''). '''Rational protein design''' approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as [[peptide synthesis]], [[site-directed mutagenesis]], or [[artificial gene synthesis]]. Rational protein design dates back to the mid-1970s.<ref name="richardson1989">{{cite journal|last=Richardson|first=JS|author2=Richardson, DC |title=The de novo design of protein structures.|journal=Trends in Biochemical Sciences|date=July 1989|volume=14|issue=7|pages=304–9|pmid=2672455|doi=10.1016/0968-0004(89)90070-4}}</ref> Recently, however, there were numerous examples of successful rational design of water-soluble and even transmembrane peptides and proteins, in part due to a better understanding of different factors contributing to [[protein folding|protein structure stability]] and development of better computational methods. ==Overview and history== The goal in rational protein design is to predict [[amino acid]] [[Protein primary structure|sequences]] that will [[protein folding|fold]] to a specific protein structure. Although the number of possible protein sequences is vast, growing exponentially with the size of the protein chain, only a subset of them will fold reliably and quickly to one [[native state]]. Protein design involves identifying novel sequences within this subset. The native state of a protein is the conformational [[Thermodynamic free energy|free energy]] minimum for the chain. Thus, protein design is the search for sequences that have the chosen structure as a free energy minimum. In a sense, it is the reverse of [[protein structure prediction]]. In design, a [[Protein tertiary structure|tertiary structure]] is specified, and a sequence that will fold to it is identified. Hence, it is also termed ''inverse folding''. Protein design is then an optimization problem: using some scoring criteria, an optimized sequence that will fold to the desired structure is chosen. When the first proteins were rationally designed during the 1970s and 1980s, the sequence for these was optimized manually based on analyses of other known proteins, the sequence composition, amino acid charges, and the geometry of the desired structure.<ref name="richardson1989" /> The first designed proteins are attributed to Bernd Gutte, who designed a reduced version of a known catalyst, bovine ribonuclease, and tertiary structures consisting of beta-sheets and alpha-helices, including a binder of [[DDT]]. Urry and colleagues later designed [[elastin]]-like [[fibrous protein|fibrous]] peptides based on rules on sequence composition. Richardson and coworkers designed a 79-residue protein with no sequence homology to a known protein.<ref name="richardson1989" /> In the 1990s, the advent of powerful computers, [[Conformational isomerism#Protein rotamer libraries|libraries of amino acid conformations]], and force fields developed mainly for [[molecular dynamics]] simulations enabled the development of structure-based computational protein design tools. Following the development of these computational tools, great success has been achieved over the last 30 years in protein design. The first protein successfully designed completely ''de novo'' was done by [[Stephen Mayo]] and coworkers in 1997,<ref name="dahiyat1997" /> and, shortly after, in 1999 [[Peter S. Kim]] and coworkers designed dimers, trimers, and tetramers of unnatural right-handed [[coiled coil]]s.<ref name="gordon99review">{{cite journal|last=Gordon|first=DB|author2=Marshall, SA |author3=Mayo, SL |title=Energy functions for protein design.|journal=Current Opinion in Structural Biology|date=August 1999|volume=9|issue=4|pages=509–13|pmid=10449371|doi=10.1016/s0959-440x(99)80072-4}}</ref><ref name="harbury99">{{cite journal|last=Harbury|first=PB|author2=Plecs, JJ |author3=Tidor, B |author4=Alber, T |author5= Kim, PS |title=High-resolution protein design with backbone freedom.|journal=Science|date=November 20, 1998|volume=282|issue=5393|pages=1462–7|pmid=9822371|doi=10.1126/science.282.5393.1462}}</ref> In 2003, [[David Baker (biochemist)|David Baker]]'s laboratory designed a full protein to a fold never seen before in nature.<ref name="kuhlman03" /> Later, in 2008, Baker's group computationally designed enzymes for two different reactions.<ref>{{cite journal|last=Sterner|first=R|author2=Merkl, R |author3=Raushel, FM |title=Computational design of enzymes.|journal=Chemistry & Biology|date=May 2008|volume=15|issue=5|pages=421–3|pmid=18482694|doi=10.1016/j.chembiol.2008.04.007|doi-access=free}}</ref> In 2010, one of the most powerful broadly neutralizing antibodies was isolated from patient serum using a computationally designed protein probe.<ref name="wu2010a">{{cite journal|last1=Wu|first1=X|author2=Yang, ZY|author3=Li, Y|author4=Hogerkorp, CM|author5=Schief, WR|author6=Seaman, MS|author7=Zhou, T|author8=Schmidt, SD|author9=Wu, L|author10=Xu, L|author11=Longo, NS|author12=McKee, K|author13=O'Dell, S|author14=Louder, MK|author15=Wycuff, DL|author16=Feng, Y|author17=Nason, M|author18=Doria-Rose, N|author19=Connors, M|author20=Kwong, PD|author21=Roederer, M|author22=Wyatt, RT|author23=Nabel, GJ|author23-link=Gary Nabel|author24=Mascola, JR|author24-link=John R. Mascola |title=Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1.|journal=Science|date=August 13, 2010|volume=329|issue=5993|pages=856–61|pmid=20616233|bibcode= 2010Sci...329..856W |doi= 10.1126/science.1187659|pmc=2965066}}</ref> In 2024, Baker received one half of the [[Nobel Prize in Chemistry]] for his advancement of computational protein design, with the other half being shared by [[Demis Hassabis]] and [[John M. Jumper|John Jumper]] of [[Google DeepMind|Deepmind]] for protein structure prediction.<ref>{{Cite web |date=2024-10-09 |title=Press Release: The Nobel Prize in Chemistry 2024 |url=https://www.nobelprize.org/prizes/chemistry/2024/press-release/ |url-status=live |access-date=2025-03-31 |website=Nobel Prize}}</ref> Due to these and other successes (e.g., see [[#Applications and examples of designed proteins|examples]] below), protein design has become one of the most important tools available for [[protein engineering]]. There is great hope that the design of new proteins, small and large, will have uses in [[biomedicine]] and [[bioengineering]]. <!--[[Prion]] diseases like [[bovine spongiform encephalopathy]] (mad-cow disease) illustrate how important it is that designer proteins possess only one stable conformation. In mad-cow disease, there exists a healthy protein with a fatal weakness: There is another conformation that it can "comfortably" take; the abnormally folded shape has very little free energy and is thus very stable. For reasons that are not yet fully understood, this [[Protein misfolding|mis-folded]] prion protein can [[Catalysis|catalyze]] other proteins of its type to also adopt the mis-folded shape, causing a disease-generating cascade of formerly functional proteins to quickly mis-fold. They lose the ability to perform their intended function in the new conformation, and have a tendency to form aggregates called [[senile plaques]]. The buildup of these aggregates in the brain leads to progressive neuronal death, and eventually death of the entire organism. Thus, it is easy to see the importance both that a designer protein have only one possible stable tertiary structure and that researchers exercise extreme diligence to ensure that this remain the case in all environments, especially ''[[in vivo]]''.--> ==Underlying models of protein structure and function == Protein design programs use [[bioinformatics|computer models]] of the molecular forces that drive proteins in ''[[in vivo]]'' environments. In order to make the problem tractable, these forces are simplified by protein design models. Although protein design programs vary greatly, they have to address four main modeling questions: What is the target structure of the design, what flexibility is allowed on the target structure, which sequences are included in the search, and which force field will be used to score sequences and structures. ===Target structure=== [[File:Top7.png|thumb|left|The [[Top7]] protein was one of the first proteins designed for a fold that had never been seen before in nature<ref name="kuhlman03">{{cite journal|last=Kuhlman|first=B|author2=Dantas, G |author3=Ireton, GC |author4=Varani, G |author5=Stoddard, BL |author6= Baker, D |title=Design of a novel globular protein fold with atomic-level accuracy.|journal=Science|date=November 21, 2003|volume=302|issue=5649|pages=1364–8|pmid=14631033|bibcode= 2003Sci...302.1364K |doi= 10.1126/science.1089427|s2cid=1939390}}</ref>]] Protein function is heavily dependent on protein structure, and rational protein design uses this relationship to design function by designing proteins that have a target structure or fold. Thus, by definition, in rational protein design the target structure or ensemble of structures must be known beforehand. This contrasts with other forms of protein engineering, such as [[directed evolution]], where a variety of methods are used to find proteins that achieve a specific function, and with [[protein structure prediction]] where the sequence is known, but the structure is unknown. Most often, the target structure is based on a known structure of another protein. However, novel folds not seen in nature have been made increasingly possible. Peter S. Kim and coworkers designed trimers and tetramers of unnatural coiled coils, which had not been seen before in nature.<ref name="gordon99review" /><ref name="harbury99" /> The protein Top7, developed in [[David Baker (biochemist)|David Baker]]'s lab, was designed completely using protein design algorithms, to a completely novel fold.<ref name="kuhlman03" /> More recently, Baker and coworkers developed a series of principles to design ideal [[globular protein|globular-protein]] structures based on [[folding funnel|protein folding funnels]] that bridge between secondary structure prediction and tertiary structures. These principles, which build on both protein structure prediction and protein design, were used to design five different novel protein topologies.<ref>{{cite journal|last=Höcker|first=B|title=Structural biology: A toolbox for protein design.|journal=Nature|date=November 8, 2012|volume=491|issue=7423|pages=204–5|pmid=23135466|bibcode= 2012Natur.491..204H |doi= 10.1038/491204a|s2cid=4426247|doi-access=free}}</ref> ===Sequence space=== [[File:1FSVblue-1ZAAred.png|thumb|FSD-1 (shown in blue, PDB id: 1FSV) was the first ''de novo'' computational design of a full protein.<ref name="dahiyat1997">{{cite journal|last=Dahiyat|first=BI|author2=Mayo, SL |title=De novo protein design: fully automated sequence selection.|journal=Science|date=October 3, 1997|volume=278|issue=5335|pages=82–7|pmid=9311930|doi=10.1126/science.278.5335.82|citeseerx=10.1.1.72.7304}}</ref> The target fold was that of the zinc finger in residues 33–60 of the structure of protein Zif268 (shown in red, PDB id: 1ZAA). The designed sequence had very little sequence identity with any known protein sequence.]] In rational protein design, proteins can be redesigned from the sequence and structure of a known protein, or completely from scratch in ''de novo'' protein design. In protein redesign, most of the residues in the sequence are maintained as their wild-type amino-acid while a few are allowed to mutate. In ''de novo'' design, the entire sequence is designed anew, based on no prior sequence. Both ''de novo'' designs and protein redesigns can establish rules on the [[Sequence space (evolution)|sequence space]]: the specific amino acids that are allowed at each mutable residue position. For example, the composition of the surface of the [[#Protein resurfacing|RSC3 probe]] to select HIV-broadly neutralizing antibodies was restricted based on evolutionary data and charge balancing. Many of the earliest attempts on protein design were heavily based on empiric ''rules'' on the sequence space.<ref name="richardson1989" /> Moreover, the [[#Design of fibrous proteins|design of fibrous proteins]] usually follows strict rules on the sequence space. [[Collagen]]-based designed proteins, for example, are often composed of Gly-Pro-X repeating patterns.<ref name="richardson1989" /> The advent of computational techniques allows designing proteins with no human intervention in sequence selection.<ref name="dahiyat1997" /> ===Structural flexibility=== [[File:ileRotamers.gif|thumb|left|200px|Common protein design programs use rotamer libraries to simplify the conformational space of protein side chains. This animation loops through all the rotamers of the isoleucine amino acid based on the Penultimate Rotamer Library (total of 7 rotamers).<ref name="lovell2000" />]] In protein design, the target structure (or structures) of the protein are known. However, a rational protein design approach must model some ''flexibility'' on the target structure in order to increase the number of sequences that can be designed for that structure and to minimize the chance of a sequence folding to a different structure. For example, in a protein redesign of one small amino acid (such as alanine) in the tightly packed core of a protein, very few mutants would be predicted by a rational design approach to fold to the target structure, if the surrounding side-chains are not allowed to be repacked. Thus, an essential parameter of any design process is the amount of flexibility allowed for both the side-chains and the backbone. In the simplest models, the protein backbone is kept rigid while some of the protein side-chains are allowed to change conformations. However, side-chains can have many degrees of freedom in their bond lengths, bond angles, and [[Dihedral angle#Dihedral angles of biological molecules|<var>χ</var> dihedral angles]]. To simplify this space, protein design methods use rotamer libraries that assume ideal values for bond lengths and bond angles, while restricting <var>χ</var> dihedral angles to a few frequently observed low-energy conformations termed [[Conformational isomerism|rotamers]]. Rotamer libraries are derived from the statistical analysis of many protein structures. Backbone-independent rotamer libraries describe all rotamers.<ref name="lovell2000">{{cite journal|last=Lovell|first=SC|author2=Word, JM |author3=Richardson, JS |author4= Richardson, DC |title=The penultimate rotamer library.|journal=Proteins|date=August 15, 2000|volume=40|issue=3|pages=389–408|pmid=10861930|doi=10.1002/1097-0134(20000815)40:3<389::AID-PROT50>3.0.CO;2-2|citeseerx=10.1.1.555.4071|s2cid=3055173 }}</ref> [[Backbone-dependent rotamer library|Backbone-dependent rotamer libraries]], in contrast, describe the rotamers as how likely they are to appear depending on the protein backbone arrangement around the side chain.<ref>{{cite journal|last=Shapovalov|first=MV|author2=Dunbrack RL, Jr|title=A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions.|journal=Structure|date=June 8, 2011|volume=19|issue=6|pages=844–58|pmid=21645855|doi=10.1016/j.str.2011.03.019|pmc=3118414}}</ref> Most protein design programs use one conformation (e.g., the modal value for rotamer dihedrals in space) or several points in the region described by the rotamer; the OSPREY protein design program, in contrast, models the entire continuous region.<ref name="samish11"/> Although rational protein design must preserve the general backbone fold a protein, allowing some backbone flexibility can significantly increase the number of sequences that fold to the structure while maintaining the general fold of the protein.<ref name="kortemme09">{{cite journal|last=Mandell|first=DJ|author2=Kortemme, T |author-link2=Tanja Kortemme |title=Backbone flexibility in computational protein design.|journal=Current Opinion in Biotechnology|date=August 2009|volume=20|issue=4|pages=420–8|pmid=19709874|doi=10.1016/j.copbio.2009.07.006|url=https://escholarship.org/content/qt89b8n09b/qt89b8n09b.pdf?t=pqrxq4}}</ref> Backbone flexibility is especially important in protein redesign because sequence mutations often result in small changes to the backbone structure. Moreover, backbone flexibility can be essential for more advanced applications of protein design, such as binding prediction and enzyme design. Some models of protein design backbone flexibility include small and continuous global backbone movements, discrete backbone samples around the target fold, backrub motions, and protein loop flexibility.<ref name="kortemme09" /><ref name="donald10" /> ===Energy function=== [[File:PEF comparison.png|thumb|400px|right|Comparison of various potential energy functions. The most accurate energy are those that use quantum mechanical calculations, but these are too slow for protein design. On the other extreme, heuristic energy functions are based on statistical terms and are very fast. In the middle are molecular mechanics energy functions that are physically based but are not as computationally expensive as quantum mechanical simulations.<ref name="Boas"/>]] Rational protein design techniques must be able to discriminate sequences that will be stable under the target fold from those that would prefer other low-energy competing states. Thus, protein design requires accurate [[force field (chemistry)|energy functions]] that can rank and score sequences by how well they fold to the target structure. At the same time, however, these energy functions must consider the computational [[#As an optimization problem|challenges]] behind protein design. One of the most challenging requirements for successful design is an energy function that is both accurate and simple for computational calculations. The most accurate energy functions are those based on quantum mechanical simulations. However, such simulations are too slow and typically impractical for protein design. Instead, many protein design algorithms use either physics-based energy functions adapted from [[molecular mechanics]] simulation programs, [[statistical potential|knowledge based energy-functions]], or a hybrid mix of both. The trend has been toward using more physics-based potential energy functions.<ref name="Boas">{{cite journal |last1=Boas |first1=F. E. |last2=Harbury |first2=P. B. |name-list-style=amp |year=2007 |title=Potential energy functions for protein design |journal=Current Opinion in Structural Biology |volume=17 |issue=2 |pages=199–204 |doi=10.1016/j.sbi.2007.03.006 |pmid=17387014}}</ref> Physics-based energy functions, such as [[AMBER]] and [[CHARMM]], are typically derived from quantum mechanical simulations, and experimental data from thermodynamics, crystallography, and spectroscopy.<ref name="boas2007">{{cite journal|last=Boas|first=FE|author2=Harbury, PB |title=Potential energy functions for protein design.|journal=Current Opinion in Structural Biology|date=April 2007|volume=17|issue=2|pages=199–204|pmid=17387014|doi=10.1016/j.sbi.2007.03.006}}</ref> These energy functions typically simplify physical energy function and make them pairwise decomposable, meaning that the total energy of a protein conformation can be calculated by adding the pairwise energy between each atom pair, which makes them attractive for optimization algorithms. Physics-based energy functions typically model an attractive-repulsive [[Lennard-Jones]] term between atoms and a pairwise [[electrostatics]] coulombic term<ref>{{cite journal|last=Vizcarra|first=CL|author2=Mayo, SL |title=Electrostatics in computational protein design.|journal=Current Opinion in Chemical Biology|date=December 2005|volume=9|issue=6|pages=622–6|pmid=16257567|doi=10.1016/j.cbpa.2005.10.014}}</ref> between non-bonded atoms. [[File:Water-hbond-vrc01-gp120.png|thumb|left|Water-mediated hydrogen bonds play a key role in protein–protein binding. One such interaction is shown between residues D457, S365 in the heavy chain of the HIV-broadly-neutralizing antibody VRC01 (green) and residues N58 and Y59 in the HIV envelope protein GP120 (purple).<ref name="wu2010">{{cite journal|last=Zhou|first=T|author2=Georgiev, I|author3=Wu, X|author4=Yang, ZY|author5=Dai, K|author6=Finzi, A|author7=Kwon, YD|author8=Scheid, JF|author9=Shi, W|author10=Xu, L|author11=Yang, Y|author12=Zhu, J|author13=Nussenzweig, MC|author14=Sodroski, J|author15=Shapiro, L|author16=Nabel, GJ|author17=Mascola, JR|author18=Kwong, PD|title=Structural basis for broad and potent neutralization of HIV-1 by antibody VRC01.|journal=Science|date=August 13, 2010|volume=329|issue=5993|pages=811–7|pmid=20616231|bibcode= 2010Sci...329..811Z |doi= 10.1126/science.1192819|pmc=2981354}}</ref>]] Statistical potentials, in contrast to physics-based potentials, have the advantage of being fast to compute, of accounting implicitly of complex effects and being less sensitive to small changes in the protein structure.<ref>{{cite journal|last=Mendes|first=J|author2=Guerois, R |author3=Serrano, L |title=Energy estimation in protein design.|journal=Current Opinion in Structural Biology|date=August 2002|volume=12|issue=4|pages=441–6|pmid=12163065|doi=10.1016/s0959-440x(02)00345-7}}</ref> These energy functions are [[:File:knowledge based potential.png|based on deriving energy values]] from frequency of appearance on a structural database. Protein design, however, has requirements that can sometimes be limited in molecular mechanics force-fields. Molecular mechanics force-fields, which have been used mostly in molecular dynamics simulations, are optimized for the simulation of single sequences, but protein design searches through many conformations of many sequences. Thus, molecular mechanics force-fields must be tailored for protein design. In practice, protein design energy functions often incorporate both statistical terms and physics-based terms. For example, the Rosetta energy function, one of the most-used energy functions, incorporates physics-based energy terms originating in the CHARMM energy function, and statistical energy terms, such as rotamer probability and knowledge-based electrostatics. Typically, energy functions are highly customized between laboratories, and specifically tailored for every design.<ref name="boas2007" /> ====Challenges for effective design energy functions==== Water makes up most of the molecules surrounding proteins and is the main driver of protein structure. Thus, modeling the interaction between water and protein is vital in protein design. The number of water molecules that interact with a protein at any given time is huge and each one has a large number of degrees of freedom and interaction partners. Instead, protein design programs model most of such water molecules as a continuum, modeling both the hydrophobic effect and solvation polarization.<ref name="boas2007" /> Individual water molecules can sometimes have a crucial structural role in the core of proteins, and in protein–protein or protein–ligand interactions. Failing to model such waters can result in mispredictions of the optimal sequence of a protein–protein interface. As an alternative, water molecules can be added to rotamers. <!-- ====Lennard-Jones potentials==== ====Electrostatics==== ====Entropy==== To be done. ====Non-pairwise terms==== Polarizability ... to be done. ====Knowledge-based energy functions==== --><ref name="boas2007" /> <!-- ====Lennard-Jones potentials==== ====Electrostatics==== ====Entropy==== To be done. ====Non-pairwise terms==== Polarizability ... to be done. ====Knowledge-based energy functions==== --> ==As an optimization problem== [[File:ProteinDesignSearch.gif|200px|thumb|This animation illustrates the complexity of a protein design search, which typically compares all the rotamer-conformations from all possible mutations at all residues. In this example, the residues Phe36 and His 106 are allowed to mutate to, respectively, the amino acids Tyr and Asn. Phe and Tyr have 4 rotamers each in the rotamer library, while Asn and His have 7 and 8 rotamers, respectively, in the rotamer library (from the Richardson's penultimate rotamer library<ref name="lovell2000" />). The animation loops through all (4 + 4) x (7 + 8) = 120 possibilities. The structure shown is that of myoglobin, PDB id: 1mbn.]] The goal of protein design is to find a protein sequence that will fold to a target structure. A protein design algorithm must, thus, search all the conformations of each sequence, with respect to the target fold, and rank sequences according to the lowest-energy conformation of each one, as determined by the protein design energy function. Thus, a typical input to the protein design algorithm is the target fold, the sequence space, the structural flexibility, and the energy function, while the output is one or more sequences that are predicted to fold stably to the target structure. The number of candidate protein sequences, however, grows exponentially with the number of protein residues; for example, there are 20<sup>100</sup> protein sequences of length 100. Furthermore, even if amino acid side-chain conformations are limited to a few rotamers (see [[Structural flexibility]]), this results in an exponential number of conformations for each sequence. Thus, in our 100 residue protein, and assuming that each amino acid has exactly 10 rotamers, a search algorithm that searches this space will have to search over 200<sup>100</sup> protein conformations. The most common energy functions can be decomposed into pairwise terms between rotamers and amino acid types, which casts the problem as a combinatorial one, and powerful optimization algorithms can be used to solve it. In those cases, the total energy of each conformation belonging to each sequence can be formulated as a sum of individual and pairwise terms between residue positions. If a designer is interested only in the best sequence, the protein design algorithm only requires the lowest-energy conformation of the lowest-energy sequence. In these cases, the amino acid identity of each rotamer can be ignored and all rotamers belonging to different amino acids can be treated the same. Let <var>r</var><sub><var>i</var></sub> be a rotamer at residue position <var>i</var> in the protein chain, and <var>E(<var>r</var><sub><var>i</var></sub>)</var> the potential energy between the internal atoms of the rotamer. Let <var>E</var>(<var>r</var><sub><var>i</var></sub>, <var>r</var><sub><var>j</var></sub>) be the potential energy between <var>r</var><sub><var>i</var></sub> and rotamer <var>r</var><sub><var>j</var></sub> at residue position <var>j</var>. Then, we define the optimization problem as one of finding the conformation of minimum energy (<var>E</var><sub><var>T</var></sub><!--Moreover, the target structure could potentially be a very high energy state for many sequences; other sequences could preferentially fold into alternate, competing states. -->): <!--Moreover, the target structure could potentially be a very high energy state for many sequences; other sequences could preferentially fold into alternate, competing states. --> {{NumBlk|:|<math>\min E_{T} = \sum_{i}\Big[ E_i(r_i) + \sum_{i\ne j} E_{ij}(r_i, r_j)\Big] \, </math>|{{EquationRef|1}}}} The problem of minimizing <var>E<sub>T</sub></var> is an [[NP-hard]] problem.<ref name="donald10">{{cite book |last1=Donald |first1=Bruce R. |author-link1=Bruce Donald |title=Algorithms in Structural Molecular Biology| year=2011|publisher=MIT Press |location=Cambridge, MA}}</ref><ref>{{cite journal|last=Pierce|first=NA|author2=Winfree, E |title=Protein design is NP-hard.|journal=Protein Engineering|date=October 2002|volume=15|issue=10|pages=779–82|pmid=12468711|doi=10.1093/protein/15.10.779|doi-access=free}}</ref><ref name="voigt00">{{cite journal|last=Voigt|first=CA|author2=Gordon, DB |author3=Mayo, SL |title=Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design.|journal=Journal of Molecular Biology|date=June 9, 2000|volume=299|issue=3|pages=789–803|pmid=10835284|doi=10.1006/jmbi.2000.3758|citeseerx=10.1.1.138.2023}}</ref> Even though the class of problems is NP-hard, in practice many instances of protein design can be solved exactly or optimized satisfactorily through heuristic methods. ==Algorithms== Several algorithms have been developed specifically for the protein design problem. These algorithms can be divided into two broad classes: exact algorithms, such as [[dead-end elimination]], that lack [[Run time (program lifecycle phase)|runtime]] guarantees but guarantee the quality of the solution; and [[Heuristic (computer science)|heuristic]] algorithms, such as Monte Carlo, that are faster than exact algorithms but have no guarantees on the optimality of the results. Exact algorithms guarantee that the optimization process produced the optimal according to the protein design model. Thus, if the predictions of exact algorithms fail when these are experimentally validated, then the source of error can be attributed to the energy function, the allowed flexibility, the sequence space or the target structure (e.g., if it cannot be designed for).<ref>{{cite journal|last=Hong|first=EJ|author2=Lippow, SM |author3=Tidor, B |author4= Lozano-Pérez, T |title=Rotamer optimization for protein design through MAP estimation and problem-size reduction.|journal=Journal of Computational Chemistry|date=September 2009|volume=30|issue=12|pages=1923–45|pmid=19123203|doi=10.1002/jcc.21188 |pmc=3495010}}</ref> Some protein design algorithms are listed below. Although these algorithms address only the most basic formulation of the protein design problem, Equation ({{EquationNote|1}}), when the optimization goal changes because designers introduce improvements and extensions to the protein design model, such as improvements to the structural flexibility allowed (e.g., protein backbone flexibility) or including sophisticated energy terms, many of the extensions on protein design that improve modeling are built atop these algorithms. For example, Rosetta Design incorporates sophisticated energy terms, and backbone flexibility using Monte Carlo as the underlying optimizing algorithm. OSPREY's algorithms build on the dead-end elimination algorithm and A* to incorporate continuous backbone and side-chain movements. Thus, these algorithms provide a good perspective on the different kinds of algorithms available for protein design. In 2020 scientists reported the development of an AI-based process using [[List of biological databases|genome databases]] for [[Evolutionary algorithm|evolution-based]] designing of novel proteins. They used [[deep learning]] to identify design-rules.<ref>{{cite news |title=Machine learning reveals recipe for building artificial proteins |url=https://phys.org/news/2020-07-machine-reveals-recipe-artificial-proteins.html |access-date=17 August 2020 |work=phys.org |language=en}}</ref><ref>{{cite journal |title=An evolution-based model for designing chorismatemutase enzymes |journal=Science |doi=10.1126/science.aba3304 |bibcode=2020Sci...369..440R |last1=Russ |first1=William P. |last2=Figliuzzi |first2=Matteo |last3=Stocker |first3=Christian |last4=Barrat-Charlaix |first4=Pierre |last5=Socolich |first5=Michael |last6=Kast |first6=Peter |last7=Hilvert |first7=Donald |last8=Monasson |first8=Remi |last9=Cocco |first9=Simona |last10=Weigt |first10=Martin |last11=Ranganathan |first11=Rama |year=2020 |volume=369 |issue=6502 |pages=440–445 |pmid=32703877 |s2cid=220714458 }}</ref> In 2022, a study reported deep learning software that can design proteins that contain prespecified functional sites.<ref>{{cite news |title=Biologists train AI to generate medicines and vaccines |url=https://medicalxpress.com/news/2022-07-biologists-ai-medicines-vaccines.html |work=University of Washington-Harborview Medical Center |language=en}}</ref><ref>{{cite journal |last1=Wang |first1=Jue |last2=Lisanza |first2=Sidney |last3=Juergens |first3=David |last4=Tischer |first4=Doug |last5=Watson |first5=Joseph L. |last6=Castro |first6=Karla M. |last7=Ragotte |first7=Robert |last8=Saragovi |first8=Amijai |last9=Milles |first9=Lukas F. |last10=Baek |first10=Minkyung |last11=Anishchenko |first11=Ivan |last12=Yang |first12=Wei |last13=Hicks |first13=Derrick R. |last14=Expòsit |first14=Marc |last15=Schlichthaerle |first15=Thomas |last16=Chun |first16=Jung-Ho |last17=Dauparas |first17=Justas |last18=Bennett |first18=Nathaniel |last19=Wicky |first19=Basile I. M. |last20=Muenks |first20=Andrew |last21=DiMaio |first21=Frank |last22=Correia |first22=Bruno |last23=Ovchinnikov |first23=Sergey |last24=Baker |first24=David |title=Scaffolding protein functional sites using deep learning |journal=Science |date=22 July 2022 |volume=377 |issue=6604 |pages=387–394 |doi=10.1126/science.abn2100 |pmid=35862514 |pmc=9621694 |bibcode=2022Sci...377..387W |url=https://www.ipd.uw.edu/wp-content/uploads/2022/07/science.abn2100.pdf |language=en |issn=0036-8075}}</ref> ===With mathematical guarantees=== ====Dead-end elimination==== {{main|Dead-end elimination}} The dead-end elimination (DEE) algorithm reduces the search space of the problem iteratively by removing rotamers that can be provably shown to be not part of the global lowest energy conformation (GMEC). On each iteration, the dead-end elimination algorithm compares all possible pairs of rotamers at each residue position, and removes each rotamer <var>r′<sub>i</sub></var> that can be shown to always be of higher energy than another rotamer <var>r<sub>i</sub></var> and is thus not part of the GMEC: : <math> E(r^\prime_i) + \sum_{j\ne i} \min_{r_j} E(r^\prime_i,r_j) > E(r_i) + \sum_{j\ne i} \max_{r_j} E(r_i,r_j) </math> Other powerful extensions to the dead-end elimination algorithm include the [[Dead-end elimination#Pairs elimination criterion|pairs elimination criterion]], and the [[Dead-end elimination#Generalization|generalized dead-end elimination criterion]]. This algorithm has also been extended to handle continuous rotamers with provable guarantees. Although the Dead-end elimination algorithm runs in polynomial time on each iteration, it cannot guarantee convergence. If, after a certain number of iterations, the dead-end elimination algorithm does not prune any more rotamers, then either rotamers have to be merged or another search algorithm must be used to search the remaining search space. In such cases, the dead-end elimination acts as a pre-filtering algorithm to reduce the search space, while other algorithms, such as A*, Monte Carlo, Linear Programming, or FASTER are used to search the remaining search space.<ref name="donald10" /> ====Branch and bound==== {{main|Branch and bound}} The protein design conformational space can be represented as a [[Tree (data structure)|tree]], where the protein residues are ordered in an arbitrary way, and the tree branches at each of the rotamers in a residue. [[Branch and bound]] algorithms use this representation to efficiently explore the conformation tree: At each ''branching'', branch and bound algorithms ''bound'' the conformation space and explore only the promising branches.<ref name="donald10" /><ref name="gordon99">{{cite journal|last=Gordon|first=DB|author2=Mayo, SL |title=Branch-and-terminate: a combinatorial optimization algorithm for protein design.|journal=Structure|date=September 15, 1999|volume=7|issue=9|pages=1089–98|pmid=10508778|doi=10.1016/s0969-2126(99)80176-2|doi-access=free}}</ref><ref name="leach98" >{{cite journal|last=Leach|first=AR|author2=Lemon, AP |title=Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm.|journal=Proteins|date=November 1, 1998|volume=33|issue=2|pages=227–39|pmid=9779790|doi=10.1002/(sici)1097-0134(19981101)33:2<227::aid-prot7>3.0.co;2-f|citeseerx=10.1.1.133.7986|s2cid=12872539 }}</ref> A popular search algorithm for protein design is the [[A* search algorithm]].<ref name="donald10" /><ref name="leach98" /> A* computes a lower-bound score on each partial tree path that lower bounds (with guarantees) the energy of each of the expanded rotamers. Each partial conformation is added to a priority queue and at each iteration the partial path with the lowest lower bound is popped from the queue and expanded. The algorithm stops once a full conformation has been enumerated and guarantees that the conformation is the optimal. The A* score <var>f</var> in protein design consists of two parts, <var>f=g+h</var>. <var>g</var> is the exact energy of the rotamers that have already been assigned in the partial conformation. <var>h</var> is a lower bound on the energy of the rotamers that have not yet been assigned. Each is designed as follows, where <var>d</var> is the index of the last assigned residue in the partial conformation. : <math>g=\sum_{i=1}^d (E(r_i ) + \sum_{j=i+1}^d E(r_i,r_j) )</math> : <math>h = \sum_{j=d+1}^n [\min_{r_j}(E(r_j) + \sum_{i=1}^d E(r_i,r_j) + \sum_{k=j+1}^n \min_{r_k} E(r_j,r_k))]</math> ====Integer linear programming==== {{Further|Linear programming#Integer unknowns|Integer programming}} The problem of optimizing <var>E<sub>T</sub></var> (Equation ({{EquationNote|1}})) can be easily formulated as an [[integer linear program]] (ILP).<ref name="kingsford05" /> One of the most powerful formulations uses binary variables to represent the presence of a rotamer and edges in the final solution, and constraints the solution to have exactly one rotamer for each residue and one pairwise interaction for each pair of residues: : <math>\ \min \sum_{i}\sum_{r_i} E_i(r_i)q_{i}(r_i) + \sum_{j\ne i}\sum_{r_j} E_{ij}(r_i, r_j)q_{ij}(r_i, r_j) \, </math> s.t. : <math>\sum_{r_i} q_{i}(r_i) = 1, \ \forall i</math> : <math>\sum_{r_j} q_{ij}(r_i,r_j) = q_{i}(r_i), \forall i, r_i, j </math> : <math>q_i, q_{ij} \in \{0,1\}</math> ILP solvers, such as [[CPLEX]], can compute the exact optimal solution for large instances of protein design problems. These solvers use a [[linear programming relaxation]] of the problem, where <var>q<sub>i</sub></var> and <var>q<sub>ij</sub></var> are allowed to take continuous values, in combination with a [[branch and cut]] algorithm to search only a small portion of the conformation space for the optimal solution. ILP solvers have been shown to solve many instances of the side-chain placement problem.<ref name="kingsford05">{{cite journal|last=Kingsford|first=CL|author2=Chazelle, B |author3=Singh, M |title=Solving and analyzing side-chain positioning problems using linear and integer programming.|journal=Bioinformatics|date=April 1, 2005|volume=21|issue=7|pages=1028–36|pmid=15546935|doi=10.1093/bioinformatics/bti144|doi-access=free}}</ref> ====Message-passing based approximations to the linear programming dual==== ILP solvers depend on linear programming (LP) algorithms, such as the [[Simplex algorithm|Simplex]] or [[barrier function|barrier]]-based methods to perform the LP relaxation at each branch. These LP algorithms were developed as general-purpose optimization methods and are not optimized for the protein design problem (Equation ({{EquationNote|1}})). In consequence, the LP relaxation becomes the bottleneck of ILP solvers when the problem size is large.<ref name=yanover06>{{cite journal|last=Yanover|first=Chen|author2=Talya Meltzer |author3=Yair Weiss |title=Linear Programming Relaxations and Belief Propagation – An Empirical Study|journal=Journal of Machine Learning Research|year=2006|volume=7|pages=1887–1907}}</ref> Recently, several alternatives based on [[belief propagation|message-passing algorithms]] have been designed specifically for the optimization of the LP relaxation of the protein design problem. These algorithms can approximate both the [[Duality (optimization)|dual]] or the [[Duality (optimization)|primal]] instances of the integer programming, but in order to maintain guarantees on optimality, they are most useful when used to approximate the dual of the protein design problem, because approximating the dual guarantees that no solutions are missed. Message-passing based approximations include the ''tree reweighted max-product message passing'' algorithm,<ref>{{cite journal|last=Wainwright|first=Martin J |author2=Tommi S. Jaakkola |author3=Alan S. Willsky|title=MAP estimation via agreement on trees: message-passing and linear programming.|journal=IEEE Transactions on Information Theory|year=2005|pages=3697–3717|doi=10.1109/tit.2005.856938|volume=51|issue=11 |citeseerx=10.1.1.71.9565 |s2cid=10007532 }}</ref><ref>{{cite journal|last=Kolmogorov|first=Vladimir|title=Convergent tree-reweighted message passing for energy minimization.|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|date=October 28, 2006|volume=28|issue=10|pages=1568–1583|doi=10.1109/TPAMI.2006.200|pmid=16986540|s2cid=8616813}}</ref> and the ''message passing linear programming'' algorithm.<ref>{{cite journal|last=Globerson|first=Amir|author2=Tommi S. Jaakkola |title=Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations.|journal=Advances in Neural Information Processing Systems|year=2007}}</ref> ===Optimization algorithms without guarantees=== ====Monte Carlo and simulated annealing==== Monte Carlo is one of the most widely used algorithms for protein design. In its simplest form, a Monte Carlo algorithm selects a residue at random, and in that residue a randomly chosen rotamer (of any amino acid) is evaluated.<ref name="voigt00" /> The new energy of the protein, <var>E</var><sub>new</sub> is compared against the old energy <var>E</var><sub>old</sub> and the new rotamer is ''accepted'' with a probability of: : <math> p=e^{-\beta(E_{\text{new}}-E_{\text{old}}))},</math> where <var>β</var> is the [[Boltzmann constant]] and the temperature <var>T</var> can be chosen such that in the initial rounds it is high and it is slowly [[simulated annealing|annealed]] to overcome local minima.<ref name="samish11">{{cite journal|last=Samish|first=I|author2=MacDermaid, CM |author3=Perez-Aguilar, JM |author4= Saven, JG |title=Theoretical and computational protein design.|journal=Annual Review of Physical Chemistry|year=2011|volume=62|pages=129–49|pmid=21128762|bibcode= 2011ARPC...62..129S |doi= 10.1146/annurev-physchem-032210-103509}}</ref> ====FASTER==== The FASTER algorithm uses a combination of deterministic and stochastic criteria to optimize amino acid sequences. FASTER first uses DEE to eliminate rotamers that are not part of the optimal solution. Then, a series of iterative steps optimize the rotamer assignment.<ref>{{cite journal|last=Allen|first=BD|author2=Mayo, SL |title=Dramatic performance enhancements for the FASTER optimization algorithm.|journal=Journal of Computational Chemistry|date=July 30, 2006|volume=27|issue=10|pages=1071–5|pmid=16685715|doi=10.1002/jcc.20420|citeseerx=10.1.1.425.5418|s2cid=769053}}</ref><ref>{{cite journal|last=Desmet|first=J|author2=Spriet, J |author3=Lasters, I |title=Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization.|journal=Proteins|date=July 1, 2002|volume=48|issue=1|pages=31–43|pmid=12012335|doi=10.1002/prot.10131|s2cid=21524437}}</ref> ====Belief propagation==== In [[belief propagation]] for protein design, the algorithm exchanges messages that describe the ''belief'' that each residue has about the probability of each rotamer in neighboring residues. The algorithm updates messages on every iteration and iterates until convergence or until a fixed number of iterations. Convergence is not guaranteed in protein design. The message <var>m</var><sub><var>i→ j</var></sub><var>(r<sub>j</sub></var> that a residue <var>i</var> sends to every rotamer <var>(r<sub>j</sub></var> at neighboring residue <var>j</var> is defined as: : <math>m_{i\to j}(r_j) = \max_{r_i} \Big(e^{\frac{-E_i(r_i)-E_{ij}(r_i,r_j)}{T}}\Big) \prod_{k \in N(i)\backslash j} m_{k\to i (r_i)}</math> Both max-product and sum-product belief propagation have been used to optimize protein design. ==Applications and examples of designed proteins== ===Enzyme design=== The design of new [[enzyme]]s is a use of protein design with huge bioengineering and biomedical applications. In general, designing a protein structure can be different from designing an enzyme, because the design of enzymes must consider many states involved in the [[enzyme catalysis|catalytic mechanism]]. However protein design is a prerequisite of ''de novo'' enzyme design because, at the very least, the design of catalysts requires a scaffold in which the catalytic mechanism can be inserted.<ref name="baker10">{{cite journal|last=Baker|first=D|title=An exciting but challenging road ahead for computational enzyme design.|journal=Protein Science|date=October 2010|volume=19|issue=10|pages=1817–9|pmid=20717908|doi=10.1002/pro.481|pmc=2998717}}</ref> Great progress in ''de novo'' enzyme design, and redesign, was made in the first decade of the 21st century. In three major studies, David Baker and coworkers ''de novo'' designed enzymes for the retro-[[aldol reaction]],<ref name="jiang08">{{cite journal |doi=10.1126/science.1152692 |title=De Novo Computational Design of Retro-Aldol Enzymes |year=2008 |last1=Jiang |first1=Lin |last2=Althoff |first2=Eric A. |last3=Clemente |first3=Fernando R. |last4=Doyle |first4=Lindsey |last5=Rothlisberger |first5=Daniela |last6=Zanghellini |first6=Alexandre |last7=Gallaher |first7=Jasmine L. |last8=Betker |first8=Jamie L. |last9=Tanaka |first9=Fujie |journal=Science |volume=319 |pages=1387–91 |pmid=18323453 |issue=5868|bibcode= 2008Sci...319.1387J |pmc=3431203}}</ref> a Kemp-elimination reaction,<ref name="roth08">{{cite journal |doi=10.1038/nature06879 |title=Kemp elimination catalysts by computational enzyme design |year=2008 |last1=Röthlisberger |first1=Daniela |last2=Khersonsky |first2=Olga |last3=Wollacott |first3=Andrew M. |last4=Jiang |first4=Lin |last5=Dechancie |first5=Jason |last6=Betker |first6=Jamie |last7=Gallaher |first7=Jasmine L. |last8=Althoff |first8=Eric A. |last9=Zanghellini |first9=Alexandre |journal=Nature |volume=453 |pages=190–5 |pmid=18354394 |issue=7192|bibcode= 2008Natur.453..190R|doi-access=free }}</ref> and for the [[Diels-Alder reaction]].<ref>{{cite journal|last=Siegel|first=JB|author2=Zanghellini, A; Lovick, HM; Kiss, G; Lambert, AR; St Clair, JL; Gallaher, JL; Hilvert, D; Gelb, MH; Stoddard, BL; Houk, KN; Michael, FE; Baker, D|title=Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction.|journal=Science|date=July 16, 2010|volume=329|issue=5989|pages=309–13|pmid=20647463|bibcode= 2010Sci...329..309S |doi= 10.1126/science.1190239|pmc=3241958}}</ref> Furthermore, Stephen Mayo and coworkers developed an iterative method to design the most efficient known enzyme for the Kemp-elimination reaction.<ref>{{cite journal|last=Privett|first=HK|author2=Kiss, G |author3=Lee, TM |author4=Blomberg, R |author5=Chica, RA |author6=Thomas, LM |author7=Hilvert, D |author8=Houk, KN |author9= Mayo, SL |title=Iterative approach to computational enzyme design.|journal=Proceedings of the National Academy of Sciences of the United States of America|date=March 6, 2012|volume=109|issue=10|pages=3790–5|pmid=22357762|bibcode= 2012PNAS..109.3790P |doi= 10.1073/pnas.1118082108 |pmc=3309769|doi-access=free}}</ref> Also, in the laboratory of [[Bruce Donald]], computational protein design was used to switch the specificity of one of the [[protein domain]]s of the [[nonribosomal peptide|nonribosomal peptide synthetase]] that produces [[Gramicidin S]], from its natural substrate [[Phenylalanine|phe]]nylalanine to other noncognate substrates including charged amino acids; the redesigned enzymes had activities close to those of the wild-type.<ref name="chen09">{{cite journal|last=Chen|first=CY|author2=Georgiev, I |author3=Anderson, AC |author4= Donald, BR |title=Computational structure-based redesign of enzyme activity.|journal=Proceedings of the National Academy of Sciences of the United States of America|date=March 10, 2009|volume=106|issue=10|pages=3764–9|pmid=19228942|bibcode= 2009PNAS..106.3764C |doi= 10.1073/pnas.0900266106 |pmc=2645347|doi-access=free}}</ref> === Semi-rational design === Semi-rational design is a purposeful modification method based on a certain understanding of the sequence, structure, and catalytic mechanism of enzymes. This method is between irrational design and rational design. It uses known information and means to perform evolutionary modification on the specific functions of the target enzyme. The characteristic of semi-rational design is that it does not rely solely on random mutation and screening, but combines the concept of directed evolution. It creates a library of random mutants with diverse sequences through [[mutagenesis]], [[Mutagenesis (molecular biology technique)|error-prone RCR]], [[Recombinant DNA|DNA recombination]], and [[Saturation mutagenesis|site-saturation mutagenesis]]. At the same time, it uses the understanding of enzymes and design principles to purposefully screen out mutants with desired characteristics. The methodology of semi-rational design emphasizes the in-depth understanding of enzymes and the control of the evolutionary process. It allows researchers to use known information to guide the evolutionary process, thereby improving efficiency and success rate. This method plays an important role in protein function modification because it can combine the advantages of irrational design and rational design, and can explore unknown space and use known knowledge for targeted modification. Semi-rational design has a wide range of applications, including but not limited to enzyme optimization, modification of drug targets, evolution of biocatalysts, etc. Through this method, researchers can more effectively improve the functional properties of proteins to meet specific biotechnology or medical needs. Although this method has high requirements for information and technology and is relatively difficult to implement, with the development of computing technology and bioinformatics, the application prospects of semi-rational design in protein engineering are becoming more and more broad.<ref>{{Cite book |last=Korendovych |first=Ivan V. |title=Protein Engineering |date=2018 |chapter=Rational and Semirational Protein Design |series=Methods in Molecular Biology (Clifton, N.J.) |volume=1685 |pages=15–23 |doi=10.1007/978-1-4939-7366-8_2 |issn=1064-3745 |pmc=5912912 |pmid=29086301|isbn=978-1-4939-7364-4 }}</ref> ===Design for affinity=== [[Protein–protein interaction]]s are involved in most biotic processes. Many of the hardest-to-treat diseases, such as [[Alzheimer]]'s, many forms of [[cancer]] (e.g., [[TP53]]), and human immunodeficiency virus ([[HIV]]) infection involve protein–protein interactions. Thus, to treat such diseases, it is desirable to design protein or protein-like therapeutics that bind one of the partners of the interaction and, thus, disrupt the disease-causing interaction. This requires designing protein-therapeutics for ''affinity'' toward its partner. Protein–protein interactions can be designed using protein design algorithms because the principles that rule protein stability also rule protein–protein binding. Protein–protein interaction design, however, presents challenges not commonly present in protein design. One of the most important challenges is that, in general, the interfaces between proteins are more polar than protein cores, and binding involves a tradeoff between desolvation and hydrogen bond formation.<ref name="kuhlman2009">{{cite journal|last=Karanicolas|first=J|author2=Kuhlman, B |title=Computational design of affinity and specificity at protein–protein interfaces.|journal=Current Opinion in Structural Biology|date=August 2009|volume=19|issue=4|pages=458–63|pmid=19646858|doi=10.1016/j.sbi.2009.07.005|pmc=2882636}}</ref> To overcome this challenge, Bruce Tidor and coworkers developed a method to improve the affinity of antibodies by focusing on electrostatic contributions. They found that, for the antibodies designed in the study, reducing the desolvation costs of the residues in the interface increased the affinity of the binding pair.<ref name=kuhlman2009 /><ref>{{cite journal|last=Shoichet|first=BK|title=No free energy lunch.|journal=Nature Biotechnology|date=October 2007|volume=25|issue=10|pages=1109–10|pmid=17921992|doi=10.1038/nbt1007-1109|s2cid=5527226}}</ref><ref>{{cite journal|last=Lippow|first=SM|author2=Wittrup, KD |author3=Tidor, B |title=Computational design of antibody-affinity improvement beyond in vivo maturation.|journal=Nature Biotechnology|date=October 2007|volume=25|issue=10|pages=1171–6|pmid=17891135|doi=10.1038/nbt1336|pmc=2803018}}</ref> ====Scoring binding predictions==== Protein design energy functions must be adapted to score binding predictions because binding involves a trade-off between the lowest-[[Thermodynamic free energy|energy]] conformations of the free proteins (<var>E<sub>P</sub></var> and <var>E<sub>L</sub></var>) and the lowest-energy conformation of the bound complex (<var>E<sub>PL</sub></var>): : <math>\Delta_G = E_{PL} - E_P - E_L </math>. The K* algorithm approximates the binding constant of the algorithm by including conformational entropy into the free energy calculation. The K* algorithm considers only the lowest-energy conformations of the free and bound complexes (denoted by the sets <var>P</var>, <var>L</var>, and <var>PL</var>) to approximate the partition functions of each complex:<ref name=donald10 /> : <math>K^* = \frac{\sum\limits_{x\in PL} e^{-E(x)/RT}}{\sum\limits_{x\in P} e^{-E(x)/RT}\sum\limits_{x\in L} e^{-E(x)/RT}}</math> ===Design for specificity === The design of protein–protein interactions must be highly specific because proteins can interact with a large number of proteins; successful design requires selective binders. Thus, protein design algorithms must be able to distinguish between on-target (or ''positive design'') and off-target binding (or ''negative design'').<ref name="richardson1989"/><ref name=kuhlman2009 /> One of the most prominent examples of design for specificity is the design of specific [[bZIP domain|bZIP]]-binding peptides by Amy Keating and coworkers for 19 out of the 20 bZIP families; 8 of these peptides were specific for their intended partner over competing peptides.<ref name="kuhlman2009" /><ref name="schreiber11">{{cite journal|last=Schreiber|first=G|author2=Keating, AE |title=Protein binding specificity versus promiscuity.|journal=Current Opinion in Structural Biology|date=February 2011|volume=21|issue=1|pages=50–61|pmid=21071205|doi=10.1016/j.sbi.2010.10.002|pmc=3053118}}</ref><ref>{{cite journal|last=Grigoryan|first=G|author2=Reinke, AW |author3=Keating, AE |title=Design of protein-interaction specificity gives selective bZIP-binding peptides.|journal=Nature|date=April 16, 2009|volume=458|issue=7240|pages=859–64|pmid=19370028|bibcode= 2009Natur.458..859G |doi= 10.1038/nature07885 |pmc=2748673}}</ref> Further, positive and negative design was also used by Anderson and coworkers to predict mutations in the active site of a drug target that conferred resistance to a new drug; positive design was used to maintain wild-type activity, while negative design was used to disrupt binding of the drug.<ref name="frey10">{{cite journal|last=Frey|first=KM|author2=Georgiev, I |author3=Donald, BR |author4= Anderson, AC |title=Predicting resistance mutations using protein design algorithms.|journal=Proceedings of the National Academy of Sciences of the United States of America|date=August 3, 2010|volume=107|issue=31|pages=13707–12|pmid=20643959|bibcode= 2010PNAS..10713707F |doi= 10.1073/pnas.1002162107 |pmc=2922245|doi-access=free}}</ref> Recent computational redesign by Costas Maranas and coworkers was also capable of experimentally switching the [[cofactor (biochemistry)|cofactor]] specificity of ''Candida boidinii'' xylose reductase from [[Nicotinamide adenine dinucleotide phosphate|NADPH]] to [[Nicotinamide adenine dinucleotide|NADH]].<ref name="khoury">{{cite journal |title=Computational design of Candida boidinii xylose reductase for altered cofactor specificity |journal=Protein Science |volume=18 |issue=10 |pages=2125–38 |date=October 2009 |doi=10.1002/pro.227 |pmc=2786976 |pmid=19693930 |last1=Khoury |first1=GA |last2=Fazelinia |first2=H |last3=Chin |first3=JW |last4=Pantazes |first4=RJ |last5=Cirino |first5=PC |last6=Maranas |first6=CD}}</ref> ===Protein resurfacing=== Protein resurfacing consists of designing a protein's surface while preserving the overall fold, core, and boundary regions of the protein intact. Protein resurfacing is especially useful to alter the binding of a protein to other proteins. One of the most important applications of protein resurfacing was the design of the RSC3 probe to select broadly neutralizing HIV antibodies at the NIH Vaccine Research Center. First, residues outside of the binding interface between the gp120 HIV envelope protein and the formerly discovered b12-antibody were selected to be designed. Then, the sequence spaced was selected based on evolutionary information, solubility, similarity with the wild-type, and other considerations. Then the RosettaDesign software was used to find optimal sequences in the selected sequence space. RSC3 was later used to discover the broadly neutralizing antibody VRC01 in the serum of a long-term HIV-infected non-progressor individual.<ref>{{cite journal|last=Burton|first=DR|author2=Weiss, RA |title=AIDS/HIV. A boost for HIV vaccine design.|journal=Science|date=August 13, 2010|volume=329|issue=5993|pages=770–3|pmid=20705840|bibcode= 2010Sci...329..770B |doi= 10.1126/science.1194693|s2cid=206528638}}</ref> ===Design of globular proteins=== [[Globular protein]]s are proteins that contain a hydrophobic core and a hydrophilic surface. Globular proteins often assume a stable structure, unlike [[fibrous protein]]s, which have multiple conformations. The three-dimensional structure of globular proteins is typically easier to determine through [[X-ray crystallography]] and [[nuclear magnetic resonance]] than both fibrous proteins and [[membrane protein]]s, which makes globular proteins more attractive for protein design than the other types of proteins. Most successful protein designs have involved globular proteins. Both [[#Sequence space|RSD-1]], and [[#Target structure|Top7]] were ''de novo'' designs of globular proteins. Five more protein structures were designed, synthesized, and verified in 2012 by the Baker group. These new proteins serve no biotic function, but the structures are intended to act as building-blocks that can be expanded to incorporate functional active sites. The structures were found computationally by using new heuristics based on analyzing the connecting loops between parts of the sequence that specify secondary structures.<ref>{{cite news |title=Proteins made to order |author=Jessica Marshall |url=http://www.nature.com/news/proteins-made-to-order-1.11767 |newspaper=Nature News |date=November 7, 2012 |access-date=November 17, 2012}}</ref> ===Design of membrane proteins=== Several transmembrane proteins have been successfully designed,<ref>[https://opm.phar.umich.edu/superfamilies/478 Designed transmembrane alpha-hairpin proteins] in [[OPM database]]</ref> along with many other membrane-associated peptides and proteins.<ref>[https://opm.phar.umich.edu/species/213 Designed membrane-associated peptides and proteins] in [[OPM database]]</ref> Recently, Costas Maranas and his coworkers developed an automated tool<ref>{{Cite journal|last1=Chowdhury|first1=Ratul|last2=Kumar|first2=Manish|last3=Maranas|first3=Costas D.|last4=Golbeck|first4=John H.|last5=Baker|first5=Carol|last6=Prabhakar|first6=Jeevan|last7=Grisewood|first7=Matthew|last8=Decker|first8=Karl|last9=Shankla|first9=Manish|date=2018-09-10|title=PoreDesigner for tuning solute selectivity in a robust and highly permeable outer membrane pore|journal=Nature Communications|language=en|volume=9|issue=1|pages=3661|doi=10.1038/s41467-018-06097-1|issn=2041-1723|pmc=6131167|pmid=30202038|bibcode=2018NatCo...9.3661C}}</ref> to redesign the pore size of Outer Membrane Porin Type-F (OmpF) from ''E.coli'' to any desired sub-nm size and assembled them in membranes to perform precise angstrom scale separation. ===Other applications === One of the most desirable uses for protein design is for [[biosensor]]s, proteins that will sense the presence of specific compounds. Some attempts in the design of biosensors include sensors for unnatural molecules including [[TNT]].<ref>{{cite journal |last1=Looger |first1=Loren L. |last2=Dwyer |first2=Mary A. |last3=Smith |first3=James J. |last4=Hellinga |first4=Homme W. |name-list-style=amp |year=2003 |title=Computational design of receptor and sensor proteins with novel functions |journal=[[Nature (journal)|Nature]] |pmid=12736688 |volume=423 |issue=6936 |pages=185–190 |doi=10.1038/nature01556 |bibcode= 2003Natur.423..185L|s2cid=4387641 }}</ref> More recently, Kuhlman and coworkers designed a biosensor of the [[p21 activated kinase|PAK1]].<ref>{{cite journal|last=Jha|first=RK|author2=Wu, YI |author3=Zawistowski, JS |author4=MacNevin, C |author5=Hahn, KM |author6= Kuhlman, B |title=Redesign of the PAK1 autoinhibitory domain for enhanced stability and affinity in biosensor applications.|journal=Journal of Molecular Biology|date=October 21, 2011|volume=413|issue=2|pages=513–22|pmid=21888918|doi=10.1016/j.jmb.2011.08.022 |pmc=3202338}}</ref> In a sense, protein design is a subset of [[circuit design|battery design]].{{Explain|date=April 2022}} ==See also== * {{annotated link|Protein engineering}} * {{annotated link|Molecular design software}} ** {{annotated link|Comparison of software for molecular mechanics modeling}} ** {{annotated link|Protein structure prediction software}} * {{annotated link|Synthetic biology}} ==References== {{Reflist|30em}} ==Further reading== * {{Cite book |last1=Donald |first1=Bruce R. |author-link1=Bruce Donald |year=2011 |title=Algorithms in Structural Molecular Biology |url=https://books.google.com/books?id=GSw3AgAAQBAJ |series=Computational Molecular Biology |location=Cambridge, MA |publisher=The MIT Press |isbn=9780262015592 |oclc=1200909148}} * {{Cite journal |last1=Jin |first1=Wenzhen |last2=Kambara |first2=Ohki |last3=Sasakawa |first3=Hiroaki |last4=Tamura |first4=Atsuo |last5=Takada |first5=Shoji |name-list-style=amp |date=May 2003 |title=De Novo Design of Foldable Proteins with Smooth Folding Funnel: Automated Negative Design and Experimental Verification |journal=Structure |volume=11 |issue=5 |pages=581–590 |doi=10.1016/S0969-2126(03)00075-3 |doi-access=free |pmid=12737823}} * {{Cite journal |last1=Pokala |first1=Navin |last2=Handel |first2=Tracy M. |name-list-style=amp |year=2005 |title=Energy Functions for Protein Design: Adjustment with Protein–Protein Complex Affinities, Models for the Unfolded State, and Negative Design of Solubility and Specificity |journal=Journal of Molecular Biology |volume=347 |issue=1 |pages=203–227 |doi=10.1016/j.jmb.2004.12.019 |pmid=15733929}} * {{Cite journal |last1=Sander |first1=Chris |last2=Vriend |first2=Gerrit |last3=Bazan |first3=Fernando |last4=Horovitz |first4=Amnon |last5=Nakamura |first5=Haruki |last6=Ribas |first6=Luis |last7=Finkelstein |first7=Alexei V. |last8=Lockhart |first8=Andrew |last9=Merkl |first9=Rainer |display-authors=etal |date=February 1992 |title=Protein Design on Computers. Five New Proteins: Shpilka, Grendel, Fingerclasp, Leather and Aida |journal=Proteins: Structure, Function, and Bioinformatics |volume=12 |pmid=1603799 |issue=2 |pages=105–110 |doi=10.1002/prot.340120203|s2cid=38986245 }} {{Biomolecular structure}} {{Design}} [[Category:Protein engineering]] [[Category:Protein structure]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:About
(
edit
)
Template:Annotated link
(
edit
)
Template:Biomolecular structure
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite news
(
edit
)
Template:Cite web
(
edit
)
Template:Design
(
edit
)
Template:EquationNote
(
edit
)
Template:Explain
(
edit
)
Template:Further
(
edit
)
Template:Main
(
edit
)
Template:NumBlk
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Use mdy dates
(
edit
)