Editing Protein secondary structure (section)

== Prediction ==
{{See also|Protein structure prediction|List of protein secondary structure prediction programs}}

Predicting protein tertiary structure from only its amino  sequence is a very challenging problem (see [[protein structure prediction]]), but using the simpler secondary structure definitions is more tractable.

Early methods of secondary-structure prediction were restricted to predicting the three predominate states: helix, sheet, or random coil. These methods were based on the helix- or sheet-forming propensities of individual amino acids, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. The first widely used techniques to predict protein secondary structure from the amino acid sequence were the [[Chou–Fasman method]]<ref name="Chou_predict0">{{cite journal | vauthors = Chou PY, Fasman GD | title = Prediction of protein conformation | journal = Biochemistry | volume = 13 | issue = 2 | pages = 222–45  | date = Jan 1974 | pmid = 4358940 | doi = 10.1021/bi00699a002 }}</ref><ref name="Chou_predict1">{{cite journal | vauthors = Chou PY, Fasman GD | title = Empirical predictions of protein conformation | journal = Annual Review of Biochemistry | volume = 47 | pages = 251–76 | year = 1978 | issue = 1 | pmid = 354496 | doi = 10.1146/annurev.bi.47.070178.001343 }}</ref><ref name="Chou_predict2">{{cite book | vauthors = Chou PY, Fasman GD | chapter = Prediction of the secondary structure of proteins from their amino acid sequence | volume = 47 | pages = [https://archive.org/details/advancesinenzymo0047unse/page/45 45–148] | year = 1978 | pmid = 364941 | doi = 10.1002/9780470122921.ch2 | series = Advances in Enzymology - and Related Areas of Molecular Biology | isbn = 9780470122921 | title = Advances in Enzymology and Related Areas of Molecular Biology | publisher = Wiley | chapter-url = https://archive.org/details/advancesinenzymo0047unse/page/45 }}</ref> and the [[GOR method]].<ref name="Garnier">{{cite journal | vauthors = Garnier J, Osguthorpe DJ, Robson B | title = Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins | journal = Journal of Molecular Biology | volume = 120 | issue = 1 | pages = 97–120 | date = March 1978 | pmid = 642007 | doi = 10.1016/0022-2836(78)90297-8 }}</ref> Although such methods claimed to achieve ~60% accurate in predicting which of the three states (helix/sheet/coil) a residue adopts, blind computing assessments later showed that the actual accuracy was much lower.<ref name="Kabsch">{{cite journal | vauthors = Kabsch W, Sander C | title = How good are predictions of protein secondary structure? | journal = FEBS Letters | volume = 155 | issue = 2 | pages = 179–82 | date = May 1983 | pmid = 6852232 | doi = 10.1016/0014-5793(82)80597-8 | bibcode = 1983FEBSL.155..179K | s2cid = 41477827 }}</ref>

A significant increase in accuracy (to nearly ~80%) was made by exploiting [[multiple sequence alignment]]; knowing the full distribution of amino acids that occur at a position (and in its vicinity, typically ~7 residues on either side) throughout [[evolution]] provides a much better picture of the structural tendencies near that position.<ref name="Simossis_2004">{{cite journal | vauthors = Simossis VA, Heringa J | title = Integrating protein secondary structure prediction and multiple sequence alignment | journal = Current Protein & Peptide Science | volume = 5 | issue = 4 | pages = 249–66  | date = Aug 2004 | pmid = 15320732 | doi = 10.2174/1389203043379675 }}</ref><ref name="pmid20221928">{{cite book | vauthors = Pirovano W, Heringa J | chapter = Protein Secondary Structure Prediction | title = Data Mining Techniques for the Life Sciences | volume = 609 | pages = 327–48 | year = 2010 | pmid = 20221928 | doi = 10.1007/978-1-60327-241-4_19 | series = Methods in Molecular Biology | publisher = Humana Press | location = Totowa, NJ | isbn = 978-1-60327-240-7 }}</ref> For illustration, a given protein might have a [[glycine]] at a given position, which by itself might suggest a random coil there. However, multiple sequence alignment might reveal that helix-favoring amino acids occur at that position (and nearby positions) in 95% of homologous proteins spanning nearly a billion years of evolution. Moreover, by examining the average [[hydrophobicity]] at that and nearby positions, the same alignment might also suggest a pattern of residue [[accessible surface area|solvent accessibility]] consistent with an α-helix.<ref>{{cite journal|title=Fourier–based classification of protein secondary structures|journal=Biochemical and Biophysical Research Communications|date=15 April 2017|first1=Jian-Jun|last1=Shu|first2=K.-Y.|last2=Yong|volume=485|issue=4|pages=731–735|doi=10.1016/j.bbrc.2017.02.117|pmid=28246013|s2cid=1240804|arxiv=1704.08994}}</ref> Taken together, these factors would suggest that the glycine of the original protein adopts α-helical structure, rather than random coil. Several types of methods are used to combine all the available data to form a 3-state prediction, including [[Artificial neural network|neural networks]], [[hidden Markov model]]s and [[support vector machine]]s. Modern prediction methods also provide a confidence score for their predictions at every position.

Secondary-structure prediction methods were evaluated by the [http://predictioncenter.org/ Critical Assessment of protein Structure Prediction (CASP) experiments] and continuously benchmarked, e.g. by [[EVA (benchmark)]].  Based on these tests, the most accurate methods were [[Psipred]], SAM,<ref name="pmid19483096">{{cite journal | vauthors = Karplus K | title = SAM-T08, HMM-based protein structure prediction | journal = Nucleic Acids Res. | volume = 37 | issue = Web Server issue | pages = W492–97 | year = 2009 | pmid = 19483096 | pmc = 2703928 | doi = 10.1093/nar/gkp403 }}</ref> PORTER,<ref name="pmid15585524">{{cite journal | vauthors = Pollastri G, McLysaght A | title = Porter: a new, accurate server for protein secondary structure prediction | journal = Bioinformatics | volume = 21 | issue = 8 | pages = 1719–20 | year = 2005 | pmid = 15585524 | doi = 10.1093/bioinformatics/bti203 | doi-access = free | hdl = 2262/39594 | hdl-access = free }}</ref> PROF,<ref name="pmid24799431">{{cite journal | vauthors = Yachdav G, Kloppmann E, Kajan L, Hecht M, Goldberg T, Hamp T, Hönigschmid P, Schafferhans A, Roos M, Bernhofer M, Richter L, Ashkenazy H, Punta M, Schlessinger A, Bromberg Y, Schneider R, Vriend G, Sander C, Ben-Tal N, Rost B | title = PredictProtein—an open resource for online prediction of protein structural and functional features | journal = Nucleic Acids Res. | volume = 42 | issue = Web Server issue | pages = W337–43 | year = 2014 | pmid = 24799431 | pmc = 4086098 | doi = 10.1093/nar/gku366 }}</ref> and SABLE.<ref name="pmid15768403">{{cite journal | vauthors = Adamczak R, Porollo A, Meller J | title = Combining prediction of secondary structure and solvent accessibility in proteins | journal = Proteins | volume = 59 | issue = 3 | pages = 467–75 | year = 2005 | pmid = 15768403 | doi = 10.1002/prot.20441 | s2cid = 13267624 }}</ref>  The chief area for improvement appears to be the prediction of β-strands; residues confidently predicted as β-strand are likely to be so, but the methods are apt to overlook some β-strand segments (false negatives). There is likely an upper limit of ~90% prediction accuracy overall, due to the idiosyncrasies of the standard method ([[DSSP (algorithm)|DSSP]]) for assigning secondary-structure classes (helix/strand/coil) to PDB structures, against which the predictions are benchmarked.<ref>{{cite journal | vauthors = Kihara D | title = The effect of long-range interactions on the secondary structure formation of proteins | journal = Protein Science | volume = 14 | issue = 8 | pages = 1955–963 | date = Aug 2005 | pmid = 15987894 | pmc = 2279307 | doi = 10.1110/ps.051479505 }}</ref>

Accurate secondary-structure prediction is a key element in the prediction of [[tertiary structure]], in all but the simplest ([[protein structure prediction|homology modeling]]) cases. For example, a confidently predicted pattern of six secondary structure elements βαββαβ is the signature of a [[ferredoxin]] fold.<ref name="pmid15558583">{{cite journal | vauthors = Qi Y, Grishin NV | title = Structural classification of thioredoxin-like fold proteins | journal = Proteins | volume = 58 | issue = 2 | pages = 376–88 | year = 2005 | pmid = 15558583 | doi = 10.1002/prot.20329 | url = http://prodata.swmed.edu/Lab/Thiored_Proteins04.pdf | quote = Since the fold definition should include only the core secondary structural elements that are present in the majority of homologs, we define the thioredoxin-like fold as a two-layer α/β sandwich with the βαβββα secondary-structure pattern. | citeseerx = 10.1.1.644.8150 | s2cid = 823339 }}</ref>