Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Protein structure prediction
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Software== {{Main|Protein structure prediction software}} A great number of software tools for protein structure prediction exist. Approaches include [[homology modeling]], [[protein threading]], ''ab initio'' methods, [[#Secondary structure|secondary structure prediction]], and transmembrane helix and signal peptide prediction. In particular, [[deep learning]] based on [[long short-term memory]] has been used for this purpose since 2007, when it was successfully applied to protein homology detection<ref name="hochreiter2007">{{Cite journal |last1=Hochreiter |first1=S. |last2=Heusel |first2=M. |last3=Obermayer |first3=K. |year=2007 |doi=10.1093/bioinformatics/btm247 |title=Fast model-based protein homology detection without alignment |journal=Bioinformatics |volume=23 |issue=14 |pages=1728–1736 |pmid= 17488755 |doi-access=free }}</ref> and to predict subcellular localization of proteins.<ref name="thireou2007">{{cite journal |last1=Thireou |first1=T. |last2=Reczko |first2=M. |year=2007 |title=Bidirectional Long Short-Term Memory Networks for predicting the subcellular localization of eukaryotic proteins |url=|journal=IEEE/ACM Transactions on Computational Biology and Bioinformatics |volume=4 |issue=3| pages=441–446 |doi=10.1109/tcbb.2007.1015| pmid=17666763 |s2cid=11787259}}</ref> Some recent successful methods based on the [[CASP]] experiments include [[I-TASSER]], [[HH-suite|HHpred]] and [[AlphaFold]]. In 2021, AlphaFold was reported to perform best.<ref name=":0">{{Cite journal|last1=Jumper|first1=John|last2=Evans|first2=Richard|last3=Pritzel|first3=Alexander|last4=Green|first4=Tim|last5=Figurnov|first5=Michael|last6=Ronneberger|first6=Olaf|last7=Tunyasuvunakool|first7=Kathryn|last8=Bates|first8=Russ|last9=Žídek|first9=Augustin|last10=Potapenko|first10=Anna|last11=Bridgland|first11=Alex|date=August 2021|title=Highly accurate protein structure prediction with AlphaFold|journal=Nature|language=en|volume=596|issue=7873|pages=583–589|doi=10.1038/s41586-021-03819-2|issn=1476-4687|pmc=8371605|pmid=34265844|bibcode=2021Natur.596..583J}}</ref> Knowing the structure of a protein often allows functional prediction as well. For instance, collagen is folded into a long-extended fiber-like chain and it makes it a fibrous protein. Recently, several techniques have been developed to predict protein folding and thus protein structure, for example, Itasser, and AlphaFold. ===AI methods=== [[AlphaFold]] was one of the first AIs to predict protein structures. It was introduced by Google's DeepMind in the 13th CASP competition, which was held in 2018.<ref name=":0"/> [[AlphaFold]] relies on a [[artificial neural network| neural network]] approach, which directly predicts the 3D coordinates of all non-hydrogen atoms for a given protein using the amino acid sequence and aligned [[sequence homology|homologous sequences]]. The [[AlphaFold]] network consists of a trunk which processes the inputs through repeated layers, and a structure module which introduces an explicit 3D structure.<ref name=":0"/> Earlier neural networks for protein structure prediction used [[LSTM]].<ref name="hochreiter2007"/><ref name="thireou2007"/> [[File:The performance of AlphaFold.png|thumb|alt=a, The performance of [[AlphaFold]] on the CASP14 dataset (n=87 protein domains) relative to the top-15 entries (out of 146 entries), group numbers correspond to the numbers assigned to entrants by CASP. Data are median and the 95% confidence interval of the median, estimated from 10,000 bootstrap samples. b, Our prediction of CASP14 target T1049 (PDB 6Y4F, blue) compared with the true (experimental) structure (green). Four residues in the C terminus of the crystal structure are B-factor outliers and are not depicted. c, CASP14 target T1056 (PDB 6YJ1). An example of a well-predicted zinc-binding site (AlphaFold has accurate side chains even though it does not explicitly predict the zinc ion). d, CASP target T1044 (PDB 6VR4)—a 2,180-residue single chain—was predicted with correct domain packing (the prediction was made after CASP using AlphaFold without intervention).]] [[File:Model architecture.png|thumb|alt=Model architecture. Arrows show the information flow among the various components described in this paper. Array shapes are shown in parentheses with s, number of sequences (Nseq in the main text); r, number of residues (Nres in the main text); c, number of channels.]] Since [[AlphaFold]] outputs protein coordinates directly, [[AlphaFold]] produces predictions in graphics processing unit (GPU) minutes to GPU hours, depending on the length of protein sequence.<ref name=":0"/> The [[European Bioinformatics Institute]] together with [[DeepMind]] have constructed the AlphaFold – EBI database<ref>{{cite web |author=<!--Not stated--> |title=AlphaFold Protein Structure Database |url=https://alphafold.ebi.ac.uk |access-date=November 30, 2022 |website=EMBL-EBI |publisher=}}</ref> for predicted protein structures.<ref name="pmid34791371">{{cite journal |display-authors=6 |vauthors=Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S |date=January 2022 |title=AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models |journal=Nucleic Acids Res |volume=50 |issue=D1 |pages=D439–D444 |doi=10.1093/nar/gkab1061 |pmc=8728224 |pmid=34791371}}</ref> ===Current AI methods and databases of predicted protein structures=== AlphaFold2, was introduced in CASP14, and is capable of predicting protein structures to near experimental accuracy.<ref name="pmid34265844">{{cite journal |vauthors=Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D |display-authors=6| title=Highly accurate protein structure prediction with AlphaFold |journal=Nature |volume=596 |issue=7873 |pages=583–589 |date=August 2021 |pmid=34265844 |pmc=8371605 |doi=10.1038/s41586-021-03819-2|bibcode=2021Natur.596..583J}}</ref> AlphaFold was swiftly followed by RoseTTAFold<ref name="pmid34282049">{{cite journal |vauthors=Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, Millán C, Park H, Adams C, Glassman CR, DeGiovanni A, Pereira JH, Rodrigues AV, van Dijk AA, Ebrecht AC, Opperman DJ, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy MK, Dalwadi U, Yip CK, Burke JE, Garcia KC, Grishin NV, Adams PD, Read RJ, Baker D|display-authors=6 |title=Accurate prediction of protein structures and interactions using a three-track neural network |journal=Science |volume=373 |issue=6557 |pages=871–876 |date=August 2021 |pmid=34282049 |pmc=7612213 |doi=10.1126/science.abj8754|bibcode=2021Sci...373..871B }}</ref> and later by OmegaFold <!--preprint <ref>https://www.biorxiv.org/content/10.1101/2022.07.21.500999v1 {{bare URL inline|date=December 2022}}</ref> --> and the ESM Metagenomic Atlas.<ref name="pmid36319775">{{cite journal |vauthors=Callaway E |title=AlphaFold's new rival? Meta AI predicts shape of 600 million proteins |journal=Nature |volume=611 |issue=7935 |pages=211–212 |date=November 2022 |pmid=36319775 |doi=10.1038/d41586-022-03539-1 |s2cid=253257926 |doi-access=|bibcode=2022Natur.611..211C }}</ref> In a study, Sommer et al. 2022 demonstrated the application of protein structure prediction in genome annotation, specifically in identifying functional protein isoforms using computationally predicted structures, available at https://www.isoform.io.<ref>{{Cite journal |last1=Sommer |first1=Markus J. |last2=Cha |first2=Sooyoung |last3=Varabyou |first3=Ales |last4=Rincon |first4=Natalia |last5=Park |first5=Sukhwan |last6=Minkin |first6=Ilia |last7=Pertea |first7=Mihaela |last8=Steinegger |first8=Martin |last9=Salzberg |first9=Steven L. |date=2022-12-15 |title=Structure-guided isoform identification for the human transcriptome |journal=eLife |volume=11 |language=en |pages=e82556 |doi=10.7554/eLife.82556|pmid=36519529 |pmc=9812405 |doi-access=free}}</ref> This study highlights the promise of protein structure prediction as a genome annotation tool and presents a practical, structure-guided approach that can be used to enhance the annotation of any genome. In 2024, [[David Baker (biochemist)|David Baker]] and [[Demis Hassabis]] were awarded the [[Nobel Prize in Chemistry]]<ref>{{Cite web |title=Nobel Prize in Chemistry 2024 |url=https://www.nobelprize.org/prizes/chemistry/2024/summary/?utm_source=chatgpt.com |access-date=2025-02-03 |website=NobelPrize.org |language=en-US}}</ref> for their contributions to computational protein modeling, including the development of AlphaFold2, an AI-based model for protein structure prediction. AlphaFold2's accuracy has been evaluated against experimentally determined protein structures using metrics such as [[Root mean square deviation of atomic positions|root-mean-square deviation]] (RMSD).<ref>{{Cite web |title=Computational protein design and protein structure prediction |url=https://www.nobelprize.org/uploads/2024/10/advanced-chemistryprize2024.pdf}}</ref> The median RMSD between different experimental structures of the same protein is approximately 0.6 Å, while the median RMSD between AlphaFold2 predictions and experimental structures is around 1 Å. For regions where AlphaFold2 assigns high confidence, the median RMSD is about 0.6 Å, comparable to the variability observed between different experimental structures. However, in low-confidence regions, the RMSD can exceed 2 Å, indicating greater deviations. In proteins with multiple domains connected by flexible linkers, AlphaFold2 predicts individual domain structures accurately but may assign random relative positions to these domains. Additionally, AlphaFold2 does not account for structural constraints such as the membrane plane, sometimes placing protein domains in positions that would physically clash with the membrane.<ref>{{Cite web |last=EMBL-EBI |title=How accurate are AlphaFold2 structure predictions? {{!}} AlphaFold |url=https://www.ebi.ac.uk/training/online/courses/alphafold/validation-and-impact/how-accurate-are-alphafold-structure-predictions/#:~:text=Analogous%20data%20for%20the%20experimental,less%20reliable%20than%20experimental%20structures. |access-date=2025-02-03 |language=en}}</ref> ===Evaluation of automatic structure prediction servers=== {{Main|CASP}} [[CASP]], which stands for Critical Assessment of Techniques for Protein Structure Prediction, is a community-wide experiment for protein structure prediction taking place every two years since 1994. CASP provides with an opportunity to assess the quality of available human, non-automated methodology (human category) and automatic servers for protein structure prediction (server category, introduced in the CASP7).<ref>{{cite journal |vauthors=Battey JN, Kopp J, Bordoli L, Read RJ, Clarke ND, Schwede T |title=Automated server predictions in CASP7 |journal=Proteins |volume=69 |issue=Suppl 8 |pages=68–82 |year=2007 |pmid=17894354 |doi=10.1002/prot.21761 |s2cid=29879391 |doi-access=free}}</ref> The [[CAMEO3D]] Continuous Automated Model EvaluatiOn Server evaluates automated protein structure prediction servers on a weekly basis using blind predictions for newly release protein structures. CAMEO publishes the results on its website.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)