Editing Structural genomics (section)

=== Modelling-based methods===

==== ''ab initio''  modeling====
This approach uses protein sequence data and the chemical and physical interactions of the encoded amino acids to predict the 3-D structures of proteins with no homology to solved protein structures. One highly successful method for ''ab initio'' modeling is the [[Rosetta@home|Rosetta]] program, which divides the protein into short segments and arranges short polypeptide chain into a low-energy local conformation.  Rosetta is available for commercial use and for non-commercial use through its public program, Robetta.

====Sequence-based modeling====
This modeling technique compares the gene sequence of an unknown protein with sequences of proteins with known structures. Depending on the degree of similarity between the sequences, the structure of the known protein can be used as a model for solving the structure of the unknown protein. Highly accurate modeling is considered to require at least 50% amino acid sequence identity between the unknown protein and the solved structure. 30-50% sequence identity gives a model of intermediate-accuracy, and sequence identity below 30% gives low-accuracy models.  It has been predicted that at least 16,000 protein structures will need to be determined in order for all structural motifs to be represented at least once and thus allowing the structure of any unknown protein to be solved accurately through modeling.<ref>{{cite journal |vauthors=Baker D, Sali A |title=Protein structure prediction and structural genomics |journal=Science |volume=294 |issue=5540 |pages=93–6 |date=October 2001 |pmid=11588250 |doi=10.1126/science.1065659 |bibcode=2001Sci...294...93B |s2cid=7193705 }}</ref>  One disadvantage of this method, however, is that structure is more conserved than sequence and thus 	sequence-based modeling may not be the most accurate way to predict protein structures.

====Threading====
[[Threading (protein sequence)|Threading]] bases structural modeling on fold similarities rather than sequence identity.  This method may help identify distantly related proteins and can be used to infer molecular functions.