Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Structural alignment
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Types of comparisons== Because protein structures are composed of [[amino acid]]s whose [[side chain]]s are linked by a common protein backbone, a number of different possible subsets of the atoms that make up a protein macromolecule can be used in producing a structural alignment and calculating the corresponding RMSD values. When aligning structures with very different sequences, the side chain atoms generally are not taken into account because their identities differ between many aligned residues. For this reason it is common for structural alignment methods to use by default only the backbone atoms included in the [[peptide bond]]. For simplicity and efficiency, often only the [[alpha carbon]] positions are considered, since the peptide bond has a minimally variant [[Plane (geometry)|planar]] conformation. Only when the structures to be aligned are highly similar or even identical is it meaningful to align side-chain atom positions, in which case the RMSD reflects not only the conformation of the protein backbone but also the [[rotamer]]ic states of the side chains. Other comparison criteria that reduce noise and bolster positive matches include [[secondary structure]] assignment, [[native contact]] maps or residue interaction patterns, measures of side chain packing, and measures of [[hydrogen bond]] retention.<ref name="godzik"/> ===Structural superposition=== The most basic possible comparison between protein structures makes no attempt to align the input structures and requires a precalculated alignment as input to determine which of the residues in the sequence are intended to be considered in the RMSD calculation. Structural superposition is commonly used to compare multiple conformations of the same protein (in which case no alignment is necessary, since the sequences are the same) and to evaluate the quality of alignments produced using only sequence information between two or more sequences whose structures are known. This method traditionally uses a simple least-squares fitting algorithm, in which the optimal rotations and translations are found by minimizing the sum of the squared distances among all structures in the superposition.<ref name="martin"/> More recently, maximum likelihood and Bayesian methods have greatly increased the accuracy of the estimated rotations, translations, and covariance matrices for the superposition.<ref name="theobald"/><ref name="theobald2"/> Algorithms based on multidimensional rotations and modified [[quaternion]]s have been developed to identify topological relationships between protein structures without the need for a predetermined alignment. Such algorithms have successfully identified canonical folds such as the [[helix bundle|four-helix bundle]].<ref name="Diederichs"/> The [http://wishart.biology.ualberta.ca/SuperPose/ SuperPose] {{Webarchive|url=https://web.archive.org/web/20151031151001/http://wishart.biology.ualberta.ca/SuperPose/ |date=2015-10-31 }} method is sufficiently extensible to correct for relative domain rotations and other structural pitfalls.<ref name="Maiti"/> ===Evaluating similarity=== Often the purpose of seeking a structural superposition is not so much the superposition itself, but an evaluation of the similarity of two structures or a confidence in a remote alignment.<ref name="casp11"/><ref name="Malmstrom" /><ref name="robetta"/> A subtle but important distinction from maximal structural superposition is the conversion of an alignment to a meaningful similarity score.<ref name="Mammoth" /><ref name="ZhangTMscore"/> Most methods output some sort of "score" indicating the quality of the superposition.<ref name="zemla" /><ref name="fischer"/><ref name="poleksic"/><ref name="Mammoth"/><ref name="ZhangTMscore"/> However, what one actually wants is ''not'' merely an ''estimated'' "Z-score" or an ''estimated'' E-value of seeing the observed superposition by chance but instead one desires that the ''estimated'' E-value is tightly correlated to the true E-value. Critically, even if a method's estimated E-value is precisely correct ''on average'', if it lacks a low standard deviation on its estimated value generation process, then the rank ordering of the relative similarities of a query protein to a comparison set will rarely agree with the "true" ordering.<ref name="Mammoth"/><ref name="ZhangTMscore"/> Different methods will superimpose different numbers of residues because they use different quality assurances and different definitions of "overlap"; some only include residues meeting multiple local and global superposition criteria and others are more greedy, flexible, and promiscuous. A greater number of atoms superposed can mean more similarity but it may not always produce the best E-value quantifying the unlikeliness of the superposition and thus not as useful for assessing similarity, especially in remote homologs.<ref name="casp11"/><ref name="Malmstrom" /><ref name="robetta" /><ref name="skolnick" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)