Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Protein sequencing
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Identification by mass spectrometry == {{Main article|protein mass spectrometry|de novo peptide sequencing}} Protein identification is the process of assigning a name to a protein of interest (POI), based on its amino-acid sequence. Typically, only part of the protein’s sequence needs to be determined experimentally in order to identify the protein with reference to databases of protein sequences deduced from the DNA sequences of their genes. Further protein characterization may include confirmation of the actual N- and C-termini of the POI, determination of sequence variants and identification of any post-translational modifications present. === Proteolytic digests === A general scheme for protein identification is described.<ref>{{cite journal | vauthors = Shevchenko A, Tomas H, Havlis J, Olsen JV, Mann M | title = In-gel digestion for mass spectrometric characterization of proteins and proteomes | journal = Nature Protocols | volume = 1 | issue = 6 | pages = 2856–60 | date = 2006 | pmid = 17406544 | doi = 10.1038/nprot.2006.468 | s2cid = 8248224 }}</ref><ref>{{cite journal | vauthors = Gundry RL, White MY, Murray CI, Kane LA, Fu Q, Stanley BA, Van Eyk JE | title = Preparation of proteins and peptides for mass spectrometry analysis in a bottom-up proteomics workflow | journal = Current Protocols in Molecular Biology | volume = Chapter 10 | pages = Unit10.25 | date = October 2009 | pmid = 19816929 | doi = 10.1002/0471142727.mb1025s88 | pmc=2905857| isbn = 978-0471142720 }}</ref> # The POI is isolated, typically by [[SDS-PAGE]] or [[chromatography]]. # The isolated POI may be chemically modified to stabilise Cysteine residues (e.g. S-amidomethylation or S-carboxymethylation). # The POI is digested with a specific protease to generate peptides. [[Trypsin]], which cleaves selectively on the C-terminal side of Lysine or Arginine residues, is the most commonly used protease. Its advantages include i) the frequency of Lys and Arg residues in proteins, ii) the high specificity of the enzyme, iii) the stability of the enzyme and iv) the suitability of tryptic peptides for mass spectrometry. # The peptides may be desalted to remove ionizable contaminants and subjected to [[Matrix-assisted laser desorption/ionization#Time of flight|MALDI-TOF]] mass spectrometry. Direct measurement of the masses of the peptides may provide sufficient information to identify the protein (see [[Peptide mass fingerprinting]]) but further fragmentation of the peptides inside the mass spectrometer is often used to gain information about the peptides’ sequences. Alternatively, peptides may be desalted and separated by [[High-performance liquid chromatography#Reversed-phase chromatography .28RPC.29|reversed phase HPLC]] and introduced into a mass spectrometer via an [[Electrospray ionization|ESI]] source. LC-ESI-MS may provide more information than MALDI-MS for protein identification but uses more instrument time. # Depending on the type of mass spectrometer, fragmentation of peptide ions may occur via a variety of mechanisms such as [[collision-induced dissociation]] (CID) or [[Reflectron#Post-source decay|post-source decay]] (PSD). In each case, the pattern of fragment ions of a peptide provides information about its sequence. # Information including the measured mass of the putative peptide ions and those of their fragment ions is then matched against calculated mass values from the conceptual (in-silico) proteolysis and fragmentation of databases of protein sequences. A successful match will be found if its score exceeds a threshold based on the analysis parameters. Even if the actual protein is not represented in the database, error-tolerant matching allows for the putative identification of a protein based on similarity to [[Sequence homology|homologous]] proteins. A variety of software packages are available to perform this analysis. # Software packages usually generate a report showing the identity (accession code) of each identified protein, its matching score, and provide a measure of the relative strength of the matching where multiple proteins are identified. # A diagram of the matched peptides on the sequence of the identified protein is often used to show the sequence coverage (% of the protein detected as peptides). Where the POI is thought to be significantly smaller than the matched protein, the diagram may suggest whether the POI is an N- or C-terminal fragment of the identified protein. === De novo sequencing === The pattern of fragmentation of a peptide allows for direct determination of its sequence by [[de novo peptide sequencing|''de novo'' sequencing]]. This sequence may be used to match databases of protein sequences or to investigate [[Post-translational modification|post-translational]] or chemical modifications. It may provide additional evidence for protein identifications performed as above. === N- and C-termini === The peptides matched during protein identification do not necessarily include the N- or C-termini predicted for the matched protein. This may result from the N- or C-terminal peptides being difficult to identify by MS (e.g. being either too short or too long), being post-translationally modified (e.g. N-terminal acetylation) or genuinely differing from the prediction. Post-translational modifications or truncated termini may be identified by closer examination of the data (i.e. ''de novo'' sequencing). A repeat digest using a protease of different specificity may also be useful. === Post-translational modifications === Whilst detailed comparison of the MS data with predictions based on the known protein sequence may be used to define post-translational modifications, targeted approaches to data acquisition may also be used. For instance, specific enrichment of phosphopeptides may assist in identifying [[phosphorylation]] sites in a protein. Alternative methods of peptide fragmentation in the mass spectrometer, such as [[Electron-transfer dissociation|ETD]] or [[Electron-capture dissociation|ECD]], may give complementary sequence information. === Whole-mass determination=== The protein’s whole mass is the sum of the masses of its amino-acid residues plus the mass of a water molecule and adjusted for any post-translational modifications. Although proteins ionize less well than the peptides derived from them, a protein in solution may be able to be subjected to ESI-MS and its mass measured to an accuracy of 1 part in 20,000 or better. This is often sufficient to confirm the termini (thus that the protein’s measured mass matches that predicted from its sequence) and infer the presence or absence of many post-translational modifications. === Limitations === Proteolysis does not always yield a set of readily analyzable peptides covering the entire sequence of POI. The fragmentation of peptides in the mass spectrometer often does not yield ions corresponding to cleavage at each peptide bond. Thus, the deduced sequence for each peptide is not necessarily complete. The standard methods of fragmentation do not distinguish between leucine and isoleucine residues since they are isomeric. Because the Edman degradation proceeds from the N-terminus of the protein, it will not work if the N-terminus has been chemically modified (e.g. by acetylation or formation of Pyroglutamic acid). Edman degradation is generally not useful to determine the positions of disulfide bridges. It also requires peptide amounts of 1 picomole or above for discernible results, making it less sensitive than [[#Identification by mass spectrometry|mass spectrometry]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)