Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Representative sequences
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
In social sciences and other domains, representative sequences are whole sequences that best characterize or summarize a set of sequences.<ref name=":0" /> In bioinformatics, representative sequences also designate substrings of a sequence that characterize the sequence.<ref>{{Citation |last1=Kuri-Morales |first1=Angel F. |title=A New Approach to Sequence Representation of Proteins in Bioinformatics |date=2005 |url=http://link.springer.com/10.1007/11579427_90 |work=MICAI 2005: Advances in Artificial Intelligence |volume=3789 |pages=880–889 |editor-last=Gelbukh |editor-first=Alexander |access-date=2023-06-12 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/11579427_90 |isbn=978-3-540-29896-0 |last2=Ortiz-Posadas |first2=Martha R. |editor2-last=de Albornoz |editor2-first=Álvaro |editor3-last=Terashima-Marín |editor3-first=Hugo|url-access=subscription }}</ref><ref>{{Cite journal |last1=Chen |first1=William L. |last2=Leland |first2=Burton A. |last3=Durant |first3=Joseph L. |last4=Grier |first4=David L. |last5=Christie |first5=Bradley D. |last6=Nourse |first6=James G. |last7=Taylor |first7=Keith T. |date=2011-09-26 |title=Self-Contained Sequence Representation: Bridging the Gap between Bioinformatics and Cheminformatics |url=https://pubs.acs.org/doi/10.1021/ci2001988 |journal=Journal of Chemical Information and Modeling |language=en |volume=51 |issue=9 |pages=2186–2208 |doi=10.1021/ci2001988 |pmid=21800899 |issn=1549-9596|url-access=subscription }}</ref> == Social sciences == [[File:Fg-rep-seq-biofam.png|thumb|Representative sequences covering 27% of 2000 cohabitation sequences between age 15 and 30 (extract of biographical data from the Swiss Household Panel)]] In [[Sequence analysis in social sciences]], representative sequences are used to summarize sets of sequences describing for example the family life course or professional career of several thousands individuals.<ref name=":18">{{Cite journal |last1=Gabadinho |first1=Alexis |last2=Ritschard |first2=Gilbert |date=2013 |editor-last=Levy |editor-first=René |editor2-last=Widmer |editor2-first=Eric D. |title=Searching for typical life trajectories, applied to childbirth histories |url=https://www.researchgate.net/publication/287202533 |journal=Gendered Life Courses, Between Standardization and Individualization: A European Approach Applied to Switzerland |publisher=LIT |publication-place=Zurich |pages=287–312}}</ref> The identification of representative sequences<ref name=":0">{{Citation |last1=Gabadinho |first1=Alexis |title=Extracting and Rendering Representative Sequences |date=2011 |url=http://link.springer.com/10.1007/978-3-642-19032-2_7 |work=Knowledge Discovery, Knowledge Engineering and Knowledge Management |volume=128 |pages=94–106 |editor-last=Fred |editor-first=Ana |access-date=2023-06-12 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/978-3-642-19032-2_7 |isbn=978-3-642-19031-5 |last2=Ritschard |first2=Gilbert |last3=Studer |first3=Matthias |last4=Müller |first4=Nicolas S. |series=Communications in Computer and Information Science |editor2-last=Dietz |editor2-first=Jan L. G. |editor3-last=Liu |editor3-first=Kecheng |editor4-last=Filipe |editor4-first=Joaquim|url-access=subscription }}</ref><ref name=":18" /> proceeds from the pairwise dissimilarities between sequences. One typical solution is the medoid sequence, i.e., the observed sequence that minimizes the sum of its distances to all other sequences in the set. An other solution is the densest observed sequence, i.e., the sequence with the greatest number of other sequences in its neighborhood. When the diversity of the sequences is large, a single representative is often insufficient to efficiently characterize the set. In such cases, an as small as possible set of representative sequences covering (i.e., which includes in at least one neighborhood of a representative) a given percentage of all sequences is searched. A solution also considered is to select the medoids of relative frequency groups. More specifically, the method consists in sorting the sequences (for example, according to the first principal coordinate of the pairwise dissimilarity matrix), splitting the sorted list into equal sized groups (called relative frequency groups), and selecting the medoids of the equal sized groups.<ref>{{Cite journal |last1=Fasang |first1=Anette Eva |last2=Liao |first2=Tim Futing |date=2014 |title=Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots |url=http://journals.sagepub.com/doi/10.1177/0049124113506563 |journal=Sociological Methods & Research |language=en |volume=43 |issue=4 |pages=643–676 |doi=10.1177/0049124113506563 |issn=0049-1241 |s2cid=61487252 |hdl=10419/209702|hdl-access=free }}</ref> The methods for identifying representative sequences described above have been implemented in the R package [https://cran.r-project.org/package=TraMineR TraMineR].<ref>{{Cite journal |last1=Gabadinho |first1=Alexis |last2=Ritschard |first2=Gilbert |last3=Müller |first3=Nicolas S. |last4=Studer |first4=Matthias |date=2011 |title=Analyzing and Visualizing State Sequences in R with TraMineR |url=http://www.jstatsoft.org/v40/i04/ |journal=Journal of Statistical Software |language=en |volume=40 |issue=4 |doi=10.18637/jss.v040.i04 |issn=1548-7660|doi-access=free }}</ref> == Bioinformatics == {{Multiple issues| {{context|date=October 2009}} {{refimprove|date=July 2018}} }}'''Representative sequences''' are short regions within [[Protein primary structure|protein sequences]] that can be used to approximate the [[Molecular evolution|evolutionary relationships]] of those proteins, or the organisms from which they come. Representative sequences are contiguous subsequences (typically 300 [[amino acid |residues]]) from [[Housekeeping gene|ubiquitous]], conserved proteins, such that each [[orthologous]] family of representative sequences taken alone gives a [[Distance matrices in phylogeny|distance matrix]] in close agreement with the consensus matrix.<ref>{{cite journal |last1= Bern|first1= Marshall|last2= Goldberg|first2= David|date= November 2, 2004|title= Automatic selection of representative proteins for bacterial phylogeny|journal= BMC Evolutionary Biology|volume= 5|issue= 34|pages= 34|doi= 10.1186/1471-2148-5-34|pmid= 15927057|pmc= 1175084|doi-access= free}}</ref> === Use === [[Protein sequence]]s can provide data about the [[Function (biology)|biological function]] and [[evolution]] of proteins and [[protein domain]]s. Grouping and interrelating protein sequences can therefore provide information about both human biological processes, and the evolutionary development of biological processes on earth; such [[sequence cluster]]s allow for the effective coverage of sequence space. Sequence clusters can reduce a large database of sequences to a smaller set of ''sequence representatives'', each of which should represent its cluster at the sequence level. Sequence representatives allow the effective coverage of the original database with fewer sequences. The database of sequence representatives is called ''non-redundant'', as similar (or redundant) sequences have been removed at a certain similarity threshold. == See also == [[Sequence analysis in social sciences]] [[Sequence analysis]] in bioinformatics ==References== {{reflist}} [[Category:Protein structure]] [[Category:Bioinformatics]] [[Category:Social sciences]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation
(
edit
)
Template:Cite journal
(
edit
)
Template:Multiple issues
(
edit
)
Template:Reflist
(
edit
)