Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bioinformatics
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Sequences === [[File: Example DNA sequence.png|thumbnail|right|Sequences of genetic material are frequently used in bioinformatics and are easier to manage using computers than manually.]] [[File:Muscle alignment view.png|thumb|369x369px|These are sequences being compared in a MUSCLE multiple sequence alignment (MSA). Each sequence name (leftmost column) is from various louse species, while the sequences themselves are in the second column.]] There has been a tremendous advance in speed and cost reduction since the completion of the Human Genome Project, with some labs able to [[DNA sequencing|sequence]] over 100,000 billion bases each year, and a full genome can be sequenced for $1,000 or less.<ref>{{cite web | vauthors = Colby B | date = 2022 | work = Sequencing.com | title = Whole Genome Sequencing Cost | url = https://sequencing.com/education-center/whole-genome-sequencing/whole-genome-sequencing-cost | access-date = 8 April 2022 | archive-date = 15 March 2022 | archive-url = https://web.archive.org/web/20220315025036/https://sequencing.com/education-center/whole-genome-sequencing/whole-genome-sequencing-cost | url-status = live }}</ref> Computers became essential in molecular biology when [[protein sequences]] became available after [[Frederick Sanger]] determined the sequence of [[insulin]] in the early 1950s.<ref name="Sanger1951">{{cite journal |vauthors=Sanger F, Tuppy H |title=The Amino-acid Sequence in the Phenylalanyl Chain of Insulin. I. The identification of lower peptides from partial hydrolysates |journal=Biochemical Journal |volume=49 |issue=4 |pages=463β81 |date=1951 |pmid=14886310 |doi=10.1042/bj0490463 |pmc=1197535 }}</ref><ref name="Sanger1953">{{cite journal |vauthors=Sanger F, Thompson EO |title=The Amino-acid Sequence in the Glycyl Chain of Insulin. I. The identification of lower peptides from partial hydrolysates |journal=Biochemical Journal |volume=53 |issue=3 |pages=353β66 |date=1953 |pmid=13032078 |doi=10.1042/bj0530353 |pmc=1198157 }}</ref> Comparing multiple sequences manually turned out to be impractical. [[Margaret Oakley Dayhoff]], a pioneer in the field,<ref>{{cite book | vauthors=Moody G |year=2004 |title=Digital Code of Life: How Bioinformatics is Revolutionizing Science, Medicine, and Business |publisher=John Wiley & Sons |location=Hoboken, NJ, USA |isbn=978-0-471-32788-2 |url-access=registration |url=https://archive.org/details/digitalcodeoflif0000mood }}</ref> compiled one of the first protein sequence databases, initially published as books<ref name="Dayhoff1965">{{cite book |vauthors=Dayhoff MO, Eck RV, Chang MA, Sochard MR |date=1965 |title=ATLAS of PROTEIN SEQUENCE and STRUCTURE |publisher=National Biomedical Research Foundation |location=Silver Spring, MD, USA |url=https://ntrs.nasa.gov/api/citations/19660014530/downloads/19660014530.pdf |lccn=65-29342 }}</ref> as well as methods of sequence alignment and [[molecular evolution]].<ref name="pmid17775169">{{cite journal |vauthors=Eck RV, Dayhoff MO |title= Evolution of the Structure of Ferredoxin Based on Living Relics of Primitive Amino Acid Sequences | journal = Science | volume = 152 | issue = 3720 | pages = 363β6 | date = April 1966 | pmid = 17775169 | doi = 10.1126/science.152.3720.363 | s2cid = 23208558 | bibcode = 1966Sci...152..363E }}</ref> Another early contributor to bioinformatics was [[Elvin A. Kabat]], who pioneered biological sequence analysis in 1970 with his comprehensive volumes of antibody sequences released online with Tai Te Wu between 1980 and 1991.<ref>{{cite journal | vauthors = Johnson G, Wu TT | title = Kabat database and its applications: 30 years after the first variability plot | journal = Nucleic Acids Research | volume = 28 | issue = 1 | pages = 214β8 | date = January 2000 | pmid = 10592229 | pmc = 102431 | doi = 10.1093/nar/28.1.214 }}</ref> In the 1970s, new techniques for sequencing DNA were applied to bacteriophage MS2 and ΓΈX174, and the extended nucleotide sequences were then parsed with informational and statistical algorithms. These studies illustrated that well known features, such as the coding segments and the triplet code, are revealed in straightforward statistical analyses and were the proof of the concept that bioinformatics would be insightful.<ref>{{cite journal | vauthors = Erickson JW, Altman GG |title=A Search for Patterns in the Nucleotide Sequence of the MS2 Genome |journal=Journal of Mathematical Biology |date=1979 |volume=7 |issue=3 |pages=219β230 |doi=10.1007/BF00275725 |s2cid=85199492 }}</ref><ref>{{cite journal | vauthors = Shulman MJ, Steinberg CM, Westmoreland N | title = The coding function of nucleotide sequences can be discerned by statistical analysis | journal = Journal of Theoretical Biology | volume = 88 | issue = 3 | pages = 409β20 | date = February 1981 | pmid = 6456380 | doi = 10.1016/0022-5193(81)90274-5 | bibcode = 1981JThBi..88..409S }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)