Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bioinformatics
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== History == The first definition of the term ''bioinformatics'' was coined by [[Paulien Hogeweg]] and [[Ben Hesper]] in 1970, to refer to the study of information processes in biotic systems.<ref>{{cite journal |last1=Ouzounis |first1=C. A. |last2=Valencia |first2=A. |date=2003 |title=Early bioinformatics: the birth of a disciplineβa personal view |journal=Bioinformatics |volume=19 |issue=17 |pages=2176β2190 | pmid=14630646 | doi=10.1093/bioinformatics/btg309| doi-access=free}}</ref><ref name="Hogeweg2011">{{cite journal |vauthors=Hogeweg P |title=The Roots of Bioinformatics in Theoretical Biology |journal=PLOS Computational Biology |volume=7 |issue=3 |pages=e1002021 |date=2011 |pmid=21483479 |pmc=3068925 | doi=10.1371/journal.pcbi.1002021 | bibcode = 2011PLSCB...7E2021H | doi-access = free }}</ref><ref>{{Cite journal| vauthors = Hesper B, Hogeweg P |year=1970|title=BIO-INFORMATICA: een werkconcept |trans-title=BIO-INFORMATICS: a working concept |language=nl |journal=Het Kameleon|volume=1 |issue=6| pages=28β29}}</ref><ref>{{cite arXiv |vauthors=Hesper B, Hogeweg P |eprint=2111.11832v1 |title=Bio-informatics: a working concept. A translation of "Bio-informatica: een werkconcept" by B. Hesper and P. Hogeweg |date=2021 |class=q-bio.OT}}</ref><ref>{{cite journal |vauthors = Hogeweg P |title=Simulating the growth of cellular forms |journal=Simulation |volume=31 |issue=3 |pages=90β96 |year=1978 |doi=10.1177/003754977803100305 |s2cid=61206099 }}</ref> This definition placed bioinformatics as a field parallel to [[biochemistry]] (the study of chemical processes in biological systems).<ref name="Hogeweg2011" /> Bioinformatics and computational biology involved the analysis of biological data, particularly DNA, RNA, and protein sequences. The field of bioinformatics experienced explosive growth starting in the mid-1990s, driven largely by the [[Human Genome Project]] and by rapid advances in DNA sequencing technology.{{cn|date=February 2025}} Analyzing biological data to produce meaningful information involves writing and running software programs that use [[algorithm]]s from [[graph theory]], [[artificial intelligence]], [[soft computing]], [[data mining]], [[image processing]], and [[computer simulation]]. The algorithms in turn depend on theoretical foundations such as [[discrete mathematics]], [[control theory]], [[system theory]], [[information theory]], and [[statistics]].{{cn|date=May 2024}} === Sequences === [[File: Example DNA sequence.png|thumbnail|right|Sequences of genetic material are frequently used in bioinformatics and are easier to manage using computers than manually.]] [[File:Muscle alignment view.png|thumb|369x369px|These are sequences being compared in a MUSCLE multiple sequence alignment (MSA). Each sequence name (leftmost column) is from various louse species, while the sequences themselves are in the second column.]] There has been a tremendous advance in speed and cost reduction since the completion of the Human Genome Project, with some labs able to [[DNA sequencing|sequence]] over 100,000 billion bases each year, and a full genome can be sequenced for $1,000 or less.<ref>{{cite web | vauthors = Colby B | date = 2022 | work = Sequencing.com | title = Whole Genome Sequencing Cost | url = https://sequencing.com/education-center/whole-genome-sequencing/whole-genome-sequencing-cost | access-date = 8 April 2022 | archive-date = 15 March 2022 | archive-url = https://web.archive.org/web/20220315025036/https://sequencing.com/education-center/whole-genome-sequencing/whole-genome-sequencing-cost | url-status = live }}</ref> Computers became essential in molecular biology when [[protein sequences]] became available after [[Frederick Sanger]] determined the sequence of [[insulin]] in the early 1950s.<ref name="Sanger1951">{{cite journal |vauthors=Sanger F, Tuppy H |title=The Amino-acid Sequence in the Phenylalanyl Chain of Insulin. I. The identification of lower peptides from partial hydrolysates |journal=Biochemical Journal |volume=49 |issue=4 |pages=463β81 |date=1951 |pmid=14886310 |doi=10.1042/bj0490463 |pmc=1197535 }}</ref><ref name="Sanger1953">{{cite journal |vauthors=Sanger F, Thompson EO |title=The Amino-acid Sequence in the Glycyl Chain of Insulin. I. The identification of lower peptides from partial hydrolysates |journal=Biochemical Journal |volume=53 |issue=3 |pages=353β66 |date=1953 |pmid=13032078 |doi=10.1042/bj0530353 |pmc=1198157 }}</ref> Comparing multiple sequences manually turned out to be impractical. [[Margaret Oakley Dayhoff]], a pioneer in the field,<ref>{{cite book | vauthors=Moody G |year=2004 |title=Digital Code of Life: How Bioinformatics is Revolutionizing Science, Medicine, and Business |publisher=John Wiley & Sons |location=Hoboken, NJ, USA |isbn=978-0-471-32788-2 |url-access=registration |url=https://archive.org/details/digitalcodeoflif0000mood }}</ref> compiled one of the first protein sequence databases, initially published as books<ref name="Dayhoff1965">{{cite book |vauthors=Dayhoff MO, Eck RV, Chang MA, Sochard MR |date=1965 |title=ATLAS of PROTEIN SEQUENCE and STRUCTURE |publisher=National Biomedical Research Foundation |location=Silver Spring, MD, USA |url=https://ntrs.nasa.gov/api/citations/19660014530/downloads/19660014530.pdf |lccn=65-29342 }}</ref> as well as methods of sequence alignment and [[molecular evolution]].<ref name="pmid17775169">{{cite journal |vauthors=Eck RV, Dayhoff MO |title= Evolution of the Structure of Ferredoxin Based on Living Relics of Primitive Amino Acid Sequences | journal = Science | volume = 152 | issue = 3720 | pages = 363β6 | date = April 1966 | pmid = 17775169 | doi = 10.1126/science.152.3720.363 | s2cid = 23208558 | bibcode = 1966Sci...152..363E }}</ref> Another early contributor to bioinformatics was [[Elvin A. Kabat]], who pioneered biological sequence analysis in 1970 with his comprehensive volumes of antibody sequences released online with Tai Te Wu between 1980 and 1991.<ref>{{cite journal | vauthors = Johnson G, Wu TT | title = Kabat database and its applications: 30 years after the first variability plot | journal = Nucleic Acids Research | volume = 28 | issue = 1 | pages = 214β8 | date = January 2000 | pmid = 10592229 | pmc = 102431 | doi = 10.1093/nar/28.1.214 }}</ref> In the 1970s, new techniques for sequencing DNA were applied to bacteriophage MS2 and ΓΈX174, and the extended nucleotide sequences were then parsed with informational and statistical algorithms. These studies illustrated that well known features, such as the coding segments and the triplet code, are revealed in straightforward statistical analyses and were the proof of the concept that bioinformatics would be insightful.<ref>{{cite journal | vauthors = Erickson JW, Altman GG |title=A Search for Patterns in the Nucleotide Sequence of the MS2 Genome |journal=Journal of Mathematical Biology |date=1979 |volume=7 |issue=3 |pages=219β230 |doi=10.1007/BF00275725 |s2cid=85199492 }}</ref><ref>{{cite journal | vauthors = Shulman MJ, Steinberg CM, Westmoreland N | title = The coding function of nucleotide sequences can be discerned by statistical analysis | journal = Journal of Theoretical Biology | volume = 88 | issue = 3 | pages = 409β20 | date = February 1981 | pmid = 6456380 | doi = 10.1016/0022-5193(81)90274-5 | bibcode = 1981JThBi..88..409S }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)