Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Biological database
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Database of biological information}} [[File:String home page.png|thumb|Home page of a biological database called characterises functional links between proteins<ref name="pmid21045058">{{cite journal |author=Szklarczyk D|author2=Franceschini A|author3=Kuhn M|title=The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored |journal=Nucleic Acids Res. |volume=39 |issue=Database issue |pages=D561–8 |date=January 2011 |pmid=21045058 |pmc=3013807 |doi=10.1093/nar/gkq973 |display-authors=etal}}</ref>|350 px|right]] '''Biological databases''' are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis.{{Citation needed|date=December 2019|reason=removed citation to predatory publisher content}} They contain information from research areas including [[genomics]], [[proteomics]], [[metabolomics]], [[microarray]] gene expression, and [[phylogenetics]].<ref>{{cite journal |author=Altman RB |title=Building successful biological databases |journal=Brief. Bioinformatics |volume=5 |issue=1 |pages=4–5 |date=March 2004 |pmid=15153301 |doi= 10.1093/bib/5.1.4|doi-access=free }}</ref> Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. Biological databases can be classified by the '''kind of data''' they collect (see below). Broadly, there are molecular databases (for sequences, molecules, etc.), functional databases (for physiology, enzyme activities, phenotypes, ecology etc), taxonomic databases (for species and other taxonomic ranks), images and other media, or specimens (for museum collections etc.) Databases are important tools in assisting scientists to analyze and explain a host of biological phenomena from the structure of [[biomolecule]]s and their interaction, to the whole [[metabolism]] of organisms and to understanding the [[evolution]] of [[species]]. This knowledge helps facilitate the fight against diseases, assists in the development of [[medication]]s, predicting certain genetic diseases and in discovering basic relationships among species in the [[evolutionary timeline|history of life]]. == Technical basis and theoretical concepts == [[Relational database]] concepts of [[computer science]] and [[Information retrieval]] concepts of [[Digital library|digital libraries]] are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of [[bioinformatics]].<ref>{{cite journal |author=Bourne P |title=Will a biological database be different from a biological journal? |journal=PLOS Comput. Biol. |volume=1 |issue=3 |pages=179–81 |date=August 2005 |pmid=16158097 |doi=10.1371/journal.pcbi.0010034 |pmc=1193993|bibcode=2005PLSCB...1...34B |doi-access=free }}</ref> Data contents include gene sequences, textual descriptions, attributes and [[Ontology (information science)|ontology]] classifications, citations, and tabular data. These are often described as semi-[[structured data]], and can be represented as tables, key delimited records, and [[XML]] structures.{{citation needed|date=April 2023}} ==Access== Most biological databases are available through web sites that organise data such that users can browse through the data online. In addition the underlying data is usually available for download in a variety of formats. [[Biological data]] comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example:{{citation needed|date=April 2023}} * Text formats are provided by [[PubMed]] and [[OMIM]]. * Sequence data is provided by [[GenBank]], in terms of DNA, and [[UniProt]], in terms of protein. * Protein structures are provided by [[Protein Data Bank|PDB]], [[Structural Classification of Proteins|SCOP]], and [[CATH]]. ==Problems and challenges== Biological knowledge is distributed among countless databases. This sometimes makes it difficult to ensure the '''consistency''' of information, e.g. when different names are used for the same species or different data formats. As a consequence, '''inter-operability''' is a constant challenge for information exchange. For instance, if a DNA sequence database stores the DNA sequence along the name of a species, a name change of that species may break the links to other databases which may use a different name. [[Integrative bioinformatics]] is one field attempting to tackle this problem by providing unified access. One solution is how biological databases [[cross-reference]] to other databases with [[Accession number (bioinformatics)|accession numbers]] to link their related knowledge together (e.g. so that the accession number stays the same even if a species name changes). '''Redundancy''' is another problem, as many databases must store the same information, e.g. [[protein structure database]]s also contain the sequence of the proteins they cover, their sequence, and their bibliographic information. == Model-organism databases == {{Main|Model organism database}} Species-specific databases are available for some species, mainly those that are often used in research ([[Model organism|''model organisms'']]). For example, EcoCyc is an ''E. coli'' database. Other popular [[model organism databases]] include [[Mouse Genome Informatics]] for the [[laboratory mouse]], ''Mus musculus'', the [[Rat Genome Database]] for ''Rattus'', [[ZFIN]] for ''Danio Rerio'' (zebrafish), [[PomBase]]<ref name="pmid38376816">{{cite journal | vauthors = Rutherford KM, Lera-Ramírez M, Wood V | title = PomBase: a Global Core Biodata Resource—growth, collaboration, and sustainability | journal = Genetics | volume = 227 | issue = 1 | date = May 2024 | pmid = 38376816 | pmc = 11075564 | doi = 10.1093/genetics/iyae007 }}</ref> for the fission yeast ''Schizosaccharomyces pombe'', [[FlyBase]] for ''Drosophila'', [[WormBase]] for the nematodes ''[[Caenorhabditis elegans]]'' and ''[[Caenorhabditis briggsae]]'', and [[Xenbase]] for ''[[Xenopus tropicalis]]'' and ''[[Xenopus laevis]]'' frogs. == Biodiversity and species databases == [[File:Animal kingdom chart from Catalogue of Life.png|thumb|Animal groups and their number of species from the [[Catalogue of Life]]<ref>{{cite web |author=Catalogue of Life |date=2001 |title=Homepage |url=https://www.catalogueoflife.org/about/catalogueoflife |accessdate=2022-05-05 |work=Search |publisher=Species 2000 |archive-date=2022-05-05 |archive-url=https://web.archive.org/web/20220505190235/https://www.catalogueoflife.org/about/catalogueoflife |url-status=live }}</ref>]] Numerous databases attempt to document the diversity of life on earth. A prominent example is the [[Catalogue of Life]], first created in 2001 by Species 2000 and the Integrated Taxonomic Information System.<ref>{{cite journal |title=Identifying and Relating Biological Concepts in the Catalogue of Life |journal=Journal of Biomedical Semantics |year=2011 |last=Jones |first=Andrew C. |volume=2 |issue=1 |page=7 |doi=10.1186/2041-1480-2-7 |pmid=22004596 |pmc=3245425 |doi-access=free }}</ref> The Catalogue of Life is a collaborative project that aims to document taxonomic categorization of all currently accepted species in the world.<ref>{{cite web |url=https://www.catalogueoflife.org/about/catalogueoflife#our-mission. |title=What is Catalogue of Life? |author=Catalogue of Life |work=Our Mission |publisher=Species 2000 |date=2001 |accessdate=2022-05-05 |archive-date=2022-05-05 |archive-url=https://web.archive.org/web/20220505190235/https://www.catalogueoflife.org/about/catalogueoflife#our-mission. |url-status=live }}</ref> The Catalogue of Life provides a consolidated and consistent database for researchers and policymakers to reference. The Catalogue of Life curates up-to-date datasets from other sources such as Conifer Database, [[International Committee on Taxonomy of Viruses|ICTV]] MSL (for viruses), and LepIndex (for butterflies and moths). In total, the Catalogue of Life draws from 165 databases as of May 2022.<ref>{{cite web |url=https://www.catalogueoflife.org/data/source-datasets |title=Source Datasets |author=Catalogue of Life |publisher=Species 2000 |date=2001 |accessdate=2022-05-05 |archive-date=2022-05-14 |archive-url=https://web.archive.org/web/20220514042628/https://www.catalogueoflife.org/data/source-datasets |url-status=live }}</ref> Operational costs of the Catalogue of Life are paid for by the [[Global Biodiversity Information Facility]], the [[Illinois Natural History Survey]], the [[Naturalis Biodiversity Center]], and the [[Smithsonian Institution]].<ref>{{cite web |url=https://www.catalogueoflife.org/about/funding |title=Funding |author=Catalogue of Life |publisher=Species 2000 |date=2001 |accessdate=2022-05-05 |archive-date=2022-05-05 |archive-url=https://web.archive.org/web/20220505190235/https://www.catalogueoflife.org/about/funding |url-status=live }}</ref> Some biological databases also document geographical distribution of different species. Shuang Dai et al. created a new multi-source database to document spatial/geographical distribution of 1,371 bird species in China, as existing databases had been severely lacking in spatial distribution data for many species.<ref>{{cite journal |title=A Spatialized Digital Database for All Bird Species in China |journal=Science China Life Sciences |year=2019 |last=Dai |first=Shuang |volume=62 |issue=5 |pages=661–667 |doi=10.1007/s11427-018-9419-2 |pmid=30900164 |s2cid=84845653 |url=https://doi.org/10.1007/s11427-018-9419-2 |accessdate=2022-05-05 |url-access=subscription }}</ref> Sources for this new database included books, literature, GPS tracking, and online webpage data. The new database displayed taxonomy, distribution, species info, and data sources for each species. After completion of the bird spatial distribution database, it was discovered that 61% of known species in China were found to be distributed in regions beyond where they were previously known.<ref>{{cite journal |title=A Spatialized Digital Database for All Bird Species in China |journal=Science China Life Sciences |year=2019 |last=Dai |first=Shuang |volume=62 |issue=5 |pages=661–667 |doi=10.1007/s11427-018-9419-2 |pmid=30900164 |s2cid=84845653 |url=https://doi.org/10.1007/s11427-018-9419-2 |accessdate=2022-05-05 |url-access=subscription }}</ref> == Medical databases == [[File:Woundsdb.png|thumb|Foot wounds from WoundsDB<ref>{{cite web |url=https://chronicwounddatabase.eu/ |title=Chronic Wound Database |work=WoundsDB |publisher=Silesian University of Technology |date=2020 |accessdate=2022-05-05 }}</ref>]] Medical databases are a special case of biomedical data resource and can range from bibliographies, such as [[PubMed]], to image databases for the development of AI based diagnostic software. For instance, one such image database was developed with the goal of aiding in the development of wound monitoring algorithms.<ref>{{cite journal |title=Chronic Wounds Multimodal Image Database |journal=Computerized Medical Imaging and Graphics |year=2021 |last=Kręcichwost |first=Michał |volume=88 |page=101844 |doi=10.1016/j.compmedimag.2020.101844 |pmid=33477091 |s2cid=231676950 |url=https://doi.org/10.1016/j.compmedimag.2020.101844 |accessdate=2022-05-05 |url-access=subscription }}</ref> Over 188 multi-modal image sets were curated from 79 patient visits, consisting of photographs, thermal images, and 3D mesh depth maps. Wound outlines were manually drawn and added to the photo datasets.<ref>{{cite web |url=https://chronicwounddatabase.eu/ |title=Chronic Wound Database |work=WoundsDB |publisher=Silesian University of Technology |date=2020 |accessdate=2022-05-05 }}</ref> The database was made publicly available in the form of a program called WoundsDB, downloadable from the Chronic Wound Database website. == ''Nucleic Acids Research'' Database Issue == An important resource for finding biological databases is a special yearly issue of the journal ''[[Nucleic Acids Research]]'' (NAR). The Database Issue of NAR is freely available, and categorizes many of the public biological databases. A companion database to the issue called the Online Molecular Biology Database Collection lists 1,380 online databases.<ref name="pmid22144685">{{cite journal |author=Galperin MY|author2=Fernández-Suárez XM |title=The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection |journal=Nucleic Acids Res. |volume=40 |issue=Database issue |pages=D1–8 |date=January 2012 |pmid=22144685 |pmc=3245068 |doi=10.1093/nar/gkr1196 }}</ref> Other collections of databases exist such as MetaBase and the Bioinformatics Links Collection.<ref name="pmid22139927">{{cite journal |author=Bolser DM|author2=Chibon PY|author3=Palopoli N|title=MetaBase--the wiki-database of biological databases |journal=Nucleic Acids Res. |volume=40 |issue=Database issue |pages=D1250–4 |date=January 2012 |pmid=22139927 |pmc=3245051 |doi=10.1093/nar/gkr1099 |display-authors=etal}}</ref><ref name="pmid21715385">{{cite journal |author=Brazas MD|author2=Yim DS|author3=Yamada JT|author4=Ouellette BF |title=The 2011 Bioinformatics Links Directory update: more resources, tools and databases and features to empower the bioinformatics community |journal=Nucleic Acids Res. |volume=39 |issue=Web Server issue |pages=W3–7 |date=July 2011 |pmid=21715385 |pmc=3125814 |doi=10.1093/nar/gkr514 }}</ref> ==See also== * [[Biobank]] * [[Biological data]] * [[Chemical database]] * [[Death Domain database]] * [[European Bioinformatics Institute]] * [[Gene Disease Database]] * [[Integrative bioinformatics]] * [[List of biological databases]] * [[Model organism databases]] * [[National Center for Biotechnology Information|NCBI]] * [[PubMed]] (a database of biomedical literature) ==References== {{Reflist|30em}} ==External links== * [https://web.archive.org/web/20060112045100/http://www.oxfordjournals.org/nar/database/c Interactive list of biological databases], classified by categories, from [[Nucleic Acids Research]], 2010 * [https://web.archive.org/web/20191202045455/http://www.biodbs.info/ DBD: Database of Biological Databases] * [http://www.Biosharing.org Biosharing] (a database of biological databases) * [https://chronicwounddatabase.eu/ Chronic Wounds Database] WoundsDB * [https://www.catalogueoflife.org/ Catalogue of Life] Catalogue of Life {{Personal genomics}} [[Category:Biological databases| ]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation needed
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Main
(
edit
)
Template:Personal genomics
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)