Editing Expressed sequence tag

{{Short description|Sub-sequence DNA in genetics}}
In [[genetics]], an '''expressed sequence tag''' ('''EST''') is a short sub-sequence of a [[cDNA]] sequence.<ref>[https://web.archive.org/web/20020507082436/http://www.ncbi.nlm.nih.gov/About/primer/est.html ESTs Factsheet]. [[National Center for Biotechnology Information]]. </ref> ESTs may be used to identify gene [[Transcription (genetics)|transcripts]], and were instrumental in gene discovery and in gene-sequence determination.<ref name=adams>{{cite journal|vauthors=Adams MD, Kelley JM, Gocayne JD |title=Complementary DNA sequencing: expressed sequence tags and human genome project |journal=Science |volume=252 |issue=5013 |pages=1651–6 |date=Jun 1991 |pmid=2047873 |doi=10.1126/science.2047873 |bibcode=1991Sci...252.1651A |s2cid=13436211 |display-authors=etal }}</ref> The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases (e.g. [[GenBank]] 1 January 2013, all species). EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.

An EST results from one-shot [[sequencing]] of a [[cloned]] cDNA. The cDNAs used for EST generation are typically individual clones from a [[cDNA library]]. The resulting sequence is a relatively low-quality fragment whose length is limited by current technology to approximately 500 to 800 [[nucleotide]]s. Because these clones consist of DNA that is complementary to mRNA, the ESTs represent portions of expressed genes. They may be represented in databases as either cDNA/mRNA sequence or as the reverse complement of the mRNA, the [[template strand]].

One can map ESTs to specific chromosome locations using [[gene mapping|physical mapping]] techniques, such as [[radiation hybrid mapping]], [[Happy mapping|HAPPY mapping]], or [[Fluorescent in situ hybridization|FISH]]. Alternatively, if the genome of the organism that originated the EST has been sequenced, one can align the EST sequence to that genome using a computer.

The current understanding of the [[human genome|human set of genes]] ({{as of | 2006 | lc=on}}) includes the existence of thousands of genes based solely on EST evidence. In this respect, ESTs have become a tool to refine the predicted transcripts for those genes, which leads to the prediction of their protein products and ultimately of their function. Moreover, the situation in which those ESTs are obtained (tissue, organ, disease state - e.g. [[cancer]]) gives information on the conditions in which the corresponding gene is acting. ESTs contain enough information to permit the design of precise probes for [[DNA microarray]]s that then can be used to determine [[gene expression]] profiles.

Some authors use the term "EST" to describe genes for which little or no further information exists besides the tag.<ref>[https://www.ncbi.nlm.nih.gov/dbEST/how_to_submit.html dbEST<!-- Bot generated title -->]</ref>

==History==
In 1979, teams at Harvard and Caltech extended the basic idea of making DNA copies of mRNAs in vitro to amplifying a library of such in bacterial plasmids.<ref>{{cite journal| journal=Cell |date= December 1979 | volume=18 |issue=4|pages=1303–16 | title=Use of a cDNA library for studies on evolution and developmental expression of the chorion multigene families |vauthors=Sim GK, Kafatos FC, Jones CW, Koehler MD, Efstratiadis A, Maniatis T | doi=10.1016/0092-8674(79)90241-1|pmid= 519770 |doi-access=free }}</ref>

In 1982,  the idea of selecting random or semi-random clones from such a cDNA library for sequencing was explored by Greg Sutcliffe and coworkers.<ref>{{cite journal | title=Common 82-nucleotide sequence unique to brain RNA |vauthors=Sutcliffe JG, Milner RJ, Bloom FE, Lerner RA | journal=Proc Natl Acad Sci U S A |date=August 1982 |volume=79 |issue=16 |pages=4942–6 | doi=10.1073/pnas.79.16.4942|pmid=6956902 |pmc=346801 |bibcode=1982PNAS...79.4942S |doi-access=free }}</ref>

In 1983, Putney  et al. sequenced 178  clones from  a  rabbit muscle cDNA  library.<ref>{{cite journal | title=A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing|vauthors=Putney SD, Herlihy WC, Schimmel P | journal=Nature |volume=302 |issue=5910 |pages=718–21  | year=1983 | doi=10.1038/302718a0|pmid=6687628 |bibcode=1983Natur.302..718P |s2cid=4364361 }}</ref>

In 1991, Adams and co-workers coined the term EST and initiated more systematic sequencing as a project (starting with 600 brain cDNAs).<ref name=adams />

== Sources of data and annotations ==

=== dbEST ===
The dbEST is a division of Genbank established in 1992. As for [[GenBank]], data in dbEST is directly submitted by laboratories worldwide and is not curated.

=== EST contigs ===
Because of the way ESTs are sequenced, many distinct expressed sequence tags are often partial sequences that correspond to the same mRNA of an organism. In an effort to reduce the number of expressed sequence tags for downstream gene discovery analyses, several groups assembled expressed sequence tags into EST [[contig]]s. Example of resources that provide EST contigs include: TIGR gene indices,<ref>{{cite journal |vauthors=Lee Y, Tsai J, Sunkara S |title=The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes |journal=Nucleic Acids Res. |volume=33 |issue=Database issue |pages=D71–4 |date=Jan 2005 |pmid=15608288 |pmc=540018 |doi=10.1093/nar/gki064 |display-authors=etal}}</ref> Unigene,<ref>{{cite journal |vauthors=Stanton JA, Macgregor AB, Green DP |title=Identifying tissue-enriched gene expression in mouse tissues using the NIH UniGene database |journal=Appl Bioinform |volume=2 |issue=3 Suppl |pages=S65–73 |year=2003 |pmid=15130819 }}</ref> and STACK <ref>{{cite journal |vauthors=Christoffels A, van Gelder A, Greyling G, Miller R, Hide T, Hide W |title=STACK: Sequence Tag Alignment and Consensus Knowledgebase |journal=Nucleic Acids Res. |volume=29 |issue=1 |pages=234–8 |date=Jan 2001 |pmid=11125101 |pmc=29830 |url=|doi=10.1093/nar/29.1.234}}</ref>

Constructing EST contigs is not trivial and may yield artifacts (contigs that contain two distinct gene products). When the complete genome sequence of an organism is available and transcripts are annotated, it is possible to bypass contig assembly and directly match transcripts with ESTs. This approach is used in the TissueInfo system (see below) and makes it easy to link annotations in the genomic database to tissue information provided by EST data.

=== Tissue information===

High-throughput analyses of ESTs often encounter similar data management challenges. A first challenge is that tissue provenance of EST libraries is described in plain English in dbEST.<ref>{{cite journal |vauthors=Skrabanek L, Campagne F |title=TissueInfo: high-throughput identification of tissue expression profiles and specificity |journal=Nucleic Acids Res. |volume=29 |issue=21 |pages=E102–2 |date=Nov 2001 |pmid=11691939 |pmc=60201 |url=|doi=10.1093/nar/29.21.e102}}</ref> This makes it difficult to write programs that can  unambiguously determine that two EST libraries were sequenced from the same tissue. Similarly, disease conditions for the tissue are not annotated in a computationally friendly manner. For instance, cancer origin of a library is often mixed with the tissue name (e.g., the tissue name "[[glioblastoma]]" indicates that the EST library was sequenced from brain tissue and the disease condition is cancer).<ref>{{cite journal |vauthors=Campagne F, Skrabanek L |title=Mining expressed sequence tags identifies cancer markers of clinical interest |journal=BMC Bioinformatics |volume=7|page=481 |year=2006 |pmid=17078886 |pmc=1635568 |doi=10.1186/1471-2105-7-481 |doi-access=free }}</ref> With the notable exception of cancer, the disease condition is often not recorded in dbEST entries. The TissueInfo project was started in 2000 to help with these challenges. The project provides curated data (updated daily) to disambiguate tissue origin and disease state (cancer/non cancer), offers a tissue ontology that links tissues and organs by "is part of" relationships (i.e., formalizes knowledge that hypothalamus is part of brain, and that brain is part of the central nervous system) and distributes open-source software for linking transcript annotations from sequenced genomes to tissue expression profiles calculated with data in dbEST.<ref>[http://icb.med.cornell.edu/crt/tissueinfo/ :institute for computational biomedicine::TissueInfo<!-- Bot generated title -->] {{webarchive |url=https://web.archive.org/web/20080604225500/http://icb.med.cornell.edu/crt/tissueinfo/ |date=June 4, 2008 }}</ref>

== See also ==
* [[Gene expression]]
* [[Complementary DNA]] (cDNA)
* [[Transcriptomics]]
* [[IMAGE cDNA clones]]
* [[Whole genome sequencing]] (WGS)

==References==
{{reflist|2}}

== External links ==
* {{cite web |url= http://www.ncbi.nlm.nih.gov/About/primer/est.html |title= ESTs: Gene Discovery Made Easier |work= Science Primer |publisher= NCBI |date= Mar 29, 2004 |url-status= dead |archive-date= February 28, 2007 |archive-url= https://web.archive.org/web/20070228181445/http://www.ncbi.nlm.nih.gov/About/primer/est.html }}
* {{cite book |chapter-url= https://www.ncbi.nlm.nih.gov/books/NBK21083/#A858 |title= NCBI Handbook |chapter=21 UniGene: A Unified View of the Transcriptome § Expressed Sequence Tags (ESTs) |first1=Joan U. |last1=Pontius |first2=Lukas |last2=Wagner |first3=Gregory D. |last3=Schuler |editor-last1= McEntyre |editor-first1= J |editor-last2= Ostell |editor-first2= J |orig-year= 2002 |date=2003 |publisher=National Center for Biotechnology Information |quote=This publication is provided for historical reference only and the information may be out of date |id=NBK21101 |url=https://www.ncbi.nlm.nih.gov/books/NBK21101/}}
* {{cite journal |journal= Bioinformatics |date= Apr 15, 2005 |volume= 21 |issue= 8 |pages= 1383–8 |title= Support vector machines for separation of mixed plant-pathogen EST collections based on codon usage (ECLAT) |last1= Friedel |first1= CC1 |last2= Jahn |first2= KH |last3= Sommer |first3= S |last4= Rudd |first4= S |last5= Mewes |first5= HW |last6= Tetko |first6= IV |doi= 10.1093/bioinformatics/bti200 |pmid=15585526|doi-access= free }}
** {{cite web |url= http://mips.gsf.de/proj/est |title= ECLAT |publisher= [[Munich Information Center for Protein Sequences|MIPS]] |quote= Server for the classification of ESTs from mixed EST pools (from fungus infected plants) using codon usage |url-status= live |archive-date= September 27, 2008 |archive-url= https://web.archive.org/web/20080927181213/http://mips.gsf.de/proj/est/ }}
* {{cite web |url= https://www.ncbi.nlm.nih.gov/genbank/dbest/ |title= dbEST |publisher= [[GenBank]] }}
** {{cite web |url= https://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html |title= dbEST summary |date= Jan 1, 2013 |publisher= [[GenBank]] |url-status= dead |archive-date= June 7, 2019 |archive-url= https://web.archive.org/web/20190607020219/https://www.ncbi.nlm.nih.gov/genbank/dbest/dbest_summary/ }}
* {{cite web |first= Shoba |last= Ranganathan |title= Bioinformatics |url= https://research.science.mq.edu.au/biolinfo/ }}
** {{cite web |url= http://www.biolinfo.org/EST/ |title= Web Resources for EST data and analysis |url-status= usurped |archive-date= August 29, 2007 |archive-url= https://web.archive.org/web/20070829194022/http://www.biolinfo.org/EST/ }}

=== Tissue Info ===
* {{cite web |url= https://icb.med.cornell.edu/wiki/index.php/TissueInfo |title= TissueInfo |work= Wiki }}
* {{cite web |url= http://icb.med.cornell.edu/crt/tissueinfo/ |title= TissueInfo |quote= Curated EST tissue provenance, tissue ontology, open-source software
|url-status= dead |archive-date= June 4, 2008 |archive-url= https://web.archive.org/web/20080604225500/http://icb.med.cornell.edu/crt/tissueinfo/ }}
* {{cite journal |journal= Nucleic Acids Res. |date= Nov 1, 2001 |volume= 29 |issue= 21 |doi= 10.1093/nar/29.21.e102 |pmid= 11691939 |title= TissueInfo: high-throughput identification of tissue expression profiles and specificity |pmc=60201 |pages=E102–2 |vauthors=Skrabanek L, Campagne F}}


{{DEFAULTSORT:Expressed Sequence Tag}}
[[Category:Gene expression]]
[[Category:Genomics]]
[[Category:DNA]]