Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
FASTA format
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Description line== The description line (defline) or header/identifier line, which begins with ">", gives a name and/or a unique identifier for the sequence, and may also contain additional information. In a deprecated practice, the header line sometimes contained more than one header, separated by a ^A (Control-A) character. In the original [[William Pearson (scientist)|Pearson]] FASTA format, one or more comments, distinguished by a semi-colon at the beginning of the line, may occur after the header. Some databases and bioinformatics applications do not recognize these comments and follow [https://www.ncbi.nlm.nih.gov/blast/fasta.shtml the NCBI FASTA specification]. An example of a multiple sequence FASTA file follows: <syntaxhighlight lang="text"> >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH </syntaxhighlight> === NCBI identifiers === The [[National Center for Biotechnology Information|NCBI]] defined a standard for the unique identifier used for the sequence (SeqID) in the header line. This allows a sequence that was obtained from a database to be labelled with a reference to its database record. The database identifier format is understood by the NCBI tools like <code>makeblastdb</code> and <code>table2asn</code>. The following list describes the NCBI FASTA defined format for sequence identifiers.<ref>{{cite book |title=NCBI C++ Toolkit Book |publisher=National Center for Biotechnology Information |url=https://ncbi.github.io/cxx-toolkit/pages/ch_demo#ch_demo.id1_fetch.html_ref_fasta |access-date=2018-12-19}}</ref> {| class="wikitable sortable" style="border: 1px solid black; margin-bottom: 10px;" |- ! Type ! Format(s) ! Example(s) |- | local (i.e. no database reference) | <code>lcl|''integer''</code><br /> <code>lcl|''string''</code> | <code>lcl|123</code><br /> <code>lcl|hmm271</code> |- | GenInfo backbone seqid | <code>bbs|''integer''</code> | <code>bbs|123</code> |- | GenInfo backbone moltype | <code>bbm|''integer''</code> | <code>bbm|123</code> |- | GenInfo import ID | <code>gim|''integer''</code> | <code>gim|123</code> |- | [https://www.ncbi.nlm.nih.gov/Genbank/index.html GenBank] | <code>gb|''accession''|''locus''</code> | <code>gb|M73307|AGMA13GT</code> |- | [http://www.embl-heidelberg.de EMBL] | <code>emb|''accession''|''locus''</code> | <code>emb|CAM43271.1|</code> |- | [https://web.archive.org/web/20140312021627/http://pir.georgetown.edu/ PIR] | <code>pir|''accession''|''name''</code> | <code>pir||G36364</code> |- | [http://www.ebi.ac.uk/swissprot SWISS-PROT] | <code>sp|''accession''|''name''</code> | <code>sp|P01013|OVAX_CHICK</code> |- | patent | <code>pat|''country''|''patent''|''sequence-number''</code> | <code>pat|US|RE33188|1</code> |- | pre-grant patent | <code>pgp|''country''|''application-number''|''sequence-number''</code> | <code>pgp|EP|0238993|7</code> |- | [https://www.ncbi.nlm.nih.gov/projects/RefSeq RefSeq] | <code>ref|''accession''|''name''</code> | <code>ref|NM_010450.1|</code> |- | general database reference<br />(a reference to a database that's not in this list) | <code>gnl|''database''|''integer''</code><br /> <code>gnl|''database''|''string''</code> | <code>gnl|taxon|9606</code><br /> <code>gnl|PID|e1632</code> |- | GenInfo integrated database | <code>gi|''integer''</code> | <code>gi|21434723</code> |- | [http://www.ddbj.nig.ac.jp DDBJ] | <code>dbj|''accession''|''locus''</code> | <code>dbj|BAC85684.1|</code> |- | [http://www.prf.or.jp PRF] | <code>prf|''accession''|''name''</code> | <code>prf||0806162C</code> |- | [https://web.archive.org/web/20080828002005/http://www.rcsb.org./pdb PDB] | <code>pdb|''entry''|''chain''</code> | <code>pdb|1I4L|D</code> |- | third-party [https://www.ncbi.nlm.nih.gov/Genbank/index.html GenBank] | <code>tpg|''accession''|''name''</code> | <code>tpg|BK003456|</code> |- | third-party [http://www.embl-heidelberg.de EMBL] | <code>tpe|''accession''|''name''</code> | <code>tpe|BN000123|</code> |- | third-party [http://www.ddbj.nig.ac.jp DDBJ] | <code>tpd|''accession''|''name''</code> | <code>tpd|FAA00017|</code> |- | TrEMBL | <code>tr|''accession''|''name''</code> | <code>tr|Q90RT2|Q90RT2_9HIV1</code> |} The vertical bars ("|") in the above list are not separators in the sense of the [[Backus–Naur form]] but are part of the format. Multiple identifiers can be concatenated, also separated by vertical bars.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)