Editing FASTA format (section)

=== Original format ===
The original FASTA/[[William Pearson (scientist)|Pearson]] format is described in the documentation for the [[FASTA]] suite of programs. It can be downloaded with any free distribution of FASTA (see fasta20.doc, fastaVN.doc, or fastaVN.me—where VN is the Version Number).

In the original format, a sequence was represented as a series of lines, each of which was no longer than 120 characters and usually did not exceed 80 characters. This probably was to allow for the preallocation of fixed line sizes in software: at the time most users relied on [[Digital Equipment Corporation]] (DEC) [[VT220]] (or compatible) terminals which could display 80 or 132 characters per line.<ref>{{Cite web |last=Landsteiner |first=mass:werk, Norbert |date=2019-02-20 |title=(Now Go Bang!) Raster CRT Typography (According to DEC) |url=https://www.masswerk.at/nowgobang/2019/dec-crt-typography |access-date=2024-03-15 |website=Now Go Bang! — mass:werk / Blog |language=en}}</ref><ref>{{Cite web |title=VT220 Built-in Glyphs |url=https://www.vt100.net/dec/vt220/glyphs |access-date=2024-03-15 |website=VT100}}</ref> Most people preferred the bigger font in 80-character modes and so it became the recommended fashion to use 80 characters or less (often 70) in FASTA lines. Also, the width of a standard printed page is 70 to 80 characters (depending on the font). Hence, 80 characters became the norm.<ref>{{Cite web |title=Why is 80 characters the 'standard' limit for code width? |url=https://softwareengineering.stackexchange.com/questions/148677/why-is-80-characters-the-standard-limit-for-code-width |access-date=2024-03-15 |website=Software Engineering Stack Exchange |language=en}}</ref>

The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";"<ref>{{Cite web |date=2023-08-01 |title=FASTA Database Format |url=https://www.loc.gov/preservation/digital/formats/fdd/fdd000622.shtml |access-date=2024-03-15 |website=www.loc.gov}}</ref> (semicolon) was taken as a comment. Subsequent lines starting with a semicolon would be ignored by software. Since the only comment used was the first, it quickly became used to hold a summary description of the sequence, often starting with a unique library accession number, and with time it has become commonplace to always use ">" for the first line and to not use ";" comments (which would otherwise be ignored).

Following the initial line (used for a unique description of the sequence) was the actual sequence itself in the standard one-letter character string. Anything other than a valid character would be ignored (including spaces, tabulators, asterisks, etc...). It was also common to end the sequence with an "*" (asterisk) character (in analogy with use in PIR formatted sequences) and, for the same reason, to leave a blank line between the description and the sequence. Below are a few sample sequences:
<syntaxhighlight lang="text">
;LCBO - Prolactin precursor - Bovine
; a sample sequence in FASTA format
MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQVSLRDLFDRAVMVSHYIHDLSS
EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHHEVLMSLILGLLRSWNDPLYHL
VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGAKETEPYPVWSGLPSLQTKDED
ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC*

>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
DIDGDGQVNYEEFVQMMTAK*

>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
</syntaxhighlight>
A multiple-sequence FASTA format, or multi-FASTA format, would be obtained by concatenating several single-sequence FASTA files in one file. This does not imply a contradiction with the format as only the first line in a FASTA file may start with a ";" or ">", forcing all subsequent sequences to start with a ">" in order to be taken as separate sequences (and further forcing the exclusive reservation of ">" for the sequence definition line). Thus, the examples above would be a multi-FASTA file if taken together.

Modern bioinformatics programs that rely on the FASTA format expect the sequence headers to be preceded by ">". The sequence is generally represented as "interleaved", or on multiple lines as in the above example, but may also be "sequential", or on a single line. Running different bioinformatics programs may require conversions between "sequential" and "interleaved" FASTA formats.