Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Open reading frame
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|DNA section marked with start and stop codon of different length}} [[File:Sampleorf.png|thumb|600px|Sample sequence showing three different possible [[reading frame]]s. [[Start codon]]s are highlighted in purple, and [[stop codon]]s are highlighted in red.]] In [[molecular biology]], '''reading frames''' are defined as spans of [[DNA]] sequence between the start and stop [[Codon|codons]]. Usually, this is considered within a studied region of a [[Prokaryote|prokaryotic]] DNA sequence, where only one of the [[#Six-frame translation|six possible reading frames]] will be "open" (the "reading", however, refers to the RNA produced by [[Transcription (biology)|transcription]] of the DNA and its subsequent interaction with the [[ribosome]] in [[Translation (biology)|translation]]). Such an open reading frame (ORF) may<ref name="Sieber_2018"/> contain a [[start codon]] (usually AUG in terms of [[RNA]]) and by definition cannot extend beyond a [[stop codon]] (usually UAA, UAG or UGA in RNA).<ref>{{cite web| vauthors = Brody LC |date=2021-08-25|title=Stop Codon|url=https://www.genome.gov/genetics-glossary/Stop-Codon|access-date=2021-08-25|website=National Human Genome Research Institute|publisher=National Institutes of Health}}</ref> That start codon (not necessarily the first) indicates where translation may start. The [[transcription terminator|transcription termination]] site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete [[protein]] would be made during translation.<ref>{{Cite book | vauthors = Slonczewski J, Foster JW |title=Microbiology: An Evolving Science |location=New York |publisher=W.W. Norton & Co. |year=2009 |isbn=978-0-393-97857-5 |oclc=185042615}}</ref> In [[Eukaryote|eukaryotic]] [[gene]]s with multiple [[exon]]s, [[intron]]s are removed and exons are then joined together after transcription to yield the final [[mRNA]] for protein translation. In the context of [[Gene prediction|gene finding]], the start-stop [[definition]] of an ORF therefore only applies to spliced [[Messenger RNA|mRNAs]], not genomic DNA, since introns may contain stop codons and/or cause shifts between reading frames. An alternative definition says that an ORF is a sequence that has a length divisible by three and is bounded by stop codons.<ref name="Sieber_2018">{{cite journal | vauthors = Sieber P, Platzer M, Schuster S | title = The Definition of Open Reading Frame Revisited | journal = Trends in Genetics | volume = 34 | issue = 3 | pages = 167–170 | date = March 2018 | pmid = 29366605 | doi = 10.1016/j.tig.2017.12.009 }}</ref><ref name="Claverie_1997a">{{cite journal | vauthors = Claverie JM | title = Computational methods for the identification of genes in vertebrate genomic sequences | journal = Human Molecular Genetics | volume = 6 | issue = 10 | pages = 1735–44 | date = 1997 | pmid = 9300666 | doi = 10.1093/hmg/6.10.1735 | doi-access = free }}</ref> This more general definition can be useful in the context of [[Transcriptomics technologies|transcriptomics]] and [[metagenomics]], where a start or stop codon may not be present in the obtained sequences. Such an ORF corresponds to parts of a gene rather than the complete gene. ==Biological significance== One common use of open reading frames (ORFs) is as one piece of evidence to assist in [[gene prediction]]. Long ORFs are often used, along with other evidence, to initially identify candidate [[Genetic code|protein-coding]] regions or [[Non-coding RNA|functional RNA]]-coding regions in a [[DNA]] sequence.<ref name=deonier2005p25 /> The presence of an ORF does not necessarily mean that the region is always [[Translation (genetics)|translated]]. For example, in a randomly generated DNA sequence with an equal percentage of each [[nucleotide]], a [[stop-codon]] would be expected once every 21 [[codon]]s.<ref name=deonier2005p25 /> A simple gene prediction algorithm for [[prokaryotes]] might look for a [[start codon]] followed by an open reading frame that is long enough to encode a typical protein, where the [[Codon usage bias|codon usage]] of that region matches the frequency characteristic for the given organism's coding regions.<ref name=deonier2005p25 /> Therefore, some authors say that an ORF should have a minimal length, e.g. 100 codons<ref name="Claverie_1997">{{cite journal | vauthors = Claverie JM, Poirot O, Lopez F | title = The difficulty of identifying genes in anonymous vertebrate sequences | journal = Computers & Chemistry | volume = 21 | issue = 4 | pages = 203–14 | date = 1997 | pmid = 9415985 | doi = 10.1016/s0097-8485(96)00039-3 }}</ref> or 150 codons.<ref name=deonier2005p25>{{cite book | vauthors = Deonier R, Tavaré S, Waterman M |title = Computational Genome Analysis: an introduction |publisher = [[Springer-Verlag]] |year = 2005 |isbn = 978-0-387-98785-9 |page=25|author2-link = Simon Tavaré }}</ref> By itself even a long open reading frame is not conclusive evidence for the presence of a [[gene]].<ref name=deonier2005p25 /> ===Short open reading frames=== Some '''short open reading frames''',<ref>{{cite journal | pmid=35300685 | doi=10.1186/s12929-022-00802-5 | title=Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures | year=2022 | journal=Journal of Biomedical Science| doi-access=free | last1=Leong | first1=Alyssa Zi-Xin | last2=Lee | first2=Pey Yee | last3=Mohtar | first3=M. Aiman | last4=Syafruddin | first4=Saiful Effendi | last5=Pung | first5=Yuh-Fen | last6=Low | first6=Teck Yew | volume=29 | issue=1 | page=19 | pmc=8928697 }}</ref> also named '''small open reading frames''',<ref>{{cite journal | pmid=36543139 | doi=10.1016/j.celrep.2022.111808 | title=De novo birth of functional microproteins in the human lineage | year=2022 | last1=Vakirlis | first1=Nikolaos | last2=Vance | first2=Zoe | last3=Duggan | first3=Kate M. | last4=McLysaght | first4=Aoife | journal=Cell Reports | volume=41 | issue=12 | page=111808 | pmc=10073203 | s2cid=254966620 }}</ref> abbreviated as '''sORFs''' or '''smORFs''', usually < 100 codons in length,<ref>{{cite journal | doi=10.3389/fgene.2021.796060 | doi-access=free | title=Small Open Reading Frames, How to Find Them and Determine Their Function | year=2022 | last1=Kute | first1=Preeti Madhav | last2=Soukarieh | first2=Omar | last3=Tjeldnes | first3=Håkon | last4=Trégouët | first4=David-Alexandre | last5=Valen | first5=Eivind | journal=Frontiers in Genetics | volume=12 | page=796060 | pmid=35154250 | pmc=8831751 }}</ref> that lack the classical hallmarks of protein-coding genes (both from ncRNAs and mRNAs) can produce functional [[peptide]]s.<ref name="ZanetBenrabah2015">{{cite journal | vauthors = Zanet J, Benrabah E, Li T, Pélissier-Monier A, Chanut-Delalande H, Ronsin B, Bellen HJ, Payre F, Plaza S | display-authors = 6 | title = Pri sORF peptides induce selective proteasome-mediated protein processing | journal = Science | volume = 349 | issue = 6254 | pages = 1356–1358 | date = September 2015 | pmid = 26383956 | doi = 10.1126/science.aac5677 | s2cid = 206639549 | bibcode = 2015Sci...349.1356Z | url = https://hal.inrae.fr/hal-04767052v1/file/zanet%202015.pdf }}</ref> They encode [[microprotein]]s or sORF‐encoded proteins (SEPs). The 5’-UTR of about 50% of mammal mRNAs are known to contain one or several sORFs,<ref>{{cite journal | vauthors = Wethmar K, Barbosa-Silva A, Andrade-Navarro MA, Leutz A | title = uORFdb--a comprehensive literature database on eukaryotic uORF biology | journal = Nucleic Acids Research | volume = 42 | issue = Database issue | pages = D60–D67 | date = January 2014 | pmid = 24163100 | pmc = 3964959 | doi = 10.1093/nar/gkt952 }}</ref> also called [[upstream ORF]]s or uORFs. However, less than 10% of the vertebrate mRNAs surveyed in an older study contained AUG codons in front of the major ORF. Interestingly, uORFs were found in two thirds of proto-oncogenes and related proteins.<ref>{{Cite journal|last1=Geballe|first1=A. P.|last2=Morris|first2=D. R.|date=April 1994|title=Initiation codons within 5'-leaders of mRNAs as regulators of translation|url=https://pubmed.ncbi.nlm.nih.gov/8016865|journal=Trends in Biochemical Sciences|volume=19|issue=4|pages=159–164|doi=10.1016/0968-0004(94)90277-1|issn=0968-0004|pmid=8016865}}</ref> 64–75% of experimentally found translation initiation sites of sORFs are conserved in the genomes of human and mouse and may indicate that these elements have function.<ref>{{cite journal | vauthors = Lee S, Liu B, Lee S, Huang SX, Shen B, Qian SB | title = Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 109 | issue = 37 | pages = E2424–E2432 | date = September 2012 | pmid = 22927429 | pmc = 3443142 | doi = 10.1073/pnas.1207846109 | doi-access = free }}</ref> However, sORFs can often be found only in the minor forms of mRNAs and avoid selection; the high conservation of initiation sites may be connected with their location inside promoters of the relevant genes. This is characteristic of [[SLAMF1]] gene, for example.<ref>{{cite journal | vauthors = Schwartz AM, Putlyaeva LV, Covich M, Klepikova AV, Akulich KA, Vorontsov IE, Korneev KV, Dmitriev SE, Polanovsky OL, Sidorenko SP, Kulakovskiy IV, Kuprash DV | display-authors = 6 | title = Early B-cell factor 1 (EBF1) is critical for transcriptional control of SLAMF1 gene in human B cells | journal = Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms | volume = 1859 | issue = 10 | pages = 1259–1268 | date = October 2016 | pmid = 27424222 | doi = 10.1016/j.bbagrm.2016.07.004 }}</ref> ==Six-frame translation== Since DNA is interpreted in groups of three nucleotides (codons), a DNA strand has three distinct reading frames.<ref name=":0">{{cite journal | vauthors = Pearson WR, Wood T, Zhang Z, Miller W | title = Comparison of DNA sequences with protein sequences | journal = Genomics | volume = 46 | issue = 1 | pages = 24–36 | date = November 1997 | pmid = 9403055 | doi = 10.1006/geno.1997.4995 | s2cid = 6413018 }}</ref> The double helix of a DNA molecule has two anti-parallel strands; with the two strands having three reading frames each, there are six possible frame translations.<ref name=":0" /> [[File:Open reading frame.jpg|thumb|440x440px|Example of a six-frame translation. The nucleotide sequence is shown in the middle with forward translations above and reverse translations below. Two possible open reading frames with the sequences are highlighted.|alt=]] ==Software== ===Finder=== The ORF Finder (Open Reading Frame Finder)<ref>{{cite web|url=https://www.ncbi.nlm.nih.gov/orffinder/|title=ORFfinder|website=National Center for Biotechnology Information }}</ref> is a graphical analysis tool which finds all open reading frames of a selectable minimum size in a user's sequence or in a sequence already in the database. This tool identifies all open reading frames using the standard or alternative genetic codes. The deduced amino acid sequence can be saved in various formats and searched against the sequence database using the [[BLAST (biotechnology)|basic local alignment search tool]] (BLAST) server. The ORF Finder should be helpful in preparing complete and accurate sequence submissions. It is also packaged with the Sequin sequence submission software (sequence analyser). ===Investigator=== ORF Investigator<ref>{{cite journal | vauthors = Dhar DV, Kumar MS |title=ORF Investigator: A New ORF finding tool combining Pairwise Global Gene Alignment |journal=Research Journal of Recent Sciences |date=2012 |volume=1 |issue=11 |pages=32–35}}</ref> is a program which not only gives information about the coding and non coding sequences but also can perform pairwise global alignment of different gene/DNA regions sequences. The tool efficiently finds the ORFs for corresponding amino acid sequences and converts them into their single letter amino acid code, and provides their locations in the sequence. The pairwise global alignment between the sequences makes it convenient to detect the different mutations, including [[single nucleotide polymorphism]]. [[Needleman–Wunsch algorithm]]s are used for the gene alignment. The ORF Investigator is written in the portable [[Perl]] [[programming language]], and is therefore available to users of all common operating systems. ===Predictor=== OrfPredictor<ref>{{cite web|url=http://bioinformatics.ysu.edu/tools/OrfPredictor.html|title=OrfPredictor|website=bioinformatics.ysu.edu|access-date=2015-12-17|archive-date=2015-12-22|archive-url=https://web.archive.org/web/20151222082631/http://bioinformatics.ysu.edu/tools/OrfPredictor.html|url-status=dead}}</ref> is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the [[FASTA format]], and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. ORF Predictor uses a combination of the two different ORF definitions mentioned above. It searches stretches starting with a start codon and ending at a stop codon. As an additional criterion, it searches for a stop codon in the 5' [[untranslated region]] (UTR or NTR, ''nontranslated region''<ref name="Carrington_1990">{{cite journal | vauthors = Carrington JC, Freed DD | title = Cap-independent enhancement of translation by a plant potyvirus 5' nontranslated region | journal = Journal of Virology | volume = 64 | issue = 4 | pages = 1590–7 | date = April 1990 | pmid = 2319646 | pmc = 249294 | doi = 10.1128/JVI.64.4.1590-1597.1990 }}</ref>). The OrfPredictor web server was not further supported, the standalone OrfPredictor tool can be downloaded at the following site (http://bioinformatics.ysu.edu/publication/tools_download/). ===ORFik=== ORFik is a R-package in Bioconductor for finding open reading frames and using Next generation sequencing technologies for justification of ORFs.<ref>{{cite journal|url=https://bioconductor.org/packages/release/bioc/html/ORFik.html|title=ORFik - Open reading frames in genomics|website=bioconductor.org|year=2018|doi=10.18129/B9.bioc.ORFik|last1=Kornel Labun|first1=Haakon Tjeldnes}}</ref><ref>{{cite journal | doi=10.1186/s12859-021-04254-w | title=ORFik: A comprehensive R toolkit for the analysis of translation | year=2021 | last1=Tjeldnes | first1=Håkon | last2=Labun | first2=Kornel | last3=Torres Cleuren | first3=Yamila | last4=Chyżyńska | first4=Katarzyna | last5=Świrski | first5=Michał | last6=Valen | first6=Eivind | journal=BMC Bioinformatics | volume=22 | issue=1 | page=336 | pmid=34147079 | pmc=8214792 | doi-access=free }}</ref> === orfipy === orfipy is a tool written in [[Python (programming language) |Python]] / [[Cython]] to extract ORFs in an extremely and fast and flexible manner.<ref>{{cite journal | vauthors = Singh U, Wurtele ES | title = orfipy: a fast and flexible tool for extracting ORFs | journal = Bioinformatics | date = February 2021 | volume = 37 | issue = 18 | pages = 3019–3020 | pmid = 33576786 | doi = 10.1093/bioinformatics/btab090| issn=1367-4803 | pmc = 8479652 | doi-access = free }}</ref> orfipy can work with plain or gzipped FASTA and FASTQ sequences, and provides several options to fine-tune ORF searches; these include specifying the start and stop codons, reporting partial ORFs, and using custom translation tables. The results can be saved in multiple formats, including the space-efficient BED format. orfipy is particularly faster for data containing multiple smaller FASTA sequences, such as de-novo transcriptome assemblies.<ref>{{Citation| vauthors = Singh U |title=urmi-21/orfipy|date=2021-02-13|url=https://github.com/urmi-21/orfipy|access-date=2021-02-13}}</ref> == See also == * [[Coding region]] * [[Putative gene]] * [[Sequerome]] – A [[sequence profiling tool]] that links each [[BLAST (biotechnology)|BLAST]] record to the [[National Center for Biotechnology Information|NCBI]] ORF enabling complete ORF analysis of a BLAST report. * [[Micropeptide]] == References == {{Reflist}} == External links == * [http://bioweb.uwlax.edu/GenWeb/Molecular/Seq_Anal/Translation/translation.html Translation and Open Reading Frames] * [http://horfdb.dfci.harvard.edu/ hORFeome V5.1] - A web-based interactive tool for CCSB Human ORFeome Collection * [https://web.archive.org/web/20091128111853/http://ugene.unipro.ru/plugin_orf_marker.html ORF Marker] - A free, fast and multi-platform desktop GUI tool for predicting and analyzing ORFs * [http://web.mit.edu/star/orf/ StarORF] - A multi-platform, java-based, GUI tool for predicting and analyzing ORFs and obtaining reverse complement sequence * [http://bioinformatics.ysu.edu/tools/OrfPredictor.html ORFPredictor] {{Webarchive|url=https://web.archive.org/web/20151222082631/http://bioinformatics.ysu.edu/tools/OrfPredictor.html |date=2015-12-22 }} - A webserver designed for ORF prediction and translation of a batch of EST or cDNA sequences [[Category:Molecular genetics]] [[Category:Bioinformatics]] [[he:מסגרת קריאה#מסגרת קריאה פתוחה]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Webarchive
(
edit
)