Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sequence analysis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Gene prediction == {{Main|Gene prediction}} Gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode [[genes]]. This includes protein-coding [[gene]]s as well as [[RNA gene]]s, but may also include the prediction of other functional elements such as [[regulatory regions]]. Geri is one of the first and most important steps in understanding the genome of a species once it has been [[Sequencing|sequenced]]. In general, the prediction of bacterial genes is significantly simpler and more accurate than the prediction of genes in eukaryotic species that usually have complex [[intron]]/[[exon]] patterns. Identifying genes in long sequences remains a problem, especially when the number of genes is unknown. [[Hidden markov models]] can be part of the solution.<ref>{{cite journal|last1=Stanke|first1=M|last2=Waack|first2=S|title=Gene prediction with a hidden Markov model and a new intron submodel|journal=Bioinformatics|volume=19 Suppl 2|date=Oct 19, 2003|issue=2|pages=215β25|pmid=14534192|doi=10.1093/bioinformatics/btg1080|doi-access=free}}</ref> Machine learning has played a significant role in predicting the sequence of transcription factors.<ref>{{cite journal|last1=Alipanahi|first1=B|last2=Delong|first2=A|last3=Weirauch|first3=MT|last4=Frey|first4=BJ|title=Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning|journal=Nat Biotechnol|date=Aug 2015|volume=33|issue=8|pages=831β8|doi=10.1038/nbt.3300|pmid=26213851|doi-access=free}}</ref> Traditional sequencing analysis focused on the statistical parameters of the nucleotide sequence itself (The most common programs used are listed in [https://www.ncbi.nlm.nih.gov/books/NBK20261/table/A181/?report=objectonly Table 4.1]). Another method is to identify homologous sequences based on other known gene sequences (Tools see [https://www.ncbi.nlm.nih.gov/books/NBK20261/table/A190/?report=objectonly Table 4.3]).<ref>{{cite journal|last1=Wooley|first1=JC|last2=Godzik|first2=A|last3=Friedberg|first3=I|title=A primer on metagenomics|journal=PLOS Comput Biol|date=Feb 26, 2010|volume=6|issue=2|doi=10.1371/journal.pcbi.1000667|pmid=20195499|pmc=2829047|pages=e1000667|bibcode=2010PLSCB...6E0667W |doi-access=free }}</ref> The two methods described here are focused on the sequence. However, the shape feature of these molecules such as DNA and protein have also been studied and proposed to have an equivalent, if not higher, influence on the behaviors of these molecules.<ref>{{cite journal|last1=Abe|first1=N|last2=Dror|first2=I|last3=Yang|first3=L|last4=Slattery|first4=M|last5=Zhou|first5=T|last6=Bussemaker|first6=HJ|last7=Rohs R|first7=R|last8=Mann|first8=RS|title=Deconvolving the recognition of DNA shape from sequence|journal=Cell|date=Apr 9, 2015|volume=161|issue=2|pages=307β18|doi=10.1016/j.cell.2015.02.008|pmid=25843630|pmc=4422406}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)