Editing Gene prediction (section)

== Comparative genomics approaches ==
As the entire genomes of many different species are sequenced, a promising direction in current research on gene finding is a [[comparative genomics]] approach.

This is based on the principle that the forces of [[natural selection]] cause genes and other functional elements to undergo mutation at a slower rate than the rest of the genome, since mutations in functional elements are more likely to negatively impact the organism than mutations elsewhere.  Genes can thus be detected by comparing the genomes of related species to detect this evolutionary pressure for conservation. This approach was first applied to the mouse and human genomes, using programs such as SLAM, SGP and TWINSCAN/N-SCAN and CONTRAST.<ref name=":0">{{cite journal | vauthors = Gross SS, Do CB, Sirota M, Batzoglou S | title = CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction | language = En | journal = Genome Biology | volume = 8 | issue = 12 | pages = R269 | date = 2007 | pmid = 18096039 | pmc = 2246271 | doi = 10.1186/gb-2007-8-12-r269 | doi-access = free }}</ref>

=== Multiple informants ===

TWINSCAN examined only human-mouse synteny to look for orthologous genes. Programs such as N-SCAN and CONTRAST allowed the incorporation of alignments from multiple organisms, or in the case of N-SCAN, a single alternate organism from the target. The use of multiple informants can lead to significant improvements in accuracy.<ref name=":0" />

CONTRAST is composed of two elements. The first is a smaller classifier, identifying donor splice sites and acceptor splice sites as well as start and stop codons. The second element involves constructing a full model using machine learning. Breaking the problem into two means that smaller targeted data sets can be used to train the classifiers,
and that classifier can operate independently and be trained with smaller windows. The full model can use the independent classifier, and not have to waste computational time or model complexity re-classifying intron-exon boundaries. The paper in which CONTRAST is introduced proposes that their method (and those of TWINSCAN, etc.) be classified as ''de novo'' gene assembly, using alternate genomes, and identifying it as distinct from ''ab initio'', which uses a target 'informant' genomes.<ref name=":0" />

Comparative gene finding can also be used to project high quality annotations from one genome to another. Notable examples include Projector, GeneWise, GeneMapper and GeMoMa. Such techniques now play a central role in the annotation of all genomes.