Editing Gene duplication (section)

===Criteria and single genome scans===
The two genes that exist after a gene duplication event are called [[Paralog#Orthology and paralogy|paralogs]] and usually code for [[protein]]s with a similar function and/or structure.  By contrast, [[Paralog#Orthology and paralogy|orthologous]] genes present in different species which are each originally derived from the same ancestral sequence.  (See [[Homology (biology)#Sequence homology|Homology of sequences in genetics]]).

It is important (but often difficult) to differentiate between paralogs and orthologs in biological research. Experiments on human gene function can often be carried out on other [[species]] if a homolog to a human gene can be found in the genome of that species, but only if the homolog is orthologous. If they are paralogs and resulted from a gene duplication event, their functions are likely to be too different. One or more copies of duplicated genes that constitute a gene family may be affected by insertion of [[transposable elements]] that causes significant variation between them in their sequence and finally may become responsible for [[divergent evolution]]. This may also render the chances and the rate of [[gene conversion]] between the homologs of gene duplicates due to less or no similarity in their sequences.

Paralogs can be identified in single genomes through a sequence comparison of all annotated gene models to one another.  Such a comparison can be performed on translated amino acid sequences (e.g. BLASTp, tBLASTx) to identify ancient duplications or on DNA nucleotide sequences (e.g. BLASTn, megablast) to identify more recent duplications.  Most studies to identify gene duplications require reciprocal-best-hits or fuzzy reciprocal-best-hits, where each paralog must be the other's single best match in a sequence comparison.<ref name= Hahn>{{cite journal | vauthors = Hahn MW, Han MV, Han SG | title = Gene family evolution across 12 Drosophila genomes | journal = PLOS Genetics | volume = 3 | issue = 11 | pages = e197 | date = November 2007 | pmid = 17997610 | pmc = 2065885 | doi = 10.1371/journal.pgen.0030197 | doi-access = free }}</ref>

Most gene duplications exist as [[low copy repeats]] (LCRs), rather highly repetitive sequences like transposable elements. They are mostly found in [[Chromosome regions|pericentronomic]], [[subtelomeric]] and [[Chromosome regions|interstitial]] regions of a chromosome. Many LCRs, due to their size (>1Kb), similarity, and orientation, are highly susceptible to duplications and deletions.