Nucleotide diversity
Nucleotide diversity is a concept in molecular genetics which is used to measure the degree of polymorphism within a population. <ref> Template:Cite journal </ref>
One commonly used measure of nucleotide diversity was first introduced by Nei and Li in 1979. This measure is defined as the average number of nucleotide differences per site between two DNA sequences in all possible pairs in the sample population, and is denoted by <math>\pi</math>.
An estimator for <math>\pi</math> is given by:
- <math>\hat{\pi} = \frac{n}{n-1} \sum_{ij} x_i x_j \pi_{ij} = \frac{n}{n-1} \sum_{i=2}^n \sum_{j=1}^{i-1} 2 x_i x_j \pi_{ij}</math>
where <math>x_i</math> and <math>x_j</math> are the respective frequencies of the <math>i</math> th and <math>j</math> th sequences, <math>\pi_{ij}</math> is the number of nucleotide differences per nucleotide site between the <math>i</math> th and <math>j</math> th sequences, and <math>n</math> is the number of sequences in the sample. The term in front of the sums guarantees an unbiased estimator, which does not depend on how many sequences you sample.<ref>Template:Cite journal</ref>
Nucleotide diversity is a measure of genetic variation. It is usually associated with other statistical measures of population diversity, and is similar to expected heterozygosity. This statistic may be used to monitor diversity within or between ecological populations, to examine the genetic variation in crops and related species,<ref>Template:Cite journal</ref> or to determine evolutionary relationships.<ref>Template:Cite journal</ref>
Nucleotide diversity can be calculated by examining the DNA sequences directly, or may be estimated from molecular marker data, such as Random Amplified Polymorphic DNA (RAPD) data <ref>Template:Cite journal</ref> and Amplified Fragment Length Polymorphism (AFLP) data.<ref>Template:Cite journal</ref>
SoftwareEdit
- DnaSP — DNA Sequence Polymorphism, is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data.
- MEGA, Molecular Evolutionary Genetics Analysis, is a software package used for estimating rates of molecular evolution, as well as generating phylogenetic trees, and aligning DNA sequences. Available for Windows, Linux and Mac OS X (since ver. 5.x).
- Arlequin3 software can be used for calculations of nucleotide diversity and a variety of other statistical tests for intra-population and inter-population analyses. Available for Windows.
- Variscan
- R package PopGenome
- pixy
- R package QSutils