Editing International HapMap Project

{{Short description|Project that developed a haplotype map of the human genome}}
{{primary sources|date=October 2012}}
The '''International HapMap Project''' was an organization that aimed to develop a [[haplotype]] [[map]] ('''HapMap''') of the [[human genome]], to describe the common patterns of human [[genetic variability|genetic variation]]. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.

The International HapMap Project is a collaboration among researchers at academic centers, non-profit biomedical research groups and private companies in [[Canada]], [[China]] (including [[Hong Kong]]), [[Japan]], [[Nigeria]], the [[United Kingdom]], and the [[United States]]. It officially started with a meeting on October 27 to 29, 2002, and was expected to take about three years. It comprises three phases; the complete data obtained in Phase I were published on 27 October 2005.<ref>{{Cite journal|last1=Altshuler|first1=David|last2=Donnelly|first2=Peter|last3=The International HapMap Consortium|date=October 2005|title=A haplotype map of the human genome|journal=Nature|language=en|volume=437|issue=7063|pages=1299–1320|doi=10.1038/nature04226|pmid=16255080|pmc=1880871|bibcode=2005Natur.437.1299T|issn=1476-4687|doi-access=free}}</ref> The analysis of the Phase II dataset was published in October 2007.<ref>{{Cite journal|last1=Frazer|first1=Kelly A.|last2=Ballinger|first2=Dennis G.|last3=Cox|first3=David R.|last4=Hinds|first4=David A.|last5=Stuve|first5=Laura L.|last6=Gibbs|first6=Richard A.|last7=Belmont|first7=John W.|last8=Boudreau|first8=Andrew|last9=Hardenbol|first9=Paul|last10=Leal|first10=Suzanne M.|last11=Pasternak|first11=Shiran|date=October 2007|title=A second generation human haplotype map of over 3.1 million SNPs|url= |journal=Nature|language=en|volume=449|issue=7164|pages=851–861|doi=10.1038/nature06258|pmid=17943122|pmc=2689609|bibcode=2007Natur.449..851F|issn=1476-4687|hdl=2027.42/62863|hdl-access=free}}</ref>  The Phase III dataset was released in spring 2009 and the publication presenting the final results published in September 2010.<ref>{{Cite journal|last1=Altshuler|first1=David M.|last2=Gibbs|first2=Richard A.|last3=Peltonen|first3=Leena|last4=Altshuler|first4=David M.|last5=Gibbs|first5=Richard A.|last6=Peltonen|first6=Leena|last7=Dermitzakis|first7=Emmanouil|last8=Schaffner|first8=Stephen F.|last9=Yu|first9=Fuli|last10=Peltonen|first10=Leena|last11=Dermitzakis|first11=Emmanouil|date=September 2010|title=Integrating common and rare genetic variation in diverse human populations|url= |journal=Nature|language=en|volume=467|issue=7311|pages=52–58|doi=10.1038/nature09298|pmid=20811451|issn=1476-4687|pmc=3173859|bibcode=2010Natur.467...52T}}</ref>

== Background ==
Unlike with the [[rare disease|rarer]] [[Mendelian]] diseases, combinations of different [[genes]] and the environment play a role in the development and progression of common diseases (such as [[diabetes]], [[cancer]], [[heart disease]], [[stroke]], [[clinical depression|depression]], and [[asthma]]), or in the individual response to [[pharmacological]] agents.<ref>{{cite journal |last1=Crouch |first1=Daniel J. M. |last2=Bodmer |first2=Walter F. |title=Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants |journal=Proceedings of the National Academy of Sciences |date=11 August 2020 |volume=117 |issue=32 |pages=18924–18933 |doi=10.1073/pnas.2005634117|pmid=32753378 |pmc=7431089 |bibcode=2020PNAS..11718924C |doi-access=free }}</ref> To find the genetic factors involved in these diseases, one could in principle do a [[genome-wide association study]]: obtain the complete genetic sequence of several individuals, some with the disease and some without, and then search for differences between the two sets of genomes. At the time, this approach was not feasible because of the cost of [[full genome sequencing]]. The HapMap project proposed a shortcut.

Although any two unrelated people share about 99.5% of their [[DNA]] sequence, their [[genome]]s differ at specific [[nucleotide]] locations. Such sites are known as [[single nucleotide polymorphisms]] (SNPs), and each of the possible resulting gene forms is called an [[allele]].<ref name="NHGRI_allele">{{cite web |title=Allele |url=https://www.genome.gov/genetics-glossary/Allele |website=Genome.gov |publisher=National Human Genome Research Institute |language=en}}</ref> The HapMap project focuses only on common SNPs, those where each allele occurs in at least 1% of the population.

Each person has two copies of all [[chromosomes]], except the [[sex chromosomes]] in [[male]]s. For each SNP, the combination of alleles a person has is called a [[genotype]]. [[Genotyping]] refers to uncovering what genotype a person has at a particular site. The HapMap project chose a sample of 269 individuals and selected several million well-defined SNPs, genotyped the individuals for these SNPs, and published the results.<ref name="HapMapNaturePaper2003">{{cite journal |last1=The International HapMap Consortium |title=The International HapMap Project |journal=Nature |date=December 2003 |volume=426 |issue=6968 |pages=789–796 |doi=10.1038/nature02168 |pmid=14685227 |hdl=2027.42/62838 |s2cid=8151693 |doi-access=free |hdl-access=free }}</ref>

The alleles of nearby SNPs on a single chromosome are correlated. Specifically, if the allele of one SNP for a given individual is known, the alleles of nearby SNPs can often be predicted, a process known as ''genotype imputation''.<ref name="Deng2022">{{cite journal |last1=Deng |first1=Tianyu |last2=Zhang |first2=Pengfei |last3=Garrick |first3=Dorian |last4=Gao |first4=Huijiang |last5=Wang |first5=Lixian |last6=Zhao |first6=Fuping |title=Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data |journal=Frontiers in Genetics |date=2022 |volume=12 |page=704118 |doi=10.3389/fgene.2021.704118 |pmid=35046990 |pmc=8762119 |doi-access=free }}</ref> This is because each SNP arose in evolutionary history as a single point [[mutation]], and was then passed down on the chromosome surrounded by other, earlier, point mutations. SNPs that are separated by a large distance on the chromosome are typically not very well correlated, because [[Genetic recombination|recombination]] occurs in each generation and mixes the allele sequences of the two chromosomes. A sequence of consecutive alleles on a particular chromosome is known as a [[haplotype]].<ref name="NHGRI_Haplotype">{{cite web |title=Haplotype |url=https://www.genome.gov/genetics-glossary/haplotype |website=Genome.gov |publisher=National Human Genome Research Institute |access-date=25 June 2022 |language=en}}</ref>

To find the genetic factors involved in a particular disease, one can proceed as follows. First a certain region of interest in the genome is identified, possibly from earlier inheritance studies. In this region one locates a set of [[tag SNP]]s from the HapMap data; these are SNPs that are very well correlated with all the other SNPs in the region. Using these, genotype imputation can be used to determine (impute) the other SNPs and thus the entire haplotype with high confidence. Next, one determines the genotype for these tag SNPs in several individuals, some with the disease and some without. By comparing the two groups, one determines the likely locations and haplotypes that are involved in the disease.

== Samples used ==

[[Haplotypes]] are generally shared between populations, but their frequency can differ widely. Four populations were selected for inclusion in the HapMap: 30 adult-and-both-parents [[Yoruba people|Yoruba]] trios from [[Ibadan]], [[Nigeria]] (YRI), 30 trios of Utah residents of northern and western [[European ethnic groups|European]] ancestry (CEU), 44 unrelated Japanese individuals from [[Tokyo]], [[Japan]] (JPT) and 45 unrelated [[Han Chinese]] individuals from [[Beijing]], [[China]] (CHB). Although the haplotypes revealed from these populations should be useful for studying many other populations, parallel studies are currently examining the usefulness of including additional populations in the project.

All samples were collected through a community engagement process with appropriate informed consent. The community engagement process was designed to identify and attempt to respond to culturally specific concerns and give participating communities input into the informed consent and sample collection processes.<ref>{{Cite journal|last1=Rotimi|first1=Charles|last2=Leppert|first2=Mark|last3=Matsuda|first3=Ichiro|last4=Zeng|first4=Changqing|last5=Zhang|first5=Houcan|last6=Adebamowo|first6=Clement|last7=Ajayi|first7=Ike|last8=Aniagwu|first8=Toyin|last9=Dixon|first9=Missy|last10=Fukushima|first10=Yoshimitsu|last11=Macer|first11=Darryl|date=2007|title=Community Engagement and Informed Consent in the International HapMap Project|url=https://www.karger.com/Article/FullText/101761|journal=Public Health Genomics|language=english|volume=10|issue=3|pages=186–198|doi=10.1159/000101761|issn=1662-4246|pmid=17575464|s2cid=10844405|url-access=subscription}}</ref>

In phase III, 11 global ancestry groups have been assembled: ASW (African ancestry in Southwest USA); CEU (Utah residents with Northern and Western European ancestry from the CEPH collection); CHB (Han Chinese in Beijing, China); CHD (Chinese in Metropolitan Denver, Colorado); GIH (Gujarati Indians in Houston, Texas); JPT (Japanese in Tokyo, Japan); LWK (Luhya in Webuye, Kenya); MEX (Mexican ancestry in Los Angeles, California); MKK (Maasai in Kinyawa, Kenya); TSI (Tuscans in Italy); YRI (Yoruba in Ibadan, Nigeria).<ref name="HapMap2010">International HapMap consortium et al. (2010). Integrating common and rare genetic variation in diverse human populations. ''Nature'', '''467''', 52-8. [https://dx.doi.org/10.1038/nature09298 doi]</ref>

{| class="wikitable"
! Phase
! ID
! Place
! Population
! Detail
|-
|I/II
|CEU
|{{flagicon|USA}}
|[[Utah]] residents with [[Northern European|Northern]] and [[Western European]] ancestry from the [[Fondation Jean Dausset-CEPH|CEPH]] collection
|[https://catalog.coriell.org/1/NIGMS/Collections/CEPH-Resources Detail]
|-
|I/II
|CHB
|{{flagicon|CHN}}
|[[Han Chinese]] in [[Beijing]], [[China]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Han-Chinese-in-Beijing-China-CHB Detail]
|-
|I/II
|JPT
|{{flagicon|JPN}}
|[[Yamato people|Japanese]] in [[Tokyo]], [[Japan]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Japanese-in-Tokyo-Japan-JPT Detail]
|-
|I/II
|YRI
|{{flagicon|NGR}}
|[[Yoruba people|Yoruba]] in [[Ibadan]], [[Nigeria]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Yoruba-in-Ibadan-Nigeria-YRI Detail]
|-
|III
|ASW
|{{flagicon|USA}}
|[[African Americans|African ancestry]] in the [[Southwestern United States|Southwest USA]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/African-Ancestry-in-SW-USA-ASW Detail]
|-
|III
|CHD
|{{flagicon|USA}}
|[[Chinese Americans|Chinese]] in [[metropolitan Denver]], [[Colorado|CO]], [[United States]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Chinese-in-Metropolitan-Denver-CO-USA-CHD Detail]
|-
|III
|GIH
|{{flagicon|USA}}
|[[Gujarati people|Gujarati]] [[Indian Americans|Indians]] in [[Houston]], [[Texas|TX]], [[United States]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Gujarati-Indians-in-Houston-TX-USA-GIH Detail]
|-
|III
|LWK
|{{flagicon|KEN}}
|[[Luhya people|Luhya]] in [[Webuye]], [[Kenya]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Luhya-in-Webuye-Kenya-LWK Detail]
|-
|III
|MKK
|{{flagicon|KEN}}
|[[Maasai people|Maasai]] in [[Kinyawa]], [[Kenya]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Maasai-in-Kinyawa-Kenya-MKK Detail]
|-
|III
|MXL
|{{flagicon|USA}}
|[[Mexican Americans|Mexican ancestry]] in [[Los Angeles]], [[California|CA]], [[United States]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Mexican-Ancestry-in-Los-Angeles-CA-USA-MXL Detail]
|-
|III
|TSI
|{{flagicon|ITA}}
|[[Tuscany|Toscani]] in [[Italy|Italia]]
|[https://catalog.coriell.org/1/NHGRI/Collections/HapMap-Collections/Toscani-in-Italia-TSI Detail]
|}

Three combined panels have also been created, which allow better identification of SNPs in groups outside the nine homogenous samples: CEU+TSI (Combined panel of Utah residents with Northern and Western European ancestry from the CEPH collection and Tuscans in Italy); JPT+CHB (Combined panel of Japanese in Tokyo, Japan and Han Chinese in Beijing, China) and JPT+CHB+CHD (Combined panel of Japanese in Tokyo, Japan, Han Chinese in Beijing, China and Chinese in Metropolitan Denver, Colorado). CEU+TSI, for instance, is a better model of UK British individuals than is CEU alone.<ref name="HapMap2010"/>

== Scientific strategy ==
It was expensive in the 1990s to sequence patients’ whole genomes. So the [[National Institutes of Health]] embraced the idea for a "shortcut", which was to look just at sites on the genome where many people have a variant DNA unit. The theory behind the shortcut was that, since the major diseases are common, so too would be the genetic variants that caused them. [[Natural selection]] keeps the human genome free of variants that damage health before children are grown, the theory held, but fails against variants that strike later in life, allowing them to become quite common (In 2002 the [[National Institutes of Health]] started a $138 million project called the [[HapMap]] to catalog the common variants in European, East Asian and African genomes).<ref name=naid>{{cite journal | vauthors = Naidoo N, Pawitan Y, Soong R, Cooper DN, Ku CS | title = Human genetics and genomics a decade after the release of the draft sequence of the human genome | journal = Human Genomics | volume = 5 | issue = 6 | pages = 577–622 | date = October 2011 | pmid = 22155605 | pmc = 3525251 | doi = 10.1186/1479-7364-5-6-577 | doi-access = free }}</ref>

For the Phase I, one common SNP was genotyped every 5,000 bases. Overall, more than one million SNPs were genotyped. The genotyping was carried out by 10 centres using five different genotyping technologies. Genotyping quality was assessed by using duplicate or related samples and by having periodic quality checks where centres had to genotype common sets of SNPs.

The Canadian team was led by [[Thomas J. Hudson]] at [[McGill University]] in [[Montreal]] and focused on chromosomes 2 and 4p. The Chinese team was led by [[Huanming Yang]] in [[Beijing]] and [[Shanghai]], and [[Lap-Chee Tsui]] in [[Hong Kong]] and focused on chromosomes 3, 8p and 21. The Japanese team was led by [[Yusuke Nakamura (geneticist)|Yusuke Nakamura]] at the [[University of Tokyo]] and focused on chromosomes 5, 11, 14, 15, 16, 17 and 19. The British team was led by [[David R. Bentley]] at the [[Sanger Institute]] and focused on chromosomes 1, 6, 10, 13 and 20. There were four United States' genotyping centres: a team led by [[Mark Chee]] and [[Arnold Oliphant]] at [[Illumina (company)|Illumina Inc.]] in [[San Diego]] (studying chromosomes 8q, 9, 18q, 22 and X), a team led by [[David Altshuler (physician)|David Altshuler]] and [[Mark Daly (scientist)|Mark Daly]] at the [[Broad Institute]] in [[Cambridge, Massachusetts|Cambridge, USA]] (chromosomes 4q, 7q, 18p, Y and [[mitochondrion]]), a team led by [[Richard Gibbs (biologist)|Richard Gibbs]] at the [[Baylor College of Medicine]] in [[Houston]] (chromosome 12), and a team led by [[Pui-Yan Kwok]] at the [[University of California, San Francisco]] (chromosome 7p).

To obtain enough SNPs to create the Map, the Consortium funded a large re-sequencing project to discover millions of additional SNPs. These were submitted to the public [[dbSNP]] database. As a result, by August 2006, the database included more than ten million SNPs, and more than 40% of them were known to be [[Polymorphism (biology)#Genetic polymorphism|polymorphic]]. By comparison, at the start of the project, fewer than 3 million SNPs were identified, and no more than 10% of them were known to be polymorphic.

During Phase II, more than two million additional SNPs were genotyped throughout the genome by David R. Cox, [[Kelly A. Frazer]] and others at [[Perlegen Sciences]] and 500,000 by the company [[Affymetrix]].

== Data access ==
All of the data generated by the project, including SNP frequencies, [[genotypes]] and [[haplotypes]], were placed in the public domain and are available for download.<ref>{{Cite journal|last1=Thorisson|first1=Gudmundur A.|last2=Smith|first2=Albert V.|last3=Krishnan|first3=Lalitha|last4=Stein|first4=Lincoln D.|date=2005-11-01|title=The International HapMap Project Web site|url=http://genome.cshlp.org/content/15/11/1592|journal=Genome Research|language=en|volume=15|issue=11|pages=1592–1593|doi=10.1101/gr.4413105|issn=1088-9051|pmc=1310647|pmid=16251469}}</ref> This website also contains a genome browser which allows to find SNPs in any region of interest, their allele frequencies and their association to nearby SNPs. A tool that can determine tag SNPs for a given region of interest is also provided. These data can also be directly accessed from the widely used [[Haploview]] program.

==Publications==
* {{cite journal
| author = International HapMap Consortium
| date       = 2003
| title      = The International HapMap Project
| journal    = Nature
| doi        = 10.1038/nature02168
| pmid = 14685227
| volume     = 426
| issue       = 6968
| pages      = 789–796
| bibcode = 2003Natur.426..789G
| url= https://deepblue.lib.umich.edu/bitstream/2027.42/62838/1/nature02168.pdf
| hdl = 2027.42/62838
| s2cid = 4387110
| hdl-access= free
}}
* {{cite journal
| author = International HapMap Consortium
| date       = 2004
| title      = Integrating ethics and science in the International HapMap Project
| journal    = Nature Reviews Genetics
| doi        = 10.1038/nrg1351
| pmid       = 15153999
| volume     = 5
| issue       = 6
| pages      = 467–475
| pmc= 2271136
}}
* {{cite journal
| author = International HapMap Consortium
| date       = 2005
| title      = A haplotype map of the human genome
| journal    = Nature
| doi        = 10.1038/nature04226
| pmid = 16255080
| volume     = 437
| issue       = 7063
| pages      = 1299–1320
| pmc= 1880871
| bibcode = 2005Natur.437.1299T
}}
* {{cite journal
| author=International HapMap Consortium
| date       = 2007
| title      = A second generation human haplotype map of over 3.1 million SNPs
| journal    = Nature
| doi        = 10.1038/nature06258
| pmid       = 17943122
| volume     = 449
| issue       = 7164
| pages      = 851–861
| pmc= 2689609
| bibcode = 2007Natur.449..851F
}}
* {{cite journal
| author=International HapMap 3 Consortium
| date       = 2010
| title      = Integrating common and rare genetic variation in diverse human populations
| journal    = Nature
| doi        = 10.1038/nature09298
| pmid       = 20811451
| pmc       = 3173859
| volume     = 467
| issue       = 7311
| pages      = 52–58
| bibcode = 2010Natur.467...52T
}}
* {{cite journal
| vauthors= Deloukas P, Bentley D
| date       = 2004
| title      = The HapMap project and its application to genetic studies of drug response
| journal    = The Pharmacogenomics Journal
| doi        = 10.1038/sj.tpj.6500226
| pmid       = 14676823
| volume     = 4
| issue       = 2
| pages      = 88–90
| doi-access=
}}
* {{cite journal
| vauthors=Thorisson GA, Smith AV, Krishnan L, Stein LD
| date       = 2005
| title      = The International HapMap Project Web site
| journal    = Genome Research
| doi        = 10.1101/gr.4413105
| pmid       = 16251469
| pmc = 1310647
| volume     = 15
| issue       = 11
| pages      = 1592–1593
}}
* {{cite journal|vauthors=Terwilliger JD, Hiekkalinna T|date=2006|title=An utter refutation of the 'Fundamental Theorem of the HapMap'|journal=European Journal of Human Genetics|doi=10.1038/sj.ejhg.5201583|pmid=16479260|volume=14|issue=4|pages=426–437|doi-access=free}}
* Secko, David (2005). [http://www.the-scientist.com/news/20051026/01 "Phase I of the HapMap Complete"] {{Webarchive|url=https://web.archive.org/web/20110514112054/http://www.the-scientist.com/news/20051026/01/ |date=2011-05-14 }}. The Scientist

==See also==
* [[Genealogical DNA test]]
* [[The 1000 Genomes Project]]
* [[Population groups in biomedicine]]
* [[Human Variome Project]]
* [[Human genetic variation]]

==References==
{{Reflist}}

==External links==
* [http://www.hapmap.org/ International HapMap Project (HapMap Homepage)] {{Webarchive|url=https://web.archive.org/web/20140416084248/http://www.hapmap.org/ |date=2014-04-16 }}
* [http://www.genome.gov/10001688 National Human Genome Research Institute (NHGRI) HapMap Page]
* [http://www.cshprotocols.org/cgi/content/full/2008/8/pdb.prot5023 Browsing HapMap Data Using the Genome Browser]
* [https://archive.today/20100918023309/http://diversity.inmegen.gob.mx/gbrowse/cgi-bin/gbrowse/inmegen_diversity/ The Mexican Genome Diversity Project]

{{Personal genomics}}
{{Authority control}}

[[Category:Human genome projects]]
[[Category:Genetic genealogy projects]]
[[Category:Genealogy websites]]
[[Category:Biological databases]]
[[Category:Open science]]
[[Category:Single-nucleotide polymorphisms]]