GEDCOM
Template:Short description Template:Infobox file format
FamilySearch GEDCOM, or simply GEDCOM (Template:IPAc-en Template:Respell, acronym of Genealogical Data Communication), is an open file format and the de facto standard specification for storing genealogical data.<ref name="defacto" /> It was developed by the Church of Jesus Christ of Latter-day Saints (LDS Church), the operators of FamilySearch, to aid in the research and sharing of genealogical information.<ref>Subject: rep: T Jenkins – open letter to GEDCOM-L – "The goal was to try and provide a standard to allow developers to provide a vehicle for their users to share genealogical conclusions and supporting evidence with others." From: "Jed R. Allen" Brigham Young University – Date: 29 Sep 1995 17:40:04 -0600 – GEDCOM-L Archives – September 1995, week 5 (#7)</ref> A common usage is as a standard format for the backup and transfer of family tree data between different genealogy software and websites, most of which support importing from and exporting to GEDCOM format.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>
GEDCOM is defined as a plain text file, using UTF-8 encoding as of version 7.0. This file contains genealogical information about individuals such as names, events, and relationships; metadata links these records together.
GEDCOM 7.0, released in 2021, is the most recent version of the GEDCOM specification Template:As of.<ref name=specV7>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> However, its predecessor, GEDCOM 5.5.1, remains the industry's format standard for the exchange of genealogical data.Template:Citation needed First released as a draft standard in 1999, GEDCOM 5.5.1 received only minor updates in the subsequent 20 years leading up to the release of 5.5.1 final in 2019. To address its shortcomings, some genealogy programs introduced proprietary extensions to GEDCOM which are not always recognized by other programs, such as GEDCOM 5.5 EL (Extended Locations).<ref>GEDCOM 5.5 EL Template:Webarchive (Extended Locations) specification</ref><ref>Ability to save information against places – "Support for parts of the GEDCOM 5.5EL proposal" – FHUG Wish List</ref><ref>0000688: Support for Gedcom 5.5EL Template:Webarchive – Gramps Bugtracker</ref> Efforts have been made to have 7.0 more widely adopted since its release. FamilySearch intends to be GEDCOM 7.0 compatible in the third quarter 2022 and Ancestry.com is planning for 7.0 compatibility, but has not yet specified an implementation date.Template:Citation needed
Data modelEdit
GEDCOM uses a lineage-linked data model based on the conceptual model of the nuclear family. The family (FAM
) record type is therefore the only source of links between the individuals (INDI
) in the file, assigning parents (as HUSB
and WIFE
) and children (as CHIL
) by referring to individuals' unique ID numbers.<ref>{{#invoke:citation/CS1|citation
|CitationClass=web
}}</ref> These historical origins are described in the 7.0 specification document: "The FAM
record was originally structured to represent families where a male HUSB
(husband or father) and female WIFE
(wife or mother) produce CHIL
(children)."<ref name=":0"/>
Although the links in a GEDCOM family record still use the original naming indicating a husband and a wife, the specification now states that "sex, gender, titles, and roles of partners should not be inferred based on the partner that the HUSB
or WIFE
structure points to" and that these individuals within a family structure are collectively referred to as 'partners', 'parents' or 'spouses'. A FAM
record can also be used for "cohabitation, fostering, adoption, and so on, regardless of the gender of the partners."<ref name=":0">{{#invoke:citation/CS1|citation
|CitationClass=web
}}</ref>
File structureEdit
A GEDCOM file consists of a header section, records, and a trailer section. Within these sections, records represent people (INDI record), families (FAM records), sources of information (SOUR records), and other miscellaneous records, including notes. Every line of a GEDCOM file begins with a level number where all top-level records (HEAD, TRLR, SUBN, and each INDI, FAM, OBJE, NOTE, REPO, SOUR, and SUBM) begin with a line with level 0, while other level numbers are positive integers.
Although it is possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}Template:Dead link</ref> that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView project, though it is not meant to be a standalone validator. For standalone validation "The Windows GEDCOM Validator" can be used.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> or the older unmaintained Gedcheck<ref>Gedcheck Template:Webarchive – "uses a grammar file for the specific version of GEDCOM to be checked against." The Church of Jesus Christ of Latter-day Saints</ref> from the LDS Church.
During 2001, The GEDCOM TestBook Project evaluated how well four popular genealogy programs conformed to the GEDCOM 5.5 standard using the Gedcheck program.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> Findings showed that a number of problems existed and that "The most commonly found fault leading to data loss was the failure to read the NOTE tag at all the possible levels at which it may appear."<ref>[GEDCOM and the GenTech Testbook Project] Genealogical Computing 7/1/2001 – Archive Summer 2001 Vol. 21.1 – Ancestry.com</ref> In 2005, the Genealogical Software Report Card was evaluated (by Bill Mumford who participated in the original GEDCOM Testbook Project)<ref>The Genealogical Software Report Card 2000 S W Mumford Last updated March 2005 Template:Unreliable source?</ref> and included testing the GEDCOM 5.5 standard using the Gedcheck program.<ref>Reviews from the NGS Newsmagazine and its Predecessors. Template:Webarchive – Test Result are in the PDF's</ref>
To assist with adoption of GEDCOM 7.0, validation tools now exist for that standard as well.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>
ExampleEdit
The following is a sample GEDCOM file.
sample.ged |
0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.0 1 DATE 30 NOV 2000 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 1 SUBM @U1@ 0 @I1@ INDI 1 NAME John /Smith/ 1 SEX M 1 FAMS @F1@ 0 @I2@ INDI 1 NAME Elizabeth /Stansfield/ 1 SEX F 1 FAMS @F1@ 0 @I3@ INDI 1 NAME James /Smith/ 1 SEX M 1 FAMC @F1@ 0 @F1@ FAM 1 HUSB @I1@ 1 WIFE @I2@ 1 MARR 1 CHIL @I3@ 0 @U1@ SUBM 1 NAME Submitter 0 TRLR |
The header (HEAD) includes the source program and version (Personal Ancestral File, 5.0), the GEDCOM version (5.5), the character encoding (ANSEL), and a link to information about the submitter of the file.
The individual records (INDI) define John Smith (ID I1), Elizabeth Stansfield (ID I2), and James Smith (ID I3).
The family record (FAM) links the husband (HUSB), wife (WIFE), and child (CHIL) by their ID numbers.
VersionsEdit
The current version of the specification in wide use is GEDCOM 5.5.1 final, which was released on 15 November 2019. Its predecessor, GEDCOM 5.5.1 draft<ref name="GEDCOM 5.5.1 draft">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> was issued in 1999, introducing nine new attribute, tags and adding UTF-8 as an approved character encoding. The draft was not formally approved, but its provisions were adopted in some part by a number of genealogy programs<ref>GED-GEN is based on GEDCOM version 5.5.1 (draft) Template:Webarchive, dated 2 October 1999. The following record types are parsed: header, individual, family, notes, source, and repository. However not all elements within these records are processed. – Specifications – GED-GEN Introduction</ref><ref>0000688: Support for Gedcom 5.5EL Template:Webarchive(0008068) romjerome (developer) 2009-01-25 06:13 – "Note : GRAMPS 3.0.x supports a part of GEDCOM 5.5.1 on export, which is not supported by most programs" – Gramps Bugtracker</ref><ref>"MyBlood supports the GEDCOM 5.5 and 5.5.1 file format." Template:Webarchive – MyBlood Support – Forum, FAQ, Know Problems</ref> including FamilySearch.org.<ref name="GEDCOM 5.5.1 draft"/>
Lineage-linked GEDCOM is the deliberate de facto common denominator.<ref name="defacto" /> Despite version 5.5 of the GEDCOM standard first being published in 1996, many genealogical software suppliers have never fully supported the feature of multilingual Unicode text (instead of the ANSEL character set) introduced with that version of the specification. Uniform use of Unicode would allow for the usage of international character sets. An example is the storage of East Asian names in their original Chinese, Japanese and Korean (CJK) characters, without which they could be ambiguous and of little use for genealogical or historical research.<ref name="ldscatalog.com">Personal Ancestral File 5.2 and PAF Companion 5.4 – Software Version Changes Template:Webarchive Release 5.0.1.4, 22 December 2000 – "10.GEDCOM improvements: Table:Destination:PAF 5 GEDCOM Version:5.5 Character Set:UTF-8</ref> PAF 5.2 is an example of software that uses UTF-8 as its internal character set, and can output a UTF-8 GEDCOM.<ref name="ldscatalog.com" /><ref>Personal Ancestral File 5.1 Template:Webarchive – "Also noted in a second test was the use of four tags from a later draft version of the Gedcom specification, FONE (phonetic name), ROMN (romanized name), EMAIL (e-mail), and _UID" Jan/Feb 2002 NGS Newsmagazine</ref>
GEDCOM 7.0 requires UTF-8 encoding throughout,<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> and resolves other long-standing issues with GEDCOM 5.5.1. Multimedia support in the form of an associated .zip file, called a GEDZip, is another inclusion. Efforts are underway to see 7.0 embraced as the new exchange standard.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> GEDCOM 7.0 allows explicitly identifying what standards other than GEDCOM may apply to a particular file. GEDCOM has always been extensible, but prior to 7.0 there was no standard way to identify such extensions. Also, GEDCOM 7.0 allows explicitly marking an event as nonexistent. This allows, for example, documenting that a particular individual never married.<ref name=":1" /> GEDCOM 7.0 was the first version to use semantic versioning, and is the most recent minor version of the specification.
Template:As of, the next planned minor release is v7.1, which is under development.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>
Release historyEdit
GEDCOM version | Release date | Notes | ||
---|---|---|---|---|
Template:Version | 1984<ref>Subject:description of InterGED theory From:Gary Steiner – "The first GEDCOM standard, version 1.0, was released to the genealogical software development community in 1984." – GEDCOM-L Archives – July 1994, week 4 (#14)</ref> | – | ||
Template:Version | Dec 1985<ref name="timeline2000" >Subject:Timeline of GEDCOM versions and PAF By George Archer – GEDCOM-L Archives – November 2000, week 3 (#12)</ref> | PAF 2.0 | ||
Template:Version | Feb 1987<ref name="timeline2000" /> | GEDCOM for PAF 2.1 | ||
Template:Version | 7 August 1985<ref name="starkey59" >Subject:Re: GEDCOM standards help please From:Graham Starkey – "DRAFT VERSION 2.3–7 August 1985 with PAF2.0 GEDCOM implementation conventions" – GEDCOM-L Archives – June 2000, week 4 (#1)</ref> | with PAF2.0 GEDCOM implementation conventions | ||
Template:Version | 13 December 1985<ref name="starkey59" /> | with PAF2.0 GEDCOM implementation conventions | ||
Template:Version | 9 October 1987<ref>RootsWeb: ROOTS-L Re: Large Charts (fairly long):Date:Tue, 11 Jul 89 15:14:31 CDT From: Marty Hoag <NU021172@N...> Subject:Re: Printing trees with PAF? From soc.roots ... * GEDCOM release 3.0, 9 Oct 1987, 131 pages (!)</ref> | PAF 2.0 and 2.1 implementation of 3.0 | ||
Template:Version | August 1989 | PAF 2.1 – 2.31 | ||
Template:Version | – | – | ||
Template:Version | 25 January 1990<ref>Subject:4.x specs From:Rafal Prinke -"while this document has the date January 25, 1990. So maybe it is GEDCOM 4.2 ?" – GEDCOM-L Archives – May 1994, week 1 (#19)</ref> | – | ||
Template:Version | 31 December 1991<ref name="starkey59" /> | lineage-linked structures were introduced.<ref name="jedfuture" >Subject: GEDCOM (Future Direction) Announced by Family History From: "Jed R. Allen" Date: Fri, 1 May 1998 18:08:24 -0600</ref> | ||
Template:Version | 18 September 1992<ref name="timeline2000" /> | – | ||
Template:Version | 22 January 1992<ref>Subject:Re: GEDCOM standards help please From:Graham Starkey – "DRAFT Release 5.2–22 January 1992 120kb" – GEDCOM-L Archives – June 2000, week 4 (#1)</ref> | – | ||
Template:Version | 4 November 1993<ref>GEDCOM 5.3 draft Template:Webarchive – 4 November 1993</ref> | Unicode standard (ISO/IEC 10646) was introduced as an additional character set. | ||
Template:Version | 21 August 1995<ref>THE GEDCOM STANDARD – DRAFT Release 5.4–21 August 1995</ref> | – | ||
Template:Version | 11 December 1995<ref>Subject:Timeline of GEDCOM versions and PAF By George Archer – "5.5 11 Dec 1995 (Title Page for 5.5)"- GEDCOM-L Archives – November 2000, week 3 (#12)</ref> | PAF 3, 4 and 5 | ||
Template:Version | January 2, 1996<ref>GEDCOM 5.5 Standard Template:Webarchive (Executable file in Envoy format)</ref><ref>Re: Looking for GEDCOM versions 4 & 5.xx "Brian C. Madsen" – "A GEDCOM 5.5 Errata Sheet dated 10 January 1996 supposedly contains corrections to pages 23, 24, 25, 26, 29, 29, 29, 33, 34, 39, 57, 79, and 85."</ref> | PAF 3, 4 and 5 / 5.5 Standard<ref>Gedcom Documentation Library Template:Webarchive, Chronoplex Software</ref> | ||
Template:Version | citation | CitationClass=web
}}</ref><ref>Comments on the GEDCOM Future Directions document Michael H. Kay, 17 May 1998</ref> |
"it used an entirely new data model"<ref>Subject:GEDCOM Future Directions – From:John Nairn – Date:Mon, 11 May 1998 13:38:45 -0600 – GEDCOM-L Archives – May 1998, week 2 (#3)</ref> | |
Template:Version | October 2, 1999<ref name="GEDCOM 5.5.1 draft"/> | Used by FamilySearch.org<ref name="GEDCOM 5.5.1 draft"/> UTF-8 added as an approved character encoding. | ||
Template:Version | November 15, 2019 | current standard, minor text modifications to 5.5.1 Draft. | ||
Template:Version | -<ref>Subject:Re: GEDCOM History From:STEFANO BOSCOLO – Date:Tue, 20 Feb 2001 19:54:06 +0100 – GEDCOM-L Archives – February 2001, week 3 (#1)</ref> | "Jed Allen sent those two files to a few people only for sort of "private comments"<ref>Subject: Re: GEDCOM History From:"Rafal T. Prinke" – Date:Tue, 20 Feb 2001 22:14:55 +0100 – GEDCOM-L Archives – February 2001, week 3 (#4)</ref> | ||
Template:Version | citation | CitationClass=web
}}</ref> |
Was not a complete specification, and not recommended to begin to software implementations. | |
Template:Version | citation | CitationClass=web
}}</ref> |
citation | CitationClass=web
}}</ref> |
Template:Version | 27 May 2021 | Modernize character encoding, clarify ambiguities in 5.5.1 specification, introduce semantic versioning, improve multimedia handling | ||
Template:Version | 4 August 2023 | |||
Template:Version |
LimitationsEdit
{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= Template:Ambox }} }} Template:Update
Support for multi-person events and sourcesEdit
A GEDCOM file can contain information on events such as births, deaths, census records, ship's records, marriages, etc.; a rule of thumb is that an event is something that took place at a specific time, at a specific place (even if time and place are not known). GEDCOM files can also contain attributes such as physical description, occupation, and total number of children; unlike events, attributes generally cannot be associated with a specific time or place.
The GEDCOM specification requires that each event or attribute is associated with exactly one individual or family.<ref name=gedcom2627>GEDCOM Standard 5.5, pp. 26–27.</ref> This causes redundancy for events such as census records where the actual census entry often contains information on multiple individuals. In the GEDCOM file, for census records a separate census "CENS" event must be added for each individual referenced. Some genealogy programs, such as Gramps and The Master Genealogist, have elaborate database structures for sources that are used, among other things, to represent multi-person events. When databases are exported from one of these programs to GEDCOM, these database structures cannot be represented in GEDCOM due to this limitation, with the result that the event or source information including all of the relevant citation reference information must be duplicated each place that it is used. This duplication makes it difficult for the user to maintain the information related to sources.
In the GEDCOM specification, events that are associated with a family such as marriage information is only stored in a GEDCOM once, as part of the family (FAM) record, and then both spouses are linked to that single family record.<ref name=gedcom2627/>
Ambiguity in the specificationEdit
The GEDCOM specification was made purposefully flexible to support many ways of encoding data, particularly in the area of sources. This flexibility has led to a great deal of ambiguity, and has produced the side effect that some genealogy programs which import GEDCOM do not import all of the data from a file.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>
Ordering of events that do not have datesEdit
The GEDCOM specification does not offer explicit support for keeping a known order of events. In particular, the order of relationships (FAMS) for a person and the order of the children within a relationship (FAM) can be lost. In many cases the sequence of events can be derived from the associated dates. But dates are not always known, in particular when dealing with data from centuries ago. For example, in the case that a person has had two relationships, both with unknown dates, but from descriptions it is known that the second one is indeed the second one. The order in which these FAMS are recorded in GEDCOM's INDI record will depend on the exporting program. In Aldfaer<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> for instance, the sequence depends on the ordering of the data by the user (alphabetical, chronological, reference, etc.). The proposed XML GEDCOM standard<ref name="xml gedcom"/> does not address this issue either.
Lesser-known featuresEdit
GEDCOM has many features that are not commonly used. Some software packages do not support all the features that the GEDCOM standard allows.
MultimediaEdit
The GEDCOM standard supports the inclusion of multimedia objects (for example, photos of individuals).<ref>GEDCOM Standard 5.5, p. 28.</ref> Such multimedia objects can be either included in the GEDCOM file itself (called the "embedded form") or in an external file where the name of the external file is specified in the GEDCOM file (called the "linked form"). Embedding multimedia directly in the GEDCOM file makes transmission of data easier, in that all of the information (including the multimedia data) is in one file, but the resulting file can be enormous. Linking multimedia keeps the size of the GEDCOM file under control, but then when transmitting the file, the multimedia objects must either be transmitted separately or archived together with the GEDCOM into one larger file. Support for embedding media directly was dropped in the draft 5.5.1 standard.<ref>Draft GEDCOM Standard 5.5.1, p. 6.</ref>
Conflicting informationEdit
The GEDCOM standard allows for the specification of multiple opinions or conflicting data, simply by specifying multiple records of the same type. For example, if an individual's birth date was recorded as 10 January 1800 on the birth certificate, but 11 January 1800 on the death certificate, two BIRT records for that individual would be included, the first with the 10 January 1800 date and giving the birth certificate as the source, and the second with the 11 January 1800 date and giving the death certificate as the source. The preferred record is usually listed first.
This example encoded in GEDCOM might look like this:
0 @I1@ INDI 1 NAME John /Doe/ 1 BIRT 2 DATE 10 JAN 1800 2 SOUR @S1@ 3 DATA 4 TEXT Transcription from birth certificate would go here 3 NOTE This birth record is preferred because it comes from the birth certificate 3 QUAY 2 1 BIRT 2 DATE 11 JAN 1800 2 SOUR @S2@ 3 DATA 4 TEXT Transcription from death certificate would go here 3 QUAY 2
Conflicting data may also be the result of user errors. The standard does not specify in any way that the contents must be consistent. A birth date like "10 APR 1819" might mistakenly have been recorded as "10 APR 1918" long after the person's death. The only way to reveal such inconsistencies is by rigorous validation of the content data.
InternationalizationEdit
The GEDCOM standard supports internationalization in several ways. First, newer versions of the standard allow data to be stored in Unicode (or, more recently, UTF-8), so text in any language can be stored.<ref>GEDCOM Standard 5.5, p. 45.</ref> Secondly, in the same way that one can have multiple events on a person, GEDCOM allows one to have multiple names for a person,<ref>GEDCOM Standard 5.5, p. 27.</ref> so names can be stored in multiple languages, although there is no standardized way to indicate which instance is in which language. Finally, in version 5.5.1, the NAME field also supports a phonetic variation (FONE) and a romanized variation (ROMN) of the name.<ref>GEDCOM Draft 5.5.1, p. 38</ref>
GEDCOM XEdit
In February 2012 at the RootsTech 2012 conference, FamilySearch outlined a major new project around genealogical standards called GEDCOM X, and invited collaboration.<ref name="gedcomx">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> It includes software developed under the Apache open source license. It includes data formats that facilitate basing family trees on sources and records (both physical artifacts and digital artifacts), support for sharing and linking data online, and an API.<ref name="gedcomx"/><ref>Template:Cite news</ref><ref>Template:Cite news</ref>
In August 2012 FamilySearch employee and GEDCOM X project leader Ryan Heaton dropped the claim that GEDCOM X is the new industry standard, and repositioned GEDCOM X as another FamilySearch open source project.<ref>2012-08-31 GEDCOM X: no industry standard, FamilySearch abandons GEDCOM X push, By Tamura Jones, Modern Software Experience.</ref>
After the release of GEDCOM 7, FamilySearch positioned GEDCOM X as useful for interoperation with its FamilySearch Family Tree software.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>
AlternativesEdit
Commsoft, the authors of the Roots<ref>CommSoft to Return? Dick Eastman Online 3/14/2001 – Archive – Ancestry.com</ref> series of genealogy software and Ultimate Family Tree, defined a version called Event-Oriented GEDCOM (also known as "Event GEDCOM" and originally called InterGED<ref>RootsWeb: TMG-L [TMG] InterGED/Event GEDCOM Date: Fri, 15 Feb 2002 13:33:18 -0700</ref>),<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> which included events as first class (zero-level) items. Although it is event based, it is still a model built on assumed reality rather than evidence. Event GEDCOM was more flexible, as it allowed some separation between believed events and the participants. However, Event GEDCOM was not widely adopted by other developers due to its semantic differences.Template:Citation needed With Roots and Ultimate Family Tree no longer available, very few people today are using Event GEDCOM.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>
Gramps XML is an XML-based open format created by the open source genealogy project Gramps and used also by PhpGedView.
The Family History Information Standards Organisation was established in 2012 with the aim of developing international standards for family history and genealogical information.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> One of the standards the organization proposed was Extended Legacy Format (ELF), compatible with GEDCOM 5.5(.1), but including an extensibility mechanism. The organization requested public comment on the proposed standard in 2017.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> It withdrew the proposal because release 7.0 of GEDCOM addressed many of the organization's concerns.<ref name=":1">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>
See alsoEdit
- FamilySearch
- GENDEX – Genealogical index
- Genealogical numbering systems
- GNTP – Genealogy Network Transfer Protocol
- Tiny Tafel Format – encoded "ancestor table"
- List of genealogy databases
NotesEdit
ReferencesEdit
External linksEdit
- General
- GEDCOM Standard
- FamilySearch GEDCOM Guide
- GEDCOM X Project
- {{#invoke:citation/CS1|citation
|CitationClass=web }}
- THE GEDCOM STANDARD Release 5.5.1, released 15. November 2019