Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Record linkage
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Naming conventions == "Record linkage" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. However, many other terms are used for this process. Unfortunately, this profusion of terminology has led to few cross-references between these research communities.<ref>{{Cite web |url=http://datamining.anu.edu.au/linkage.html |title=Cristen, P & T: Febrl - Freely extensible biomedical record linkage (Manual, release 0.3) p.9 |access-date=2006-04-21 |archive-date=2016-03-11 |archive-url=https://web.archive.org/web/20160311044101/http://datamining.anu.edu.au/linkage.html |url-status=dead }}</ref><ref> {{cite journal|last=Elmagarmid|first=Ahmed|author2=Panagiotis G. Ipeirotis|author3=Vassilios Verykios|date=January 2007|title=Duplicate Record Detection: A Survey|url=http://www.cs.purdue.edu/homes/ake/pub/TKDE-0240-0605-1.pdf|journal=IEEE Transactions on Knowledge and Data Engineering|volume=19|issue=1|pages=''pp.'' 1–16|doi=10.1109/tkde.2007.250581|s2cid=386036|access-date=2009-03-30}} </ref> [[computer science|Computer scientists]] often refer to it as "data matching" or as the "object identity problem". Commercial mail and database applications refer to it as "merge/purge processing" or "list washing". Other names used to describe the same concept include: "coreference/entity/identity/name/record resolution", "entity disambiguation/linking", "fuzzy matching", "duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration" and "conflation".<ref>{{cite book |last1=Singla |first1=Parag |last2=Domingos |first2=Pedro |title=Sixth International Conference on Data Mining (ICDM'06) |chapter=Entity Resolution with Markov Logic |date=December 2006 |chapter-url=https://homes.cs.washington.edu/~pedrod/papers/icdm06.pdf |pages=572β582 |doi=10.1109/ICDM.2006.65 |isbn=9780769527024 |access-date=1 March 2023 |s2cid=12211870}}</ref> While they share similar names, record linkage and [[Linked Data|linked data]] are two separate approaches to processing and structuring data. Although both involve identifying matching entities across different data sets, record linkage standardly equates "entities" with human individuals; by contrast, Linked Data is based on the possibility of interlinking any [[web resource]] across data sets, using a correspondingly broader concept of identifier, namely a [[Uniform Resource Identifier|URI]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)