Editing Annotation (section)

==== Semantic labelling techniques ====
There are several semantic labelling types which utilises machine learning techniques. These techniques can be categorised following the work of Flach<ref name=":2">{{Cite book |last=Flach |first=Peter |url=https://www.cambridge.org/core/books/machine-learning/621D3E616DF879E494B094CC93ED36A4 |title=Machine Learning: The Art and Science of Algorithms that Make Sense of Data |date=2012 |publisher=Cambridge University Press |isbn=978-1-107-09639-4 |location=Cambridge |doi=10.1017/cbo9780511973000}}</ref><ref name=":5">{{Cite thesis |title=Knowledge-Graph-Based Semantic Labeling of Tabular Data |url=https://oa.upm.es/64068/ |publisher=E.T.S. de Ingenieros Informáticos (UPM) |date=c. 2020 |degree=phd |doi=10.20868/upm.thesis.64068 |first=Ahmad |last=Alobaid}}</ref> as follows: geometric (using lines and planes, such as [[Support-vector machine]], [[Linear regression]]), probabilistic (e.g., [[Conditional random field]]), logical (e.g., [[Decision tree learning]]), and Non-ML techniques (e.g., balancing coverage and specificity<ref name=":02"/>). Note that the geometric, probabilistic, and logical machine learning models are not mutually exclusive.<ref name=":2" />

===== Geometric techniques =====
Pham et al.<ref name=":6">{{Cite book |last1=Pham |first1=Minh |last2=Alse |first2=Suresh |last3=Knoblock |first3=Craig A. |last4=Szekely |first4=Pedro |title=The Semantic Web – ISWC 2016 |chapter=Semantic Labeling: A Domain-Independent Approach |date=2016 |editor-last=Groth |editor-first=Paul |editor2-last=Simperl |editor2-first=Elena |editor3-last=Gray |editor3-first=Alasdair |editor4-last=Sabou |editor4-first=Marta |editor5-last=Krötzsch |editor5-first=Markus |editor6-last=Lecue |editor6-first=Freddy |editor7-last=Flöck |editor7-first=Fabian |editor8-last=Gil |editor8-first=Yolanda |chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-46523-4_27 |series=Lecture Notes in Computer Science |language=en |location=Cham |publisher=Springer International Publishing |volume=9981 |pages=446–462 |doi=10.1007/978-3-319-46523-4_27 |isbn=978-3-319-46523-4|s2cid=37873758 }}</ref> use [[Jaccard index]] and [[Tf–idf|TF-IDF]] similarity for textual data and [[Kolmogorov–Smirnov test]] for the numeric ones. Alobaid and Corcho<ref name="auto2"/> use [[fuzzy clustering]] (c-means<ref>{{Citation |title=Fuzzy c-Means Library |date=2022-01-29 |url=https://github.com/oeg-upm/fcm-cpp |publisher=Ontology Engineering Group (UPM) |access-date=2023-01-04}}</ref><ref>{{Citation |title=fuzzy-c-means |date=2022-12-12 |url=https://github.com/oeg-upm/fuzzy-c-means |publisher=Ontology Engineering Group (UPM) |access-date=2023-01-04}}</ref>) to label numeric columns.

===== Probabilistic techniques =====
Limaye et al.<ref name=":7">{{Cite journal |last1=Limaye |first1=Girija |last2=Sarawagi |first2=Sunita |last3=Chakrabarti |first3=Soumen |date=2010-09-01 |title=Annotating and searching web tables using entities, types and relationships |url=https://doi.org/10.14778/1920841.1921005 |journal=Proceedings of the VLDB Endowment |volume=3 |issue=1–2 |pages=1338–1347 |doi=10.14778/1920841.1921005 |issn=2150-8097 |s2cid=9262964}}</ref> uses [[Tf–idf|TF-IDF]] similarity and [[graphical model]]s. They also use [[support-vector machine]] to compute the weights. Venetis et al.<ref name=":8">{{Cite journal |last1=Venetis |first1=Petros |last2=Halevy |first2=Alon |last3=Madhavan |first3=Jayant |last4=Paşca |first4=Marius |last5=Shen |first5=Warren |last6=Wu |first6=Fei |last7=Miao |first7=Gengxin |last8=Wu |first8=Chung |date=2011-06-01 |title=Recovering semantics of tables on the web |url=https://doi.org/10.14778/2002938.2002939 |journal=Proceedings of the VLDB Endowment |volume=4 |issue=9 |pages=528–538 |doi=10.14778/2002938.2002939 |issn=2150-8097 |s2cid=11359711}}</ref> construct an isA database which consists of the pairs (instance, class) and then compute maximum likelihood using these pairs. Alobaid and Corcho<ref>{{Cite journal |last=Alobaid |first=Ahmad |last2=Corcho |first2=Oscar |date=March 2024|title=Linear approximation of the quantile–quantile plot for semantic labelling of numeric columns in tabular data |url=https://linkinghub.elsevier.com/retrieve/pii/S0957417423026544 |journal=Expert Systems with Applications |language=en |volume=238 |pages=122152 |doi=10.1016/j.eswa.2023.122152|url-access=subscription }}</ref> approximated the q-q plot for predicting the properties of numeric columns.

===== Logical techniques =====
Syed et al.<ref name=":4">{{Cite journal |last1=Syed |first1=Zareen |last2=Finin |first2=Tim |last3=Mulwad |first3=Varish |last4=Joshi |first4=Anupam |date=2010-04-26 |title=Exploiting a Web of Semantic Data for Interpreting Tables |url=https://ebiquity.umbc.edu/paper/html/id/474 |journal=Proceedings of the Second Web Science Conference |language=en}}</ref> built Wikitology, which is "a hybrid knowledge base of structured and unstructured information extracted from Wikipedia augmented by RDF data from DBpedia and other Linked Data resources."<ref name=":4" /> For the Wikitology index, they use [[PageRank]] for [[Entity linking]], which is one of the tasks often used in semantic labelling. Since they were not able to query Google for all Wikipedia articles to get the [[PageRank]], they used [[Decision tree]] to approximate it.<ref name=":4" />

===== Non-ML techniques =====
Alobaid and Corcho<ref name=":02" /> presented an approach to annotate entity columns. The technique starts by annotating the cells in the entity column with the entities from the reference knowledge graph (e.g., [[DBpedia]]). The classes are then gathered and each one of them is scored based on several formulas they presented taking into account the frequency of each class and their depth according to the subClass hierarchy.<ref>{{Cite web |title=OWL Web Ontology Language Reference |url=https://www.w3.org/TR/owl-ref/Overview.html |access-date=2022-09-22 |website=www.w3.org}}</ref>