Editing Semantic similarity (section)

{{Short description|Natural language processing}}
'''Semantic similarity''' is a [[Metric (mathematics)|metric]] defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or [[semantics|semantic content]]{{Citation needed|reason=What aspect of semantic content is related to distance?|date=March 2024}} as opposed to [[lexicographical]] similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature.<ref name=harispe2015>{{ cite journal | journal=Synthesis Lectures on Human Language Technologies |author1=Harispe S. |author2=Ranwez S.|author3= Janaqi S.|author4= Montmain J. | year=2015 | title=Semantic Similarity from Natural Language and Ontology Analysis| pages=1–254 | volume=8 |issue=1 | doi=10.2200/S00639ED1V01Y201504HLT027|arxiv=1704.05295 |s2cid=17428739 }}</ref><ref name=Feng2017>{{ cite journal | journal=Knowledge Engineering Review |volume=32 |author1=Feng Y. |author2=Bagheri E. |author3=Ensan F. |author4=Jovanovic J. |year=2017 |title=The state of the art in semantic relatedness: a framework for comparison| pages=1–30 | doi=10.1017/S0269888917000029|s2cid=52172371 }}</ref> The term semantic similarity is often confused with semantic relatedness. '''Semantic relatedness''' includes any relation between two terms, while semantic similarity only includes [[Is-a|"is a"]] relations.<ref>{{ cite journal | journal=GeoInformatica |author1=A. Ballatore |author2=M. Bertolotto |author3=D.C. Wilson | year=2014 | title=An evaluative baseline for geo-semantic relatedness and similarity| pages=747–767 | volume=18|issue=4 |arxiv=1402.3371 |doi=10.1007/s10707-013-0197-8 |bibcode=2014GInfo..18..747B |s2cid=17474023 }}</ref>
For example, "car" is similar to "bus", but is also related to "road" and "driving".

Computationally, semantic similarity can be estimated by defining a [[topological]] similarity, by using [[Ontology (computer science)|ontologies]] to define the distance between terms/concepts. For example, a naive metric for the comparison of concepts ordered in a [[partially ordered set]] and represented as nodes of a [[directed acyclic graph]] (e.g., a [[Taxonomy (general)|taxonomy]]), would be the shortest-path linking the two concept nodes. Based on text analyses, semantic relatedness between units of language (e.g., words, sentences) can also be estimated using statistical means such as a [[vector space model]] to [[correlation|correlate]] words and textual contexts from a suitable [[text corpus]]. The evaluation of the proposed semantic similarity / relatedness measures are evaluated through two main ways. The former is based on the use of datasets designed by experts and composed of word pairs with semantic similarity / relatedness degree estimation. The second way is based on the integration of the measures inside specific applications such as information retrieval, recommender systems, natural language processing, etc.