Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Semantic similarity
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Natural language processing}} '''Semantic similarity''' is a [[Metric (mathematics)|metric]] defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or [[semantics|semantic content]]{{Citation needed|reason=What aspect of semantic content is related to distance?|date=March 2024}} as opposed to [[lexicographical]] similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature.<ref name=harispe2015>{{ cite journal | journal=Synthesis Lectures on Human Language Technologies |author1=Harispe S. |author2=Ranwez S.|author3= Janaqi S.|author4= Montmain J. | year=2015 | title=Semantic Similarity from Natural Language and Ontology Analysis| pages=1β254 | volume=8 |issue=1 | doi=10.2200/S00639ED1V01Y201504HLT027|arxiv=1704.05295 |s2cid=17428739 }}</ref><ref name=Feng2017>{{ cite journal | journal=Knowledge Engineering Review |volume=32 |author1=Feng Y. |author2=Bagheri E. |author3=Ensan F. |author4=Jovanovic J. |year=2017 |title=The state of the art in semantic relatedness: a framework for comparison| pages=1β30 | doi=10.1017/S0269888917000029|s2cid=52172371 }}</ref> The term semantic similarity is often confused with semantic relatedness. '''Semantic relatedness''' includes any relation between two terms, while semantic similarity only includes [[Is-a|"is a"]] relations.<ref>{{ cite journal | journal=GeoInformatica |author1=A. Ballatore |author2=M. Bertolotto |author3=D.C. Wilson | year=2014 | title=An evaluative baseline for geo-semantic relatedness and similarity| pages=747β767 | volume=18|issue=4 |arxiv=1402.3371 |doi=10.1007/s10707-013-0197-8 |bibcode=2014GInfo..18..747B |s2cid=17474023 }}</ref> For example, "car" is similar to "bus", but is also related to "road" and "driving". Computationally, semantic similarity can be estimated by defining a [[topological]] similarity, by using [[Ontology (computer science)|ontologies]] to define the distance between terms/concepts. For example, a naive metric for the comparison of concepts ordered in a [[partially ordered set]] and represented as nodes of a [[directed acyclic graph]] (e.g., a [[Taxonomy (general)|taxonomy]]), would be the shortest-path linking the two concept nodes. Based on text analyses, semantic relatedness between units of language (e.g., words, sentences) can also be estimated using statistical means such as a [[vector space model]] to [[correlation|correlate]] words and textual contexts from a suitable [[text corpus]]. The evaluation of the proposed semantic similarity / relatedness measures are evaluated through two main ways. The former is based on the use of datasets designed by experts and composed of word pairs with semantic similarity / relatedness degree estimation. The second way is based on the integration of the measures inside specific applications such as information retrieval, recommender systems, natural language processing, etc.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)