Editing Semantic similarity (section)

== Applications ==
=== In biomedical informatics ===
Semantic similarity measures have been applied and developed in biomedical ontologies.<ref>{{cite journal|last1=Guzzi|first1=Pietro Hiram|first2=Marco |last2=Mina |first3=Mario|last3= Cannataro|first4= Concettina |last4=Guerra|title=Semantic similarity analysis of protein data: assessment with biological features and issues|journal=Briefings in Bioinformatics|year=2012|volume=13|pages=569–585|issue=5|doi=10.1093/bib/bbr066|pmid=22138322|doi-access=free}}</ref><ref name="ReferenceA">{{cite journal |last1=Benabderrahmane|first1=Sidahmed|last2=Smail Tabbone|first2=Malika|last3=Poch|first3=Olivier |last4=Napoli|first4=Amedeo|last5=Devignes|first5=Marie-Domonique. |title=IntelliGO: a new vector-based semantic similarity measure including annotation origin |journal=BMC Bioinformatics|volume=11 |pages=588 |year=2010 |pmid=21122125 |doi=10.1186/1471-2105-11-588 |pmc=3098105 |doi-access=free }}</ref>
They are mainly used to compare [[genes]] and [[proteins]] based on the similarity of their functions<ref>{{cite journal | last1 = Chicco | first1 = D | last2 = Masseroli | first2 = M | year = 2015 | title = Software suite for gene and protein annotation prediction and similarity search | journal = IEEE/ACM Transactions on Computational Biology and Bioinformatics | volume = 12 | issue = 4 | pages = 837–843 | doi=10.1109/TCBB.2014.2382127 | pmid = 26357324 | hdl = 11311/959408 | s2cid = 14714823 | url = https://doi.org/10.1109/TCBB.2014.2382127 | hdl-access = free }}
</ref> rather than on their [[sequence similarity]],
but they are also being extended to other bioentities, such as diseases.<ref>{{cite journal |last1=Köhler |first1=S |last2=Schulz |first2=MH |last3=Krawitz |first3=P |last4=Bauer |first4=S |last5=Dolken |first5=S |last6=Ott |first6=CE |last7=Mundlos |first7=C |last8=Horn |first8=D |last9=Mundlos |first9=S |last10=Robinson |first10=Peter N. |title=Clinical diagnostics in human genetics with semantic similarity searches in ontologies |journal=American Journal of Human Genetics |volume=85 |issue=4 |pages=457–64 |year=2009 |pmid=19800049 |pmc=2756558 |doi=10.1016/j.ajhg.2009.09.003|display-authors=8 }}</ref>

These comparisons can be done using tools freely available on the web:
* ProteInOn can be used to find interacting proteins, find assigned GO terms and calculate the functional semantic similarity of [[UniProt]] proteins and to get the information content and calculate the functional semantic similarity of GO terms.<ref>{{cite web|url=http://xldb.fc.ul.pt/biotools/proteinon/|title=ProteInOn}}</ref>
* CMPSim provides a functional similarity measure between chemical compounds and metabolic pathways using [[ChEBI]] based semantic similarity measures.<ref>{{cite web|url=http://xldb.di.fc.ul.pt/biotools/cmpsim/|title=CMPSim}}</ref>
* CESSM provides a tool for the automated evaluation of GO-based semantic similarity measures.<ref>{{cite web|url=http://xldb.fc.ul.pt/biotools/cessm/|title=CESSM}}</ref>

=== In geoinformatics ===
Similarity is also applied in [[geoinformatics]] to find similar [[geographic feature]]s or feature types:<ref>{{cite journal|author1=Janowicz, K.|author2=Raubal, M.|author3=Kuhn, W.|title=The semantics of similarity in geographic information retrieval|journal=Journal of Spatial Information Science|volume=2|issue=2|year=2011|pages=29–57|doi=10.5311/josis.2011.2.3|doi-access=free|hdl=20.500.11850/41298|hdl-access=free}}</ref>
* SIM-DL similarity server<ref>{{cite conference | citeseerx = 10.1.1.172.5544 | title =Algorithm, implementation and application of the SIM-DL similarity server | pages = 128–145 | year = 2007 |series=Lecture Notes in Computer Science |number=4853 |conference=Second International Conference on Geospatial Semantics (GEOS 2007)}}</ref> can be used to compute similarities between concepts stored in geographic feature type ontologies.
* Similarity Calculator can be used to compute how well related two geographic concepts are in the Geo-Net-PT ontology.<ref>{{cite web|url=http://xldb.fc.ul.pt/wiki/Geographic_Similarity_calculator_GeoSSM|title=Geo-Net-PT Similarity Calculator}}</ref><ref>{{cite web|url=http://xldb.fc.ul.pt/wiki/Geo-Net-PT_02_in_English|title=Geo-Net-PT}}</ref>
* The OSM<ref>[https://wiki.openstreetmap.org/wiki/OSM_Semantic_Network "OSM Semantic Network"]. OSM Wiki.</ref> [[semantic network]] can be used to compute the semantic similarity of tags in [[OpenStreetMap]].<ref>{{cite journal|title=Geographic Knowledge Extraction and Semantic Similarity in OpenStreetMap|author1=A. Ballatore |author2=D.C. Wilson |author3=M. Bertolotto |journal=Knowledge and Information Systems|pages=61–81|url=http://irserver.ucd.ie/bitstream/handle/10197/3973/2012_-_Geographic_Knowledge_Extraction_and_Semantic_Similarity_in_OpenStreetMap_-_Ballatore_et_al.pdf?sequence=1}}</ref>

=== In computational linguistics ===
Several metrics use [[WordNet]], a manually constructed lexical database of English words. Despite the advantages of having human supervision in constructing the database, since the words are not automatically learned the database cannot measure relatedness between multi-word term, non-incremental vocabulary.<ref name=budanitsky2001 /><ref>{{cite book|author1=Kaur, I. |author2=Hornof, A.J. |title=Proceedings of the SIGCHI Conference on Human Factors in Computing Systems |chapter=A comparison of LSA, wordNet and PMI-IR for predicting user click behavior |name-list-style=amp |date=2005|pages=51–60|doi=10.1145/1054972.1054980|isbn=978-1-58113-998-3|s2cid=14347026 }}</ref>

=== In natural language processing ===
[[Natural language processing]] (NLP) is a field of computer science and linguistics. Sentiment analysis, Natural language understanding and Machine translation (Automatically translate text from one human language to another) are a few of the major areas where it is being used. For example, knowing one information resource in the internet, it is often of immediate interest to find similar resources. The [[Semantic Web]] provides semantic extensions to find similar data by content and not just by arbitrary descriptors.<ref>[http://www.di.uniba.it/~cdamato/PhDThesis_dAmato.pdf Similarity-based Learning Methods for the Semantic Web] (C. d'Amato, PhD Thesis)</ref><ref>{{cite journal|author1=Gracia, J. |author2=Mena, E. |name-list-style=amp |year=2008|url=http://disi.unitn.it/~p2p/RelatedWork/Matching/Gracia_wise08.pdf|title=Web-Based Measure of Semantic Relatedness|journal=Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE '08)|pages=136–150}}</ref><ref>Raveendranathan, P. (2005). [http://www.d.umn.edu/~tpederse/Pubs/prath-thesis.pdf Identifying Sets of Related Words from the World Wide Web]. Master of Science Thesis, University of Minnesota Duluth.</ref><ref>Wubben, S. (2008). [http://ilk.uvt.nl/~swubben/publications/wubben2008-techrep.pdf Using free link structure to calculate semantic relatedness]. In ILK Research Group Technical Report Series, nr. 08-01, 2008.</ref><ref>Juvina, I., van Oostendorp, H., Karbor, P., & Pauw, B. (2005). [https://cloudfront.escholarship.org/dist/prd/content/qt0p7528tp/qt0p7528tp.pdf Towards modeling contextual information in web navigation]. In B. G. Bara & L. Barsalou & M. Bucciarelli (Eds.), 27th Annual Meeting of the Cognitive Science Society, CogSci2005 (pp.&nbsp;1078–1083). Austin, Tx: The Cognitive Science Society, Inc.</ref><ref>Navigli, R., Lapata, M. (2007). [http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-272.pdf Graph Connectivity Measures for Unsupervised Word Sense Disambiguation], Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, January 6–12th, 2007, pp.&nbsp;1683–1688.</ref><ref>{{cite journal|author=Pirolli, P.|year=2005|title=Rational analyses of information foraging on the Web|journal=Cognitive Science|volume=29|issue=3|pages=343–373|doi=10.1207/s15516709cog0000_20|pmid=21702778|doi-access=free}}</ref><ref>{{cite book|author=Pirolli, P.|author2=Fu, W.-T.|name-list-style=amp |year=2003|chapter=SNIF-ACT: A model of information foraging on the World Wide Web|title=Lecture Notes in Computer Science|volume=2702|pages=45–54|doi=10.1007/3-540-44963-9_8|isbn=978-3-540-40381-4|citeseerx=10.1.1.6.1506}}</ref><ref>Turney, P. (2001). [https://arxiv.org/abs/cs/0212033 Mining the Web for Synonyms: PMI versus LSA on TOEFL]. In L. De Raedt & P. Flach (Eds.), Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp.&nbsp;491–502). Freiburg, Germany.</ref> [[Deep learning]] methods have become an accurate way to gauge semantic similarity between two text passages, in which each passage is first embedded into a continuous vector representation.<ref>{{Cite book|last1=Reimers|first1=Nils|last2=Gurevych|first2=Iryna|title=Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) |chapter=Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks |date=November 2019|chapter-url=https://www.aclweb.org/anthology/D19-1410|location=Hong Kong, China|publisher=Association for Computational Linguistics|pages=3982–3992|doi=10.18653/v1/D19-1410|arxiv=1908.10084|doi-access=free}}</ref><ref>{{Cite journal|last1=Mueller|first1=Jonas|last2=Thyagarajan|first2=Aditya|date=2016-03-05|title=Siamese Recurrent Architectures for Learning Sentence Similarity|url=https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12195|journal=Thirtieth AAAI Conference on Artificial Intelligence|volume=30 |doi=10.1609/aaai.v30i1.10350 |s2cid=16657628 |language=en|doi-access=free}}</ref><ref>{{Citation|last1=Kiros|first1=Ryan|title=Skip-Thought Vectors|date=2015|url=http://papers.nips.cc/paper/5950-skip-thought-vectors.pdf|work=Advances in Neural Information Processing Systems 28|pages=3294–3302|editor-last=Cortes|editor-first=C.|publisher=Curran Associates, Inc.|access-date=2020-03-13|last2=Zhu|first2=Yukun|last3=Salakhutdinov|first3=Russ R|last4=Zemel|first4=Richard|last5=Urtasun|first5=Raquel|last6=Torralba|first6=Antonio|last7=Fidler|first7=Sanja|editor2-last=Lawrence|editor2-first=N. D.|editor3-last=Lee|editor3-first=D. D.|editor4-last=Sugiyama|editor4-first=M.}}</ref>

=== In ontology matching ===
Semantic similarity plays a crucial role in [[ontology alignment]], which aims to establish correspondences between [[Ontology components|entities]] from different ontologies. It involves quantifying the degree of similarity between concepts or terms using the information present in the ontology for each entity, such as labels, descriptions, and hierarchical relations to other entities. Traditional metrics used in ontology matching are based on a lexical similarity between features of the entities, such as using the Levenshtein distance to measure the edit distance between entity labels.<ref>{{Cite conference|last1=Cheatham |first1=Michelle |last2=Hitzler |first2=Pascal |title=Advanced Information Systems Engineering |chapter=String Similarity Metrics for Ontology Alignment |date=2013 |editor-last=Alani |editor-first=Harith |editor2-last=Kagal |editor2-first=Lalana |editor3-last=Fokoue |editor3-first=Achille |editor4-last=Groth |editor4-first=Paul |editor5-last=Biemann |editor5-first=Chris |editor6-last=Parreira |editor6-first=Josiane Xavier |editor7-last=Aroyo |editor7-first=Lora |editor8-last=Noy |editor8-first=Natasha |editor9-last=Welty |editor9-first=Chris |conference =The Semantic Web – ISWC 2013 |series=Lecture Notes in Computer Science |volume=7908 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=294–309 |doi=10.1007/978-3-642-41338-4_19 |isbn=978-3-642-41338-4|s2cid=18372966 |doi-access=free }}</ref> However, it is difficult to capture the semantic similarity between entities using these metrics. For example, when comparing two ontologies describing conferences, the entities "Contribution" and "Paper" may have high semantic similarity since they share the same meaning. Nonetheless, due to their lexical differences, lexicographical similarity alone cannot establish this alignment. To capture these semantic similarities, [[Latent space|embeddings]] are being adopted in ontology matching.<ref name=":0">Sousa, G., Lima, R., & Trojahn, C. (2022). An eye on representation learning in ontology matching. ''OM@ISWC''.</ref> By encoding semantic relationships and contextual information, embeddings enable the calculation of similarity scores between entities based on the proximity of their vector representations in the embedding space. This approach allows for efficient and accurate matching of ontologies since embeddings can model semantic differences in entity naming, such as homonymy, by assigning different embeddings to the same word based on different contexts.<ref name=":0" />