Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Semantic similarity
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Applications == === In biomedical informatics === Semantic similarity measures have been applied and developed in biomedical ontologies.<ref>{{cite journal|last1=Guzzi|first1=Pietro Hiram|first2=Marco |last2=Mina |first3=Mario|last3= Cannataro|first4= Concettina |last4=Guerra|title=Semantic similarity analysis of protein data: assessment with biological features and issues|journal=Briefings in Bioinformatics|year=2012|volume=13|pages=569β585|issue=5|doi=10.1093/bib/bbr066|pmid=22138322|doi-access=free}}</ref><ref name="ReferenceA">{{cite journal |last1=Benabderrahmane|first1=Sidahmed|last2=Smail Tabbone|first2=Malika|last3=Poch|first3=Olivier |last4=Napoli|first4=Amedeo|last5=Devignes|first5=Marie-Domonique. |title=IntelliGO: a new vector-based semantic similarity measure including annotation origin |journal=BMC Bioinformatics|volume=11 |pages=588 |year=2010 |pmid=21122125 |doi=10.1186/1471-2105-11-588 |pmc=3098105 |doi-access=free }}</ref> They are mainly used to compare [[genes]] and [[proteins]] based on the similarity of their functions<ref>{{cite journal | last1 = Chicco | first1 = D | last2 = Masseroli | first2 = M | year = 2015 | title = Software suite for gene and protein annotation prediction and similarity search | journal = IEEE/ACM Transactions on Computational Biology and Bioinformatics | volume = 12 | issue = 4 | pages = 837β843 | doi=10.1109/TCBB.2014.2382127 | pmid = 26357324 | hdl = 11311/959408 | s2cid = 14714823 | url = https://doi.org/10.1109/TCBB.2014.2382127 | hdl-access = free }} </ref> rather than on their [[sequence similarity]], but they are also being extended to other bioentities, such as diseases.<ref>{{cite journal |last1=KΓΆhler |first1=S |last2=Schulz |first2=MH |last3=Krawitz |first3=P |last4=Bauer |first4=S |last5=Dolken |first5=S |last6=Ott |first6=CE |last7=Mundlos |first7=C |last8=Horn |first8=D |last9=Mundlos |first9=S |last10=Robinson |first10=Peter N. |title=Clinical diagnostics in human genetics with semantic similarity searches in ontologies |journal=American Journal of Human Genetics |volume=85 |issue=4 |pages=457β64 |year=2009 |pmid=19800049 |pmc=2756558 |doi=10.1016/j.ajhg.2009.09.003|display-authors=8 }}</ref> These comparisons can be done using tools freely available on the web: * ProteInOn can be used to find interacting proteins, find assigned GO terms and calculate the functional semantic similarity of [[UniProt]] proteins and to get the information content and calculate the functional semantic similarity of GO terms.<ref>{{cite web|url=http://xldb.fc.ul.pt/biotools/proteinon/|title=ProteInOn}}</ref> * CMPSim provides a functional similarity measure between chemical compounds and metabolic pathways using [[ChEBI]] based semantic similarity measures.<ref>{{cite web|url=http://xldb.di.fc.ul.pt/biotools/cmpsim/|title=CMPSim}}</ref> * CESSM provides a tool for the automated evaluation of GO-based semantic similarity measures.<ref>{{cite web|url=http://xldb.fc.ul.pt/biotools/cessm/|title=CESSM}}</ref> === In geoinformatics === Similarity is also applied in [[geoinformatics]] to find similar [[geographic feature]]s or feature types:<ref>{{cite journal|author1=Janowicz, K.|author2=Raubal, M.|author3=Kuhn, W.|title=The semantics of similarity in geographic information retrieval|journal=Journal of Spatial Information Science|volume=2|issue=2|year=2011|pages=29β57|doi=10.5311/josis.2011.2.3|doi-access=free|hdl=20.500.11850/41298|hdl-access=free}}</ref> * SIM-DL similarity server<ref>{{cite conference | citeseerx = 10.1.1.172.5544 | title =Algorithm, implementation and application of the SIM-DL similarity server | pages = 128β145 | year = 2007 |series=Lecture Notes in Computer Science |number=4853 |conference=Second International Conference on Geospatial Semantics (GEOS 2007)}}</ref> can be used to compute similarities between concepts stored in geographic feature type ontologies. * Similarity Calculator can be used to compute how well related two geographic concepts are in the Geo-Net-PT ontology.<ref>{{cite web|url=http://xldb.fc.ul.pt/wiki/Geographic_Similarity_calculator_GeoSSM|title=Geo-Net-PT Similarity Calculator}}</ref><ref>{{cite web|url=http://xldb.fc.ul.pt/wiki/Geo-Net-PT_02_in_English|title=Geo-Net-PT}}</ref> * The OSM<ref>[https://wiki.openstreetmap.org/wiki/OSM_Semantic_Network "OSM Semantic Network"]. OSM Wiki.</ref> [[semantic network]] can be used to compute the semantic similarity of tags in [[OpenStreetMap]].<ref>{{cite journal|title=Geographic Knowledge Extraction and Semantic Similarity in OpenStreetMap|author1=A. Ballatore |author2=D.C. Wilson |author3=M. Bertolotto |journal=Knowledge and Information Systems|pages=61β81|url=http://irserver.ucd.ie/bitstream/handle/10197/3973/2012_-_Geographic_Knowledge_Extraction_and_Semantic_Similarity_in_OpenStreetMap_-_Ballatore_et_al.pdf?sequence=1}}</ref> === In computational linguistics === Several metrics use [[WordNet]], a manually constructed lexical database of English words. Despite the advantages of having human supervision in constructing the database, since the words are not automatically learned the database cannot measure relatedness between multi-word term, non-incremental vocabulary.<ref name=budanitsky2001 /><ref>{{cite book|author1=Kaur, I. |author2=Hornof, A.J. |title=Proceedings of the SIGCHI Conference on Human Factors in Computing Systems |chapter=A comparison of LSA, wordNet and PMI-IR for predicting user click behavior |name-list-style=amp |date=2005|pages=51β60|doi=10.1145/1054972.1054980|isbn=978-1-58113-998-3|s2cid=14347026 }}</ref> === In natural language processing === [[Natural language processing]] (NLP) is a field of computer science and linguistics. Sentiment analysis, Natural language understanding and Machine translation (Automatically translate text from one human language to another) are a few of the major areas where it is being used. For example, knowing one information resource in the internet, it is often of immediate interest to find similar resources. The [[Semantic Web]] provides semantic extensions to find similar data by content and not just by arbitrary descriptors.<ref>[http://www.di.uniba.it/~cdamato/PhDThesis_dAmato.pdf Similarity-based Learning Methods for the Semantic Web] (C. d'Amato, PhD Thesis)</ref><ref>{{cite journal|author1=Gracia, J. |author2=Mena, E. |name-list-style=amp |year=2008|url=http://disi.unitn.it/~p2p/RelatedWork/Matching/Gracia_wise08.pdf|title=Web-Based Measure of Semantic Relatedness|journal=Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE '08)|pages=136β150}}</ref><ref>Raveendranathan, P. (2005). [http://www.d.umn.edu/~tpederse/Pubs/prath-thesis.pdf Identifying Sets of Related Words from the World Wide Web]. Master of Science Thesis, University of Minnesota Duluth.</ref><ref>Wubben, S. (2008). [http://ilk.uvt.nl/~swubben/publications/wubben2008-techrep.pdf Using free link structure to calculate semantic relatedness]. In ILK Research Group Technical Report Series, nr. 08-01, 2008.</ref><ref>Juvina, I., van Oostendorp, H., Karbor, P., & Pauw, B. (2005). [https://cloudfront.escholarship.org/dist/prd/content/qt0p7528tp/qt0p7528tp.pdf Towards modeling contextual information in web navigation]. In B. G. Bara & L. Barsalou & M. Bucciarelli (Eds.), 27th Annual Meeting of the Cognitive Science Society, CogSci2005 (pp. 1078β1083). Austin, Tx: The Cognitive Science Society, Inc.</ref><ref>Navigli, R., Lapata, M. (2007). [http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-272.pdf Graph Connectivity Measures for Unsupervised Word Sense Disambiguation], Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, January 6β12th, 2007, pp. 1683β1688.</ref><ref>{{cite journal|author=Pirolli, P.|year=2005|title=Rational analyses of information foraging on the Web|journal=Cognitive Science|volume=29|issue=3|pages=343β373|doi=10.1207/s15516709cog0000_20|pmid=21702778|doi-access=free}}</ref><ref>{{cite book|author=Pirolli, P.|author2=Fu, W.-T.|name-list-style=amp |year=2003|chapter=SNIF-ACT: A model of information foraging on the World Wide Web|title=Lecture Notes in Computer Science|volume=2702|pages=45β54|doi=10.1007/3-540-44963-9_8|isbn=978-3-540-40381-4|citeseerx=10.1.1.6.1506}}</ref><ref>Turney, P. (2001). [https://arxiv.org/abs/cs/0212033 Mining the Web for Synonyms: PMI versus LSA on TOEFL]. In L. De Raedt & P. Flach (Eds.), Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp. 491β502). Freiburg, Germany.</ref> [[Deep learning]] methods have become an accurate way to gauge semantic similarity between two text passages, in which each passage is first embedded into a continuous vector representation.<ref>{{Cite book|last1=Reimers|first1=Nils|last2=Gurevych|first2=Iryna|title=Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) |chapter=Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks |date=November 2019|chapter-url=https://www.aclweb.org/anthology/D19-1410|location=Hong Kong, China|publisher=Association for Computational Linguistics|pages=3982β3992|doi=10.18653/v1/D19-1410|arxiv=1908.10084|doi-access=free}}</ref><ref>{{Cite journal|last1=Mueller|first1=Jonas|last2=Thyagarajan|first2=Aditya|date=2016-03-05|title=Siamese Recurrent Architectures for Learning Sentence Similarity|url=https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12195|journal=Thirtieth AAAI Conference on Artificial Intelligence|volume=30 |doi=10.1609/aaai.v30i1.10350 |s2cid=16657628 |language=en|doi-access=free}}</ref><ref>{{Citation|last1=Kiros|first1=Ryan|title=Skip-Thought Vectors|date=2015|url=http://papers.nips.cc/paper/5950-skip-thought-vectors.pdf|work=Advances in Neural Information Processing Systems 28|pages=3294β3302|editor-last=Cortes|editor-first=C.|publisher=Curran Associates, Inc.|access-date=2020-03-13|last2=Zhu|first2=Yukun|last3=Salakhutdinov|first3=Russ R|last4=Zemel|first4=Richard|last5=Urtasun|first5=Raquel|last6=Torralba|first6=Antonio|last7=Fidler|first7=Sanja|editor2-last=Lawrence|editor2-first=N. D.|editor3-last=Lee|editor3-first=D. D.|editor4-last=Sugiyama|editor4-first=M.}}</ref> === In ontology matching === Semantic similarity plays a crucial role in [[ontology alignment]], which aims to establish correspondences between [[Ontology components|entities]] from different ontologies. It involves quantifying the degree of similarity between concepts or terms using the information present in the ontology for each entity, such as labels, descriptions, and hierarchical relations to other entities. Traditional metrics used in ontology matching are based on a lexical similarity between features of the entities, such as using the Levenshtein distance to measure the edit distance between entity labels.<ref>{{Cite conference|last1=Cheatham |first1=Michelle |last2=Hitzler |first2=Pascal |title=Advanced Information Systems Engineering |chapter=String Similarity Metrics for Ontology Alignment |date=2013 |editor-last=Alani |editor-first=Harith |editor2-last=Kagal |editor2-first=Lalana |editor3-last=Fokoue |editor3-first=Achille |editor4-last=Groth |editor4-first=Paul |editor5-last=Biemann |editor5-first=Chris |editor6-last=Parreira |editor6-first=Josiane Xavier |editor7-last=Aroyo |editor7-first=Lora |editor8-last=Noy |editor8-first=Natasha |editor9-last=Welty |editor9-first=Chris |conference =The Semantic Web β ISWC 2013 |series=Lecture Notes in Computer Science |volume=7908 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=294β309 |doi=10.1007/978-3-642-41338-4_19 |isbn=978-3-642-41338-4|s2cid=18372966 |doi-access=free }}</ref> However, it is difficult to capture the semantic similarity between entities using these metrics. For example, when comparing two ontologies describing conferences, the entities "Contribution" and "Paper" may have high semantic similarity since they share the same meaning. Nonetheless, due to their lexical differences, lexicographical similarity alone cannot establish this alignment. To capture these semantic similarities, [[Latent space|embeddings]] are being adopted in ontology matching.<ref name=":0">Sousa, G., Lima, R., & Trojahn, C. (2022). An eye on representation learning in ontology matching. ''OM@ISWC''.</ref> By encoding semantic relationships and contextual information, embeddings enable the calculation of similarity scores between entities based on the proximity of their vector representations in the embedding space. This approach allows for efficient and accurate matching of ontologies since embeddings can model semantic differences in entity naming, such as homonymy, by assigning different embeddings to the same word based on different contexts.<ref name=":0" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)