Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Collocation
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Frequent occurrence of words next to each other}} {{About|the corpus linguistics notion||Colocation (disambiguation)}} In [[corpus linguistics]], a '''collocation''' is a series of words or [[terminology|terms]] that [[co-occurrence|co-occur]] more often than would be expected by chance. In [[phraseology]], a '''collocation''' is a type of [[principle of compositionality|compositional]] [[phraseme]], meaning that it can be understood from the words that make it up. This contrasts with an [[idiom]], where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated. There are about seven main types of collocations<!-- in english? -->: adjective + noun, noun + noun (such as [[collective nouns]]), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase ([[phrasal verb]]s), and verb + adverb. [[Collocation extraction]] is a computational technique that finds collocations in a document or corpus, using various [[computational linguistics]] elements resembling [[data mining]]. ==Expanded definition== Collocations are partly or fully fixed expressions that become established through repeated context-dependent use. Such terms as ''crystal clear'', ''middle management'', ''nuclear family'', and ''cosmetic surgery'' are examples of collocated pairs of words. Collocations can be in a [[syntax|syntactic]] relation (such as [[subject–verb–object|verb–object]]: ''make'' and ''decision''), [[lexicon|lexical]] relation (such as [[antonymy]]), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: a [[grammar|grammatically]] correct sentence will stand out as awkward if collocational preferences are violated. This makes collocation a common focus for language teaching. Corpus linguists specify a [[Keyword (linguistics)|key word]] in context ([[key Word in Context|KWIC]]) and identify the words immediately surrounding them, to illustrate the way words are used in practice. The processing of collocations involves a number of parameters, the most important of which is the ''measure of association'', which evaluates whether the [[co-occurrence]] is purely by chance or statistically [[Statistical significance|significant]]. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association include [[mutual information]], [[Student's t-test|t scores]], and [[log-likelihood]].<ref>Dunning, Ted (1993): "[http://aclweb.org/anthology/J/J93/J93-1003.pdf Accurate methods for the statistics of surprise and coincidence] {{Webarchive|url=https://web.archive.org/web/20120805163029/http://www.aclweb.org/anthology/J/J93/J93-1003.pdf |date=2012-08-05 }}". [[Computational Linguistics (journal)|Computational Linguistics]] 19, 1 (Mar. 1993), 61–74.</ref><ref>{{cite web |url=http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html |title=Surprise and Coincidence |author=Dunning, Ted |date=2008-03-21 |publisher=blogspot.com |access-date=2012-04-09 |archive-date=2012-01-20 |archive-url=https://web.archive.org/web/20120120140321/http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html |url-status=live }}</ref> Rather than select a single definition, Gledhill<ref>Gledhill C. (2000): [https://books.google.com/books?id=U8FlfunUIOEC Collocations in Science Writing] {{Webarchive|url=https://web.archive.org/web/20230629054533/https://books.google.com/books?id=U8FlfunUIOEC |date=2023-06-29 }}, Narr, Tübingen</ref> proposes that collocation involves at least three different perspectives: co-occurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates;<ref>Firth J.R. (1957): Papers in Linguistics 1934–1951. Oxford: Oxford University Press.</ref><ref>Sinclair J. (1996): "The Search for Units of Meaning", in Textus, IX, 75–106. </ref><ref>Smadja F. A & McKeown, K. R. (1990): "[http://aclweb.org/anthology/P/P90/P90-1032.pdf Automatically extracting and representing collocations for language generation] {{Webarchive|url=https://web.archive.org/web/20150906073636/http://www.aclweb.org/anthology/P/P90/P90-1032.pdf |date=2015-09-06 }}", Proceedings of ACL'90, 252–259, Pittsburgh, Pennsylvania.</ref> construction, which sees collocation either as a correlation between a lexeme and a lexical-grammatical pattern,<ref>Hunston S. & Francis G. (2000): [https://books.google.com/books?id=nqqh46Q0uVMC&q=collocation Pattern Grammar — A Corpus-Driven Approach to the Lexical Grammar of English] {{Webarchive|url=https://web.archive.org/web/20230629054534/https://books.google.com/books?id=nqqh46Q0uVMC&q=collocation |date=2023-06-29 }}, Amsterdam, John Benjamins</ref> or as a relation between a base and its collocative partners;<ref>Hausmann F. J. (1989): Le dictionnaire de collocations. In Hausmann F.J., Reichmann O., Wiegand H.E., Zgusta L.(eds), Wörterbücher : ein internationales Handbuch zur Lexikographie. Dictionaries. Dictionnaires. Berlin/New-York : De Gruyter. 1010–1019.</ref> and expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form.<ref> Moon R. (1998): Fixed Expressions and Idioms, a Corpus-Based Approach. Oxford, Oxford University Press.</ref><ref>Frath P. & Gledhill C. (2005): "[http://www.academia.edu/download/28949432/Frath__Pierre___Gledhill__Christopher_2005a._Free-Range_Clusters_or_Frozen_Chunks.pdf Free-Range Clusters or Frozen Chunks? Reference as a Defining Criterion for Linguistic Units]{{dead link|date=July 2022|bot=medic}}{{cbignore|bot=medic}}", in Recherches anglaises et Nord-américaines, vol. 38 :25–43</ref> These different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in a continuum: :Free combination ↔ bound collocation ↔ frozen idiom ==In dictionaries== In 1933, [[Harold E. Palmer|Harold Palmer]]'s ''Second Interim Report on English Collocations'' highlighted the importance of collocation as a key to producing natural-sounding language, for anyone learning a [[foreign language]].<ref>Cowie, A.P., English Dictionaries for Foreign Learners, Oxford University Press 1999:54–56</ref> Thus from the 1940s onwards, information about recurrent word combinations became a standard feature of [[Monolingual learner's dictionary|monolingual learner's dictionaries]]. As these dictionaries became "less word-centred and more phrase-centred",<ref>Bejoint, H., The Lexicography of English, Oxford University Press 2010: 318</ref> more attention was paid to collocation. This trend was supported, from the beginning of the 21st century, by the availability of large text [[Corpus linguistics|corpora]] and intelligent [[Text mining|corpus-querying software]], making it possible to provide a more systematic account of collocation in dictionaries. Using these tools, dictionaries such as the ''[[Macmillan English Dictionary for Advanced Learners|Macmillan English Dictionary]]'' and the ''[[Longman Dictionary of Contemporary English]]'' included boxes or panels with lists of frequent collocations.<ref>{{cite web|url=http://www.macmillandictionaries.com/about/med/key-features-of-the-macmillan-english-dictionary-second-edition/#7|title=MED Second Edition – Key features – Macmillan|work=macmillandictionaries.com|access-date=2011-08-24|archive-date=2020-09-28|archive-url=https://web.archive.org/web/20200928035907/http://www.macmillandictionaries.com/about/med/key-features-of-the-macmillan-english-dictionary-second-edition/#7|url-status=dead}}</ref> There are also a number of [[Specialized dictionary|specialized dictionaries]] devoted to describing the frequent collocations in a language.<ref>Herbst, T. and Klotz, M. 'Syntagmatic and Phraseological Dictionaries' in Cowie, A.P. (Ed.) The Oxford History of English Lexicography, 2009: part 2, 234–243</ref> These include (for Spanish) ''Redes: Diccionario combinatorio del español contemporaneo'' (2004), (for French) ''Le Robert: Dictionnaire des combinaisons de mots'' (2007), and (for English) the ''LTP Dictionary of Selected Collocations'' (1997) and the ''Macmillan Collocations Dictionary'' (2010).<ref>{{cite web|url=http://www.macmillandictionaries.com/features/how-dictionaries-are-written/macmillan-collocations-dictionary/|title=Macmillan Collocation Dictionary – How it was written - Macmillan|work=macmillandictionaries.com|access-date=2011-08-24|archive-date=2018-12-21|archive-url=https://web.archive.org/web/20181221182544/http://www.macmillandictionaries.com/features/how-dictionaries-are-written/macmillan-collocations-dictionary/|url-status=dead}}</ref> == Statistically significant collocation == [[Student's t-test|Student's ''t''-test]] can be used to determine whether the occurrence of a collocation in a corpus is statistically significant.<ref>{{Cite book|title=Foundations of Statistical Natural Language Processing|url=https://archive.org/details/foundationsstati00mann_118|url-access=limited|last1=Manning|first1=Chris|last2=Schütze|first2=Hinrich|publisher=MIT Press|year=1999|isbn=0262133601|location=Cambridge, MA|pages=[https://archive.org/details/foundationsstati00mann_118/page/n202 163]–166}}</ref> For a [[bigram]] <math>w_1w_2</math>, let <math>P(w_1) = \frac{\#w_1}{N}</math> be the unconditional probability of occurrence of <math>w_1</math> in a corpus with size <math>N</math>, and let <math>P(w_2) = \frac{\#w_2}{N}</math> be the unconditional probability of occurrence of <math>w_2</math> in the corpus. The t-score for the bigram <math>w_1w_2</math> is calculated as: : <math>t = \frac{\bar{x} - \mu}{\sqrt{\frac{s^2}{N}}}, </math> where <math>\bar{x} = \frac{\# w_iw_j}{N}</math> is the sample mean of the occurrence of <math>w_1w_2</math>, <math>\#w_1w_2</math> is the number of occurrences of <math>w_1w_2</math>, <math>\mu = P(w_i)P(w_j)</math> is the probability of <math>w_1w_2</math> under the null-hypothesis that <math>w_1</math> and <math>w_2</math> appear independently in the text, and <math>s^2 = \bar{x}(1-\bar{x}) \approx \bar{x}</math> is the sample variance. With a large <math>N</math>, the ''t''-test is equivalent to a [[z-test|''Z''-test]]. ==See also== {{Portal|Linguistics}} {{div col|colwidth=22em}} *[[English collocations]] *[[Agreement (linguistics)]] *[[Cliché]] *[[Collocational restriction]] *[[Collostructional analysis]] *[[Compound noun, adjective and verb]] *[[Government (linguistics)]] *[[Idiom (language structure)]] *[[Irreversible binomial]] *[[Isocolon]] *[[Lexical item]] *[[N-gram]] *[[Phrasal verb]] *[[Phraseology]] *[[Phraseme#Collocations|Phraseme]] *[[Sketch Engine]] *[[Statistically improbable phrase]] *[[Word sketch]] {{div col end}} ==References== {{Reflist}} ==External links== {{Wiktionary|collocation}} *[http://ozdic.com/ Ozdic Collocation Dictionary] * [https://doi.org/10.1007%2F978-3-540-24630-5_30 A Small System Storing Spanish Collocations] (Igor A. Bolshakov & Sabino Miranda-Jiménez) * [https://web.archive.org/web/20120317004247/http://www.cic.ipn.mx/posgrados/images/sources/cic/tesis/B001123.pdf Morphological characterization of collocations and semantic relationships in Spanish] (Sabino Miranda-Jiménez & Igor A. Bolshakov) * [https://wordassociations.net/en/words-associated-with/surgery?button=Search Example of collocations for the word "Surgery"] at ''wordassociations.net'' {{Authority control}} [[Category:Lexical units]] [[Category:Language education]] [[Category:Corpus linguistics]] [[Category:Semantic relations]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:About
(
edit
)
Template:Authority control
(
edit
)
Template:Cbignore
(
edit
)
Template:Cite book
(
edit
)
Template:Cite web
(
edit
)
Template:Dead link
(
edit
)
Template:Div col
(
edit
)
Template:Div col end
(
edit
)
Template:Portal
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Sister project
(
edit
)
Template:Webarchive
(
edit
)
Template:Wiktionary
(
edit
)