Trigram search
Template:Short description Trigram search is a method of searching for text when the exact syntax or spelling of the target object is not precisely known<ref>Template:Cite journal Note: This article discusses trigram search as a way of efficiently encoding certain kinds of economic data, and finds that this technique is particularly useful when the users of the system have little context for the structure of the data.</ref> or when queries may be regular expressions.<ref name=":2" /> It finds objects which match the maximum number of three consecutive character strings (i.e. trigrams) in the entered search terms, which are generally near matches.<ref>Template:Cite book</ref> Two strings with many shared trigrams can be expected to be very similar.<ref name=":0" /> Trigrams also allow for efficiently creating search engine indexes for searches that are regular expressions or match the text inexactly. Indexes can significantly accelerate searches.<ref name=":1" /><ref>Template:Cite journal Note: This research paper does not use the term "trigram search" but does seem to be the first instance in the literature of using n-grams as indices, and is cited in the Russ Cox article as the first instance of the structure of a reverse index based on trigrams. The paper also cited successful performance results from using this style of search.</ref> A threshold for number of trigram matches can be specified as a cutoff point, after a result is unmatched.<ref name=":0" />
Using trigrams for accelerating searches is a technique used in some systems for code searching, in situations in which queries that are regular expressions may be useful,<ref name=":1">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref><ref name=":2">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref><ref>{{#invoke:citation/CS1|citation |CitationClass=web }} Also called BigGrep. Uses n-grams (not always 3),</ref> in search engines such as Elasticsearch,<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> as well as in databases such as PostgreSQL.<ref name=":0">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>
ExamplesEdit
Consider the string "alice". The trigrams of the string would be "ali", "lic", and "ice", not including spaces.<ref name=":1" /> Searching for this string in a database with a trigram-based index would involve finding which objects contain as many of the three trigrams as possible.
As a concrete example of using trigram search to search for a regular expression query, consider searching for the string ab[cd]e
, where the brackets denote that the third character in the string being searched for could be c
or d
. In this situation, one could query the index for objects that have the two trigrams abc
and bce
or the two trigrams abd
and bde
. Thus, finding this query would involve no string matching, and could just query the index directly, which can be faster in practice.<ref name=":2" />
See alsoEdit
ReferencesEdit
<references />