Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Full-text search
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Indexing== When dealing with a small number of documents, it is possible for the full-text-search engine to directly scan the contents of the documents with each [[Information retrieval|query]], a strategy called "[[Serial memory processing|serial scanning]]". This is what some tools, such as [[grep]], do when searching. However, when the number of documents to search is potentially large, or the quantity of search queries to perform is substantial, the problem of full-text search is often divided into two tasks: indexing and searching. The indexing stage will scan the text of all the documents and build a list of search terms (often called an [[Search index|index]], but more correctly named a [[concordance (publishing)|concordance]]). In the search stage, when performing a specific query, only the index is referenced, rather than the text of the original documents.<ref name="Capabilities of Full Text Search System ">{{Cite web|url=http://www.lucidimagination.com/full-text-search|archiveurl=https://web.archive.org/web/20101223192214/http://www.lucidimagination.com/full-text-search|url-status=dead|title=Capabilities of Full Text Search System|archivedate=December 23, 2010}}</ref> The indexer will make an entry in the index for each term or word found in a document, and possibly note its relative position within the document. Usually the indexer will ignore [[stop words]] (such as "the" and "and") that are both common and insufficiently meaningful to be useful in searching. Some indexers also employ language-specific [[stemming]] on the words being indexed. For example, the words "drives", "drove", and "driven" will be recorded in the index under the single concept word "drive".
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)