Editing Search engine (computing) (section)

==How search engines work==
Search engines provide an [[interface (computer science)|interface]] to a group of items that enables users to specify criteria about an item of interest and have the engine find the matching items. The criteria are referred to as a [[Web search query|search query]]. In the case of text search engines, the search query is typically expressed as a set of words that identify the desired [[concept]] that one or more [[document]]s may contain.<ref>{{cite web|url= https://paulandre.com/understanding-search-queries/ |title= Understanding Search Queries: How Search Engines Match Your Words to Relevant Documents| website= paulandre.com }}</ref> There are several styles of search query [[syntax]] that vary in strictness. It can also switch names within the search engines from previous sites. Whereas some text search engines require users to enter two or three words separated by [[Whitespace (computer science)|white space]], other search engines may enable users to specify entire documents, pictures, sounds, and various forms of [[natural language]]. Some search engines apply improvements to search queries to increase the likelihood of providing a quality set of items through a process known as [[query expansion]]. [[Query understanding]] methods can be used as standardized query language.

[[Image:search-engine-diagram-en.svg|right|thumb|Index-based search engine]]

The list of items that meet the criteria specified by the query is typically sorted, or ranked. Ranking items by relevance (from highest to lowest) reduces the time required to find the desired information. [[probability|Probabilistic]] search engines rank items based on measures of [[String metric|similarity]] (between each item and the query, typically on a scale of 1 to 0, 1 being most similar) and sometimes [[popularity]] or [[authority]] (see [[Bibliometrics]]) or use [[relevance feedback]]. [[Boolean logic|Boolean]] search engines typically only return items which match exactly without regard to order, although the term ''Boolean search engine'' may simply refer to the use of Boolean-style syntax (the use of operators [[Logical conjunction|AND]], [[Logical disjunction|OR]], NOT, and [[Exclusive nor gate|XOR]]) in a probabilistic context.

To provide a set of matching items that are sorted according to some criteria quickly, a search engine will typically collect [[metadata]] about the group of items under consideration beforehand through a process referred to as [[Index (search engine)|indexing]]. The index typically requires a smaller amount of [[computer storage]], which is why some search engines only store the indexed information and not the full content of each item, and instead provide a method of navigating to the items in the [[Serpent (album)|search engine result page]]. Alternatively, the search engine may store a copy of each item in a [[cache (computing)|cache]] so that users can see the state of the item at the time it was indexed or for archive purposes or to make repetitive processes work more efficiently and quickly.<ref>{{Cite web |title=Internet Basics: Using Search Engines |url=https://edu.gcfglobal.org/en/internetbasics/using-search-engines/1/ |access-date=2022-07-11 |website=GCFGlobal.org |language=en}}</ref>

Other types of search engines do not store an index. [[Web crawler|Crawler]], or spider type search engines (a.k.a. real-time search engines) may collect and assess items at the time of the search query, dynamically considering additional items based on the contents of a starting item (known as a seed, or seed URL in the case of an Internet crawler). [[Meta search engine]]s store neither an index nor a cache and instead simply reuse the index or results of one or more other search engine to provide an aggregated, final set of results.

Database size, which had been a significant marketing feature through the early 2000s, was similarly displaced by emphasis on relevancy ranking, the methods by which search engines attempt to sort the best results first. Relevancy ranking first became a major issue {{circa|1996}}, when it became apparent that it was impractical to review full lists of results. Consequently, [[Algorithm|algorithms]] for relevancy ranking have continuously improved. Google's [[PageRank]] method for ordering the results has received the most press, but all major search engines continually refine their ranking methodologies with a view toward improving the ordering of results. As of 2006, search engine rankings are more important than ever, so much so that an industry has developed ("[[Search engine optimization|search engine optimizers]]", or "SEO") to help web-developers improve their search ranking, and an entire body of [[case law]] has developed around matters that affect search engine rankings, such as use of [[trademarks]] in [[metatags]]. The sale of search rankings by some search engines has also created controversy among librarians and consumer advocates.<ref>{{cite book |last=Stross |first=Randall |url=https://books.google.com/books?id=xOk3EIUW9VgC |title=Planet Google: One Company's Audacious Plan to Organize Everything We Know |date=22 September 2009 |publisher=Simon and Schuster |isbn=978-1-4165-4696-2 |access-date=9 December 2012}}</ref>
[[File:Google_Knowledge_Panel.png|thumb|Google's "Knowledge Panel." This is how information from the Knowledge Graph is presented to users.]]
Search engine experience for users continues to be enhanced. Google's addition of the [[Knowledge Graph|Google Knowledge Graph]] has had wider ramifications for the Internet, possibly even limiting certain websites traffic, for example Wikipedia. By pulling information and presenting it on Google's page, some argue that it can negatively affect other sites. However, there have been no major concerns.<ref>{{Cite web |date=2014-01-08 |title=What do we make of Wikipedia's falling traffic? |url=https://www.dailydot.com/unclick/wikipedia-falling-traffic-meaning/ |access-date=2020-11-01 |website=The Daily Dot |language=en-US}}</ref>