Editing Accuracy and precision (section)

==In information systems==

Information retrieval systems, such as [[database]]s and [[web search engine]]s, are evaluated by [[Evaluation measures (information retrieval)|many different metrics]], some of which are derived from the [[confusion matrix]], which divides results into true positives (documents correctly retrieved), true negatives (documents correctly not retrieved), false positives (documents incorrectly retrieved), and false negatives (documents incorrectly not retrieved). Commonly used metrics include the notions of [[precision and recall]]. In this context, precision is defined as the fraction of documents correctly retrieved compared to the documents retrieved (true positives divided by true positives plus false positives), using a set of [[ground truth]] relevant results selected by humans. Recall is defined as the fraction of documents correctly retrieved compared to the relevant documents (true positives divided by true positives plus false negatives). Less commonly, the metric of accuracy is used, is defined as the fraction of documents correctly classified compared to the documents (true positives plus true negatives divided by true positives plus true negatives plus false positives plus false negatives).

None of these metrics take into account the ranking of results. Ranking is very important for web search engines because readers seldom go past the first page of results, and there are too many documents on the web to manually classify all of them as to whether they should be included or excluded from a given search. Adding a cutoff at a particular number of results takes ranking into account to some degree. The measure [[precision at k]], for example, is a measure of precision looking only at the top ten (k=10) search results. More sophisticated metrics, such as [[discounted cumulative gain]], take into account each individual ranking, and are more commonly used where this is important.