Editing Robots.txt (section)

{{Short description|Filename used to indicate portions for web crawling}}
{{Lowercase}}
{{Selfref|For Wikipedia's robots.txt file, see https://en.wikipedia.org/robots.txt.}}
{{Pp-pc1}}
{{Pp-pc|small=yes}}
{{Infobox technology standard
| title             = robots.txt
| long_name         = Robots Exclusion Protocol
| image             = Robots txt.svg
| image_size        = 
| alt               = 
| caption           = Example of a simple robots.txt file, indicating that a user-agent called "Mallorybot" is not allowed to crawl any of the website's pages, and that other user-agents cannot crawl more than one page every 20 seconds, and are not allowed to crawl the "secret" folder
| abbreviation      = 
| native_name       = <!-- Name in local language. If more than one, separate using {{plain list}} -->
| native_name_lang  = <!-- ISO 639-1 code e.g. "fr" for French. If more than one, use {{lang}} inside native_name items instead -->
| status            = Proposed Standard
| year_started      = <!-- {{Start date|YYYY|MM|DD|df=y}} -->
| first_published   = 1994 published, formally standardized in 2022
| version           = 
| version_date      = 
| preview           = 
| preview_date      = 
| organization      = 
| committee         = 
| series            = 
| editors           = 
| authors           = {{plain list|
* Martijn Koster (original author)
* Gary Illyes, Henner Zeller, Lizzi Sassman (IETF contributors)
}}
| base_standards    = 
| related_standards = 
| predecessor       = 
| successor         = 
| domain            = 
| license           = 
| copyright         = 
| website           = {{URL|https://robotstxt.org}}, {{URL|https://datatracker.ietf.org/doc/html/rfc9309|RFC 9309}}
}}

'''robots.txt''' is the [[filename]] used for implementing the '''Robots Exclusion Protocol''', a standard used by [[website]]s to indicate to visiting [[web crawler]]s and other [[Internet bot|web robots]] which portions of the website they are allowed to visit.

The standard, developed in 1994, relies on [[voluntary compliance]]. Malicious bots can use the file as a directory of which pages to visit, though standards bodies discourage countering this with [[security through obscurity]]. Some archival sites ignore robots.txt. The standard was used in the 1990s to mitigate [[server (computing)|server]] overload. In the 2020s, websites began denying bots that collect information for [[generative artificial intelligence]].

The "robots.txt" file can be used in conjunction with [[sitemaps]], another robot inclusion standard for websites.