Editing Googlebot

{{short description|Web crawler used by Google}}
{{Multiple issues|{{More citations needed|date=October 2019}}
{{Update|date=March 2020}}}}
{{Infobox software
|name                       = Googlebot
|screenshot                  =
|caption                    =
|logo                       = Google 2015 logo.svg

|author                     = [[Google]]
|developer                  =
|released                   = <!-- {{Start date|YYYY|MM|DD}} -->
|discontinued               =
|latest release version     =
|latest release date        = <!-- {{Start date and age|YYYY|MM|DD}} -->
|latest preview version     =
|latest preview date        = <!-- {{Start date and age|YYYY|MM|DD}} -->
|programming language       =
|operating system           =
|platform                   =
|size                       =
|language                   =
|genre                      = [[Web crawler]]
|license                    =
|website                    = [https://developers.google.com/search/docs/crawling-indexing/googlebot Googlebot FAQ]
}}

'''Googlebot''' is the [[web crawler]] software used by [[Google]] that collects documents from the [[World Wide Web|web]] to build a searchable index for the [[Google Search]] engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler (to simulate desktop users) and a mobile crawler (to simulate a mobile user).<ref>{{Cite web|url=https://support.google.com/webmasters/answer/182072?hl=en|title=Googlebot|date=2019-03-11|website=Google|access-date=2019-03-11}}</ref>

== Behavior ==

A website will probably be crawled by both Googlebot Desktop and Googlebot Mobile. However starting from September 2020, all sites were switched to mobile-first indexing, meaning Google is crawling the web using a smartphone Googlebot.<ref>{{Cite web|url=https://developers.google.com/search/blog/2020/03/announcing-mobile-first-indexing-for|title=Announcing mobile first indexing for the whole web|website=Google Developers|access-date=2021-03-17}}</ref> The subtype of Googlebot can be identified by looking at the user agent string in the request. However, both crawler types obey the same product token (useent token) in robots.txt, and so a developer cannot selectively target either Googlebot mobile or Googlebot desktop using robots.txt.

Google provides various methods that enable website owners to manage the content displayed in Google's search results. If a [[webmaster]] chooses to restrict the information on their site available to a Googlebot, or another [[Web spider|spider]], they can do so with the appropriate directives in a [[robots.txt]] file,<ref name="tools">{{cite web|url=https://search.google.com/search-console/about|title=Google Search Console|website=Google.com}}</ref> or by adding the [[Meta element|meta tag]] <code><nowiki><meta name="Googlebot" content="nofollow" /></nowiki></code> to the web page.<ref>{{Cite web|url=https://search.google.com/search-console/about|title=Google Search Console|website=search.google.com|access-date=2019-03-11}}</ref> Googlebot requests to [[Web server]]s are identifiable by a [[user-agent]] string containing "Googlebot" and a host address containing "googlebot.com".<ref>{{Cite web| url=https://developers.google.com/search/docs/advanced/crawling/googlebot|date=May 2022|title=What is Googlebot &#124; Google Search Central &#124; Documentation }}</ref> 

Currently, Googlebot follows [[HREF]] links and SRC links.<ref name="tools"/>  There is increasing evidence Googlebot can execute JavaScript and parse content generated by [[Ajax (programming)|Ajax]] calls as well.<ref>{{Cite web|title=Understand the JavaScript SEO basics {{!}} Search for Developers|url=https://developers.google.com/search/docs/guides/javascript-seo-basics|access-date=2020-07-26|website=Google Developers|language=en}}</ref> There are many theories regarding how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability derived from custom interpreters.<ref>{{cite web|url=https://www.youtube.com/watch?v=LXF8bM4g-J4 |archive-url=https://ghostarchive.org/varchive/youtube/20211212/LXF8bM4g-J4| archive-date=2021-12-12 |url-status=live|title=How Google Search indexes JavaScript sites - JavaScript SEO|last=Splitt|first=Martin|website=YouTube|date=28 February 2019 }}{{cbignore}}</ref> Currently, Googlebot uses a web rendering service (WRS) that is based on the Chromium rendering engine (version 74 as on 7 May 2019).<ref name="evergreen">{{Cite web|url=https://webmasters.googleblog.com/2019/05/the-new-evergreen-googlebot.html|title=The new evergreen Googlebot|website=Official Google Webmaster Central Blog|language=en|access-date=2019-06-07}}</ref> Googlebot discovers pages by harvesting every link on every page that it can find. Unless prohibited by a [[nofollow]]-tag, it then follows these links to other web pages. New web pages must be linked to from other known pages on the web in order to be crawled and indexed, or manually submitted by the webmaster.

A problem that webmasters with low-bandwidth [[Web hosting]] plans{{citation needed|date=May 2019}} have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.{{Citation needed|date=March 2011}}  This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for [[Web mirror|mirror]] sites which host many [[gigabyte]]s of data. Google provides "[[Google Search Console|Search Console]]" that allow website owners to throttle the crawl rate.<ref>{{cite web|url=https://www.google.com/webmasters/|title=Google - Webmasters|access-date=2012-12-15}}</ref>

How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how typically a website is updated.{{Citation needed|date=May 2018}} Technically, Googlebot's development team (Crawling and Indexing team) uses several defined terms internally to take over what "crawl budget" stands for.<ref>{{Cite news|url=https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-googlebot.html|title=What Crawl Budget Means for Googlebot|work=Official Google Webmaster Central Blog|access-date=2018-07-04|language=en-US}}</ref> Since May 2019, Googlebot uses the latest [[Chromium (web browser)|Chromium]] rendering engine, which supports [[ECMAScript 6]] features. This will make the bot a bit more "evergreen" and ensure that it is not relying on an outdated rendering engine compared to browser capabilities.<ref name="evergreen" /> 

== Mediabot ==
''Mediabot'' is the [[web crawler]] that [[Google]] uses for analyzing the content so [[AdSense|Google AdSense]] can serve [[contextual advertising|contextually relevant]] advertising to a web page. Mediabot identifies itself with the [[user agent]] string "Mediapartners-Google/2.1".

Unlike other crawlers, Mediabot does not follow links to discover new crawlable URLs, instead only visiting URLs that have included the AdSense code.<ref>{{cite web|url=https://support.google.com/adsense/answer/99376?hl=en&ref_topic=1348129|title = About the AdSense Crawler}}</ref> Where that content resides behind a login, the crawler can be given a log in so that it is able to crawl protected content.<ref>{{cite web|url=https://support.google.com/adsense/answer/161351|title = Display ads on login-protected pages}}</ref>

== Inspection Tool Crawlers==
''InspectionTool'' is the crawler used by Search testing tools such as the Rich Result Test and URL inspection in [[Google Search Console]]. Apart from the user agent and user agent token, it mimics Googlebot.<ref>{{cite web|url=https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers|title = Google Crawler (User Agent) Overview}}</ref>

A guide to the crawlers was independently published.<ref>{{cite web|url=https://strategicmarketinghouse.com/the-ultimate-guide-to-the-new-inspectiontool-crawlers/|title = The Ultimate Guide to the New InspectionTool Crawlers}}</ref> It details four (4) distinctive crawler agents based on [[Web server directory index]] data - one (1) non-chrome and three (3) chrome crawlers.

==References==
{{reflist}}

==External links==
*[https://developers.google.com/search/docs/crawling-indexing/googlebot Google's official Googlebot FAQ]

{{Google LLC}}
{{Web crawlers}}

[[Category:Google software]]
[[Category:Web crawlers]]
[[Category:Internet bots]]
[[Category:Google Search]]