Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Web crawler
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== In-house web crawlers === <!-- PLEASE RESPECT ALPHABETICAL ORDER --> * Applebot is [[Apple (company)|Apple]]'s web crawler. It supports [[Siri]] and other products.<ref>{{cite web |title=About Applebot |url=https://support.apple.com/en-us/HT204683 |publisher=Apple Inc |access-date=18 October 2021}}</ref> * [[Bingbot]] is the name of Microsoft's [[Bing (search engine)|Bing]] webcrawler. It replaced ''[[Msnbot]]''. * Baiduspider is [[Baidu]]'s web crawler. * DuckDuckBot is [[DuckDuckGo]]'s web crawler. * [[Googlebot]] is described in some detail, but the reference is only about an early version of its architecture, which was written in C++ and [[Python (programming language)|Python]]. The crawler was integrated with the indexing process, because text parsing was done for full-text indexing and also for URL extraction. There is a URL server that sends lists of URLs to be fetched by several crawling processes. During parsing, the URLs found were passed to a URL server that checked if the URL have been previously seen. If not, the URL was added to the queue of the URL server. * [[WebCrawler]] was used to build the first publicly available full-text index of a subset of the Web. It was based on [[Libwww|lib-WWW]] to download pages, and another program to parse and order URLs for breadth-first exploration of the Web graph. It also included a real-time crawler that followed links based on the similarity of the anchor text with the provided query. * [[WebFountain]] is a distributed, modular crawler similar to Mercator but written in C++. * [[Xenon (program)|Xenon]] is a web crawler used by government tax authorities to detect fraud.{{r|Norton-2007-01-25}}{{r|Canada-2017-04-11}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)