Editing Gnutella (section)

== Design ==
[[File:GnutellaQuery.JPG|thumb|upright=1.5|alt=A diagram of Gnutella nodes and their connections.|The gnutella search and retrieval protocol]]
To envision how Gnutella originally worked, imagine a large circle of users ''(called nodes),'' each of whom has Gnutella client software. On initial startup, the client software must [[Bootstrapping#Computing|bootstrap]] and find at least one other node.  Various methods have been used for this, including a pre-existing address list of possibly working nodes shipped with the software, using updated web caches of known nodes (called ''Gnutella Web Caches''), UDP host caches and, rarely, even [[Internet Relay Chat|IRC]]. Once connected, the client requests a list of working addresses. The client tries to connect to the nodes it was shipped with, as well as nodes it receives from other clients until it reaches a certain quota. It connects to only that many nodes, locally caching the addresses which it has not yet tried and discarding the addresses which it tried and found to be invalid.<ref>{{Cite web |title=How Auto Discovery Works - Amazon ElastiCache |url=https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.HowAutoDiscoveryWorks.html |access-date=2023-06-08 |website=docs.aws.amazon.com |archive-date=2023-03-30 |archive-url=https://web.archive.org/web/20230330091345/https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.HowAutoDiscoveryWorks.html |url-status=live }}</ref><!--try http://gnufu.net (primary source)-->

When the user wants to do a search, the client sends the request to each actively connected node. In version 0.4 of the protocol, the number of actively connected nodes for a client was quite small (around 5).  In that version of the protocol, each node forwards the request to all its actively connected nodes, who, in turn, forward the request.  This continues until the packet has reached a predetermined number of ''hops'' from the sender (maximum 7).<ref>{{cite book |last1=Moon |first1=Jongbae |last2=Cho |first2=Yongyun |title=Computational Science and Its Applications - ICCSA 2011 Proceedings |date=2011 |publisher=Springer |isbn=978-3-642-21897-2 |chapter-url=https://books.google.com/books?id=3ivd-NwNCN8C&pg=PA464 |language=en |page=464 |chapter=A point-based inventive system to prevent free-riding on p2p network environments |access-date=2022-03-10 |archive-date=2023-01-17 |archive-url=https://web.archive.org/web/20230117212316/https://books.google.com/books?id=3ivd-NwNCN8C&pg=PA464 |url-status=live }}</ref>

Since version 0.6 (2002<ref>{{Cite web|url=http://rfc-gnutella.sourceforge.net/src/rfc-0_6-draft.html|title=Gnutella Protocol Development|website=rfc-gnutella.sourceforge.net|access-date=2017-04-13|archive-date=2017-05-12|archive-url=https://web.archive.org/web/20170512103209/http://rfc-gnutella.sourceforge.net/src/rfc-0_6-draft.html|url-status=live}}</ref>), Gnutella is a composite network made of leaf nodes and ultra nodes (also called ultrapeers). The leaf nodes are connected to a small number of ultrapeers (typically 3) while each ultrapeer is connected to more than 32 other ultrapeers. With this higher [[outdegree]], the maximum number of ''hops'' a query can travel was lowered to 4.

Leaves and ultrapeers use the Query Routing Protocol to exchange a Query Routing Table (QRT), a table of 64 [[Binary prefix#IEC standard prefixes|Ki]]-slots and up to 2 [[Binary prefix#IEC standard prefixes|Mi]]-slots consisting of hashed keywords. A leaf node sends its QRT to each of the ultrapeers to which it is connected, and ultrapeers merge the QRT of all their leaves (downsized to 128 [[binary prefix#IEC standard prefixes|Ki]]-slots) plus their own QRT (if they share files) and exchange that with their own neighbors. Query routing is then done by hashing the words of the query and seeing whether all of them match in the QRT. Ultrapeers do that check before forwarding a query to a leaf node, and also before forwarding the query to a peer ultra node provided this is the last hop the query can travel.{{fact|date=April 2025}}

If a search request turns up a result, the node that has the result contacts the searcher. In the classic Gnutella protocol, response messages were sent back along the route taken by the query, as the query itself did not contain identifying information for the node. This scheme was later revised, to deliver search results over [[User Datagram Protocol|UDP]], directly to the node that initiated the search, usually an ultrapeer of the node. Thus, in the current protocol, the queries carry the [[IP address]] and port number of either node. This lowers the amount of traffic routed through the Gnutella network, making it significantly more scalable.<ref>{{Cite report |last1=Ripeanu |first1=Matei |last2=Nakai |first2=Yugo |title=Topology of Gnutella Network: Discovery and Analysis |url=https://www.academia.edu/2893724 |access-date=2023-06-08 |archive-date=2023-11-03 |archive-url=https://web.archive.org/web/20231103002639/https://www.academia.edu/2893724 |url-status=live }}{{self-published inline|date=April 2025}}</ref>

If the user decides to download the file, they negotiate the [[file transfer]]. If the node which has the requested file is not [[Firewall (computing)|firewalled]], the querying node can connect to it directly. However, if the node is firewalled, stopping the source node from receiving incoming connections, the client wanting to download a file sends it a so-called ''push request'' to the server for the remote client to initiate the connection instead (to ''push'' the file). At first, these push requests were routed along the original chain it used to send the query. This was rather unreliable because routes would often break and routed packets are always subject to flow control. ''push proxies'' were introduced to address this problem. These are usually the ultrapeers of a leaf node and they are announced in search results. The client connects to one of these ''push proxies'' using an HTTP request and the proxy sends a ''push request'' to a leaf on behalf of the client. Normally, it is also possible to send a push request over UDP to the push proxy, which is more efficient than using TCP. Push proxies have two advantages: First, ultrapeer-leaf connections are more stable than routes.  This makes push requests much more reliable. Second, it reduces the amount of traffic routed through the Gnutella network.<ref>{{Cite web |date=2022-05-09 |title=Gnutella clients that still work |url=https://questhalo.amebaownd.com/posts/34235509 |access-date=2023-06-08 |website=apreasnisuf1984's Ownd |language=ja |archive-date=2023-11-03 |archive-url=https://web.archive.org/web/20231103002640/https://questhalo.amebaownd.com/posts/34235509 |url-status=live }}</ref><!--try http://gnufu.net (primary source)-->

Finally, when a user disconnects, the client software saves a list of known nodes.
This contains the nodes to which the client was connected and the nodes learned from pong packets.
The client uses that as its seed list, when it next starts, thus becoming independent of bootstrap services.<ref>{{cite journal |last1=Franzoni |first1=Federico |last2=Daza |first2=Vanesa |title=SoK: Network-Level Attacks on the Bitcoin P2P Network |journal=IEEE Access |date=2022 |volume=10 |pages=94924–94962 |doi=10.1109/ACCESS.2022.3204387 |bibcode=2022IEEEA..1094924F |hdl=10230/55353 |hdl-access=free }}</ref><!--try http://gnufu.net (primary source)-->

In practice, this method of searching on the Gnutella network was often unreliable. Each node is a regular computer user; as such, they are constantly connecting and disconnecting, so the network is never completely stable. Also, the bandwidth cost of searching on Gnutella grew exponentially to the number of connected users,<ref>[http://www.darkridge.com/~jpr5/doc/gnutella.html Why Gnutella Can't Scale. No, Really.] {{Webarchive|url=https://web.archive.org/web/20170806013417/http://www.darkridge.com/~jpr5/doc/gnutella.html |date=2017-08-06 }} February 2001.{{self-published inline|date=April 2025}}</ref> often saturating connections and rendering slower nodes useless. Therefore, search requests would often be dropped, and most queries reached only a very small part of the network. This observation identified the Gnutella network as an [[Scalability|unscalable]] distributed system, and inspired the development of [[distributed hash table]]s, which are much more scalable but support only exact-match, rather than keyword, search.<ref>{{cite conference |last1=Bischofs |first1=Ludger |last2=Hasselbring |first2=Wilhelm |title=A Hierarchical Super Peer Network for Distributed Software Development |conference=Proceedings of CSSE 2004 Workshop on Cooperative Support for Distributed Software Engineering Processes |date=September 2004 |pages=99–106 |url=https://oceanrep.geomar.de/id/eprint/14574/ |publisher=Austrian Computer Society }}</ref><!--try http://gnufu.net (primary source)-->

To address the problems of [[bottleneck (engineering)|bottleneck]]s, Gnutella developers implemented a tiered system of ''ultrapeers'' and ''leaves''.  Instead of all nodes being considered equal, nodes entering the network were kept at the 'edge' of the network, as a leaf.  Leaves don't provide routing.  Nodes which are capable of routing messages are promoted to ultrapeers.  Ultrapeers accept leaf connections and route searches and network maintenance messages.  This allows searches to propagate further through the network and allows for numerous alterations in topology.  This greatly improved efficiency and scalability.{{fact|date=April 2025}}

Additionally, gnutella adopted a number of other techniques to reduce traffic overhead and make searches more efficient. Most notable are Query Routing Protocol (QRP) and Dynamic Querying (DQ). With QRP, a search reaches only those clients which are likely to have the files, so searches for rare files become far more efficient.  With DQ, the search stops as soon as the program has acquired enough search results.  This vastly reduces the amount of traffic caused by popular searches.<ref>{{Cite web |title=DQ in Arabic - English-Arabic Dictionary {{!}} Glosbe |url=https://glosbe.com/en/ar/DQ |access-date=2023-06-08 |website=glosbe.com |language=en |archive-date=2024-04-09 |archive-url=https://web.archive.org/web/20240409044806/https://glosbe.com/en/ar/DQ |url-status=live }}</ref><!--try http://gnufu.net (primary source)-->

One of the benefits of having Gnutella so decentralized is to make it very difficult to shut the network down and to make it a network in which the users are the only ones who can decide which content will be available. Unlike [[Napster]], where the entire network relied on the central server, Gnutella cannot be shut down by shutting down any one node. A decentralized network prevents bad actors from taking control of the contents of the network and/or manipulating data by controlling the central server.<ref>{{cite web |url=https://www.berkes.ca/archive/berkes_gnutella_freenet.pdf |title=Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet |last=Berkes |first=Jem |publisher=University of Manitoba |date=April 9, 2003 |website=berkes.ca/ |access-date=October 26, 2019 |archive-url=https://web.archive.org/web/20170808170634/http://www.berkes.ca/archive/berkes_gnutella_freenet.pdf |archive-date=August 8, 2017 |url-status=dead }}</ref>