Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Border Gateway Protocol
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Operation == BGP neighbors, called peers, are established by manual configuration among [[Router (computing)|router]]s to create a [[Transmission Control Protocol|TCP]] session on [[port (computer networking)|port]] 179. A BGP speaker sends 19-byte keep-alive messages every 30 seconds (protocol default value, tunable) to maintain the connection.<ref>RFC 4274</ref> Among routing protocols, BGP is unique in using TCP as its transport protocol. When BGP runs between two peers in the same [[autonomous system (Internet)|autonomous system]] (AS), it is referred to as ''Internal BGP'' (''iBGP'' or ''Interior Border Gateway Protocol''). When it runs between different autonomous systems, it is called ''External BGP'' (''eBGP'' or ''Exterior Border Gateway Protocol''). Routers on the boundary of one AS exchanging information with another AS are called ''border'' or ''edge routers'' or simply ''eBGP peers'' and are typically connected directly, while ''iBGP peers'' can be interconnected through other intermediate routers. Other deployment [[Network topology|topologies]] are also possible, such as running eBGP [[peering]] inside a [[VPN]] tunnel, allowing two remote sites to exchange routing information in a secure and isolated manner. The main difference between iBGP and eBGP peering is in the way routes that were received from one peer are typically propagated by default to other peers: * New routes learned from an eBGP peer are re-advertised to all iBGP and eBGP peers. * New routes learned from an iBGP peer are re-advertised to all eBGP peers only. These route-propagation rules effectively require that all iBGP peers inside an AS are interconnected in a full mesh with iBGP sessions. How routes are propagated can be controlled in detail via the ''route-maps'' mechanism. This mechanism consists of a set of rules. Each rule describes, for routes matching some given criteria, what action should be taken. The action could be to drop the route, or it could be to modify some attributes of the route before inserting it in the routing table. === Extensions negotiation === During the peering handshake, when OPEN messages are exchanged, BGP speakers can negotiate optional capabilities of the session,<ref>{{cite IETF |rfc=2842 |title=Capabilities Advertisement with BGP-4 |author1=R. Chandra |author2=J. Scudder |date=May 2000}}</ref> including [[Multiprotocol BGP|multiprotocol extensions]]<ref>{{cite IETF |rfc=2858 |title=Multiprotocol Extensions for BGP-4 |author=T. Bates |display-authors=etal |date=June 2000}}</ref> and various recovery modes. If the multiprotocol extensions to BGP are negotiated at the time of creation, the BGP speaker can prefix the Network Layer Reachability Information (NLRI) it advertises with an address family prefix. These families include the IPv4 (default), IPv6, IPv4/IPv6 Virtual Private Networks and multicast BGP. Increasingly, BGP is used as a generalized signaling protocol to carry information about routes that may not be part of the global Internet, such as VPNs.<ref>{{cite IETF |rfc=2547 |title=BGP/MPLS VPNs |author1=E. Rosen |author2=Y. Rekhter |date=April 2004}}</ref> In order to make decisions in its operations with peers, a BGP peer uses a simple [[finite-state machine]] (FSM) that consists of six states: Idle; Connect; Active; OpenSent; OpenConfirm; and Established. For each peer-to-peer session, a BGP implementation maintains a state variable that tracks which of these six states the session is in. The BGP defines the messages that each peer should exchange in order to change the session from one state to another. The first state is the Idle state. In the Idle state, BGP initializes all resources, refuses all inbound BGP connection attempts and initiates a TCP connection to the peer. The second state is Connect. In the Connect state, the router waits for the TCP connection to complete and transitions to the OpenSent state if successful. If unsuccessful, it starts the ConnectRetry timer and transitions to the Active state upon expiration. In the Active state, the router resets the ConnectRetry timer to zero and returns to the Connect state. In the OpenSent state, the router sends an Open message and waits for one in return in order to transition to the OpenConfirm state. Keepalive messages are exchanged and, upon successful receipt, the router is placed into the Established state. In the Established state, the router can send and receive: Keepalive; Update; and Notification messages to and from its peer. * '''Idle State''': ** Refuse all incoming BGP connections. ** Start the initialization of event triggers. ** Initiates a TCP connection with its configured BGP peer. ** Listens for a TCP connection from its peer. ** Changes its state to Connect. ** If an error occurs at any state of the FSM process, the BGP session is terminated immediately and returned to the Idle state. Some of the reasons why a router does not progress from the Idle state are: *** TCP port 179 is not open. *** A random TCP port over 1023 is not open. *** Peer address configured incorrectly on either router. *** AS number configured incorrectly on either router. * '''Connect State''': ** Waits for successful TCP negotiation with peer. ** BGP does not spend much time in this state if the TCP session has been successfully established. ** Sends Open message to peer and changes state to OpenSent. ** If an error occurs, BGP moves to the Active state. Some reasons for the error are: *** TCP port 179 is not open. *** A random TCP port over 1023 is not open. *** Peer address configured incorrectly on either router. *** AS number configured incorrectly on either router. * '''Active State''': ** If the router was unable to establish a successful TCP session, then it ends up in the Active state. ** BGP FSM tries to restart another TCP session with the peer and, if successful, then it sends an Open message to the peer. ** If it is unsuccessful again, the FSM is reset to the Idle state. ** Repeated failures may result in a router cycling between the Idle and Active states. Some of the reasons for this include: *** TCP port 179 is not open. *** A random TCP port over 1023 is not open. *** BGP configuration error. *** Network congestion. *** Flapping network interface. * '''OpenSent State''': ** BGP FSM listens for an Open message from its peer. ** Once the message has been received, the router checks the validity of the Open message. ** If there is an error it is because one of the fields in the Open message does not match between the peers, e.g., BGP version mismatch, the peering router expects a different My AS, etc. The router then sends a Notification message to the peer indicating why the error occurred. ** If there is no error, a Keepalive message is sent, various timers are set and the state is changed to OpenConfirm. * '''OpenConfirm State''': ** The peer is listening for a Keepalive message from its peer. ** If a Keepalive message is received and no timer has expired before reception of the Keepalive, BGP transitions to the Established state. ** If a timer expires before a Keepalive message is received, or if an error condition occurs, the router transitions back to the Idle state. * '''Established State''': ** In this state, the peers send Update messages to exchange information about each route being advertised to the BGP peer. ** If there is any error in the Update message then a Notification message is sent to the peer, and BGP transitions back to the Idle state. === Router connectivity and learning routes === {{technical|section|date=April 2021}} In the simplest arrangement, all routers within a single AS and participating in BGP routing must be configured in a full mesh: each router must be configured as a peer to every other router. This causes scaling problems, since the number of required connections [[quadratic growth|grows quadratically]] with the number of routers involved. To alleviate the problem, BGP implements two options: [[route reflector]]s (RFC 4456) and [[BGP confederation]]s (RFC 5065). The following discussion of basic update processing assumes a full iBGP mesh. A given BGP router may accept [[#Update Packet|network-layer reachability information (NLRI) updates]] from multiple neighbors and advertise NLRI to the same, or a different set, of neighbors. The BGP process maintains several [[routing information base]]s: * <code>RIB</code>: routers main routing information base table. * <code>Loc-RIB</code>: local routing information base BGP maintains its own master routing table separate from the main routing table of the router. * <code>Adj-RIB-In</code>: For each neighbor, the BGP process maintains a conceptual ''adjacent routing information base, incoming'', containing the NLRI received from the neighbor. * <code>Adj-RIB-Out</code>: For each neighbor, the BGP process maintains a conceptual ''adjacent routing information base, outgoing '', containing the NLRI sent to the neighbor. The physical storage and structure of these conceptual tables are decided by the implementer of the BGP code. Their structure is not visible to other BGP routers, although they usually can be interrogated with management commands on the local router. It is quite common, for example, to store the <code>Adj-RIB-In</code>, <code>Adj-RIB-Out</code> and the <code>Loc-RIB</code> together in the same data structure, with additional information attached to the RIB entries. The additional information tells the BGP process such things as whether individual entries belong in the <code>Adj-RIBs</code> for specific neighbors, whether the peer-neighbor route selection process made received policies eligible for the <code>Loc-RIB</code>, and whether <code>Loc-RIB</code> entries are eligible to be submitted to the local router's routing table management process. BGP submits the routes that it considers best to the main routing table process. Depending on the implementation of that process, the BGP route is not necessarily selected. For example, a directly connected prefix, learned from the router's own hardware, is usually most preferred. As long as that directly connected route's interface is active, the BGP route to the destination will not be put into the routing table. Once the interface goes down, and there are no more preferred routes, the Loc-RIB route would be installed in the main routing table. BGP carries the information with which rules inside BGP-speaking routers can make policy decisions. Some of the information carried that is explicitly intended to be used in policy decisions are: * [[#Communities|Communities]] * [[#Multi-exit discriminators|multi-exit discriminators]] (MED). * [[autonomous system (Internet)|autonomous systems]] (AS) === Route selection process === The BGP standard specifies a number of decision factors, more than the ones that are used by any other common routing process, for selecting NLRI to go into the Loc-RIB. The first decision point for evaluating NLRI is that its next-hop attribute must be reachable (or resolvable). Another way of saying the next-hop must be reachable is that there must be an active route, already in the main routing table of the router, to the prefix in which the next-hop address is reachable. Next, for each neighbor, the BGP process applies various standard and implementation-dependent criteria to decide which routes conceptually should go into the Adj-RIB-In. The neighbor could send several possible routes to a destination, but the first level of preference is at the neighbor level. Only one route to each destination will be installed in the conceptual Adj-RIB-In. This process will also delete, from the Adj-RIB-In, any routes that are withdrawn by the neighbor. Whenever a conceptual Adj-RIB-In changes, the main BGP process decides if any of the neighbor's new routes are preferred to routes already in the Loc-RIB. If so, it replaces them. If a given route is withdrawn by a neighbor, and there is no other route to that destination, the route is removed from the Loc-RIB and no longer sent by BGP to the main routing table manager. If the router does not have a route to that destination from any non-BGP source, the withdrawn route will be removed from the main routing table. As long as there is [[tiebreaker]] the route selection process moves to the next step. {| class="wikitable" |+ Steps to determine best path, in order of [[tiebreaker]]: <ref>{{cite web |url=https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/13753-25.html |title=BGP Best Path Selection Algorithm |website=[[Cisco]].com}}</ref> <ref>{{cite web |url=https://www.juniper.net/documentation/us/en/software/junos/vpn-l2/bgp/topics/concept/routing-protocols-address-representation.html |title=Understanding BGP Path Selection |website=[[Juniper]].com}}</ref> ! Step !! Scope !! Name !! Default !! Preferred !! BGP field !! NOTE |- | 1 || Local to router || local Weight || {{notnom|"Off"}} || Higher || || Cisco-specific parameter |- | 2 || rowspan="2" |Internal to AS || Local preference || {{notnom|"Off", all set to 100.}} || Higher || LOCAL_PREF || If there are several iBGP routes from the neighbor, the one with the highest local preference is selected unless there are several routes with the same local preference. |- | 3 || Accumulated Interior Gateway Protocol (AIGP) || {{notnom|"Off"}} || Lowest || AIGP || rfc7311 |- | 4 || rowspan="3" |External to AS || Autonomous system (AS) jumps || {{good|"On", skipped if ignored in configuration}} || Lowest || AS-path || AS jumps is the number of AS numbers that must be traversed to reach the advertised destination. AS1–AS2–AS3 is a shorter path with fewer jumps than AS4–AS5–AS6–AS7. |- | 5 || origin type || "IGP" || Lowest || ORIGIN ||0 = IGP<br />1 = EGP<br />2 = Incomplete |- | 6 || multi-exit discriminator (MED) || {{good|"on", imported from IGP}} || Lowest || MULTI_EXIT_DISC|| By default only route with the same autonomous system (AS) is compared. Can be set to ignore same autonomous system (AS). <br /> By default Internal IGP is not added. Can be set to add IGP metric. Before the most recent edition of the BGP standard, if an update had no MED value, several implementations created a MED with the highest possible value. The current standard specifies that missing MEDs are treated as the lowest possible value. Since the current rule may cause different behavior than the vendor interpretations, BGP implementations that used the nonstandard default value have a configuration feature that allows the old or standard rule to be selected. |- | 7 || rowspan="6" |Local to router (Loc-RIB) || eBGP over iBGP paths || "on" || || || Directly connected, over indirectly |- | 8 || IGP metric to BGP next hop || "on", imported from IGP || Lowest || || Continue, even if bestpath is already selected. Prefer the route with the lowest interior cost to the next hop, according to the main routing table. If two neighbors advertised the same route, but one neighbor is reachable via a low-bitrate link and the other by a high-bitrate link, and the [[interior routing protocol]] calculates lowest cost based on highest bitrate, the route through the high-bitrate link would be preferred and other routes dropped. |- | 9 || Path that was received first || "on" || oldest || || Used to ignore changes on the steps 10+ |- | 10 || Router ID || "on" || Lowest || || |- | 11 || Cluster list length || "on" || Lowest || || |- | 12 || Neighbor address || "on" || Lowest |} The local preference, weight, and other criteria can be manipulated by local configuration and software capabilities. Such manipulation, although commonly used, is outside the scope of the standard. For example, the ''community'' attribute (see below) is not directly used by the BGP selection process. The BGP neighbor process can have a rule to set local preference or another factor based on a manually programmed rule to set the attribute if the community value matches some pattern-matching criterion. If the route was learned from an external peer the per-neighbor BGP process computes a local preference value from local policy rules and then compares the local preference of all routes from the neighbor. === Communities === BGP communities are attribute tags that can be applied to incoming or outgoing prefixes to achieve some common goal.<ref>{{IETF RFC|1997}}</ref> While it is common to say that BGP allows an administrator to set policies on how prefixes are handled by ISPs, this is generally not possible, strictly speaking. For instance, BGP natively has no concept to allow one AS to tell another AS to restrict advertisement of a prefix to only North American peering customers. Instead, an ISP generally publishes a list of well-known or proprietary communities with a description for each one, which essentially becomes an agreement of how prefixes are to be treated. {| class="wikitable" |+ Well-known BGP communities<ref>{{Cite web |title=Border Gateway Protocol (BGP) Well-known Communities |url=https://www.iana.org/assignments/bgp-well-known-communities/bgp-well-known-communities.xhtml |access-date=2022-12-04 |website=www.iana.org}}</ref> ! Attribute value !! Attribute !! Description !! Reference |- | 0x00000000–0x0000FFFF || Reserved || || {{IETF RFC|1997}} |- | 0x00010000–0xFFFEFFFF || Reserved for private use || || {{IETF RFC|1997}} |- | 0xFFFF0000 || GRACEFUL_SHUTDOWN || At neighbor AS-peer, set LOCAL_PREF, lower to route away from source. || {{IETF RFC|8326}} |- | 0xFFFF0001 || ACCEPT_OWN || Used to modify how a route originated within one VRF is imported into other VRFs || {{IETF RFC|7611}} |- | 0xFFFF0002 || ROUTE_FILTER_TRANSLATED_v4 || || RFC draft-l3vpn-legacy-rtc |- | 0xFFFF0003 || ROUTE_FILTER_v4 || || RFC draft-l3vpn-legacy-rtc |- | 0xFFFF0004 || ROUTE_FILTER_TRANSLATED_v6 || || RFC draft-l3vpn-legacy-rtc |- | 0xFFFF0005 || ROUTE_FILTER_v6 || || RFC draft-l3vpn-legacy-rtc |- | 0xFFFF0006 || LLGR_STALE || Stale routes are retained for longer after a session failure || {{IETF RFC|9494}} |- | 0xFFFF0007 || NO_LLGR || LLGR capability should not apply || {{IETF RFC|9494}} |- | 0xFFFF0008 || accept-own-nexthop || || RFC draft-agrewal-idr-accept-own-nexthop |- | 0xFFFF0009 || Standby PE || Allow for faster recovery of connectivity on different types of failures, with multicast in BGP/MPLS VPNs. || {{IETF RFC|9026}} |- | 0xFFFF029A || BLACKHOLE || To temporarily protect against [[denial-of-service attack]] by asking the neighbour AS to discard all traffic to the prefix (blackholing) || {{IETF RFC|7999}} |- | 0xFFFFFF01 || NO_EXPORT || Limit to a BGP confederation boundary || {{IETF RFC|1997}} |- | 0xFFFFFF02 || NO_ADVERTISE || Limit to a BGP peer || {{IETF RFC|1997}} |- | 0xFFFFFF03 || NO_EXPORT_SUBCONFED || Limit to an AS ||{{IETF RFC|1997}} |- | 0xFFFFFF04 || NOPEER || "No need" to advertise over a peer link || {{IETF RFC|3765}} |} Examples of common communities include: * local preference adjustments, * geographic * peer type restrictions * [[denial-of-service attack]] identification * AS prepending options. An ISP might state that any routes received from customers with following examples: * To Customers North America (East Coast) 3491:100 * To Customers North America (West Coast) 3491:200 The customer simply adjusts their configuration to include the correct community or communities for each route, and the ISP is responsible for controlling who the prefix is advertised to. The end user has no technical ability to enforce correct actions being taken by the ISP, though problems in this area are generally rare and accidental.<ref>{{Cite web |title=BGP Community Support {{!}} iFog GmbH |url=https://ifog.ch/en/blog/bgp-community-support |access-date=2022-12-04 |website=ifog.ch}}</ref><ref>{{Cite web |title=BGP communities |url=https://retn.net/bgp-communities |access-date=2022-12-04 |website=retn.net |language=en}}</ref> It is a common tactic for end customers to use BGP communities (usually ASN:70,80,90,100) to control the local preference the ISP assigns to advertised routes instead of using MED (the effect is similar). The community attribute is transitive, but communities applied by the customer very rarely propagated outside the next-hop AS. Not all ISPs give out their communities to the public.<ref>{{cite web|title=BGP Community Guides|url=http://www.onesc.net/communities/|access-date=13 April 2015}}</ref> ==== BGP Extended Community Attribute ==== The BGP Extended Community Attribute was added in 2006,<ref>{{IETF RFC|4360}}</ref> in order to extend the range of such attributes and to provide a community attribute structuring by means of a type field. The extended format consists of one or two octets for the type field followed by seven or six octets for the respective community attribute content. The definition of this Extended Community Attribute is documented in RFC 4360. The IANA administers the registry for BGP Extended Communities Types.<ref>{{Cite web |title=Border Gateway Protocol (BGP) Extended Communities |url=https://www.iana.org/assignments/bgp-extended-communities/bgp-extended-communities.xhtml |access-date=2022-12-04 |website=www.iana.org}}</ref> The Extended Communities Attribute itself is a transitive optional BGP attribute. A bit in the type field within the attribute decides whether the encoded extended community is of a transitive or non-transitive nature. The IANA registry therefore provides different number ranges for the attribute types. Due to the extended attribute range, its usage can be manifold. RFC 4360 exemplarily defines the "Two-Octet AS Specific Extended Community", the "IPv4 Address Specific Extended Community", the "Opaque Extended Community", the "Route Target Community", and the "Route Origin Community". A number of BGP QoS drafts also use this Extended Community Attribute structure for inter-domain QoS signalling.<ref>[http://www.bgp-qos.org/forum/viewforum.php?f=6 IETF drafts on BGP signalled QoS] {{webarchive|url=https://web.archive.org/web/20090223214439/http://www.bgp-qos.org/forum/viewforum.php?f=6 |date=2009-02-23 }}, Thomas Knoll, 2008</ref> With the introduction of 32-bit AS numbers, some issues were immediately obvious with the community attribute that only defines a 16-bit ASN field, which prevents the matching between this field and the real ASN value. Since RFC 7153, extended communities are compatible with 32-bit ASNs. RFC 8092 and RFC 8195 introduce a Large Community attribute of 12 bytes, divided in three field of 4 bytes each (AS:function:parameter).<ref>{{cite web |url=http://largebgpcommunities.net/ |title=Large BGP Communities |access-date=2021-11-27}}</ref> === Multi-exit discriminators === MEDs, defined in the main BGP standard, were originally intended to show to another neighbor AS the advertising AS's preference as to which of several links are preferred for inbound traffic. Another application of MEDs is to advertise the value, typically based on delay, of multiple ASs that have a presence at an [[IXP]], that they impose to send traffic to some destination. Some routers (like Juniper) will use the Metric from OSPF to set MED. '''Examples of MED used with BGP when exported to BGP on Juniper SRX''' <syntaxhighlight lang="console"> # run show ospf route Topology default Route Table: Prefix Path Route NH Metric NextHop Nexthop Type Type Type Interface Address/LSP 10.32.37.0/24 Inter Discard IP 16777215 10.32.37.0/26 Intra Network IP 101 ge-0/0/1.0 10.32.37.241 10.32.37.64/26 Intra Network IP 102 ge-0/0/1.0 10.32.37.241 10.32.37.128/26 Intra Network IP 101 ge-0/0/1.0 10.32.37.241 # show route advertising-protocol bgp 10.32.94.169 Prefix Nexthop MED Lclpref AS path * 10.32.37.0/24 Self 16777215 I * 10.32.37.0/26 Self 101 I * 10.32.37.64/26 Self 102 I * 10.32.37.128/26 Self 101 I </syntaxhighlight>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)