BGP module type declarations
Contains a pointer to allocated memory and current processing state
within the buffers. Used to store input data received from a particular
peer. Part of bgpPeer
and bgpProtoPeer
structures. Management routines found in bgp_buffer*()
.
The corresponding data structure for holding data to be
sent to a peer. Part of the bgpPeer
structure only.
Management routines found in bgp_buffer*()
.
Structure to hold peer and group configuration information.
Part of bgpPeer
and bgpGroup
structures.
These structures are instantiated in the parser depending on
the values specified in the configuration information.
A per nexthop metrics structure containing one or more of a
multi-exit-discriminator, a localpref or a route tag. In
addition, there is a byte field which indicates which of these
tags are present. This is associated with routes advertised by
a BGP instance bgp_adv_entry
and an outgoing route
send list bgp_rto_entry
. Metrics structures are
stored in a Patricia tree.
An internal node of the metrics Patricia tree described in
the metrics manipulation macros.
Queue of bgp_adv_entry
s, a list of routes advertised
to each peer. Part of each bgpPeer
structure and is
used for external peers only. Also part of the bgpPeerGroup
structure and used for internal groups there. Most of the manipulations
on this structure is done by the BGP_ADV*
set of macros.
An entry in a bgp_adv_queue
. This contains a pointer to the
route entry and the set of metrics associated with the route.
Manipulated by of the BGP_ADV* set of macros.
Structure containing incoming route advertisements from a peer. Part
of a doubly-linked list of such advertisements. This linked list is
maintained in the bgpPeer
structure. The
BGP_RTI*
macros are used to manipulate this linked list.
As far as I can tell, nothing is ever stored in a peer's incoming route
list (all routes are immediately dumped into the GateD routing database
after policy checking).
Contains a list of routes to be sent in update messages to a peer (the
bgp_adv_queue
, in contrast, contains the Adj-Rib-Out to
the peer.). Each entry is doubly linked into a chain of
bgp_rto_entry
structures. The queue head of each such list
is stored in the bgp_asp_list
structure. Now, each
bgpPeer
struct contains the head of a
bgp_rt_queue
struct. This queue is threaded through the
bgp_asp_queue
struct in the bgp_asp_list
.
That is how the bgp_rto_entry
is associated with the
bgpPeer
struct. The BGP_RTO*
macros are used
to manipulate this linked list.
This structure is similar to the bgp_rto_entry
, but gets
used in bgpPeerGroup
s instead. Unlike the
bgp_rto_entry
, this contains a linked list of
bgp_rtinfo_entry
s, which are used to store previously
advertised metrics and the peers in the group to which the update
should be sent. The BGP-GRTO*
macros are used to
manipulate this linked list. The BRT_INFO*
macros are used
to allocate and free these entries.
This structure is part of the bgpg_rto_entry
and contains
the last metric and as path sent out for a given advertisement.
In addition, it contains a bit vector used to determine which peers
in non-external groups get to receive this announcement.
This structure simply contains a pointer to an AS path
struct. bgp_rt_queue
structures form a doubly
linked list that are used both in the bgpPeer
structure
(for storing routes that should be advertised to an external peer)
and in the bgpPeerGroup
structure (for storing routes
to be advertised to internal peers as well). Both these lists
actually thread through the bgp_rt_queue
struct
contained in the body of a bgp_asp_list
struct, which
contains a list of bgp(g)_rto_entry
s. Conceptually,
this structure also forms the head of a queue of AS paths (and routes
that share that AS path). The head of the queue contains a pointer to
an AS path hash table of the AS paths in the list, for easy retrieval.
This contains a single AS path (embedded in the bgpl_rt_queue
)
field and a pointer to a list of bgp(g)_rto_entries
, each
of which contains a route to be advertised. This forms the
superstructure to the list of routes to be advertised to peers
in the bgpPeer and bgpPeerGroup structures.
Found only in bgpPeerGroup
, this is the list of local interfaces
on which peers in this group are running.
Basic state block for maintaining state info about a BGP peer.
Such state includes the associated task, configuration info
(metrics, timers and suchlike), protocol processing state,
buffers and incoming routes and outgoing routes. In addition,
this contains a back pointer to the associated peer group.
Basic state block for maintaining state info about gated peer
groups. This includes peer list, different counters/bit maps,
outgoing route queues for internal groups and so on.
Temporary state block for a peer from which a connection
has been received, but no OPEN message has been received.
Contains the task associated with the peer and some buffer
space. Replaced by the bgpPeer
after successful open.
BGP module macros
Defines and macros related to staggering connect() attempts
to peers. Every 256 second time cycle is split up into 64 time
slots. Each slot is 4 seconds wide. During a slot, a number
of peer connection requests may be initiated. gated attempts to
spread out the cost of connection over a number of slots.
The number of connects() to be attempted during any slot is
stored in the bgp_connect_slots[]
array. Used mostly in
the bgp_*_connect_timer()
functions.
Macros to initialize, test the presence of elements, get
the next element, insert and delete elements from a bgp_adv_queue.
Standard doubly-linked list operations.
Removing an element from a list of bgp_[g]rt[oi]_entry
lists.
Doubly-linked list removal.
A set of macros to initialize, test for null queues, insert
and delete elements from both the bgp_[g]rt[oi]
entry lists and
the bgp_rt_queue
structures in the bgp_aspl_list
structure. Recall that this structure is doubly chained: through
the as paths and through the rto lists. Much of it is standard
list operations.
Each non-external group maintains a bit mask of some number of
words. Each peer within this group is assigned a position in this
bitmask. This position is indicated by the bgp_group_bit
field in the bgpPeer
structure. This set of macros helps
in the manipulation of these bitmasks (e.g. computing which word and
which bit within the word the peer's bit mask is in, checking if a
peer's mask is set, check if two bitmaps are equal by stepping through
each word constituting the bitmask etc.).
List operations for the bgp_rti_entry
queues.
Macros to read different sized quantities (bytes, shorts etc.)
from the TCP byte stream. Also macros to extract different
fields from BGP messages.
The write analogs of the BGP_GET* macros.
BGP module function definitions
Indicate a BGP FSM state transition caused by a "BGP event".
This change of state, the event that caused it, and the
previous state and event are stored in the bgpPeer structure.
An event is one of BGPEVENT_*
found (typical
events include BGP message receipts, timer expirations and TCP
connection status changes). State is one of BGPSTATE_*
.
Called when a open/notify/keepalive/update message is to be sent.
We get a completely formed BGP packet here and all we need to
do is to send it out of the socket. If the send fails for the
reason that the write may have been blocked, we spool the unset
part of the message in the outbuffer associated with the bgpPeer
structure and schedule the remainder of the packet to be sent
later. Otherwise, we retry three times before giving up.
If the message was only partially sent, we repeatedly try to send
the rest of the message.
Called either when a BGP module initiates an open or when an
open message is received from a peer. In the latter case, the
caller specifies the remote peer's version. Simply figure
out what to put into the different header fields, form a
complete packet and hand off to bgp_send()
.
Send a keepalive message to a peer. As above, form a complete
BGP message and hand off to bgp_send()
.
Different flavors of a notification send. Formulate a complete
notification message and call bgp_send()
.
Send an error message to a proto-peer. To recall, a protoPeer struct
is a state block for those peers which have not yet reached
OpenConfirm state (?). This function is called when there is
an error in processing an incoming message on such a peer.
Before we send this error message, we need to send an open
(in case the peer hasn't yet instantiated any state about us).
This function makes a best guess about the parameters (e.g version)
of such an open and then sends a notify message.
Different flavors of the bgp_pp_notify()
function for different
sized notification data to be sent.
This routine reads a message (or part of it) directly from
a socket into the peer's buffer. This is the lowest level
function in the read chain, and is called from bgp_read_message()
and other functions. It first tries to make space in the input
buffer for the specified amount of data, and then reads directly
from the socket. It complains if a hard error occurred while
reading. Funnily enough, in all cases bgp_recv
is called with
maxread = 0, with the caller adjusting the buffers to make sure
enough can be read.
This function is the next higher level up in the read chain.
That is, in different states, a BGP peer expects to read different
kinds of messages. Functions like bgp_recv_open()
and
bgp_pp_recv()
perform these tasks. They all call
bgp_read_message()
. This
function calls bgp_recv()
to read the entire message from the
socket, checks that the length of data read is at least that
expected for the type of message. Also, a caller may specify
upto two types of messages it expects to read: if so, this
function checks the type read to see if there is a match.
This function parses an open message. It expects the entire
message to be in the peer's input buffer. This function is called
from two places: bgp_recv_open()
and bgp_pp_recv()
.
It merely checks the version, holdtime, identifier and authentication fields
for correctness. It also initiates version negotiation with the
peer by sending a notification with the highest version it knows,
if the received version number is not known to us.
This function gets called directly from the task module when
a BGP module is in the OpenSent state or the OpenConfirm state.
In the former case, it is waiting to receive an Open message
from a peer (see bgp_peer_connected()
, or
bgp_pp_recv()
). In the latter, it is waiting to receive a
keep alive or a notification (respectively an ack or nack from
the peer to the Open message that it sent). Code for these
two should be separated out, for cleanliness.
If the peer is in OpenSent state, we read the open message,
do the initial sanity checking, then do a version negotiation
for the case when we support his version number but we also
support a higher version. We also check if his AS number matches
the one we were configured with. Finally, we do an authentication
check before doing a state transition to OpenConfirm.
If the peer is in a OpenConfirm state, simply check that either
a notify or a keep alive was received. In either case, check to
make sure that there is a version match, else we have to re-initiate
version negotiation.
This function gets called directly from a task module when a
BGP module has gotten a TCP connection from a peer and is
waiting for an Open message to arrive. This function first
parses the Open message and does the initial sanity checks. It
then extracts the bgpPeerGroup structure corresponding to the
address, AS number and authentication information in the Open
message.
In GateD, a peer may be explicitly configured within a
group through a peer clause or implicitly through the allow
clause by using a network address/mask pair. In the latter
case, we may have a bgpPeerGroup
structure for the peer, but no
corresponding peer structure. If so, we check if the group is
in one of the allowed states and whether the peer's version is
correct, initiation version re-negotiation otherwise. Else, we
establish a bgpPeer
structure for the peer. Finally, we send
an Open message to the peer and transition our state to
OpenConfirm.
Functions to generate character-string names for BGP peers
and groups, using pre-defined generation rules.
GateD's BGP module keeps a free list of bgpPeer
structures
in bgp_free_list
. This routine allocates an element from
this free list; if none is available, it malloc's some
appropriately sized memory.
Adds a bgpPeer
struct to the free list.
Free()s all the bgpPeer
structs from the free list.
Add a bgpPeer
structure to the list of peers in a
bgpPeerGroup
. This list contains "normal" peers, followed
by "unconfigured" peers (e.g. those which are "allowed" but not
explicitly configured), followed by peers which have been deleted but
are awaiting delete processing. This function adds the peer to this
list according to which of these groups the peer belongs. This
function is called: (i) when a peer structure has to be added after
receiving an Open from a proto-peer (ii) from the parser when an
explicit peer clause is encountered and (iii) from
bgp_init()
when BGP is restarted after re-configuration.
Walk down the peer list in a bgpPeerGroup
and remove the entry
from the list. Caller frees up memory. This is called either
when we are reconfiguring a running BGP or when we are closing
down a peer connection.
Similar to bgp_peer_alloc()
, but for bgpPeerGroup
structures.
Add the specified bgpPeerGroup
structure to the end of the
global list of groups bgp_group
s. This function is called
from bgp_conf_group_add()
to link a group declared in the
configuration file.
Allocate an input buffer and link it to a buffer descriptor
block (that is, a bgpBuffer
structure). Initialize the descriptor
block.
Free the memory associated with the data buffer. This does
not free the descriptor block.
Copy only the buffer descriptor. Make sure that the "to"
descriptor does not have an allocated buffer.
The bgpOutBuffer
is allocated differently from the input buffer.
The descriptor is contiguous with, and precedes the data area.
In addition, BGP maintains a single element cache of freed output
buffers. This functions checks if there is something in the cache,
else calls the task memory allocator. It then initializes the
other fields of the descriptor structure.
This fills the cache. If a buffer already exists in the cache,
the function zero's the entire buffer and calls the memory
de-allocator.
Frees the cache element.
Called from bgp_listen_accept()
when we receive a TCP connection
request on our well-known socket from a potential peer. This
function allocates and initializes a bgpProtoPeer
structure,
allocates an input buffer for the proto-peer and links it into
the list of global proto-peers (bgp_protopeers).
Called when any error happens in reading data from a protopeer
or when any error is encountered in processing Open messages
from the protopeer. The function undoes any pending timers,
frees the input buffer associated with a proto-peer and unlinks
the bgpProtoPeer
data structure from the global list of
proto-peers.
The task structure associated with a peer contains pointers
to send/recv/accept etc. functions that should be used for that
peer. This function is called to reset the corresponding send
routine. Called from bgp_write_ready()
.
A protocol's flash routine initiates an immediate transference
of routing updates to a specified peer. Its new_policy
routine
is called whenever BGP reconfigures itself or in the initialization
phase. This function sets these two routines.
The complement of the previous function.
Similar routines to the above, but for the associated reinit functions.
Set send/receive buffers, and socket options on the associated task.
(Re)initialize the socket associated with the peer's task. Set
the read routine appropriately. Initialize the buffers and
other socket options using the bgp_recv_*()
functions above.
Close the socket associated with the peer's task.
function bgp_iflist_add():
Each bgpPeer
structure contains the peer's interface. For some
groups (e.g. Internal and Test), we need to add this interface
to the interface list kept in the group structure. We do this
either by creating a new bgp_ifap_list
structure, or if one
already exists, just upping the reference count. Called when
a peer reaches established state.
The opposite of the add function. Find the peer's interface
in the group list and decrement the reference count. If the count
has reached zero, unlink the bgp_ifap_list structure and
free the corresponding storage. Called when closing a peer.
Called when the system is initializing or if we have reconfigured
gated. Simply check if the specified interface is up or down.
If up, add the interface to the group list (policy
permitting), otherwise delete the interface from the group's
list (XXX).
bgp_send()
is used by the protocol processing routines to
send Open/Notify etc messages. If for some reason (e.g. the write would
block), these messages are not completely sent synchronously, they are
spooled in the task's output buffer for a later send.
bgp_write_flush()
, called by
bgp_write_ready()
, does the task of flushing out the
spooled data. It's structure is identical to bgp_send()
:
it attempts to write the entire spooled data, and returns immediately
on soft errors (e.g. EWOULDBLOCK), and fails on hard errors.
Called when we want to send a notify just before closing a peer's
connection. If the peer's outbuffer contains a full message, this
function discards it. Else, it tries to bgp_write_flush()
the message and discards the message if that fails as well. Can afford
to be that cavalier since we will be closing the connection anyway.
This routine is called from the main task loop if something can be
written to the socket. The routing first tries to flush out any
existing stuff in buffers, then informs the routing table module that
it can now write routing updates to the socket (this is so that if some
routes are queued up on a peer, they can be flushed). If at the end of
that, the output buffer is empty, we reset the write routine.
This function is called from bgp_send()
and is used to spool
an unsent message (or part thereof) into a peer's output buffer.
It allocates the buffer space, copies the unsent message and
sets the task's write routine so that the write is completed
later during execution of the task's main loop.
While the previous routine sets the task write routine to
bgp_write_ready()
after queueing up data, this set's the task
write routine without checking whether data is queued up or not.
As far as I can see, in the couple of places it is used, it
needn't be...
To avoid initiating too many connection requests all at once,
BGP maintains a 64 slot event wheel. Each slot is 4 seconds long.
A connection request for a peer is pseudo-randomly (actually this is
a function of the address of the peer structure) set to occur
some slot N on the wheel. BGP tries to schedule the request
into the least crowded slot in the interval (N, N+5).
These functions deal with setting, resetting or deleting connection
timers.
This function is invoked after a connection request to a peer
succeeds (see below for how connection requests are initiated).
This function initializes the local address of the BGP connection,
the input buffer and sets up the send and receive buffer sizes
on the TCP connection. It then sets up a timer for sending
an Open message.
This function is called when a connect attempt (by
bgp_connect_start()
) is delayed (because the socket
is non-blocking). In the task main loop, when we can write
to the socket, we try to get the remote address. If this
is available, then we know we have succeeded and we can
proceed to call bgp_peer_connected()
. Otherwise, we simply
wait to re-connect to the peer.
This is called when a connect timer expires, or when a connect
has failed and we need to retry the connection. This function
first allocates a socket with the appropriate options (non-blocking).
It then finds out what address to use to bind the socket (this
address can be specified in the config file through the config clause
or through the interface specified to talk to the particular peer
in internal or test groups). If the bind succeeds, the function
then attempts a connect, which may return immediately because of
the non-blocking nature of the socket. If so, this function
sets bgp_connect_complete()
to be called at the completion of the
connect.
This function is called when the connect timer expires. If a peer
has not already connected to us in the interim, this function
calls bgp_connect_start()
, otherwise it simply resets
the connection timer.
Traffic timer expiration monitors the state of a connection.
at every expiration of the traffic timer, we monitor the state
of the connection. The traffic timer is initially set to
holdtime if the connection isn't established, otherwise it
is set to the lesser of our holdtime and a third of the peer's.
These functions set, delete and initialize the traffic timer.
This function is called when a traffic timer has exceeded. After
processing, it reschedules itself. The function checks to see if
a peer has been silent for more than his hold time. If so, it
closes the connection. It also checks to see if we have been
silent for longer than a third of our hold time, and sends a
keepalive if so.
To dampen route fluctuations, BGP may be configured to hold
down route advertisements to external peers for a period of time.
This one-shot timer used to send peer updates after the specified
hold-down period. These functions set and delete the timer.
To dampen route fluctuations, BGP may be configured to hold
down route advertisements to non-external peers for a period of time.
This one-shot timer used to send peer updates after the specified
hold-down period. These functions set and delete the timer.
Called when a connection request is made on the BGP well-known
socket. The function performs the accept, creates a task structure
for the remote end, then initializes the local address and the
task structure routines, creates a protoPeer structure to serve
until we get an open from the peer. Finally, it sets a timer on the
protoPeer.
Set up a the well-known BGP listen socket (bound to the BGP listen
task) to listen for incoming connections. This is called at
the expiration of bgp_listen_timer
.
Allocate the bgp_listen_task
and set the
bgp_listen_timer
to go off after a small timeout.
Close the BGP listen task.
A peer belonging to a gated external BGP group or internal
(i.e. one peering with members on an SMDS type network) BGP
group requires an interface pointer. Any other peer belonging
to a non-test group can use an interface pointer (in the absence
of other information about which interface routes are carried
by the IGP). This function, called when peers are initialized,
initializes the interface pointers. We simply find the interface
with the same address as the peer's (or the next hop towards the peer)
and then set the peer's interface to that.
Called when transiting from a protoPeer
to a
bgpPeer
structure. Given the addresses of the endpoints
of a connection to a peer, the AS numbers of the endpoints, and some
authentication information, find the group to which the peer should
belong. Since peers can either be specified explicitly or through the
"allow" clause, check both whether an explicit peer match is found on
the specified address. Otherwise, return the group whose allow clause
entertains the specified peer.
Given a remote address, find the peer with a matching address.
Used in conjunction with the find_group()
function.
Needed when an error message is received on a protopeer and we
don't know anything about the peer's AS and authentication information.
Logic is similar to bgp_find_group()
.
function bgp_pp_timeout():
Called when a proto-peer times out (that is, no open has been
received for the timeout period since the peer connected).
Delete the proto-peer structure.
Called either from the parser to create a configured group, or
when an open is received on a proto-peer for a peer that matches
an allow clause. Allocate a peer structure, copy the group config
information and initialize the gateway entry.
Called when an open in received on a proto-peer for a peer that
matches an allow clause. Unlike configured peers, we need to
be able to fill in the appropriate fields of the bgpPeer structure
from different places. Create a peer structure, initialize its
address and policy information, steal the socket from the protoPeer,
delete the protoPeer structure and we are well on our way.
Called when a proto-peer is unconfigured (i.e.
through the allow clause). We are called when the peer turns out
to be configured. We delete the peer's connect timer, steal the socket
from the proto-peer, check if our local interface address is OK,
do a state transition and set a traffic timer to monitor the
status of the connection.
In gated, a BGP connection can be configured to be passive (i.e. the
module does not initiate a connection, but waits instead for the
peer to do so). For non-passive peers, if no start interval is
specified, we attempt to connect right away *and* set a timer for
connecting later on. Otherwise, we set a connect timer.
Called when a fatal error is observed in a peer connection (e.g.
timeouts occur or open didn't succeed and so on).
Causes the session to transition to an Idle state and releases
all associated resources (timers, buffers etc.). In some cases,
the peer might need to be restarted (e.g. if it is a configured peer,
which hasn't been explicitly deleted).
Called when a keep-alive is received on a session in OpenConfirm
state. Mainly send out our initial routes to the session's peer.
Process an interface change detected during a change in configuration
or during startup. The comments preceding this function are
descriptive enough.
Called when gated receives a SIGTERM. Clean up all the tasks
associated with the each peer in each group (be careful to scan
the peer's repeatedly from the beginning since the peer's position
in group list may change when the peer is closed). Stop listening
on the socket.
Called from bgp_init()
. Simply create a task associated
with the group.
During configuration, gated first creates a skeletal peer structure
for each peer. Here, called from bgp_peer_init()
, we create a task
for the peer, set the peer interface, and initialize the peer into
the idled state.
This is called, when, after a reconfiguration, the peer is no
longer configured so we want to completely remove resources
allocated to him (as opposed to just close()-ing him). Set the
peer's delete flag and then call bgp_peer_close()
. This ensures
that the peer's task and associated resources are removed.
The group analog of the above. Delete each peer in the group,
delete the task associated with the group and any other
resources allocated.
Called before the configuration file is re-read. Run through
all groups and all peers marking them for deletion. Free up
interface lists, policy lists as well.
Allocate group and peer structure when the corresponding
statements are encountered in the parser.
Given a group structure do a number of sanity checks to see
if the different kinds of groups have the right kinds of
configured information. If the group doesn't already exist
(e.g. because we are starting anew), then we add the group
to the global list. Otherwise, we copy the configuration options
from the new structure to the existing structure and delete
the structure we are given.
Similar to the above code in structure. The checks to be done
if a deleted peer already exists are slightly different (e.g.
we need to check if some data is already queued in the existing
peer structure and so on).
Called when gated is initializing after startup or after a
reconfiguration. This is called after the configuration file
has been parsed and the group and peer structures have already
been created. We also have to be careful to see if in our current
incarnation, we have been configured to turn BGP off. Clean up
any deleted or unconfigured peers from a previous incarnation.
Then start up tasks for each peer and each group that our
current incarnation is configured with.