GateD provides a common substrate, which includes abstractions such as tasks, timers and jobs, for implementing these primitive operations efficiently. A task represents a separately schedulable thread of protocol computation. A timer represents an event scheduled for a future instant, which causes some non-preemptable computation to be performed at that instant. This computation is associated with a specific task. A job (which can either be foreground or background) represents some non-preemptable protocol computation that can be scheduled anytime in the future.
A more detailed description of the task module data types, macros and function definitions may be found here.
Different protocol implementations use tasks differently. The simplest way to use tasks is to allocate a single task to all computation that happens within a protocol. OSPF and RIP have a single task in whose context all protocol computation is performed. BGP, which maintains a transport connection to each individual peer, assigns a task to all protocol activity (message processing, route computation etc.) associated with the peer (or a peer group).
In the current GateD implementation, tasks do not correspond to separate UNIX processes (with the exception of the system trace dump task).
bgpPeer
struct associated with this block).
A task is usually associated with a UNIX socket; the task structure
contains the socket descriptor.
In addition, the task structure contains pointers to protocol specific routines for performing the different primitive operations that the task module supports. These primitive operations are:
For example, when it is ready to read
update messages from a peer, the BGP protocol implementation registers
a pointer to
bgp_recv_v4_update()
.
with the corresponding peer's task.
Then, when data actually arrives on a socket, the task module simply
calls this handler to perform the protocol specific read.
To notify protocols of GateD reconfiguration, tasks also maintain pointers
to protocol specific notification handlers (e.g., the protocol specific
handler to be called before reconfiguration, after reconfiguration and so on).
GateD individually prioritizes each of these operations. At any given
instant, pending timers have highest priority, followed by foreground jobs,
reads from sockets, protocol specific exception handling, writes to sockets,
and finally background jobs. GateD maintains a global list of tasks
(task_read_queue
, task_write_queue
etc.)
waiting to perform these operations; the task structure also contains
linked list pointers for placement on each of these queues. (Actually, jobs
and timers are queued in separate data structures. That's because a
task may have more than one timer or job pending).
task_alloc()
),
instantiating the various field of the task structure, and then calling
task_create()
to
insert the task structure into a global list of tasks. This insertion
takes place in priority order (tasks are assigned a static priority).
This latter function also
creates a socket for use by the task for reads and writes.
Operations on the socket associated with the task include
task_close()
(to close the socket associated with the task, and remove it from
the global read and write queues),
task_set_socket()
and
task_reset_socket()
.
Storage associated with the task data structure is removed when
task_delete()
is called.
This frees up any other resources (timers, jobs) associated with the task
and removes the task from any other lists to which it may belong.
Some task module functions are used to notify each individual task of
events of global interest. Some of these are processed in task priority
order. Thus,
task_reconfigure()
is called to force all protocol instances to re-read the configuration files.
Similarly,
task_ifachange()
and
task_iflchange()
are used to notify protocol instances of changes in the status of
one or more network interfaces.
And,
task_terminate()
is used to notify a protocol instance of GateD intention to wind up
operations (e.g. in response to operator action).
Also,
task_newpolicy()
is used to inform interested protocol modules of routing policy configuration
changes.
Finally,
task_flash()
calls each protocol instance's flash handler when any change
happens to GateD's global
route database.
Once a task has been created, individual protocol instances may schedule
reads on the task socket using
task_set_recv()
(naming inconsistency here),
schedule blocked writes to the socket using
task_set_write()
(when a protocol instance has some data to send on a socket, it attempts
to write it directly; if that blocks, it calls this function).
No protocol implementation seems to use the socket exception feature.
To actually read data from, or write data to a socket, the task
module provides
task_receive_packet()
and
task_send_packet()
respectively. While ideal for protocols that use unreliable datagrams
for sending routing protocol messages, these functions do not suffice
for protocol that use stream-based transports. BGP prefers instead
to directly access the socket.
From the operating system, task reads go directly into a global
task_recv_buffer
(GateD can do this because tasks are
non-preemptive). It is then the protocol instance's responsibility to
consume all the data in the buffer.
Similarly, in preparing to write data, protocol instances formulate
messages in a single global task_send_buffer
; if the write
blocks, it is the protocol instance's responsibility to spool the
data elsewhere.
main()
loop.
As described above, the main loop prioritizes different computations
and repeatedly loops to discover which computations have been enabled
since the last time the loop was traversed.
Basically, the main loop does a select()
on all socket descriptors
used for reading, writing, or exception handling. It falls
into the select()
when no timers are scheduled and
the job queues are empty. GateD stays in the select()
until
the next scheduled timer or 8 seconds, whichever is earlier. After coming
out of the select, it processes "high priority" timer events
(see the timer section).
If any operator actions occurred (in the form of UNIX signals), it processes
these (from
task_process_signals()
)
and causes any queued foreground jobs to run before returning to the
select()
.
Otherwise, it performs socket reads and writes (from
task_process_sockets()
), pending timers and
background jobs in that order.
Thus, a protocol's message read and write operations, as well as timers and jobs get scheduled from GateD's main loop. Only the flash routines are performed whenever (as a result of a read) GateD's routing database gets updated.
GateD allows protocols to specify higher priority timers (these have a greater probability of correct expiry compared to "normal" timers). GateD also allows specification of drifting timers: that is, depending on how the timer is initialized, the next timer expiry is calculated based on when the timer actually expired last and the interval. It is also possible to configure the timer to be non-drifting: that is, timer expiry is scheduled at exactly some integral multiple intervals after the timer was first scheduled.
Most protocols use timers for periodic tasks such as monitoring connection status, timing out connection opens and so on.
In the current GateD implementation, timer expiry is checked by polling all active timers in the main loop; operating system facilities for timer expiry notification are not used.
task_timer
. This includes timer parameters (interval,
start time) as well state describing the action to perform when
timer expiry happens.
These task_timer
data structures are queued into one
of three global queues maintained by the task module: a list of
high priority timers, a list of active timers and a list of inactive
timers (the last is used when protocols temporarily need to idle timer
expiry). The first two lists are sorted in increasing order of next
expiry, to ease processing.
High priority timers are used by some protocols like OSPF for timing their hello deliveries. BGP uses "normal" timers for all its timing needs: keepalives, connection retries and so on.
task_timer_create()
creates a timer with the specified parameters and inserts it into the
appropriate global list of timers.
Conversely,
task_timer_create()
removes a timer from its queue and frees any resources that the timer
holds.
Also,
task_timer_reset()
is used to idle the timer (i.e. move it into the inactive list).
task_timer_uset()
is used to reschedule an idle timer or a timer already in progress.
task_timer_dispatch()
and
task_timer_hiprio_dispatch()
routines.
These simply traverse the list of timers, call the appropriate handlers,
recompute the next interval at which the timer should fire and reschedule
the timer for then.
These are called from the
main loop.
For example, BGP uses a foreground job to close a peer connection if an error is observed when flushing routes to that peer. This cannot be performed correctly from the task's write routine since there may be reads queue'd up or timers scheduled for the task. Similarly, the OSPF implementation sets a timer after an SPF computation to ensure that the SPF computation is not done within that time period; at the expiry of the timer, it sets up a background task to run the SPF computation.
job_task
, contains
pointers to the task for which this job is scheduled, the routine
encompassing the computation to be performed and a single argument
to pass to that routine.
The GateD module maintains a global list of foreground jobs (ordered FIFO)
and a list of background jobs (ordered by priority). In addition, there
exists a single array of pointers to the locations in the background
queue marking ends of blocks of task_job
structs with
the same priority.
task_job_create()
to allocate a task_job
structure and insert it into the
appropriate list.
They call
task_job_delete()
to do the opposite.
Foreground jobs are automatically deleted after completion.
Background jobs must be explicitly deleted.
task_job_fg_dispatch()
to schedule computation of foreground jobs.
This runs all foreground jobs to completion.
When possible, the Gated main loop calls
task_job_bg_dispatch()
to schedule computation of foreground jobs.
Unlike the former, this function simply removes the first, highest-priority
background job from the list of background jobs, schedules its execution
and returns.
Protocol modules (and other support modules as well) allocate memory in
GateD in two distinctly different ways. For dynamic allocation of
fixed size descriptors, GateD uses the memory allocator described
below. For variable sized blocks (e.g. the BGP attribute descriptor
structure as_path
),
GateD uses malloc()
. (Actually, GateD optimizes this
latter allocation as well, using malloc()
only for the
uncommon case and allocating fixed sized blocks for the uncommon
case).
The fixed block allocation model is optimized for the case when a module
repeatedly requests the same sized block. The module first contacts
the memory manager to obtain a descriptor block (called a
task_block
, see below) which indicates the
size of blocks it will request in the future. To later allocate a
block, it calls the memory manager with this descriptor block;
the memory manager returns a pointer to the appropriately sized block.
task_block
and a task_size_block
.
The task_block
, which is the descriptor block referred to
above, contains size of each allocation request and some bookkeeping
data.
A task_size_block
is a descriptor for a block allocation
of a particular size. It contains a linked-list of
task_block
s for blocks of that size. In addition, it contains
a pointer to a free-list of blocks of that particular size.
task_block_init()
and specifying the size of allocations it expects to make with this
registration.
Macros
task_block_alloc()
and
task_block_free()
respectively allocate and free a block out of the requested size.
The memory allocation strategy is explained below.
Some protocol modules allocate data in units of pages directly.
For example, the task module allocates its send and receive buffers
this way. These modules may explicitly free these pages by calling
task_block_reclaim()
.
These pages go into a global free list, out which the memory manager
may allocate data for its block allocator.
task_block
descriptor, after linking that
descriptor to the task_size_block
descriptor for 16 byte
blocks.
The latter descriptor maintains a linked list of free 16 byte blocks.
Freed blocks are linked in a singly-linked list (even though the
pointer to a free block is cast into the head of a doubly-linked list,
only the forward pointer is used in the free list). Whenever the free
list for a task_size_block
is empty, the memory manager
allocates a page, then carves up the page into a linked list of
appropriately sized blocks (the fragment of the page that doesn't fit
into one block, called a runt, is linked into a free list
corresponding to its size).
Allocations are done out of this free list. When a protocol module frees up a block, the free-d block goes back on the free list.