This is gated-2.0-impl.txt in view mode; [Download] [Up]
Overview
Gated provides a single-threaded event driven enviornment for
implementing routing protocols. Events are generated by sockets being
ready for read, write or exceptions, interval timer expiration and
signals requesting re-configuration and shutdown. Threads are
non-interruptable and therefore must take steps to avoid excessive
processing time when possible.
Implementing ISO in gated
Most of the support routines (tracing, tasks and timers) are
relatively protocol independent. The major areas needing changes are
the interface code, routing table and the parser.
Socket addresses are already supported as a typedef sockaddr_un which
is a union of all relevent address family socket types. The tracing
code prints sockaddr's by pointer selecting on protocol family.
The interface code must be updated to obtain the necessary information
about ISO interfaces from the kernel. The interface support routines
will need to be updated to locate interfaces with in the ISO address
family and the if_rt* routines will need to be updated to support the
ISO address family.
The routing table is currently set up as a hash table. [I'd like to
convert to Patricia, but like everyone else I'm waiting for Van.] I
don't know enough about ISO addresses to know if this method can be
extended to ISO or not. Various AF_INET dependencies need to be
generalized to support AF_ISO at various places in the ISO code. The
actual ISO routing table should probably be seperate from the AF_INET
one.
The parser will be the most complicated. The parser will need to be
updated to recognize ISO addresses. Various support code in the
parser will have to be expanded to support additional address
families. The current protocols will need to be updated to reject
non-AF_INET addresses. The route control code will need to be updated
to prevent missing of address families.
Tracing
All logging and tracing is done by two routines. Tracing is usually
done to a file, but may be done to stdout depending on how gated is
started. Stdout and stderr are normally closed so printf() should not
be used.
Tracing is currently controlled by a global flag set from the
configuration file specifying the levels of tracing. [This will
hopefully be re-written to provide task specific tracing flags in the
future, allowing tracing of packets from one peer and not the others,
for example]
tracef(fmt, args....)
tracef(level, priority, fmt, args...)
Tracing is done on a line-by-line basis, newlines should not be
included in the config file because of the use of timestamps.
Timestamps may be disabled for an individual line by including
TR_NOSTAMP in the level.
Both trace() and tracef() use the fmt and args format of print() with
the following additions:
%A Expects a pointer to a sockaddr and formats the
address for the specified family. Currently only
AF_INET is defined. If the # modifier is specified,
the port number is appended to the address.
%m Inserts the text associatted with the current value of
errno.
%T Prints the passed time_t value time in hh:mm:ss
format. Currently does not support the date or a
number of days.
The trace buffer is filled by tracef() calls. A trace() call fills
the buffer and specifies the disposition. If any of the logging level
flags specified on the trace() call match those specified in the
configuration file, the line is logged to the trace file. If the
specified priority is non-zero, the message is also syslogged with the
specified priority. The buffer is then cleared. If trace() is called
with a fmt of NULL, no data is appended before logging.
The logging file is specified on the command line or in the
configuration file.
Tasks
A TASK is generally associated with a socket, but may be around just
to co-ordinate timers, or for cleanup for reparsing.
The fields task_socket and task_timer should not be modified directly.
Task routines are called with:
task_routine(task *tp)
A is created by first allocating task structure by calling task_alloc:
task *task_alloc(char *name);
The applicable fields should then be filled in followed by a call to
task_create:
int task_create(task *tp, int maxpacket)
Task_create returns TRUE or FALSE, if FALSE, an immediate quit(EINVAL)
is in order. Maxpacket is the specification of the largest packet
size to be received. This allows a common receive buffer to be shared
amoung all protocols.
A task is deleted by calling task_delete which will delete all timers
associated with this task, close the socket and finally delete the
task:
void task_delete(task *tp);
If a socket has been opened before a task has been created,
task_socket should be set before calling task_create. If it is opened
after task creation, task_set_socket() should be used it indicate the
association. Existing sockets should be disassociated with task first
by calling task_reset_socket() after closing the socket.
void task_set_socket(task *tp, int socket);
void task_reset_socket(task *tp);
When printing the task name, task_name() should be used which appends
the address (if non-zero) and port number to the task name. It
returns the pointer to a static string.
char *task_name(task *tp);
Tasks may be deleted and allocated at any time. They are kept in
random order on a doubly linked list. Task to socket mapping is
stored to allow quick access to a task if a select succeeds on it's
socket.
The task structure has the form:
struct _task {
struct _task *task_forw;
struct _task *task_back;
char *task_name;
flag_t task_flags;
int task_proto;
int task_socket;
proto_t task_rtproto;
u_long task_rtrevision;
void (*task_recv) ();
void (*task_accept) ();
void (*task_write) ();
void (*task_connect) ();
void (*task_except) ();
void (*task_terminate) ();
void (*task_flash) ();
void (*task_cleanup) ();
void (*task_reinit) ();
void (*task_ifchange) ();
sockaddr_un task_addr;
caddr_t task_data;
struct _timer *task_timer[TASK_TIMERS];
};
typedef struct _task task;
#define TASKF_ACCEPT 0x01 /* This socket is waiting for accepts, not reads */
#define TASKF_CONNECT 0x02 /* This socket is waiting for connects, not writes */
#define TASKF_IPHEADER 0x04 /* Received packets have IP header to be received */
Field descriptions:
task_name is a pointer to a static character
string specifying the printable name
of this task.
task_flags Are task specific flags.
TASKF_ACCEPT indicates that a
socket is waiting on
an accept instead of a
read and so
task_accept() should
be called instead of
task_read().
TASKF_CONNECT indicates that a
socket is waiting on a
connect instead of a
write so task_accept()
should be called
instead of
task_write().
TASKF_IPHEADER Indicates that packets
read from this
connection have an IP
header which should be
stored in the first
element of the iovec.
The rest of the data
packet is stored in
the second element of
the iovec.
task_proto The IP protocol being used for this
socket if it is directly on the IP
layer. Mainly for human consumption.
task_socket The fd assigned to this socket, should
be -1 if no socket is open and reset
to -1 if the socket is closed.
task_rtproto The routing table protocol being used
by this task. Specified here to avoid
use of a constant and also for human
comsumption.
task_rtrevision The routing table revision that has
been flashed to this
protocol/neighbor/interface ... Used
to insure propagation of changed routes.
task_recv Routine to call when there is data to
be read on this socket.
task_accept Routine to call when there is an
incoming connection on this socket.
task_write Routine to call when this socket is
ready to accept more data.
task_connect Routine to call when a connect has
completed on this socket.
task_except Routine to call when there is an
exception pending on this socket.
task_terminate Routine to call when a SIGTERM has
been received and gated has commenced
a graceful shutdown. This routine
does not have to terminate a task, but
should initiate the state change that
will lead to an eventual shutdown.
The default value is task_delete().
task_flash Routine to call when changes have been
make to the routing table and flash
updates should be generated. Does not
have to do the flash update, but may
schedule it at a future date by
creating a timer.
task_cleanup Routine called when a SIGHUP has been
received and gated is about to re-read
it's configuration file.
Policy lists owned by this task should
be freed and steps should be taken to
allow later determination if this task
has been removed from the config file
and should be terminated.
task_reinit Routine called after the configuration
file has been re-read. If this task
is no longer in the config file, it
should be terminated.
task_ifchange Called when an interface status has
changed. The second argument is a
pointer to the if_entry contol block.
task_addr The address family, address and port
selector for this task if applicable.
task_data Task specific data which should be
cast to a (caddr_t).
task_timer[TASK_TIMERS] Array of pointers to timers owned by
this task. This allows timers to be
referenced by a define and deletion of
all timers when a task is deleted.
Timers
Timers may be create, deleted, reset and cleared at any time. A timer
causes a routine to be run at the specified time. Provisions are
available to compensate for system load and processing time. Timer
resolution is in seconds.
Timers by default refire every timer_interval seconds, with the
re-fire specified to occur timer_interval seconds from the last time
the timer was supposed to fire. If the system is loaded and a timer
is late more two intervals, the timer only fires once.
The TIMERF_ABSOLUTE flag causes the timer to fire timer_interval
seconds from when it last fired, regardless of system load or
processing time.
Timers automatically repeat unless TIMERF_DELETE is specified.
TIMERF_DELETE creates a one-shot timer which is deleted after it
fires.
Timers are kept in two queues, the active queue and the inactive
queue. The inactive queue is kept in random order, the active queue
is kept in time order so the complete timer queue does not have to be
scanned when the interval timer fires.
Timer fields should not be modified directly, except for timer_job and
timer_flags.
The timer control block contains the following fields:
struct _timer {
struct _timer *timer_forw;
struct _timer *timer_back;
char *timer_name;
flag_t timer_flags;
time_t timer_next_time;
time_t timer_last_time;
time_t timer_interval;
void (*timer_job) ();
task *timer_task;
int timer_index;
};
typedef struct _timer timer;
timer_name A printable name for this timer for
human consumption.
timer_flags TASKF_ABSOLUTE specifies that this
timer should fire the
specified interval
from when it was set,
notfrom when it last
fired.
TIMERF_DELETE specifies that this
timer should be
deleted as soon as it
is finished. This
flag may be set at any
time.
timer_next_time The Unix format timestamp indicating
when this timer is scheduled to fire
again.
timer_last_time The Unix format timestamp indicating
when this timer last fired.
timer_interval The Unix format time interval of this
timer. If TIMERF_DELETE is not specified
timer_job The routine to be called when a timer
fires. It is called as:
void timer_job(timer *tip,
time_t interval);
timer_task Pointer to the task associated with
this timer.
timer_index The index into task_timer[] which
points to this timer.
Routines:
timer *timer_create(task *tp,
int index,
char *name,
flag_t flags,
time_t interval,
void (*job) ());
Creates a timer. If no task is specified, task should
be (task *) 0. If this timer is initially inactive,
the interval should be specified as (time_t) 0.
void timer_delete(timer *tip);
Deletes the timer. If a task is associated with this
timer, it's pointer to this timer is cleared.
void timer_reset(timer *tip);
This timer is reset and put in the inactive queue.
void timer_set(timer *tip, timer_t interval);
Sets the timer to fire interval seconds from now.
void timer_interval(timer *tip, timer_t interval);
Sets the timer to fire interval seconds from when it
last fired.
char *timer_name(timer *tip);
Returns a pointer a static area containing the task
name followed by the timer name.
Interfaces
A structure is maintained for each address on each interface (BSD 4.4
allows multiple addresses per interface). All references to interface
addresses are resolved to pointers to interface structures at
configuration time.
At initialization gated finds all active interfaces and creates
interface structures for them. Interfaces which have not been
configured are currently ignored. Every minute these interfaces are
checked for a change in status, but new interfaces are not detected.
It is my intention to eventually scan for new interfaces.
Interfaces are checked for failure not noticed by a change in the
IFF_UP flag by routing packets addressed by to myself on P2P lines and
monitoring for the reception of routing packets. If no packets are
received, the routes to an interface will time out and be deleted.
This can be disabled on a per-interface basis.
typedef struct _if_entry {
struct _if_entry *int_next;
sockaddr_un int_addr;
union {
sockaddr_un _intu_broadaddr;
sockaddr_un _intu_dstaddr;
} _int_intu;
#define int_broadaddr _int_intu._intu_broadaddr
#define int_dstaddr _int_intu._intu_dstaddr
sockaddr_un int_net;
sockaddr_un int_netmask;
sockaddr_un int_subnet;
sockaddr_un int_subnetmask;
int int_metric;
flag_t int_state;
int int_ipackets;
int int_opackets;
char *int_name;
u_short int_transitions;
int int_index;
pref_t int_preference;
} if_entry;
int_addr Is the address assigned to this interface.
Address family is contained in the sockaddr.
int_broadaddr The broadcast address of this interface if
appropriate.
int_dstaddr The destination address of this interface if
appropriate.
int_net The natural net of this interface.
int_netmask The natural netmask of this interface.
int_subnet The subnet specified on this interface.
Same as int_net if subnetting is not used.
int_subnetmask The subnet mask specified on this interface.
Same as int_netmask if subnetting is not used.
int_metric The configured (ifconfig) or gated specified
metric for this interface.
int_state Flags for this interface.
int_ipackets Not used, I just now realized it existed.
int_opackets Not used, I just now realized it existed.
int_name The kernel's name for this interface.
int_transitions Number of up->down transitions of this
interface.
int_index The order this interface appears in the kernel
file.
int_preference The preference to be used for the route to
this interface.
Flags:
IFS_UP
IFS_BROADCAST
IFS_POINTOPOINT
IFS_REMOTE
IFS_LOOPBACK
IFS_INTERFACE
Set from the kernel's IFF_ flags with the name name.
IFF_LOOPBACK is emulated on 4.2 systems.
IFS_SUBNET
This interface has specified a non-natural subnet
mask.
IFS_NOAGE
Routing packets should not be used to determine the
status of this interface.
IFS_NORIPOUT
IFS_NORIPIN
IFS_NOHELLOOUT
IFS_NOHELLOIN
IFS_NOICMPIN
Global disabling of routing protocols on an interface
basis. Sort of gross to put them here, but a protocol
specific control block is too much work at the moment.
IFS_METRICSET
The value int_metric was set in the gated config file.
This does not cause the kernel's idea of the metric to
be updated.
IFS_MULTICAST
This interface supports IP multicasting.
Routines:
if_entry *if_withdst(sockaddr_un *dstaddr);
Returns a pointer to the interface structure of the
interface with the given address. Note that
POINTOPOINT interfaces are always refered to by their
destination address.
if_entry *if_withaddr(sockaddr_un *dstaddr);
Returns a pointer to the interface structure of the
interface on the given directly attached network.
if_entry *if_withname(char *name);
Returns a pointer to the interface structure of the
interface with the given name. Note that in BSD 4.4
an interface can have multiple addresses, so this
won't work.
u_long if_subnetmask(struct in_addr addr);
Returns the IP subnet mask of a given address if there
is an interface to a subnet of that network. Will
need updating for BSD 4.4 where subnet masks are variable.
Routing table
A destination is a host or network route and associated mask.
The routing table allows multiple routes per destination as well as a
provision for protocol dependent data (BGP currently uses it for
maintaining an AS path and HELLO uses it for a sliding window of
metrics).
The one of multiple next hops used is determined by preference. A
default preference is specified for each routing protocol and is
overridable down to the destination and source level. A tie is
resolved by using the next hop with the lower address in the interest
of being deterministic.
Routes in the routing table are aged unless the RTS_NOAGE flag is
specified. When rt_timer reaches rt_timer_max, the route is put in
holddown, rt_timer is reset and rt_timer_max is set to RT_T_HOLDDOWN.
When a HOLDDOWN expires, the route is deleted.
Routes added with the RTS_NOAGE flag should be deleted with
rt_delete(). Routes added without the RTS_NOAGE flag should use
rt_unreach() to delete a route, which puts it into holddown for
RT_T_HOLDDOWN seconds.
Modifications to the routing table are started by opening the table
with rt_open() and finished with rt_close(). Attempts to change the
table when it is not open result in a fatal error.
The propagation of routes to the various protocols is controlled by
the revision number. When the table is opened, the global revision
number is incremented. If no changes have been made when it is
closed, the number is decremented. Routes that are changes get their
rt_revision set to the global value. Each task that modifies the
routing table has it's own revision. When changes are made and
task_flash() is called to cause the flash update tasks to be executed,
a protocol can determine changes by comparing it's revision number
with that of the route in question. When finished processing a flash
update, a protocol sets it's task_rtrevision to the current global
value.
The existing protocols are not good examples of the work required to
update the routing table with SPF algorithms. The suggested method is
to run the Dykstra (sp) algorithm and generate at most RT_N_MULTIPATH
mutlipath routes. The rt_add() and rt_change() routines will be
modified to receive a pointer to a list of next hop gateways
terminated by an null-entry. [RT_N_MULTIPATH will always be one on
Unix, but ports of gated to routers will require support of multiple
next hops.] Differing routes, such as non-multipath routes to the
same external network should be added as seperate routes.
Each route added has a pointer to a gw_entry for the gateway this
route was learned from. For ICMP, RIP and HELLO these gw_entry
control blocks are allocated dynamically each time a gateway is
learned from. For BGP and EGP these control blocks are part of the
peer structure and are used to delete all routes to a particular
gateway when the connection is broken. For SPF routes this could be
one common gw_entry, or could be the address of the link originating
this route. The first would make it easy to delete all SPF routes if
the protocol is disabled at run-time, the second would make it easy to
delete routes when link goes down.
struct _rt_entry {
...
#define rt_dest rt_head->rth_dest
#define rt_dest_mask rt_head->rth_dest_mask
#define rt_parent rt_head->rth_parent
sockaddr_un rt_router;
if_entry *rt_ifp;
gw_entry *rt_sourcegw;
task *rt_task;
time_t rt_timer;
time_t rt_timer_max;
metric_t rt_metric;
flag_t rt_state;
proto_t rt_proto;
pref_t rt_preference;
u_long rt_revision;
as_t rt_as;
flag_t rt_flags;
rt_data *rt_data;
};
rt_dest Is a sockaddr_un specifying the address family
and destination for this route.
rt_dest_mask Is a sockaddr_un specifying the mask for this
destination.
rt_parent Is the parent of this route, i.e. a route that
has a smaller netmask.
rt_router Is the next hop gateway for this route.
rt_ifp Is a pointer to the interface used to reach
the next hop.
rt_sourcegw Is the address of the source_gw for this
route. For SPF algorithms a single global
gw_entry should be used.
rt_task Pointer to the task that installed this route.
rt_timer The age of this route in seconds.
rt_timer_max The maximum age of this route.
rt_metric The metric for this route. This is not
translated between protocols.
rt_state Flags for this route.
rt_proto The protocol this route was learned from.
rt_preference The preference for this route.
rt_revision The revision of the routing table at which
this route was modified.
rt_as The AS of this route. Zero is allowed if no
exterior protocols are in use.
rt_flags Emulation of kernel flags for this route.
rt_data Pointer to protocol specific data block
(rtd_data).
Routines:
rt_open Obtain update permission on the routing table
rt_close Release control of the routing table
rt_add Adds a route to the routing table.
rt_change Change the next-hop, metric or preference of
the route.
rt_unreach Put the specified route into holddown.
rt_delete Delete the specified route.
rt_refresh Indicate that this route has been heard again
from the same gateway with the same metric and
it's age should be reset to zero.
rt_gwunreach The specified gateway has become unreachable.
An rt_delete will be issued to all routes
installed by this gateway.
rt_locate Locate a route given flags, destination and
protocol.
rt_locate_gw Locate a route given flags, destination,
protcol and gw_entry.
Route specific data:
The rt_data pointer in the rt_entry structure allows the
manipulation of protocol specific data for each route.
Rt_data should point to the following structure:
/* Prefix of protocol independent data */
typedef struct _rt_data {
struct _rt_data *rtd_forw;
struct _rt_data *rtd_back;
int rtd_refcount;
u_int rtd_length;
void (*rtd_dump)();
caddr_t rtd_data;
} rt_data;
Where rtd_data protocol-specific data area.
There are basically two types of route-specific data, data
that is unique for each route and data that can be common to a
group of routes. The HELLO protocol is an example of the use
of unique data and the BGP AS paths (attributes in version 2)
are an example of shared data.
Unique data should be allocated with rtd_alloc() and is
automatically freed when a route is deleted.
Shared data should be allocated with rtd_locate() which will
return a pointer to a new data area containing the desired
data, or a pointer to an existing data area. Each reference
is counted, this count is decremented each time a route is
deleted and the area is freed when the last reference is
deleted.
Shared data requires the protocol set up a queue head pointer
for maintenance of the list of protocol-specific data. The
RTDATA_LIST and RTDATA_LIST_END macros are available to scan
this list.
Shared data can also be manipulated with rtd_alloc() and
rtd_insert() which allow manipulation of rt_data structures
instead of pointers and length of data areas.
AF_INET support routines:
gd_inet_makeaddr() Build an address from a network
number, a host number and a flag
indicating if subnets should be
considered when taking the network
part of the network number supplied.
gd_inet_netof() Return the subnet of a sockaddr_in.
gd_inet_wholenetof() Return the natural net of a
sockaddr_in.
gd_inet_class() Return the class (A = 1, B = 2, C = 3)
of the first byte of a network number
(used mainly by EGP).
gd_inet_checkhost() Mostly used by RIP. Verifies that a
network is class A, B or C, that
sin_port is zero and that the reserved
fields are zero.
gd_inet_hash() Calculates the routing table hash
value for a sockaddr_in.
gd_inet_cksum() Calculates the Internet checksum given
an iovec.
inet_ntoa() Returns a pointer to a static string
containing the ASCII representation of
the IP network number. Don't use
this, use the %A format of trace and
*printf().
Other routines
quit() Terminate gated. Passed an errno
value which is logged.
Implementing a protocol
The following sections of need to be updated to add a new protocol:
defs.c:
#define PROTO_protocol
if.h define protocol specific interface flags. Also add to IFF_KEEPMASK
if.c add above flags to if_flag_bits structure.
main.c:
include "protocol.h"
main():
Code to call protocol initialization routine.
nmi.c:
Add code to return the correct value for ipRouteProto.
parse.c
Add keywords to keywords table.
Add code to parse_metric_check.
parse.h
Add metric limits and other value limits.
parser.y
Add code to parse protocol-specific configuration information
as well as updating the propagation restrictions for this
protocol.
rt_table.h
Define RTPROTO_protocol and RTPREF_protocol
rt_table.c
Define printable versions of above.
snmp.c:
Add code to return the correct value for ipRouteProto.
task.c:
task_reinit():
Code to call protocol init routine after reparse.
trace.h:
Define TR_protocol and optionally IF_protocolUPD
trace.c:
Define text values for above and specify command-line flags
for enabling tracing of this protocol.
Call protocol_dump().These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.