ftp.nice.ch/pub/next/unix/network/system/gated.2.1pl2.NI.bs.tar.gz#/gated-2.1/man/gated-2.0-impl.txt

This is gated-2.0-impl.txt in view mode; [Download] [Up]

			Overview

Gated provides a single-threaded event driven enviornment for
implementing routing protocols.  Events are generated by sockets being
ready for read, write or exceptions, interval timer expiration and
signals requesting re-configuration and shutdown. Threads are
non-interruptable and therefore must take steps to avoid excessive
processing time when possible.


		Implementing ISO in gated

Most of the support routines (tracing, tasks and timers) are
relatively protocol independent.  The major areas needing changes are
the interface code, routing table and the parser.

Socket addresses are already supported as a typedef sockaddr_un which
is a union of all relevent address family socket types.  The tracing
code prints sockaddr's by pointer selecting on protocol family.

The interface code must be updated to obtain the necessary information
about ISO interfaces from the kernel.  The interface support routines
will need to be updated to locate interfaces with in the ISO address
family and the if_rt* routines will need to be updated to support the
ISO address family.

The routing table is currently set up as a hash table.  [I'd like to
convert to Patricia, but like everyone else I'm waiting for Van.]  I
don't know enough about ISO addresses to know if this method can be
extended to ISO or not.  Various AF_INET dependencies need to be
generalized to support AF_ISO at various places in the ISO code.  The
actual ISO routing table should probably be seperate from the AF_INET
one.

The parser will be the most complicated.  The parser will need to be
updated to recognize ISO addresses.  Various support code in the
parser will have to be expanded to support additional address
families.  The current protocols will need to be updated to reject
non-AF_INET addresses.  The route control code will need to be updated
to prevent missing of address families.


			Tracing

All logging and tracing is done by two routines.  Tracing is usually
done to a file, but may be done to stdout depending on how gated is
started.  Stdout and stderr are normally closed so printf() should not
be used.

Tracing is currently controlled by a global flag set from the
configuration file specifying the levels of tracing.  [This will
hopefully be re-written to provide task specific tracing flags in the
future, allowing tracing of packets from one peer and not the others,
for example]

	tracef(fmt, args....)

	tracef(level, priority, fmt, args...)

Tracing is done on a line-by-line basis, newlines should not be
included in the config file because of the use of timestamps.
Timestamps may be disabled for an individual line by including
TR_NOSTAMP in the level.

Both trace() and tracef() use the fmt and args format of print() with
the following additions:

	%A	Expects a pointer to a sockaddr and formats the
		address for the specified family.  Currently only
		AF_INET is defined.  If the # modifier is specified,
		the port number is appended to the address.

	%m	Inserts the text associatted with the current value of
		errno.

	%T	Prints the passed time_t value time in hh:mm:ss
		format.  Currently does not support the date or a
		number of days.

The trace buffer is filled by tracef() calls.  A trace() call fills
the buffer and specifies the disposition.  If any of the logging level
flags specified on the trace() call match those specified in the
configuration file, the line is logged to the trace file.  If the
specified priority is non-zero, the message is also syslogged with the
specified priority.  The buffer is then cleared.  If trace() is called
with a fmt of NULL, no data is appended before logging.

The logging file is specified on the command line or in the
configuration file.


			Tasks


A TASK is generally associated with a socket, but may be around just
to co-ordinate timers, or for cleanup for reparsing.

The fields task_socket and task_timer should not be modified directly.

Task routines are called with:

	task_routine(task *tp)

A is created by first allocating task structure by calling task_alloc:

	task *task_alloc(char *name);

The applicable fields should then be filled in followed by a call to
task_create:

	int task_create(task *tp, int maxpacket)

Task_create returns TRUE or FALSE, if FALSE, an immediate quit(EINVAL)
is in order.  Maxpacket is the specification of the largest packet
size to be received.  This allows a common receive buffer to be shared
amoung all protocols.



A task is deleted by calling task_delete which will delete all timers
associated with this task, close the socket and finally delete the
task:

	void task_delete(task *tp);


If a socket has been opened before a task has been created,
task_socket should be set before calling task_create.  If it is opened
after task creation, task_set_socket() should be used it indicate the
association.  Existing sockets should be disassociated with task first
by calling task_reset_socket() after closing the socket.

	void task_set_socket(task *tp, int socket);

	void task_reset_socket(task *tp);


When printing the task name, task_name() should be used which appends
the address (if non-zero) and port number to the task name.  It
returns the pointer to a static string.

	char *task_name(task *tp);


Tasks may be deleted and allocated at any time.  They are kept in
random order on a doubly linked list.  Task to socket mapping is
stored to allow quick access to a task if a select succeeds on it's
socket.


The task structure has the form:

struct _task {
    struct _task *task_forw;
    struct _task *task_back;
    char   *task_name;
    flag_t  task_flags;
    int     task_proto;
    int     task_socket;
    proto_t task_rtproto;
    u_long  task_rtrevision;
    void    (*task_recv) ();
    void    (*task_accept) ();
    void    (*task_write) ();
    void    (*task_connect) ();
    void    (*task_except) ();
    void    (*task_terminate) ();
    void    (*task_flash) ();
    void    (*task_cleanup) ();
    void    (*task_reinit) ();
    void    (*task_ifchange) ();
    sockaddr_un task_addr;
    caddr_t task_data;
    struct _timer *task_timer[TASK_TIMERS];
};
typedef struct _task task;

#define	TASKF_ACCEPT		0x01   /* This socket is waiting for accepts, not reads */
#define	TASKF_CONNECT		0x02   /* This socket is waiting for connects, not writes */
#define	TASKF_IPHEADER		0x04   /* Received packets have IP header to be received */


Field descriptions:

	task_name		is a pointer to a static character
				string specifying the printable name
				of this task.

	task_flags		Are task specific flags.

				TASKF_ACCEPT	indicates that a
						socket is waiting on
						an accept instead of a
						read and so
						task_accept() should
						be called instead of
						task_read().

				TASKF_CONNECT	indicates that a
						socket is waiting on a
						connect instead of a
						write so task_accept()
						should be called
						instead of
						task_write().

				TASKF_IPHEADER	Indicates that packets
						read from this
						connection have an IP
						header which should be
						stored in the first
						element of the iovec.
						The rest of the data
						packet is stored in
						the second element of
						the iovec.

	task_proto		The IP protocol being used for this
				socket if it is directly on the IP
				layer.  Mainly for human consumption.

	task_socket		The fd assigned to this socket, should
				be -1 if no socket is open and reset
				to -1 if the socket is closed.

	task_rtproto		The routing table protocol being used
				by this task.  Specified here to avoid
				use of a constant and also for human
				comsumption.

	task_rtrevision		The routing table revision that has
				been flashed to this
				protocol/neighbor/interface ...  Used
				to insure propagation of changed routes.

	task_recv		Routine to call when there is data to
				be read on this socket.

	task_accept		Routine to call when there is an
				incoming connection on this socket.

    	task_write		Routine to call when this socket is
				ready to accept more data.

    	task_connect		Routine to call when a connect has
				completed on this socket.

	task_except		Routine to call when there is an
				exception pending on this socket.

	task_terminate		Routine to call when a SIGTERM has
				been received and gated has commenced
				a graceful shutdown.  This routine
				does not have to terminate a task, but
				should initiate the state change that
				will lead to an eventual shutdown.
				The default value is task_delete().
				
	task_flash		Routine to call when changes have been
				make to the routing table and flash
				updates should be generated.  Does not
				have to do the flash update, but may
				schedule it at a future date by
				creating a timer.
				
	task_cleanup		Routine called when a SIGHUP has been
				received and gated is about to re-read
				it's configuration file.

				Policy lists owned by this task should
				be freed and steps should be taken to
				allow later determination if this task
				has been removed from the config file
				and should be terminated.

	task_reinit		Routine called after the configuration
				file has been re-read.  If this task
				is no longer in the config file, it
				should be terminated.

	task_ifchange		Called when an interface status has
				changed. The second argument is a
				pointer to the if_entry contol block.
				
	task_addr		The address family, address and port
				selector for this task if applicable.

	task_data		Task specific data which should be
				cast to a (caddr_t).

	task_timer[TASK_TIMERS]	Array of pointers to timers owned by
				this task.  This allows timers to be
				referenced by a define and deletion of
				all timers when a task is deleted.




			Timers


Timers may be create, deleted, reset and cleared at any time.  A timer
causes a routine to be run at the specified time.  Provisions are
available to compensate for system load and processing time.  Timer
resolution is in seconds.

Timers by default refire every timer_interval seconds, with the
re-fire specified to occur timer_interval seconds from the last time
the timer was supposed to fire.  If the system is loaded and a timer
is late more two intervals, the timer only fires once.

The TIMERF_ABSOLUTE flag causes the timer to fire timer_interval
seconds from when it last fired, regardless of system load or
processing time.

Timers automatically repeat unless TIMERF_DELETE is specified.
TIMERF_DELETE creates a one-shot timer which is deleted after it
fires.

Timers are kept in two queues, the active queue and the inactive
queue.  The inactive queue is kept in random order, the active queue
is kept in time order so the complete timer queue does not have to be
scanned when the interval timer fires.

Timer fields should not be modified directly, except for timer_job and
timer_flags.

The timer control block contains the following fields:

struct _timer {
    struct _timer *timer_forw;
    struct _timer *timer_back;
    char   *timer_name;
    flag_t  timer_flags;
    time_t  timer_next_time;
    time_t  timer_last_time;
    time_t  timer_interval;
    void    (*timer_job) ();
    task   *timer_task;
    int     timer_index;
};
typedef struct _timer timer;


	timer_name		A printable name for this timer for
				human consumption.

	timer_flags		TASKF_ABSOLUTE	specifies that this
						timer should fire the
						specified interval
						from when it was set,
						notfrom when it last
						fired.

				TIMERF_DELETE	specifies that this
						timer should be
						deleted as soon as it
						is finished.  This
						flag may be set at any
						time.

	timer_next_time		The Unix format timestamp indicating
				when this timer is scheduled to fire
				again.

	timer_last_time		The Unix format timestamp indicating
				when this timer last fired.

	timer_interval		The Unix format time interval of this
				timer.  If TIMERF_DELETE is not specified

	timer_job		The routine to be called when a timer
				fires.  It is called as:

					void timer_job(timer *tip,
						time_t interval);

	timer_task		Pointer to the task associated with
				this timer.

	timer_index		The index into task_timer[] which
				points to this timer.


Routines:

	timer *timer_create(task *tp,
			   int index,
			   char *name,
			   flag_t flags,
			   time_t interval,
			   void    (*job) ());

		Creates a timer.  If no task is specified, task should
		be (task *) 0.  If this timer is initially inactive,
		the interval should be specified as (time_t) 0.


	void timer_delete(timer *tip);

		Deletes the timer.  If a task is associated with this
		timer, it's pointer to this timer is cleared.

	void timer_reset(timer *tip);

		This timer is reset and put in the inactive queue.

	void timer_set(timer *tip, timer_t interval);

		Sets the timer to fire interval seconds from now.

	void timer_interval(timer *tip, timer_t interval);

		Sets the timer to fire interval seconds from when it
		last fired.

	char *timer_name(timer *tip);

		Returns a pointer a static area containing the task
		name followed by the timer name.



				Interfaces
	
A structure is maintained for each address on each interface (BSD 4.4
allows multiple addresses per interface).  All references to interface
addresses are resolved to pointers to interface structures at
configuration time.

At initialization gated finds all active interfaces and creates
interface structures for them.  Interfaces which have not been
configured are currently ignored.  Every minute these interfaces are
checked for a change in status, but new interfaces are not detected.

It is my intention to eventually scan for new interfaces.

Interfaces are checked for failure not noticed by a change in the
IFF_UP flag by routing packets addressed by to myself on P2P lines and
monitoring for the reception of routing packets.  If no packets are
received, the routes to an interface will time out and be deleted.
This can be disabled on a per-interface basis.

typedef struct _if_entry {
    struct _if_entry *int_next;
    sockaddr_un int_addr;
    union {
	sockaddr_un	_intu_broadaddr;
	sockaddr_un	_intu_dstaddr;
    }       _int_intu;
#define	int_broadaddr	_int_intu._intu_broadaddr
#define	int_dstaddr	_int_intu._intu_dstaddr
    sockaddr_un int_net;
    sockaddr_un int_netmask;
    sockaddr_un int_subnet;
    sockaddr_un int_subnetmask;
    int     int_metric;
    flag_t  int_state;
    int     int_ipackets;
    int     int_opackets;
    char   *int_name;
    u_short int_transitions;
    int     int_index;
    pref_t  int_preference;
}       if_entry;

	int_addr	Is the address assigned to this interface.
			Address family is contained in the sockaddr.

	int_broadaddr	The broadcast address of this interface if
			appropriate.

	int_dstaddr	The destination address of this interface if
			appropriate. 

	int_net		The natural net of this interface.

	int_netmask	The natural netmask of this interface.

	int_subnet	The subnet specified on this interface.
			Same as int_net if subnetting is not used. 

	int_subnetmask	The subnet mask specified on this interface.
			Same as int_netmask if subnetting is not used.

	int_metric	The configured (ifconfig) or gated specified
			metric for this interface.

	int_state	Flags for this interface.

	int_ipackets	Not used, I just now realized it existed.
	int_opackets	Not used, I just now realized it existed.

	int_name	The kernel's name for this interface.

	int_transitions	Number of up->down transitions of this
			interface. 

	int_index	The order this interface appears in the kernel
			file. 

	int_preference	The preference to be used for the route to
			this interface.


	Flags:

	IFS_UP
	IFS_BROADCAST
	IFS_POINTOPOINT
	IFS_REMOTE
	IFS_LOOPBACK
	IFS_INTERFACE

		Set from the kernel's IFF_ flags with the name name.
		IFF_LOOPBACK is emulated on 4.2 systems.
		
	IFS_SUBNET

		This interface has specified a non-natural subnet
		mask.

	IFS_NOAGE

		Routing packets should not be used to determine the
		status of this interface.

	IFS_NORIPOUT
	IFS_NORIPIN
	IFS_NOHELLOOUT
	IFS_NOHELLOIN
	IFS_NOICMPIN

		Global disabling of routing protocols on an interface
		basis.  Sort of gross to put them here, but a protocol
		specific control block is too much work at the moment.
		
	IFS_METRICSET

		The value int_metric was set in the gated config file.
		This does not cause the kernel's idea of the metric to
		be updated.

	IFS_MULTICAST

		This interface supports IP multicasting.


Routines:


	if_entry *if_withdst(sockaddr_un *dstaddr);

		Returns a pointer to the interface structure of the
		interface with the given address.  Note that
		POINTOPOINT interfaces are always refered to by their
		destination address.
		
	if_entry *if_withaddr(sockaddr_un *dstaddr);

		Returns a pointer to the interface structure of the
		interface on the given directly attached network.
		
	if_entry *if_withname(char *name);

		Returns a pointer to the interface structure of the
		interface with the given name.  Note that in BSD 4.4
		an interface can have multiple addresses, so this
		won't work.
		
	u_long if_subnetmask(struct in_addr addr);

		Returns the IP subnet mask of a given address if there
		is an interface to a subnet of that network.  Will
		need updating for BSD 4.4 where subnet masks are variable.
		


				Routing table


A destination is a host or network route and associated mask.

The routing table allows multiple routes per destination as well as a
provision for protocol dependent data (BGP currently uses it for
maintaining an AS path and HELLO uses it for a sliding window of
metrics).

The one of multiple next hops used is determined by preference.  A
default preference is specified for each routing protocol and is
overridable down to the destination and source level.  A tie is
resolved by using the next hop with the lower address in the interest
of being deterministic.

Routes in the routing table are aged unless the RTS_NOAGE flag is
specified.  When rt_timer reaches rt_timer_max, the route is put in
holddown, rt_timer is reset and rt_timer_max is set to RT_T_HOLDDOWN.
When a HOLDDOWN expires, the route is deleted.

Routes added with the RTS_NOAGE flag should be deleted with
rt_delete().  Routes added without the RTS_NOAGE flag should use
rt_unreach() to delete a route, which puts it into holddown for
RT_T_HOLDDOWN seconds.

Modifications to the routing table are started by opening the table
with rt_open() and finished with rt_close().  Attempts to change the
table when it is not open result in a fatal error.

The propagation of routes to the various protocols is controlled by
the revision number.  When the table is opened, the global revision
number is incremented.  If no changes have been made when it is
closed, the number is decremented.  Routes that are changes get their
rt_revision set to the global value.  Each task that modifies the
routing table has it's own revision.  When changes are made and
task_flash() is called to cause the flash update tasks to be executed,
a protocol can determine changes by comparing it's revision number
with that of the route in question.  When finished processing a flash
update, a protocol sets it's task_rtrevision to the current global
value.

The existing protocols are not good examples of the work required to
update the routing table with SPF algorithms.  The suggested method is
to run the Dykstra (sp) algorithm and generate at most RT_N_MULTIPATH
mutlipath routes.  The rt_add() and rt_change() routines will be
modified to receive a pointer to a list of next hop gateways
terminated by an null-entry.  [RT_N_MULTIPATH will always be one on
Unix, but ports of gated to routers will require support of multiple
next hops.]  Differing routes, such as non-multipath routes to the
same external network should be added as seperate routes.

Each route added has a pointer to a gw_entry for the gateway this
route was learned from.  For ICMP, RIP and HELLO these gw_entry
control blocks are allocated dynamically each time a gateway is
learned from.  For BGP and EGP these control blocks are part of the
peer structure and are used to delete all routes to a particular
gateway when the connection is broken.  For SPF routes this could be
one common gw_entry, or could be the address of the link originating
this route.  The first would make it easy to delete all SPF routes if
the protocol is disabled at run-time, the second would make it easy to
delete routes when link goes down.

struct _rt_entry {
	...
#define	rt_dest		rt_head->rth_dest
#define	rt_dest_mask	rt_head->rth_dest_mask
#define rt_parent rt_head->rth_parent
    sockaddr_un	rt_router;
    if_entry *rt_ifp;
    gw_entry *rt_sourcegw;
    task   *rt_task;
    time_t  rt_timer;
    time_t  rt_timer_max;
    metric_t rt_metric;
    flag_t  rt_state;
    proto_t rt_proto;
    pref_t  rt_preference;
    u_long  rt_revision;
    as_t    rt_as;
    flag_t  rt_flags;
    rt_data *rt_data;
};


	rt_dest		Is a sockaddr_un specifying the address family
			and destination for this route.

	rt_dest_mask	Is a sockaddr_un specifying the mask for this
			destination. 

	rt_parent	Is the parent of this route, i.e. a route that
			has a smaller netmask.

	rt_router	Is the next hop gateway for this route.

	rt_ifp		Is a pointer to the interface used to reach
			the next hop.

	rt_sourcegw	Is the address of the source_gw for this
			route.  For SPF algorithms a single global
			gw_entry should be used.

	rt_task		Pointer to the task that installed this route.

	rt_timer	The age of this route in seconds.

	rt_timer_max	The maximum age of this route.

	rt_metric	The metric for this route.  This is not
			translated between protocols.

	rt_state	Flags for this route.

	rt_proto	The protocol this route was learned from.

	rt_preference	The preference for this route.

	rt_revision	The revision of the routing table at which
			this route was modified.

	rt_as		The AS of this route.  Zero is allowed if no
			exterior protocols are in use.

	rt_flags	Emulation of kernel flags for this route.

	rt_data		Pointer to protocol specific data block
			(rtd_data). 



Routines:

	rt_open		Obtain update permission on the routing table

	rt_close	Release control of the routing table

	rt_add		Adds a route to the routing table.

	rt_change	Change the next-hop, metric or preference of
			the route.

	rt_unreach	Put the specified route into holddown.

	rt_delete	Delete the specified route.

	rt_refresh	Indicate that this route has been heard again
			from the same gateway with the same metric and
			it's age should be reset to zero.

	rt_gwunreach	The specified gateway has become unreachable.
			An rt_delete will be issued to all routes
			installed by this gateway.

	rt_locate	Locate a route given flags, destination and
			protocol.

	rt_locate_gw	Locate a route given flags, destination,
			protcol and gw_entry.


	Route specific data:

	The rt_data pointer in the rt_entry structure allows the
	manipulation of protocol specific data for each route.

	Rt_data should point to the following structure:

	/* Prefix of protocol independent data */
	typedef struct _rt_data {
	    struct _rt_data *rtd_forw;	
	    struct _rt_data *rtd_back;	
	    int    rtd_refcount;	
	    u_int  rtd_length;		
	    void   (*rtd_dump)();
	    caddr_t rtd_data;
	} rt_data;

	Where rtd_data protocol-specific data area.

	There are basically two types of route-specific data, data
	that is unique for each route and data that can be common to a
	group of routes.  The HELLO protocol is an example of the use
	of unique data and the BGP AS paths (attributes in version 2)
	are an example of shared data.

	Unique data should be allocated with rtd_alloc() and is
	automatically freed when a route is deleted.

	Shared data should be allocated with rtd_locate() which will
	return a pointer to a new data area containing the desired
	data, or a pointer to an existing data area.  Each reference
	is counted, this count is decremented each time a route is
	deleted and the area is freed when the last reference is
	deleted.

	Shared data requires the protocol set up a queue head pointer
	for maintenance of the list of protocol-specific data.  The
	RTDATA_LIST and RTDATA_LIST_END macros are available to scan
	this list.

	Shared data can also be manipulated with rtd_alloc() and
	rtd_insert() which allow manipulation of rt_data structures
	instead of pointers and length of data areas.


	AF_INET support routines:

	gd_inet_makeaddr()	Build an address from a network
				number, a host number and a flag
				indicating if subnets should be
				considered when taking the network
				part of the network number supplied.

	gd_inet_netof()		Return the subnet of a sockaddr_in. 

	gd_inet_wholenetof()	Return the natural net of a
				sockaddr_in. 

	gd_inet_class()		Return the class (A = 1, B = 2, C = 3)
				of the first byte of a network number
				(used mainly by EGP).

	gd_inet_checkhost()	Mostly used by RIP.  Verifies that a
				network is class A, B or C, that
				sin_port is zero and that the reserved
				fields are zero.

	gd_inet_hash()		Calculates the routing table hash
				value for a sockaddr_in.

	gd_inet_cksum()		Calculates the Internet checksum given
				an iovec.

	inet_ntoa()		Returns a pointer to a static string
				containing the ASCII representation of
				the IP network number.  Don't use
				this, use the %A format of trace and
				*printf(). 


		Other routines

	quit()			Terminate gated.  Passed an errno
				value which is logged.

	

Implementing a protocol

	The following sections of need to be updated to add a new protocol:

	defs.c:
		#define	PROTO_protocol

	if.h	define protocol specific interface flags.  Also add to IFF_KEEPMASK

	if.c	add above flags to if_flag_bits structure.

	main.c:
		include "protocol.h"

		main():
			Code to call protocol initialization routine.

	nmi.c:
		Add code to return the correct value for ipRouteProto.

	parse.c
		Add keywords to keywords table.

		Add code to parse_metric_check.

	parse.h
		Add metric limits and other value limits.

	parser.y
		Add code to parse protocol-specific configuration information
		as well as updating the propagation restrictions for this
		protocol.

	rt_table.h
		Define RTPROTO_protocol and RTPREF_protocol

	rt_table.c
		Define printable versions of above.

	snmp.c:
		Add code to return the correct value for ipRouteProto.

	task.c:
		task_reinit():
			Code to call protocol init routine after reparse.

	trace.h:
		Define TR_protocol and optionally IF_protocolUPD

	trace.c:
		Define text values for above and specify command-line flags
		for enabling tracing of this protocol.


		Call protocol_dump().

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.