Overview Gated provides a single-threaded event driven enviornment for implementing routing protocols. Events are generated by sockets being ready for read, write or exceptions, interval timer expiration and signals requesting re-configuration and shutdown. Threads are non-interruptable and therefore must take steps to avoid excessive processing time when possible. Implementing ISO in gated Most of the support routines (tracing, tasks and timers) are relatively protocol independent. The major areas needing changes are the interface code, routing table and the parser. Socket addresses are already supported as a typedef sockaddr_un which is a union of all relevent address family socket types. The tracing code prints sockaddr's by pointer selecting on protocol family. The interface code must be updated to obtain the necessary information about ISO interfaces from the kernel. The interface support routines will need to be updated to locate interfaces with in the ISO address family and the if_rt* routines will need to be updated to support the ISO address family. The routing table is currently set up as a hash table. [I'd like to convert to Patricia, but like everyone else I'm waiting for Van.] I don't know enough about ISO addresses to know if this method can be extended to ISO or not. Various AF_INET dependencies need to be generalized to support AF_ISO at various places in the ISO code. The actual ISO routing table should probably be seperate from the AF_INET one. The parser will be the most complicated. The parser will need to be updated to recognize ISO addresses. Various support code in the parser will have to be expanded to support additional address families. The current protocols will need to be updated to reject non-AF_INET addresses. The route control code will need to be updated to prevent missing of address families. Tracing All logging and tracing is done by two routines. Tracing is usually done to a file, but may be done to stdout depending on how gated is started. Stdout and stderr are normally closed so printf() should not be used. Tracing is currently controlled by a global flag set from the configuration file specifying the levels of tracing. [This will hopefully be re-written to provide task specific tracing flags in the future, allowing tracing of packets from one peer and not the others, for example] tracef(fmt, args....) tracef(level, priority, fmt, args...) Tracing is done on a line-by-line basis, newlines should not be included in the config file because of the use of timestamps. Timestamps may be disabled for an individual line by including TR_NOSTAMP in the level. Both trace() and tracef() use the fmt and args format of print() with the following additions: %A Expects a pointer to a sockaddr and formats the address for the specified family. Currently only AF_INET is defined. If the # modifier is specified, the port number is appended to the address. %m Inserts the text associatted with the current value of errno. %T Prints the passed time_t value time in hh:mm:ss format. Currently does not support the date or a number of days. The trace buffer is filled by tracef() calls. A trace() call fills the buffer and specifies the disposition. If any of the logging level flags specified on the trace() call match those specified in the configuration file, the line is logged to the trace file. If the specified priority is non-zero, the message is also syslogged with the specified priority. The buffer is then cleared. If trace() is called with a fmt of NULL, no data is appended before logging. The logging file is specified on the command line or in the configuration file. Tasks A TASK is generally associated with a socket, but may be around just to co-ordinate timers, or for cleanup for reparsing. The fields task_socket and task_timer should not be modified directly. Task routines are called with: task_routine(task *tp) A is created by first allocating task structure by calling task_alloc: task *task_alloc(char *name); The applicable fields should then be filled in followed by a call to task_create: int task_create(task *tp, int maxpacket) Task_create returns TRUE or FALSE, if FALSE, an immediate quit(EINVAL) is in order. Maxpacket is the specification of the largest packet size to be received. This allows a common receive buffer to be shared amoung all protocols. A task is deleted by calling task_delete which will delete all timers associated with this task, close the socket and finally delete the task: void task_delete(task *tp); If a socket has been opened before a task has been created, task_socket should be set before calling task_create. If it is opened after task creation, task_set_socket() should be used it indicate the association. Existing sockets should be disassociated with task first by calling task_reset_socket() after closing the socket. void task_set_socket(task *tp, int socket); void task_reset_socket(task *tp); When printing the task name, task_name() should be used which appends the address (if non-zero) and port number to the task name. It returns the pointer to a static string. char *task_name(task *tp); Tasks may be deleted and allocated at any time. They are kept in random order on a doubly linked list. Task to socket mapping is stored to allow quick access to a task if a select succeeds on it's socket. The task structure has the form: struct _task { struct _task *task_forw; struct _task *task_back; char *task_name; flag_t task_flags; int task_proto; int task_socket; proto_t task_rtproto; u_long task_rtrevision; void (*task_recv) (); void (*task_accept) (); void (*task_write) (); void (*task_connect) (); void (*task_except) (); void (*task_terminate) (); void (*task_flash) (); void (*task_cleanup) (); void (*task_reinit) (); void (*task_ifchange) (); sockaddr_un task_addr; caddr_t task_data; struct _timer *task_timer[TASK_TIMERS]; }; typedef struct _task task; #define TASKF_ACCEPT 0x01 /* This socket is waiting for accepts, not reads */ #define TASKF_CONNECT 0x02 /* This socket is waiting for connects, not writes */ #define TASKF_IPHEADER 0x04 /* Received packets have IP header to be received */ Field descriptions: task_name is a pointer to a static character string specifying the printable name of this task. task_flags Are task specific flags. TASKF_ACCEPT indicates that a socket is waiting on an accept instead of a read and so task_accept() should be called instead of task_read(). TASKF_CONNECT indicates that a socket is waiting on a connect instead of a write so task_accept() should be called instead of task_write(). TASKF_IPHEADER Indicates that packets read from this connection have an IP header which should be stored in the first element of the iovec. The rest of the data packet is stored in the second element of the iovec. task_proto The IP protocol being used for this socket if it is directly on the IP layer. Mainly for human consumption. task_socket The fd assigned to this socket, should be -1 if no socket is open and reset to -1 if the socket is closed. task_rtproto The routing table protocol being used by this task. Specified here to avoid use of a constant and also for human comsumption. task_rtrevision The routing table revision that has been flashed to this protocol/neighbor/interface ... Used to insure propagation of changed routes. task_recv Routine to call when there is data to be read on this socket. task_accept Routine to call when there is an incoming connection on this socket. task_write Routine to call when this socket is ready to accept more data. task_connect Routine to call when a connect has completed on this socket. task_except Routine to call when there is an exception pending on this socket. task_terminate Routine to call when a SIGTERM has been received and gated has commenced a graceful shutdown. This routine does not have to terminate a task, but should initiate the state change that will lead to an eventual shutdown. The default value is task_delete(). task_flash Routine to call when changes have been make to the routing table and flash updates should be generated. Does not have to do the flash update, but may schedule it at a future date by creating a timer. task_cleanup Routine called when a SIGHUP has been received and gated is about to re-read it's configuration file. Policy lists owned by this task should be freed and steps should be taken to allow later determination if this task has been removed from the config file and should be terminated. task_reinit Routine called after the configuration file has been re-read. If this task is no longer in the config file, it should be terminated. task_ifchange Called when an interface status has changed. The second argument is a pointer to the if_entry contol block. task_addr The address family, address and port selector for this task if applicable. task_data Task specific data which should be cast to a (caddr_t). task_timer[TASK_TIMERS] Array of pointers to timers owned by this task. This allows timers to be referenced by a define and deletion of all timers when a task is deleted. Timers Timers may be create, deleted, reset and cleared at any time. A timer causes a routine to be run at the specified time. Provisions are available to compensate for system load and processing time. Timer resolution is in seconds. Timers by default refire every timer_interval seconds, with the re-fire specified to occur timer_interval seconds from the last time the timer was supposed to fire. If the system is loaded and a timer is late more two intervals, the timer only fires once. The TIMERF_ABSOLUTE flag causes the timer to fire timer_interval seconds from when it last fired, regardless of system load or processing time. Timers automatically repeat unless TIMERF_DELETE is specified. TIMERF_DELETE creates a one-shot timer which is deleted after it fires. Timers are kept in two queues, the active queue and the inactive queue. The inactive queue is kept in random order, the active queue is kept in time order so the complete timer queue does not have to be scanned when the interval timer fires. Timer fields should not be modified directly, except for timer_job and timer_flags. The timer control block contains the following fields: struct _timer { struct _timer *timer_forw; struct _timer *timer_back; char *timer_name; flag_t timer_flags; time_t timer_next_time; time_t timer_last_time; time_t timer_interval; void (*timer_job) (); task *timer_task; int timer_index; }; typedef struct _timer timer; timer_name A printable name for this timer for human consumption. timer_flags TASKF_ABSOLUTE specifies that this timer should fire the specified interval from when it was set, notfrom when it last fired. TIMERF_DELETE specifies that this timer should be deleted as soon as it is finished. This flag may be set at any time. timer_next_time The Unix format timestamp indicating when this timer is scheduled to fire again. timer_last_time The Unix format timestamp indicating when this timer last fired. timer_interval The Unix format time interval of this timer. If TIMERF_DELETE is not specified timer_job The routine to be called when a timer fires. It is called as: void timer_job(timer *tip, time_t interval); timer_task Pointer to the task associated with this timer. timer_index The index into task_timer[] which points to this timer. Routines: timer *timer_create(task *tp, int index, char *name, flag_t flags, time_t interval, void (*job) ()); Creates a timer. If no task is specified, task should be (task *) 0. If this timer is initially inactive, the interval should be specified as (time_t) 0. void timer_delete(timer *tip); Deletes the timer. If a task is associated with this timer, it's pointer to this timer is cleared. void timer_reset(timer *tip); This timer is reset and put in the inactive queue. void timer_set(timer *tip, timer_t interval); Sets the timer to fire interval seconds from now. void timer_interval(timer *tip, timer_t interval); Sets the timer to fire interval seconds from when it last fired. char *timer_name(timer *tip); Returns a pointer a static area containing the task name followed by the timer name. Interfaces A structure is maintained for each address on each interface (BSD 4.4 allows multiple addresses per interface). All references to interface addresses are resolved to pointers to interface structures at configuration time. At initialization gated finds all active interfaces and creates interface structures for them. Interfaces which have not been configured are currently ignored. Every minute these interfaces are checked for a change in status, but new interfaces are not detected. It is my intention to eventually scan for new interfaces. Interfaces are checked for failure not noticed by a change in the IFF_UP flag by routing packets addressed by to myself on P2P lines and monitoring for the reception of routing packets. If no packets are received, the routes to an interface will time out and be deleted. This can be disabled on a per-interface basis. typedef struct _if_entry { struct _if_entry *int_next; sockaddr_un int_addr; union { sockaddr_un _intu_broadaddr; sockaddr_un _intu_dstaddr; } _int_intu; #define int_broadaddr _int_intu._intu_broadaddr #define int_dstaddr _int_intu._intu_dstaddr sockaddr_un int_net; sockaddr_un int_netmask; sockaddr_un int_subnet; sockaddr_un int_subnetmask; int int_metric; flag_t int_state; int int_ipackets; int int_opackets; char *int_name; u_short int_transitions; int int_index; pref_t int_preference; } if_entry; int_addr Is the address assigned to this interface. Address family is contained in the sockaddr. int_broadaddr The broadcast address of this interface if appropriate. int_dstaddr The destination address of this interface if appropriate. int_net The natural net of this interface. int_netmask The natural netmask of this interface. int_subnet The subnet specified on this interface. Same as int_net if subnetting is not used. int_subnetmask The subnet mask specified on this interface. Same as int_netmask if subnetting is not used. int_metric The configured (ifconfig) or gated specified metric for this interface. int_state Flags for this interface. int_ipackets Not used, I just now realized it existed. int_opackets Not used, I just now realized it existed. int_name The kernel's name for this interface. int_transitions Number of up->down transitions of this interface. int_index The order this interface appears in the kernel file. int_preference The preference to be used for the route to this interface. Flags: IFS_UP IFS_BROADCAST IFS_POINTOPOINT IFS_REMOTE IFS_LOOPBACK IFS_INTERFACE Set from the kernel's IFF_ flags with the name name. IFF_LOOPBACK is emulated on 4.2 systems. IFS_SUBNET This interface has specified a non-natural subnet mask. IFS_NOAGE Routing packets should not be used to determine the status of this interface. IFS_NORIPOUT IFS_NORIPIN IFS_NOHELLOOUT IFS_NOHELLOIN IFS_NOICMPIN Global disabling of routing protocols on an interface basis. Sort of gross to put them here, but a protocol specific control block is too much work at the moment. IFS_METRICSET The value int_metric was set in the gated config file. This does not cause the kernel's idea of the metric to be updated. IFS_MULTICAST This interface supports IP multicasting. Routines: if_entry *if_withdst(sockaddr_un *dstaddr); Returns a pointer to the interface structure of the interface with the given address. Note that POINTOPOINT interfaces are always refered to by their destination address. if_entry *if_withaddr(sockaddr_un *dstaddr); Returns a pointer to the interface structure of the interface on the given directly attached network. if_entry *if_withname(char *name); Returns a pointer to the interface structure of the interface with the given name. Note that in BSD 4.4 an interface can have multiple addresses, so this won't work. u_long if_subnetmask(struct in_addr addr); Returns the IP subnet mask of a given address if there is an interface to a subnet of that network. Will need updating for BSD 4.4 where subnet masks are variable. Routing table A destination is a host or network route and associated mask. The routing table allows multiple routes per destination as well as a provision for protocol dependent data (BGP currently uses it for maintaining an AS path and HELLO uses it for a sliding window of metrics). The one of multiple next hops used is determined by preference. A default preference is specified for each routing protocol and is overridable down to the destination and source level. A tie is resolved by using the next hop with the lower address in the interest of being deterministic. Routes in the routing table are aged unless the RTS_NOAGE flag is specified. When rt_timer reaches rt_timer_max, the route is put in holddown, rt_timer is reset and rt_timer_max is set to RT_T_HOLDDOWN. When a HOLDDOWN expires, the route is deleted. Routes added with the RTS_NOAGE flag should be deleted with rt_delete(). Routes added without the RTS_NOAGE flag should use rt_unreach() to delete a route, which puts it into holddown for RT_T_HOLDDOWN seconds. Modifications to the routing table are started by opening the table with rt_open() and finished with rt_close(). Attempts to change the table when it is not open result in a fatal error. The propagation of routes to the various protocols is controlled by the revision number. When the table is opened, the global revision number is incremented. If no changes have been made when it is closed, the number is decremented. Routes that are changes get their rt_revision set to the global value. Each task that modifies the routing table has it's own revision. When changes are made and task_flash() is called to cause the flash update tasks to be executed, a protocol can determine changes by comparing it's revision number with that of the route in question. When finished processing a flash update, a protocol sets it's task_rtrevision to the current global value. The existing protocols are not good examples of the work required to update the routing table with SPF algorithms. The suggested method is to run the Dykstra (sp) algorithm and generate at most RT_N_MULTIPATH mutlipath routes. The rt_add() and rt_change() routines will be modified to receive a pointer to a list of next hop gateways terminated by an null-entry. [RT_N_MULTIPATH will always be one on Unix, but ports of gated to routers will require support of multiple next hops.] Differing routes, such as non-multipath routes to the same external network should be added as seperate routes. Each route added has a pointer to a gw_entry for the gateway this route was learned from. For ICMP, RIP and HELLO these gw_entry control blocks are allocated dynamically each time a gateway is learned from. For BGP and EGP these control blocks are part of the peer structure and are used to delete all routes to a particular gateway when the connection is broken. For SPF routes this could be one common gw_entry, or could be the address of the link originating this route. The first would make it easy to delete all SPF routes if the protocol is disabled at run-time, the second would make it easy to delete routes when link goes down. struct _rt_entry { ... #define rt_dest rt_head->rth_dest #define rt_dest_mask rt_head->rth_dest_mask #define rt_parent rt_head->rth_parent sockaddr_un rt_router; if_entry *rt_ifp; gw_entry *rt_sourcegw; task *rt_task; time_t rt_timer; time_t rt_timer_max; metric_t rt_metric; flag_t rt_state; proto_t rt_proto; pref_t rt_preference; u_long rt_revision; as_t rt_as; flag_t rt_flags; rt_data *rt_data; }; rt_dest Is a sockaddr_un specifying the address family and destination for this route. rt_dest_mask Is a sockaddr_un specifying the mask for this destination. rt_parent Is the parent of this route, i.e. a route that has a smaller netmask. rt_router Is the next hop gateway for this route. rt_ifp Is a pointer to the interface used to reach the next hop. rt_sourcegw Is the address of the source_gw for this route. For SPF algorithms a single global gw_entry should be used. rt_task Pointer to the task that installed this route. rt_timer The age of this route in seconds. rt_timer_max The maximum age of this route. rt_metric The metric for this route. This is not translated between protocols. rt_state Flags for this route. rt_proto The protocol this route was learned from. rt_preference The preference for this route. rt_revision The revision of the routing table at which this route was modified. rt_as The AS of this route. Zero is allowed if no exterior protocols are in use. rt_flags Emulation of kernel flags for this route. rt_data Pointer to protocol specific data block (rtd_data). Routines: rt_open Obtain update permission on the routing table rt_close Release control of the routing table rt_add Adds a route to the routing table. rt_change Change the next-hop, metric or preference of the route. rt_unreach Put the specified route into holddown. rt_delete Delete the specified route. rt_refresh Indicate that this route has been heard again from the same gateway with the same metric and it's age should be reset to zero. rt_gwunreach The specified gateway has become unreachable. An rt_delete will be issued to all routes installed by this gateway. rt_locate Locate a route given flags, destination and protocol. rt_locate_gw Locate a route given flags, destination, protcol and gw_entry. Route specific data: The rt_data pointer in the rt_entry structure allows the manipulation of protocol specific data for each route. Rt_data should point to the following structure: /* Prefix of protocol independent data */ typedef struct _rt_data { struct _rt_data *rtd_forw; struct _rt_data *rtd_back; int rtd_refcount; u_int rtd_length; void (*rtd_dump)(); caddr_t rtd_data; } rt_data; Where rtd_data protocol-specific data area. There are basically two types of route-specific data, data that is unique for each route and data that can be common to a group of routes. The HELLO protocol is an example of the use of unique data and the BGP AS paths (attributes in version 2) are an example of shared data. Unique data should be allocated with rtd_alloc() and is automatically freed when a route is deleted. Shared data should be allocated with rtd_locate() which will return a pointer to a new data area containing the desired data, or a pointer to an existing data area. Each reference is counted, this count is decremented each time a route is deleted and the area is freed when the last reference is deleted. Shared data requires the protocol set up a queue head pointer for maintenance of the list of protocol-specific data. The RTDATA_LIST and RTDATA_LIST_END macros are available to scan this list. Shared data can also be manipulated with rtd_alloc() and rtd_insert() which allow manipulation of rt_data structures instead of pointers and length of data areas. AF_INET support routines: gd_inet_makeaddr() Build an address from a network number, a host number and a flag indicating if subnets should be considered when taking the network part of the network number supplied. gd_inet_netof() Return the subnet of a sockaddr_in. gd_inet_wholenetof() Return the natural net of a sockaddr_in. gd_inet_class() Return the class (A = 1, B = 2, C = 3) of the first byte of a network number (used mainly by EGP). gd_inet_checkhost() Mostly used by RIP. Verifies that a network is class A, B or C, that sin_port is zero and that the reserved fields are zero. gd_inet_hash() Calculates the routing table hash value for a sockaddr_in. gd_inet_cksum() Calculates the Internet checksum given an iovec. inet_ntoa() Returns a pointer to a static string containing the ASCII representation of the IP network number. Don't use this, use the %A format of trace and *printf(). Other routines quit() Terminate gated. Passed an errno value which is logged. Implementing a protocol The following sections of need to be updated to add a new protocol: defs.c: #define PROTO_protocol if.h define protocol specific interface flags. Also add to IFF_KEEPMASK if.c add above flags to if_flag_bits structure. main.c: include "protocol.h" main(): Code to call protocol initialization routine. nmi.c: Add code to return the correct value for ipRouteProto. parse.c Add keywords to keywords table. Add code to parse_metric_check. parse.h Add metric limits and other value limits. parser.y Add code to parse protocol-specific configuration information as well as updating the propagation restrictions for this protocol. rt_table.h Define RTPROTO_protocol and RTPREF_protocol rt_table.c Define printable versions of above. snmp.c: Add code to return the correct value for ipRouteProto. task.c: task_reinit(): Code to call protocol init routine after reparse. trace.h: Define TR_protocol and optionally IF_protocolUPD trace.c: Define text values for above and specify command-line flags for enabling tracing of this protocol. Call protocol_dump().