This is sr_doc.h in view mode; [Download] [Up]
/* * PCN Abstract Machine Emulator * Authors: Steve Tuecke and Ian Foster * Argonne National Laboratory * * Please see the DISCLAIMER file in the top level directory of the * distribution regarding the provisions under which this software * is distributed. * * sr_doc.h - Documentation for all of the sr_*.c Send/Receive routines */ /* Each SR (send/receive) module consists of two files: sr_*.c - All of the code that implements initialization, sends, and receives. sr_*.h - Any header information that is needed by other parts of the emulator. The functions that the SR module should implement for the emulator are: _p_sr_get_argdesc() _p_sr_init_node() _p_sr_node_initialized() _p_destroy_nodes() _p_abort_nodes() _p_alloc_msg_buffer() _p_msg_send() _p_msg_receive() This file contains general documentation describing the use of the SR module as a whole, as well as descriptions of each procedure that the SR module must implement. Argument parsing ================ _p_sr_get_argdesc() is called immediately before command line arguments are parsed. It is passed argv and argc, in case something is needed directory out of them -- for example, sr_bsdipc.c saves a pointer to argv[0] (the program name) so that it can use it later. It should fill in its argument argdescp with a pointer to an argument description array that contains the arguments needed by this SR module. And n_argdescp should be set to the number of arguments held in this array. Initialization ============== The parallel emulator is started up by calling _p_sr_init_node(), which should in turn call _p_init_node(), which will in turn call _p_sr_node_initialized(). It is the responsibility of _p_sr_init_node() to make sure all the nodes are created and their send/receive primitives are initialized. The last thing _p_sr_init_node() should do on all nodes is call _p_init_node(), which will take care of all of the general emulator initialization. When _p_sr_init_node() makes this call, all send/receive operations should be fully functional. At the end of initialization, _p_init_node() will call _p_sr_node_initialized(). This is basicly just a debugging hook, though it can also be used to verify initialization. It need not do anything. However, it is very useful when debugging a new SR module, because it is called after all initialization, immediately before the main emulator loop is entered. It provides a good place to check out initialization and test out the SR primitives. It can also be used to verify that all the nodes have actually initialized correctly, and if not then it can shut things down. There is one other function, sr_fatal_error(), that is used by _p_sr_init_node(), but that should not be exported to the rest of the emulator. Once the emulator has been completely initialized, _p_fatal_error() (in boot.c) should be used to kill the emulator in the case of a fatal error. But _p_fatal_error() should not be called until all of the SR routines are initialized and functional. If there is an error in _p_init_node(), then _p_fatal_error() cannot be used. Therefore, sr_fatal_error() should be used during SR initialization to kill everything in the case of an initialization error. It should try to kill off all nodes by whatever method possible. Global variables ================ _p_sr_init_node() is responsible for setting the following global variables: _p_my_id _p_host_id _p_nodes _p_host _p_default_msg_buffer_size All nodes of a parallel emulator run are given a unique integer. If there are N nodes in the system, they must be numbered 0..N-1, where the host is always node 0. The first four variables listed above must be set to reflect the parallel architecture: _p_nodes : The number of nodes (N) in the emulator on this run. _p_my_id : The node number (from 0..N-1) for my node. _p_host_id : The node number for the host (always 0). _p_host : A boolean variable that should be set to TRUE if this is the host (_p_my_id == _p_host_id), otherwise it should be set to FALSE. _p_default_msg_buffer_size : The default message buffer size (in cells) for message buffers. This size should not include any header information that the SR code might tack onto the message. Thus, if 4096 bytes is an good default message size, cells are 4 bytes each, and 4 cells are needed for header information, then _p_default_msg_buffer_size should be set to 1020 (4096/4 - 4). So what is a good value for _p_default_msg_buffer_size? That's a good question -- and one that doesn't have a pat answer. It is used when the emulator does not know exactly what size buffer should allocated before it starts packing stuff into that buffer. For example, if a tuple needs to be sent in the message, how much space should be allocated? Just enough to allow the first level of the tuple to be copied? Or do you allow additional space in case the tuple contains other tuples (for example, it is a list), so that you can pack more of the contents of the tuple into the message? The emulator will always allocate enough space for the top level of the tuple. But, if it requires less than the _p_default_msg_buffer_size to hold the top level, then it allocates a space of size _p_default_msg_buffer_size, so that it can pack addition levels of the tuple into the message, if those additional levels exist. Finally, one last factor in determining a value for this variable. As mentioned, the emulator does not know how much space it needs to allocate for the message before packing the message into the buffer. However, after the message is packed into the buffer, it knows exactly how many cells from the buffer it actually used. And it is this value (the number of cells actually used) that is passed to the _p_msg_send() routine. Therefore, _p_msg_send() routine need not send the entire allocated buffer. It only needs to send the part that is used. So it is ok to allocate considerably more space than you actually send. So, in general, this value should probably be at least 100-200 cells. That way, at least a few levels of a tuple (such as a list) can be packed into a single message. But if memory is available, and your send/receive routines allow allocation of buffer that are larger than what is actually sent, then the _p_default_msg_buffer_size should be made considerably bigger. What is "considerably bigger"? At least 1000 cells, and perhaps even more. Sending messages ================ Messages are sent using the code: _p_alloc_msg_buffer(...); <Fill in the message buffer> _p_msg_send(...); _p_alloc_msg_buffer() allocates a message buffer of the appropriate size. _p_msg_send send the message in that message buffer to a node and frees the message buffer. Receiving a message =================== The function _p_msg_receive() is used to receive messages. It places the received message onto the heap starting at _p_heap_ptr. (It will check to make sure there is enough space left on the heap for message first, and if not it will call the garbage collector.) _p_msg_receive() has several different modes of operation, depending on the receive type: RCV_BLOCK : Blocking receive of any type RCV_NOBLOCK : Non-blocking receive of any type RCV_PARAMS : Only receive a MSG_PARAMS (parameter) message, or a MSG_EXIT or MSG_INITIATE_EXIT. Queue up messages of other types. This is a blocking receive. RCV_PARAMS : Only receive a MSG_GAUGE (gauge) message, or a MSG_EXIT or MSG_INITIATE_EXIT. Queue up messages of other types. This is a blocking receive. RCV_COLLECT : Only receive a MSG_READ, MSG_CANCEL, MSG_EXIT, or MSG_INITIATE_EXIT message. This type will be used if the heap space fills up in a parallel run, and we're waiting for space to be free up. This is a blocking receive. Normal termination of the emulator ================================== A normal termination will occur if any node runs the exit PAM instruction, or runs one of the exit_from_*() procedures (in utils.c). The guts of the exit routines are in parallel.c, _p_host_handle_exit() and _p_node_handle_exit(). If a node initiates the exit, it will send a MSG_INITIATE_EXIT to the host and then wait for the normal exit protocol to occur. If the host initiates the exit, or receives a MSG_INITIATE_EXIT message from a node, then it will run the exit protocol: 1) The host will sync up with the nodes by sending a MSG_EXIT message to each node, and then waiting for a return MSG_EXIT message from each node. Upon receipt of a MSG_EXIT message, a node will simply return a MSG_EXIT message to the host. 2) The host will initiate the Gauge profile dump. MSG_GAUGE type messages will be used within this chunk of code to control the dump. 3) All nodes will dump their Upshot logs to files. 4) The host will sync up with the nodes again, as described in step #1. 5) _p_destroy_nodes() will be called on the host and all nodes 6) a) The host will send a MSG_EXIT to each node and then call _p_shutdown_pcn(). b) The nodes will wait for a MSG_EXIT from the host and then call _p_shutdown_pcn(). 7) The host and all nodes will call exit(). The _p_destroy_nodes() that is called during step 5 on the host and the nodes need not do anything. If it does not do anything, then everyone will proceed to call exit() normall. However, if normal shutdown must be done in some manner other than having all nodes call exit(), this can be implemented in _p_destroy_nodes(). Aborting the emulator ===================== If the emulator encounters a fatal error during its execution (a signal, corrupt heap, etc), it will call the _p_fatal_error() function. That function will try to cleanly shut all nodes of the emulator down. (As opposed to leaving stray processes hanging around, etc.) But it will not go through the normal exit protocol described above. Along the way it will call _p_abort_nodes(). If a method exists for killing all nodes of the emulator, then _p_abort_nodes() should use it. For example, the Sequent Symmetry version uses a killpg() to kill the entire process group which consists of all the nodes. In other SR modules (sr_machipc), _p_abort_nodes() sends a special abort message to all other nodes before exiting. In that case, the _p_msg_receive() routine watches for an abort message and calls _p_fatal_error() if it receives one. In general, the goal of _p_abort_nodes() is to everything possible to kill all nodes of the emulator, so that under abortive circumstances some nodes aren't left hanging around while others have terminated. If a fatal error occurs in the emulator after it has been completely initialized, there are two procedures (in boot.c) that should be used: _p_fatal_error("Error string"); and _p_malloc_error(); Neither of these procedures return. They will kill the node and hopefully all other nodes as well. Note: There is a separate _p_malloc_error() procedure because on some machines the fprintf's used by _p_fatal_error() will call malloc and fail and cause a real mess. sr_*.h ====== At a minimum, the following needs to be defined in sr_*.h: #undef PARALLEL #define PARALLEL #undef ASYNC_MSG #define ASYNC_MSG 0 The PARALLEL definition causes all of the parallel emulator code to be compiled into the emulator. Without this definition, the emulator only has the code to run a 1 node emulator. The ASYNC_MSG definition causes the proper message handling code to get linked into the emulator. It signals whether this SR module uses synchronous (polled) message handling (ASYNC_MSG==0) or asynchronous message handling (ASYNC_MSG==1). Asynchronous message handling ============================= When ASYNC_MSG is set to 0 (synchronous message handling), the emulator will occasionally poll for new messages. It does this by calling _p_msg_receive(...,RCV_NOBLOCK) -- a non-blocking receive. Unfortunately, this can be a relatively expensive operation. However, some systems can be set up so that when a message arrives, the emulator can be asynchronous notified of this fact. In this situation, the emulator need not call _p_msg_receive(...,RCV_NOBLOCK) in order to find out if there are messages. Rather, the asynchronous notification can set a variable that the emulator can check, instead of having to call _p_msg_receive() each time. When ASYNC_MSG is set to 1, this asynchronous notification is enabled. Instead of calling _p_msg_receive(...,RCV_NOBLOCK) to check for new messages, the emulator just checks the _p_msg_avail variable. Thus, if a SR module uses asynchronous messaging, then it must set _p_msg_avail to TRUE when a message arrives. When the emulator finds that _p_msg_avail has been set to TRUE, only then it will call _p_msg_receive(...,RCV_NOBLOCK). So, once _p_msg_receive() handles all available messages, it should reset _p_msg_avail to FALSE. */ /***************************************************************** ****************************************************************** ** ** ** PROCEDURE DESCRIPTIONS ** ** ** ****************************************************************** *****************************************************************/ /*********************************************************** void _p_sr_get_argdesc(int argc; char **argv; argdesc_t **argdescp, int *n_argdesc) Called by boot.c to get a pointer to argument description table. If the sr code needs something from argc and argv directly (for example, argv[0] has the name of this executable), it can get this. We can also initialize sr variables here if they might be modified during argument handling. ********************* _p_sr_get_argdesc() ******************/ /*********************************************************** void sr_fatal_error(char *msg) Used by _p_sr_init_node() to deal with fatal errors during the worker creation process. _p_fatal_error() cannot be called until everything is up and running. So sr_fatal_error() fills in until then. ********************* sr_fatal_error() *********************/ /*********************************************************** void _p_sr_init_node() This procedure is responsible for setting up and initializing the SR module on all nodes. It is the first thing called. The last thing it should do is call _p_init_node() on all nodes (including the host). When it makes this call, the SR module should be fully functional. This module usually works in one of two ways: 1) The host process must spawn off all the nodes (using fork, or rsh, or some such means), initialize itself, and then call _p_init_node(). Then, when the node processes hit this routine (by way of the fork, or the rsh, or whatever), then initialize themselves, and call _p_init_node(). 2) On some parallel machines, the OS takes care of loading the executable onto all nodes simultaneously. In this case, the procedure must figure out how to initialize the SR module for the node it is running on, get everything setup so that it can communicate with other nodes, and then call _p_init_node(). ********************* _p_sr_init_node() ********************/ /*********************************************************** void _p_sr_node_initialized() This function is called after the node has been completely initialized. It need not do anything. However, it can be useful for two things: 1) SR module debugging code can be put here. For example, I often put a simple ring test in here, just to see if the proper connections are being made. 2) It can make a final check to make sure all the other nodes came up ok. And if it didn't then it can shut down. ********************* _p_sr_node_initialized() *************/ /*********************************************************** void _p_destroy_nodes() This procedure is described above under the "Normal termination of the emulator" section. To recap, it is called on every node during normal termination of the emulator. It can kill all of the nodes. Or it can do nothing, in which case all nodes will proceed to execute an exit(0). ********************* _p_destroy_nodes() *******************/ /*********************************************************** void _p_abort_nodes() This procedure is called from _p_fatal_error() -- when we encounter a fatal error situation. It should do what it can to kill off all of the nodes. Some typical ways in which this is done: 1) A special (machine specific) procedure is called which will kill off all the nodes. For example, on the Sequent Symmetry, killpg() is called to kill all the nodes. 2) An abort message is sent to the host. When the _p_msg_receive() routine on the host receives this abort message, it calls some special procedure to kill all the nodes. 3) An abort message is sent to all other nodes. When those other nodes do a _p_msg_receive() and see the abort messages, then they will shutdown using _p_fatal_error(). ********************* _p_abort_nodes() *********************/ /*********************************************************** cell_t *_p_alloc_msg_buffer(int size) Allocate a message buffer that will later be used by _p_msg_send(). The 'size' argument specifies how many cells (NOT bytes) the message buffer should contain. Note: If this procedure uses malloc(), and the malloc fails, then it should call _p_malloc_error(), not _p_fatal_error(). The difference is that _p_malloc_error() does not use fprintf(). One many machines, once a malloc fails once, it will fail from then on. Unfortunately, fprintf() usually uses malloc() for temporary space, so it fails after a malloc error. Therefore, _p_malloc_error() does not use fprintf(). Return: A pointer to a message buffer with 'size' cells. ********************* _p_alloc_msg_buffer() ****************/ /*********************************************************** void _p_msg_send(cell_t *buf, int node, int size, int type) Sends the message that is pointed to by 'buf' to 'node'. Only the first 'size' cells of the buffer need to be sent. The message has the specified 'type'. If buf==NULL (and size==0), then an empty message of the specified type is sent. After the send is completed, free the message buffer. This send will block until the message can be delivered (though not necessarily until it has been received, if there is buffering in transit). If an error occurs, _p_fatal_error() or _p_malloc_error() should be called to abort the program. ********************* _p_msg_send() ************************/ /*********************************************************** bool_t _p_msg_receive(int *node, int *size, int *type, int rcv_type) Receive a message from ANY node. Place the message onto the heap. (And make sure there is room for it on the heap.) Valid 'rcv_type' arguments are: RCV_NOBLOCK Do not block if no messges are waiting RCV_BLOCK Block until a message is received. RCV_PARAMS Block until a MSG_PARAMS message arrives. RCV_GAUGE Block until a MSG_GAUGE message arrives. RCV_COLLECT When called from _p_garbage_collect(). Ignore MSG_COLLECT messages, and queue up MSG_DEFINE and MSG_VALUE. Block until we get a MSG_CANCEL or MSG_READ. See above for more detailed info on receiving messages. Return: TRUE if we read a message, otherwise FALSE. The node, size and type arguments are return values. ********************* _p_msg_receive() *********************/
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.