This is sr_doc_stream.h in view mode; [Download] [Up]
/* * PCN Abstract Machine Emulator * Authors: Steve Tuecke and Ian Foster * Argonne National Laboratory * * Please see the DISCLAIMER file in the top level directory of the * distribution regarding the provisions under which this software * is distributed. * * sr_doc_stream.h - Documentation for stream versions * of the sr_*.c Send/Receive routines. * * Please use the instructions in sr_doc.h for implementing your * own sned/receive module. * * The stream versions were partially implemented in sr_bsdipc.c, * but never completed. This was some of the doc for what is * in the #ifdef STREAMS in that file. */ /* Each SR (send/receive) module consists of two files: sr_*.c - All of the code that implements initialization, sends, and receives. sr_*.h - Any header information that is needed by other parts of the emulator. The functions that the SR module should implement for the emulator are: _p_sr_get_argdesc() _p_sr_init_node() _p_sr_node_initialized() _p_destroy_nodes() _p_abort_nodes() _p_alloc_msg_buffer() _p_msg_send() _p_msg_receive() _p_alloc_stream() _p_free_stream() _p_stream_send() _p_enable_stream_receive() This file contains general documentation describing the use of the SR module as a whole, as well as descriptions of each procedure that the SR module must implement. Argument parsing ================ _p_sr_get_argdesc() is called immediately before command line arguments are parsed. It is passed argv and argc, in case something is needed directory out of them -- for example, sr_bsdipc.c saves a pointer to argv[0] (the program name) so that it can use it later. It should fill in its argument argdescp with a pointer to an argument description array that contains the arguments needed by this SR module. And n_argdescp should be set to the number of arguments held in this array. Initialization ============== The parallel emulator is started up using the following sequence of calls: _p_sr_init_node(...); <Do initialization stuff on each node> _p_sr_node_initialized(); It is the responsibility of _p_sr_init_node() to make sure all the nodes are created and their send/receive primitives are initialized. When _p_sr_init_node() returns, all send/receive operations should be fully functional. _p_sr_node_initialized() is basicly just a debugging hook, though it can also be used to verify initialization. It need not do anything. However, it is very useful when debugging a new SR module, because it is called after all initialization, immediately before the main emulator loop is entered. It provides a good place to check out initialization and test out the SR primitives. It can also be used to verify that all the nodes have actually initialized correctly, and if not then it can shut things down. There is one other function, sr_fatal_error(), that is used by _p_sr_init_node(), but that should not be exported to the rest of the emulator. Once the emulator has been completely initialized, _p_fatal_error() (in boot.c) should be used to kill the emulator in the case of a fatal error. But _p_fatal_error() should not be called until all of the SR routines are initialized and functional. But if there is an error in _p_init_node(), then _p_fatal_error() cannot be used. Therefore, sr_fatal_error() should be used during SR initialization to kill everything in the case of an initialization error. It should try to kill off all nodes by whatever method possible. Global variables ================ _p_sr_init_node() is responsible for setting the following global variables: _p_my_id _p_host_id _p_nodes _p_host _p_usehost _p_default_msg_buffer_size All nodes of a parallel emulator run are given a unique integer. If there are N nodes in the system, they must be numbered 0..N-1, where the host is always node N-1. In addition, the system needs to be told if it should use the host node (N-1) when mapping work to nodes. In a implementation like the Cosmic Environment, where there is a host machine that is separate from all the other nodes, you would generally not want to use the host for general work. But in an implementation like the Sequent Symmetry, where all nodes are the same, you would want to use the host. Thus, the first five variables listed above must be set to reflect the parallel architecture: _p_nodes : The number of nodes (N) in the emulator on this run. _p_my_id : The node number (from 0..N-1) for my node. _p_host_id : The node number for the host (always _p_nodes - 1). _p_host : A boolean variable that should be set to TRUE if this is the host (_p_my_id == _p_host_id), otherwise it should be set to FALSE. _p_usehost : A boolean variable that should be set to TRUE if the host should be used when mapping work to nodes, otherwise FALSE if it should not. _p_default_msg_buffer_size : The default message buffer size (in cells) for message buffers. This size should not include any header information that the SR code might tack onto the message. Thus, if 4096 bytes is an good default message size, cells are 4 bytes each, and 4 cells are needed for header information, then _p_default_msg_buffer_size should be set to 1020 (4096/4 - 4). So what is a good value for _p_default_msg_buffer_size? That's a good question -- and one that doesn't have a pat answer. It is used when the emulator does not know exactly what size buffer should allocated before it starts packing stuff into that buffer. For example, if a tuple needs to be sent in the message, how much space should be allocated? Just enough to allow the first level of the tuple to be copied? Or do you allow additional space in case the tuple contains other tuples (for example, it is a list), so that you can pack more of the contents of the tuple into the message? The emulator will always allocate enough space for the top level of the tuple. But, if it requires less than the _p_default_msg_buffer_size to hold the top level, then it allocate a space of size _p_default_msg_buffer_size, so that it can pack addition levels of the tuple into the message, if those additional levels exist. Finally, one last factor in determining a value for this variable. As mentioned, the emulator does not know how much space it needs to allocate for the message before packing the message into the buffer. However, after the message is packed into the buffer, it knows exactly how many cells from the buffer it actually used. And it is this value (the number of cells actually used) that is passed to the _p_msg_send() routine. Therefore, _p_msg_send() routine need to send the entire allocated buffer. It only needs to send the part that is used. So it is ok to allocate considerably more space than you actually send. So, in general, this value should probably be at least 100-200 cells. That way, at least a few levels of a tuple (such as a list) can be packed into a single message. But if memory is available, and your send/receive routines allow allocation of buffer that are larger than what is actually sent, then the _p_default_msg_buffer_size should be made considerably bigger. What is "considerably bigger"? At least 1000 cells, and perhaps even more. Sending messages ================ There are essentially two catagories of messages that are sent by the emulator: normal messages - The normal communication done between emulator nodes stream messages - Communication done using the special stream primitives. Each catagory of messages has its own way in which messages are sent. Sending a normal message ------------------------ Normal messages are sent using the code: _p_alloc_msg_buffer(...); <Fill in the message buffer> _p_msg_send(...); _p_alloc_msg_buffer() allocates a message buffer of the appropriate size. _p_msg_send send the message in that message buffer to a node and frees the message buffer. Sending a stream message ------------------------ Stream messaages are sent using the _p_stream_send() function. No message buffers are allocated because the arguments to _p_stream_send() tell it exactly where on heap it should grab its data from. Receiving a message =================== There is only one function for receiving messages -- _p_msg_receive(). It must handle both normal and stream messages. However, it must work in conjunction with _p_enable_stream_receive() to handle stream messages properly. These functions will be described in more detail below. Normal termination of the emulator ================================== The only way the emulator will normally exit is if the the host node reaches a PCN exit instruction, which will cause the main emulator loop to exit. In this situation, the host will send MSG_EXIT messages to all other nodes. Upen receipt of a MSG_EXIT message, a node will return a MSG_EXIT message to the host and then call _p_destroy_nodes(). Once the host receives a MSG_EXIT message back from each node, it will also call _p_destroy_nodes. _p_destroy_nodes() need not do anything. If it does not do anything, then the host will send a second MSG_EXIT to each node, followed by an exit(0). Each node, upon receipt of the second MSG_EXIT(), will also do an exit(0). Thus, under normal circumstances, every node will execute an exit(0) to shut itself down. If this is not the proper way for the nodes to exit on a particular machine, the proper method should be implemented in _p_destroy_nodes(). Aborting the emulator ===================== If the emulator encounters a fatal error during its execution (a signal, corrupt heap, etc), it will call the _p_fatal_error() function. That function will try to cleanly shut all nodes of the emulator down. Along the way it will call _p_abort_nodes(). If a method exists for kill all nodes of the emulator, then _p_abort_nodes() should use it. For example, the Sequent Symmetry version uses a killpg() to kill the entire process group which consists of all the nodes. In other SR modules (sr_machipc), _p_abort_nodes() sends a special abort message to all other nodes before exiting. In that case, the _p_msg_receive() routine watches for an abort message and calls _p_fatal_error() if it receives one. In general, the goal of _p_abort_nodes() is to everything possible to kill all nodes of the emulator, so that under abortive circumstances some nodes aren't left hanging around while others have terminated. If a fatal error occurs in the emulator after it has been completely initialized, there are two procedures (in boot.c) that should be used: _p_fatal_error("Error string"); and _p_malloc_error(); Neither of these procedures return. They will kill the node and hopefully all other nodes as well. sr_*.h ====== At a minimum, the following needs to be defined in sr_*.h: #undef PARALLEL #define PARALLEL #undef ASYNC_MSG #define ASYNC_MSG 0 The PARALLEL definition causes all of the parallel emulator code to be compiled into the emulator. Without this definition, the emulator only has the code to run a 1 node emulator. The ASYNC_MSG definition causes the proper message handling code to get linked into the emulator. It signals whether this SR module uses synchronous (polled) message handling (ASYNC_MSG==0) or asynchronous message handling (ASYNC_MSG==1). Asynchronous message handling ============================= When ASYNC_MSG is set to 0, the emulator will occasionally poll for new messages. It does this by calling _p_msg_receive(...,RCV_NOBLOCK) -- a non-blocking receive. Unfortunately, this can be a relatively expensive operation. However, some systems can be set up so that when a message arrives, the emulator can be asynchronous notified of this fact. In this situation, the emulator need not call _p_msg_receive(...,RCV_NOBLOCK) in order to find out if there are messages. Rather, the asynchronous notification can set a variable that the emulator can check, instead of having to call _p_msg_receive() each time. When ASYNC_MSG is set to 1, this asynchronous notification is enabled. Instead of calling _p_msg_receive(...,RCV_NOBLOCK) to check for new messages, the emulator just checks the _p_msg_avail variable. Thus, if a SR module uses asynchronous messaging, then it must set _p_msg_avail to TRUE when a message arrives. When the emulator finds that _p_msg_avail has been set to TRUE, only then it will call _p_msg_receive(...,RCV_NOBLOCK). So, once _p_msg_receive() handles all available messages, it should reset _p_msg_avail to FALSE. Aborting from the emulator ========================== */ /***************************************************************** ****************************************************************** ** ** ** PROCEDURE DESCRIPTIONS ** ** ** ****************************************************************** *****************************************************************/ /*********************************************************** void _p_sr_get_argdesc(argdesc_t **argdescp, int *n_argdesc) Called by boot.c to get a pointer to argument description table. For ease of mind's sake, we can initialize the values here since they will be filled in soon after this call. ********************* _p_sr_get_argdesc() ******************/ /*********************************************************** void sr_fatal_error(char *msg) Used by _p_sr_init_node() to deal with fatal errors during the worker creation process. _p_fatal_error() cannot be called until everything is up and running. So sr_fatal_error() fills in until then. ********************* sr_fatal_error() *********************/ /*********************************************************** void _p_sr_init_node() This procedure is responsible for setting up and initializing the SR module on all nodes. It is the first thing called. When it returns, the SR module should be fully functional. This module usually works in one of two ways: 1) The host process must spawn off all the nodes (using fork, or rsh, or some such means), and then initialize itself. 2) On some parallel machines, the OS takes care of loading the executable onto all nodes simultaneously. In this case, the procedure must figure out how to initialize the SR module for the node it is running on, and get everything setup so that it can communicate with other nodes. ********************* _p_sr_init_node() ********************/ /*********************************************************** void _p_sr_node_initialized() This function is called after the node has been completely initialized. It need not do anything. However, it can be useful for two things: 1) SR module debugging code can be put here. For example, I often put a simple ring test in here, just to see if the proper connections are being made. 2) It can make a final check to make sure all the other nodes came up ok. And if it didn't then it can shut down. ********************* _p_sr_node_initialized() *************/ /*********************************************************** void _p_destroy_nodes() This procedure is described above under the "Normal termination of the emulator" section. To recap, it is called on every node during normal termination of the emulator. It can kill all of the nodes. Or it can do nothing, in which case all nodes will proceed to execute an exit(0). ********************* _p_destroy_nodes() *******************/ /*********************************************************** void _p_abort_nodes() This procedure is called from _p_fatal_error() -- when we encounter a fatal error. It should do what it can to kill off all of the nodes. Some typical ways in which this is done: 1) A special (machine specific) procedure is called which will kill off all the nodes. For example, on the Sequent Symmetry, killpg() is called to kill all the nodes. 2) An abort message is sent to the host. When the _p_msg_receive() routine on the host receives this abort message, it calls some special procedure to kill all the nodes. 3) An abort message is sent to all other nodes. When those other nodes do a _p_msg_receive() and see the abort messages, then they will shutdown using _p_fatal_error(). ********************* _p_abort_nodes() *********************/ /*********************************************************** cell_t *_p_alloc_msg_buffer(int size) Allocate a message buffer that will later be used by _p_msg_send(). The 'size' argument specifies how many cells (NOT bytes) the message buffer should contain. Note: If this procedure uses malloc(), and the malloc fails, then it should call _p_malloc_error(), not _p_fatal_error(). The difference is that _p_malloc_error() does not use fprintf(). One many machines, once a malloc fails once, it will fail from then on. Unfortunately, fprintf() usually uses malloc() for temporary space, so it fails after a malloc error. Therefore, _p_malloc_error() does not use fprintf(). Return: A pointer to a message buffer with 'size' cells. ********************* _p_alloc_msg_buffer() ****************/ /*********************************************************** void _p_msg_send(cell_t *buf, int node, int size, int type) Sends the message that is pointed to by 'buf' to 'node'. Only the first 'size' cells of the buffer need to be sent. The message has the specified 'type'. If buf==NULL (and size==0), then an empty message of the specified type is sent. After the send is completed, free the message buffer. This send will block until the message can be delivered. If an error occurs, _p_fatal_error() or _p_malloc_error() should be called to abort the program. ********************* _p_msg_send() ************************/ /*********************************************************** bool_t _p_msg_receive(int *node, int *size, int *type, int rcv_type) Receive a message from ANY node. Place the message onto the heap. (And make sure there is room for it on the heap.) Valid 'rcv_type' arguments are: RCV_NOBLOCK Do not block if no messges are waiting RCV_BLOCK Block until a message is received. RCV_PARAMS Block until a MSG_PARAMS message arrives. RCV_COLLECT When called from _p_garbage_collect(). Ignore MSG_COLLECT messages, and queue up MSG_DEFINE and MSG_VALUE. Block until we get a MSG_CANCEL or MSG_READ. Return: TRUE if we read a message, otherwise FALSE. The node, size and type arguments are return values. ********************* _p_msg_receive() *********************/
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.