This is pcn_tags.h in view mode; [Download] [Up]

 * PCN System
 * Author:      Steve Tuecke
 *              Argonne National Laboratory
 * Please see the DISCLAIMER file in the top level directory of the
 * distribution regarding the provisions under which this software
 * is distributed.
 * pcn_tags.h
 * The tag values used by the assembler, linker, and runtime system.
 * A discussion of the tags can be found in a comment near the end
 * of this file.


 * Number of bits for the tag
#define TAG_SIZE	5

 * All tags (except REF_TAG) must have a low order bit of 1.

#define REF_TAG			0
#define UNDEF_TAG		1
#define RREF_TAG		3
#define TUPLE_TAG		5
#define INT_TAG			7
#define STRING_TAG		9
#define DOUBLE_TAG		11
#define TRAILER_TAG		13
#define PROC_RECORD_TAG		19
#define VALUE_NOTE_TAG		21

#ifdef STREAMS
#define STREAM_TAG		23

 * The following are the remaining available tags:
 *	25  27  29 31

#define PCN_TAG_NAMES \
{ \
    "REF_TAG", \
    "UNDEF_TAG", \
    "REF_TAG", \
    "RREF_TAG", \
    "REF_TAG", \
    "TUPLE_TAG", \
    "REF_TAG", \
    "INT_TAG", \
    "REF_TAG", \
    "STRING_TAG", \
    "REF_TAG", \
    "DOUBLE_TAG", \
    "REF_TAG", \
    "TRAILER_TAG", \
    "REF_TAG", \
    "REF_TAG", \
    "REF_TAG", \
    "STREAM_TAG", \
    "REF_TAG", \
    "UNUSED_TAG", \
    "REF_TAG", \
    "UNUSED_TAG", \
    "REF_TAG", \
    "UNUSED_TAG", \
    "REF_TAG", \
    "UNUSED_TAG", \
    "REF_TAG", \
    "UNUSED_TAG", \
    "REF_TAG", \
    "UNUSED_TAG" \

#ifndef PCN


#ifdef GLOBAL

PCN_TAGS_GLOBAL char *_p_tag_name[]


#endif /* !PCN */

Discussion of tags (or "Why tags are the way they are")

To support the Morris garbage collection algorithm in PCN, all
pointers (conceptually, at least) need to have two extra flags during
the garbage collection:

	marked	: This pointer is in use, so it needs to be gc's
	reversed: This pointer has been reversed.

However, there is no need to actually support a 'marked' bit in each
pointer, because the location of a pointer on the heap implies if its
marked or not.  That is, the only pointers that we will encounter on
the heap that will point to other structures on the heap will be in
tuples, and if the tuple is marked then this implies that the pointer
in the tuple is marked.  Thus, there is no need for a 'marked bit in
each pointer, but only in the header cell of data structures.

In addition, during the emulator run, pointers must be distinguishable
from tags.  This is accomplished by exploiting the fact that pointers
are word aligned, so the low two bits are 0.  

But during the gc, since we need an extra bit for the 'reversed' flag,
we'll grab the second bit.  Therefore, during the gc, pointers are
distinguished by the fact that the low bit is 0.  The second lowest
bit is 0 if it is not reversed, or 1 if it is reversed.

Therefore, tags must have a low order bit of 1.  Assuming four
additional tag bits, this yields 16 legal tags.  (i.e., The odd
integers between 1 and 31, inclusive.)

In addition, assume that the pointer dereferencing routine stops when
it hits a cell where the two low order bits are not both zero.  This
has the nice effect that a reversed pointer will stop a pointer
dereference.  This property is exploited in the combined mark stage +
offheap pointer reversal stage, as described below.

Double word alignment

Most of the new microprocessors require double word alignment of
double precision floats (at least to achieve maximum efficiency).  In
the old implementation (stop and copy garbage collector), this was
handled by just skipping a cell on the heap if need be before
allocating room for the double.

However, this method will not work with the Morris algorithm, because
it needs to know the exact amount of garbage before collection begins.
So we cannot just add spaces whereever we feel like.

Therefore, if double word alignment is required, all data structures
must consume an even number of words, and must be double word aligned
on odd words (i.e., the data header cell is double word aligned on an
odd word so that the actual data contained in the data structure is
double word aligned on the even word).  In some cases (tuples,
integers, characters) this means padding the size out.

Trailer tag cell

Since the Morris garbage collection algorithm must be able to
distinguish data structures when linearly scanning the heap from
either direction, all data structures that are more than one cell in
size must have some sort of tag cell at each end of all data

However, this trailing tag is not used for anything other than the
reverse scan in the gc.  Therefore, it need not carry complete
information (i.e., tag, size, etc).  Instead, a special tag
(TRAILER_TAG) is used at the end of all multi-cell data structures on
the heap, with the exception of of undefs and rref, which have their
own special trailer cell.  The non-tag bits of the trailer cell
contain the offset (in cells) from that trailer cell to the data
header tag cell.  Thus, the header tag cell of the data structure can
be found by simply subtracting the trailer cell size field from the
trailer cell location.

This results in data structures having the following sizes (in cells,
assuming 32-bit cells with double word alignment of doubles):

	undef	- 2 (tag + undef_trailer)
	rref	- 6 (tag + weight + irt_location + node + pad + rref_trailer)
	tuple	- 2 + arity + possible_pad
	int	- 2 + size + possible_pad
	string	- 2 + size_in_cells + possible_pad
	double	- 2 + 2*size

On 64-bit machines, there is no need for inserting pad word, since
all cells are already double word aligned.

Likewise, on machines that do not require double word alignment of
doubles, there is no need for inserting pad words, since we do not
need to make all data structures have an even number of words.

Data header cell layout + resulting size limitations

Given these constraints, a data header cell has the following layout:

    bits 0 - 4	: tag, where the low bit is 0 if its a pointer (reference),
			or 1 if its a data structure tag
    bit  5	: mark, for garbage collection, and for temporary use by PDB
    bits 6 - ??	: size

Assuming a 32 bit machine:

  1) This leaves 26 bits for the 'size' field.  Since the 'size' field
of a trailer cell contains the offset in cells to the header cell, the
maximum size in cells of any data structure is 2^26-2 (67108862 or
about 67 million cells).  Similarly, the 'size' field of a header cell
contains the number of elements in that data structure.  These two constraints translate to maximum array sizes of:
	Double:		  (2^26-2)/2 = 33554431 elements (~33.5 million)
	Tuple or Integer: 2^26-2 = 67108862 elements (~67 million)
	Character:	  2^26-2 = 67108862 elements (~67 million)

  2) Undefs and rrefs use the header and trailer cells somewhat
differently than the other data structures.  Since undefs and rrefs
both have a constant size, the 'size' field of the header and trailer
cells can be used for other purposes.  In particular, these extraneous
'size' fields are used to hold information about the suspension queue
that is hung off of this undef or rref.

    Since process records are alloced off the heap using malloc,
undefs and rrefs need full 32 bit pointers to point to a suspension
queue.  However, we do not want to add an additional full word to each
of these data structures just to hold this pointer.  (Note: On a
double word aligned machine, this would require undefs to be 4 words
-- header, pointer, pad, trailer.)  Instead, we break the pointer up
into two parts (its high and low order bits) and put those parts in
the 'size' fields of the header and trailer cells, respectively.

    Therefore, there is no limitation on the location or number of
process records.


#endif /* _PCN_INCLUDE_PCN_TAGS_H */

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.