Previous Up Next

Chapter 13  Precurs

Precur supports the description of recursive (i.e., tree-structured) data.

13.1  Syntax

The syntax for Precur is given by the following BNF grammar fragment:

recur_ty ::= Precur [(p_ty)] identifier [p_formals] ;
 Precur struct_ty
 Precur union_ty

We explain the meaning of this syntax in the remainder of this chapter. All non-terminals not defined in this grammar fragment were defined previously: Pads types (p_ty) and parameter lists (p_formals) in Section 3.6, struct_ty in Section 5.1, and union_ty in Section 6.1.

13.1.1  Examples

The Precur construct has two components: a (forward) declaration and a definition. The forward declaration allows later types to refer to the recursive type before its definition, just as in C. The definition specifies the structure of the recursive data structure. Currently, the definition can be either a Pstruct or Punion.

We begin this example with a simple data fragment encoding a sorted binary tree containing the numbers 3,4,7,9, and 10.

(3,((4,(7,9)),10))

In this encoding (based on the Newick format [New]), a leaf is an integer while a full tree is a matched pair of parentheses enclosing two child trees separated by a comma. This encoding is naturally described recursively, as shown in the simple format below:

Precur tree;

Pstruct fullTree{
  
’(’;
  tree left;
  
’,’;
  tree right;
  
’)’;
};

Precur Punion tree{
  Pint32     value;
  fullTree   nested;
};

In this example, types tree and fullTree are mutually recursive. As recursive types are treated differently than other types, we must explicitly designate at least one of them as a recursive type. In this example, we choose tree to be the recursive type and so begin with a forward declaration of tree so that it may be referenced in fullTree.

13.2  Generated library

Precurs are different than other types in two important ways. First, in C, mutual recursion (of both functions and types) is handled through forward declarations. Therefore, for every recursive type, the compiler generates type and function forward declarations in addition to the standard type and function definitions.

Second, in general, we can’t know the size of a data source described with recursive types. Therefore, the data structures corresponding to recursive types must be dynamically allocated. To accomplish this, the compiler generates two types for each of the generated data structures. One is an anonymous 1 type that is generated from the recursive type’s definition. The other is a named type (where the name corresponds to the name of the recursive type) that is a pointer to the anonymous type.

We will use the tree examples to illustrate the data structures and functions generated for recursive types.

13.2.1  In-memory representation

The in-memory representation of a recursive type has two parts: the representation of the underlying type and a pointer to that representation. The name of the underlying type is derived from the name of the type by prefixing the name with an underscore (’_’). In our example, type tree is defined as follows:

typedef struct _tree_s *tree;

The definition of struct _tree_s is just the representation of the underlying Punion.

13.2.2  Mask

The mask of a recursive type has two parts: the mask of the underlying type and a pointer to that mask. The name of the underlying type is derived from the name of the type by prefixing the name with an underscore (’_’). In our example, type tree_m is defined as follows:

typedef struct _tree_m_s *tree_m;

The definition of struct _tree_m_s is just the mask of the underlying Punion. Note the difference from Ptypedefs, which provide a new mask type for the new type.

13.2.3  Parse descriptor

The parse descriptor of a Precur with name myRecur is a C struct with name myRecur_pd. This struct has the fields described in Section 3.13. In addition, it contains a field val, the type of which is a pointer to the parse descriptor for the underlying type.

For example, the parse descriptor type tree_pd has the following structure:

typedef struct tree_pd_s tree_pd;

struct tree_pd_s {
  Pflags_t pstate;
  Puint32 nerr;
  PerrCode_t errCode;
  Ploc_t loc;
  
struct _tree_pd_s *val;
};

The definition of struct _tree_pd_s is just the parse descriptor of the underlying Punion.

13.2.4  Operations

The operations generated by the Pads compiler for a recursive type are those described in Chapter 3. However, the compiler generates two sets of functions for every recursive type. One is the standard set of functions for the underlying type. Note that these are not intended for use by the user. The other set of functions are those for operating on the type itself. These functions essentially wrap the operations of the underlying type and handle the dynamic allocation and deallocation of the representation and parse descriptor.

The operations appear in Figure 13.1.

Note that the mask is handled differently from the parse descriptor and representation. As the mask is an input to the parsing process, it must be fully allocated and initialized before parsing begins. However, due to the fact the size of the data is not known before parsing it, the mask must able to guide the parsing process for data of arbitrary size. The simplest way to accomplish this is to set the (recursive) pointers in the mask to point back to the root of the mask. In this way, we make the mask a cyclic data structure thereby simulating an infinitely large data structure. However, as traversal of the mask is driven by the data itself, this structure does not introduce a danger of infinite loops.

In the current release there is no support for automatically initializing masks to be cyclic. Instead, users must initialize the pointers correctly themselves. To assist, we provide a macro that bundles the various steps in initializing the mask. It allocates the mask, initializes it and then sets the specified pointer field to point back the root of the mask. It has the following signature:

P_DynamicMaskInit(m, maskTy, baseMaskTy, maskVal, PATH_TO_MASK)

The argument m is the (pointer) variable for the root of the mask. Argument maskTy is the type of the mask itself, and baseMaskTy is the underlying mask type. Argument maskVal specifies the default setting for all of the mask’s fields. Finally, PATH_TO_MASK, specifies the path from the root of the mask to the pointer field in the mask. Note that this assumes that there is only one pointer field. In the case of more, the user must initialize those herself. Here is an example use of this macro for our tree format:

P_DynamicMaskInit(m, tree_m, _tree_m, READ_MASK, m->nested.left); m->nested.right = m;

Note the need to initialize the second pointer separately.

Code such as this can be specified for some of the template programs with the macro CUSTOM_MASK_CODE.


Perror_t tree_init (P_t *,tree *);

Perror_t tree_pd_init (P_t *,tree_pd *);

Perror_t tree_cleanup (P_t *,tree *);

Perror_t tree_pd_cleanup (P_t *,tree_pd *);

Perror_t tree_copy (P_t *,tree *,tree *);

Perror_t tree_pd_copy (P_t *,tree_pd *,tree_pd *);

void tree_m_init (P_t *,tree_m *,Pbase_m);

Perror_t tree_read (P_t *,tree_m *,tree_pd *,tree *);

int tree_verify (tree *);

int tree_genPD (P_t *,tree *,tree_pd *);

ssize_t tree_write2buf (P_t *,Pbyte *,size_t,
int *,tree_pd *,tree *);

ssize_t tree_write2io (P_t *,Sfio_t *,tree_pd *,tree *);

ssize_t tree_write_xml_2buf (P_t *,Pbyte *,size_t,
int *,tree_pd *,tree *,char const *,int);

ssize_t tree_write_xml_2io (P_t *,Sfio_t *,tree_pd *,tree *,
char const *,int);
Figure 13.1: Prototypes of operations generated for the Precur tree.


Read function

The error codes for Precurs are:

CodeMeaning
P_OKIndicates no error occurred
P_ERRIndicates an unspecified error occured.

Accumulator functions

Accumulator functions for recursive types are not yet supported.

Histogram functions

Histogram functions for recursive types are not yet supported.

Clustering functions

Clustering functions for recursive types are not yet supported.


1
For technical reasons, we do not actually generate an anonymous type, but rather generate for the type a fresh name that is not chosen by the user.

Previous Up Next