Previous Up Next

Chapter 7  Parrays

Parrays are used to describe sequences of values of the same type. We call the repeated type the base type of the sequence.

7.1  Syntax

The syntax for Parrays is given by the following BNF grammar fragment:

p_size_spec ::=  [expresssion] ∣ [expression] : [expression]
 
p_term_expression ::= Pnosepp_expression
p_array_constraint ::= Psep(p_expression) ∣ Pterm(p_term_expression)
 Plast(predicate) ∣ Pended(predicate)
 PlongestPomit(predicate)
p_array_constraints ::= p_array_constraintp_array_constraint && p_array_constraints
 
p_range ::= ‘[’ expression .. expression‘]’ ∣ identifier
p_forall ::= Pforall ( identifier Pin p_range : expression )
p_array_post ::= predicatep_forall
p_array_posts ::= p_array_postp_array_post && p_array_posts
 
array_ty ::= Parray identifier [p_formals] {
    p_ty ‘[’p_size_spec‘]’ [: p_array_constraints]
  } [ Pwhere { p_array_posts }] ;

We explain the meaning of this syntax in the remainder of this chapter. All non-terminals not defined in this grammar fragment were defined previously. Predicates (predicate) are described in Section 3.3. Pads types (p_ty) and formal parameters (p_formals) are described in Section 3.6. Pads expressions p_expressions are defined in Section 3.15. Literals (p_literal) are described in Section 3.4. Expressions (expression) represent any C expression. We put single quotation marks around the left and right brackets to indicate they appear in the grammer, rather than as a meta-notation for optionality.

7.1.1  Examples

In this section, we illustrate various uses of Parrays.

Resolved IP address

To describe a resolved IP address in ASCII, an example of which is:

135.207.26.22

we use the specification:

Parray nIP {
  Puint8 [
4] : Psep(’.’) && Pterm(’ ’);
};

which indicates that Parray nIP is a sequence of four Puint8’s. The elements of the sequence are separated by dots and the sequence is terminated by a space. As a part of parsing the sequence, the generated read function for this type will read the separators. It will check that the terminator is present, but will not consume it. This specification has two termination conditions: a maximum size (4) and a terminator (a space). Parsing will terminate when either condition is satisified. An error will be reported if the other condition is not also satisifed.

Binary sequence of integers

As another example, the Parray seq_t

Parray seq_t{
  Pb_int32 [] : 
Plast(elts[current] > 10);
};

uses the Plast predicate to terminate a sequence of 32-bit binary integers as soon as one of those integers is greater than ten. The special variable elts refers to the sequence matched so far, while current is the index of the most recently read element. If the expression within the Plast clause evaluates to true, parsing for the array terminates. The current element is then the last element in the array.

Sorted sequence of structs

In the next example, the Parray sorted_t uses a Pwhere clause to check that the elements of the sequence were sorted by the id field of the element type..

Precord Pstruct elem_t{
  Puint32 id;
  
’|’;
  Puint32 val;
};

Parray sorted_t(: Pint32 size :) {
  elem_t [size];
Pwhere {
  
Pforall( i Pin [0..length-2] : elts[i].id < elts[i+1].id );
};

A Pforall expression executes its body once for each value in the range. The index variable, i in the example, is bound to the range value in the body.

7.1.2  Special variables

Within the various expression contexts of Parray declarations, a number of variables are in scope. The following table lists the variables, their types, and their bindings.

VariableTypeContextsBinding
numReadintPparsecheckNumber of elements read from source.
lengthintallNumber of elements in in-memory representation of sequence.
eltsrep_t *allIn-memory representation of element sequence.
pdspd_t *allIn-memory representation of parse descriptor sequence.
currentintallIndex of most recently read element.
eltrep_tallMost recently read element of sequence.
pdpd_tPparsecheckMost recently set parse descriptor.
consumeintPendedSee Pended section for explanation.
beginPpos_tPparsecheckPosition in input source before reading sequence.
endPpos_tPparsecheck within PwherePosition in input source after reading sequence.
eltBeginPpos_tPparsecheckPosition in input source before reading element.
eltEndPpos_tPparsecheckPosition in input source after reading element.

In the table, we use rep_t to denote the type of the in-memory representation of the base type of the sequence i.e., rep_t is the base type for the sequence. Similarly, pd_t denotes the type of the parse descriptor for the base type of the sequence. In the table, arrayName denotes the name of the array; it is a synonym for the special variable elts.

7.1.3  Size specifications

There are five different kinds of size specifications possible:

[]unbounded
[n]exactly size n
[low:high]at least low, at most high
[low:]at least low, no upper bound
[:high]at most high, no lower bound

As an example, the following specification describes an integer sequence of length at least min and at most max:

Parray list(:Puint32 min, Puint32 max:) {
  Pint32 [min : max] : 
Psep(’|’);
};

A maximum size specification constitutes a termination condition for reading the array. It is an error for the lower bound to be greater than the upper bound.

7.1.4  Psep

The Psep constraint is used to specify separators, i.e. data in the source that occurs between sequence elements. The body of the Psep constraint can be a character, string, or regular expression. It is an error for a Parray to contain more than one Psep clause.

7.1.5  Pterm

The Pterm constraint is used to specify a terminator, i.e. data in the source that occurs after the last element of the sequence and indicates that the sequence has ended. The body of the Pterm constraint can be a character, string, or regular expression, or the special keyword Pnosep, which indicates that the lack of a separator should be interpreted as signalling the end of the sequence. The terminator is not consumed by the read function for the array. It is an error for a Parray to contain more than one Pterm clause. The following description uses Pnosep to indicate that an IP address should be terminated by the lack of a period.

Parray nIP2 {
  Puint8 [
4] : Psep(’.’) && Pterm(Pnosep);
};

7.1.6  Plast

The Plast constraint is used to specify an arbitrary termination condition. The body of the constraint is a predicate expression (cf. 3.3). It is evaluated after reading each element in the array. If it evaluates to true, array parsing terminates. Section 7.1.2 lists the variables that are in scope within the predicate. The following description uses a Plast predicate to terminate array parsing if the number of input bytes consumed by reading the sequence is greater than a specified amount.

Pstruct character_string {
  Psbh_uint8(:
1:)        length;
  Pa_string_FW(:length:) bytes;
};

Parray TXT_t(:Puint16 rdlength:) {
  character_string [] :
      
Plast(Pparsecheck(eltEnd.offset - begin.offset >= rdlength));
};

If an array has a Plast constraint, it cannot have a Plongest or a Pended constraint.

7.1.7  Plongest

The Plongest specifier indicates that parsing the sequence continues until the parser detects an error, either in parsing a separator for the array or in parsing an element. The parser returns the data that produced the error to the input. If the array declaration includes a separator, the separator preceding the returned element is also returned to the input. The following Pads code describes a non-empty sequence of even integers separated by the character (’a’), followed by a character (’a’), followed by a non-empty sequence of odd integers, separated by the character (’a’) and terminated by the character (’b’):

Ptypedef Pint32 even_t : even_t x => {x % 2 == 0};
Ptypedef Pint32 odd_t  : odd_t  x => {x % 2 == 1};

Parray eseq_t {
  even_t [
1:] : Psep(’a’) && Plongest;
};

Parray oseq_t {
  odd_t [
1:] :  Psep(’a’) && Plongest && Pterm(’b’);
};

Precord Pstruct entry{
  eseq_t evens;
  
’a’;
  oseq_t odds;
  
’b’;
};

If an array has a Plongest constraint, it cannot have a Plast or a Pended constraint.

7.1.8  Pended

The Pended clause is similar to Plast, except that it allows the specification writer to indicate whether to consume the terminating element. As with Plast, Pended takes a predicate and has in scope the variables describd in Section 7.1.2. When the predicate returns true, the array terminates. By default, the terminating element is returned to the input, as is any preceeding separator. To indicate that the terminating element should instead be considered part of the array, the predicate can set the special variable consume to true as a side-effect.

The specification in Figure 7.1 describes a sequence of comma-separated integers. The isDone function in the Pended clause examines the most recently read value and the parse descriptor to determine if the sequence has terminated. Because the function takes the parse descriptor as an argument, it must be within the scope of a Pparsecheck clause.


int isDone(Pint32 value, Pbase_pd p, int *consume){
  
if (p.errCode != P_NO_ERR) return 0;   /* continue sequence */
  
if (value == 1) {
    *consume = 
1;        /* consume value */
    
return 1;            /* terminate sequence */
  };
  
if (value == -1){
    *consume = 
0;        /* return value to input */
    
return 1;            /* terminate sequence */
  };
  
return 0;              /* continue sequence */
};

Parray fseq_t {
  Pint32 [] : 
Psep(’,’)
              && 
Pended(Pparsecheck(isDone(fseq_t[current],
                                           pds[current], &consume)));
};
Figure 7.1: Parray with Pomit clause.

If an array has a Pended constraint, it cannot have a Plast or a Plongest constraint.

7.1.9  Pomit

Like Plast and Pended, the Pomit clause takes a predicate and has the variables in scope described in Section 7.1.2. The predicate is evaluated after reading each element in the array. When the predicate returns true, the just-read element is not stored into the array; it is “omitted”. It is because of the Pomit predicate that the special variables numRead and length need not be the same. Variable numRead indicates the total number of possible elements that have been read, while length indicates the number that have been stored into the array. Consequently, numRead will always be greater than or equal to length.

The following code reads a sequence of up to four space-separated, signed integers terminated by the end of the record. It omits each of the negative numbers from the in-memory representation.

Parray nseq_t{
  Pint32 [:
4] : Psep(’ ’) && Pomit(elt < 0) && Pterm(Peor);
};

7.1.10  Optional Pwhere clause

If given, a Pwhere clause expresses constraints over the entirety of a Parray value. The special variables that are in scope are described in Section 7.1.2. If the predicate given in the Pwhere clause evaluates to false (i.e., zero), the error code in the associated parse descriptor will indicate a user-constraint error has occurred. Syntactically, Pwhere clauses for arrays are a &&-separated sequence of array predicates. There are two types of array predicates: general predicates, which include Pparsecheck constraints, and Pforall predicates. The value of the Pwhere clause is the conjunction of the values of each of the array predicates. The predicates are evaluated in the order they are listed.

7.1.11  General predicates

General predicates, which include Pparsecheck predicates, are described in Section 3.3

7.1.12  Pforall

Pforall predicates provide a way to write constraints over the entirety of the in-memory representation of an array after it has been fully parsed. The following example checks that the resulting integer sequence is in sorted order

Precord Parray intList {
  Puint32 [INTLIST_SIZE] : 
Psep(’|’);
Pwhere {
    
Pforall( i Pin [0..length-2] : (intList[i] < intList[i+1]) )  &&
    
Pparsecheck(end.offset - begin.offset > 10);
};

Syntactically, the Pforall clause gives an index variable (i in the example) and a range for that variable (0 to length-2, inclusive). Ranges can be written either as an integer lower and upper bound, as in the example, or as the name of the array, in which case the lower bound is zero and the upper bound is one less than the length of the array. Following the colon, it gives a boolean-valued expression. This expression is executed once for each value of the index in the range; the value of the clause is the conjunction of the resulting values.

7.1.13  In-line arrays

For conciseness, Parrays can be declared in-line in Pstruct and Punion declartions (cf. Section 5.1.5 and Section 6.1.7).

7.2  Termination conditions

There are a number of conditions under which parsing for an array will terminate:

A single array specification can include many of these termination conditions. If any of the conditions trigger, then array parsing terminates.

After a termination condition has been detected, the parse function does error checking. In particular, if an array specification has a minimum size and the length of the in-memory representation does not satisfy this requirement, the parse descriptor will record the error P_ARRAY_SIZE_ERR.

7.3  Generated library

To explain the data structures and operations generated for array declarations, we will use the description of resolved IP addresses as an example.

Parray nIP {
  Puint8 [
4] : Psep(’.’) && Pterm(’ ’);
};

7.3.1  In-memory representation

The in-memory representation of a Parray is a struct with three fields.

typedef struct nIP_s nIP;

struct nIP_s {
  Puint32 length;
  Puint8 *elts;
  RBuf_t *_internal;
};

The first of these is an integer recording the length of the array in memory. The second is an array of elements of the base type of the array. The third field is a pointer to a growable buffer that manages the dynamically-allocated space for the array.

7.3.2  Mask

The mask for a Parray is a struct containing a pair of masks. The element field stores a mask of the base element mask type. This mask specifies how to handle each element in the array. The arrayLevel mask allows the user to toggle behavior at the level of the Parray as a whole.

typedef struct nIP_m_s nIP_m;

struct nIP_m_s {
  Pbase_m element;              
/* per-element */
  Pbase_m arrayLevel;           
/* entire array */
};

7.3.3  Parse descriptor

The parse descriptor for a Parray is a ¸ struct, with all the fields described in Section 3.13.

typedef struct nIP_pd_s nIP_pd;

struct nIP_pd_s {
  Pflags_t pstate;
  Puint32 nerr;                 
/* Number of array errors */
  PerrCode_t errCode;
  Ploc_t loc;
  Puint32 neerr;                
/* Number of element errors */
  Puint32 firstError;           
/* if errCode == ARRAY_ELEM_ERR, 
                                   index of first error */
  Puint32 numRead;              
/* Number of elements read */
  Puint32 length;               
/* Number of elements in memory */
  Pbase_pd *elts;
  RBuf_t *_internal;
};

In addition, the field neerr records the number of errors that occurred while processing elements of the array. The field firstError records the index of the first element that triggered an error. This field is only valid if the errCode has the value ARRAY_ELEM_ERR. The numRead field records the number of elements in the array storing the parse descriptors for each of the array elements. The field elts stores the array of base parse descriptors. Finally, field _internal contains a pointer to the growable buffer that manages the dynamically-allocated space for the array of base parse descriptors.

7.3.4  Operations

The operations generated by the Pads compiler for a Parray are those described in Chapter 3. Figure 7.2 lists these operations.


Perror_t nIP_init (P_t *pads,nIP *rep);

Perror_t nIP_pd_init (P_t *pads,nIP_pd *pd);

Perror_t nIP_cleanup (P_t *pads,nIP *rep);

Perror_t nIP_pd_cleanup (P_t *pads,nIP_pd *pd);

Perror_t nIP_copy (P_t *pads,nIP *rep_dst,nIP *rep_src);

Perror_t nIP_pd_copy (P_t *pads,nIP_pd *pd_dst,nIP_pd *pd_src);

void nIP_m_init (P_t *pads,nIP_m *mask,Pbase_m baseMask);

Perror_t nIP_read (P_t *pads,nIP_m *m,nIP_pd *pd,nIP *rep);

int nIP_verify (nIP *rep);

int nIP_genPD (P_t *pads, nIP *rep, nIP_pd *pd);
Figure 7.2: Prototypes of operations generated for the Parray nIP.

Read function

The error codes for Pstructs appear in Figure 7.3.


CodeMeaning
P_NO_ERRNo error occurred
P_ARRAY_ELEM_ERRAn error occurred during parsing one of the elements of the Parray. The parse descriptor for the element whose index is firstError will contain more information describing the precise nature of the error.
P_ARRAY_SEP_ERRError reading a separator.
P_ARRAY_TERM_ERRError reading the terminator.
P_STRUCT_EXTRA_BEFORE_SEPUnexpected data before a separator.
P_STRUCT_EXTRA_BEFORE_TERMUnexpected data before the terminator.
P_ARRAY_SEP_TERM_SAME_ERRThe separator and terminator sequences were the same.
P_ARRAY_SIZE_ERRThe size of the parsed Parray did not match the specified size constraints.
P_ARRAY_MIN_BIGGER_THAN_MAX_ERRMinimum size larger than maximum size.
P_ARRAY_MIN_NEGATIVENegative minimum size.
P_ARRAY_MAX_NEGATIVENegative maximum size.
P_ARRAY_USER_CONSTRAINT_ERRPwhere clause returned false.
Figure 7.3: Error codes that can be returned by Parray read functions.

If multiple errors occur during the parsing of a Parray, the errCode field will reflect the first detected error. The array of parse descriptors will describe any errors detected while reading the Parray elements.

Accumulator functions

Accumulator functions for Parrays are described in Chapter 16.

Histogram functions

Histogram functions for Parrays are described in Chapter 17.

Clustering functions

Clustering functions for Parrays are described in Chapter 18.


Previous Up Next