Parrays are used to describe sequences of values of the same type. We call the repeated type the base type of the sequence.
The syntax for Parrays is given by the following BNF grammar fragment:
p_size_spec | ::= | [expresssion] ∣ [expression] : [expression] |
p_term_expression | ::= | Pnosep ∣ p_expression |
p_array_constraint | ::= | Psep(p_expression) ∣ Pterm(p_term_expression) |
∣ | Plast(predicate) ∣ Pended(predicate) | |
∣ | Plongest ∣ Pomit(predicate) | |
p_array_constraints | ::= | p_array_constraint ∣ p_array_constraint && p_array_constraints |
p_range | ::= | ‘[’ expression .. expression‘]’ ∣ identifier |
p_forall | ::= | Pforall ( identifier Pin p_range : expression ) |
p_array_post | ::= | predicate ∣ p_forall |
p_array_posts | ::= | p_array_post ∣ p_array_post && p_array_posts |
array_ty | ::= | Parray identifier [p_formals] { |
p_ty ‘[’p_size_spec‘]’ [: p_array_constraints] | ||
} [ Pwhere { p_array_posts }] ; |
We explain the meaning of this syntax in the remainder of this chapter. All non-terminals not defined in this grammar fragment were defined previously. Predicates (predicate) are described in Section 3.3. Pads types (p_ty) and formal parameters (p_formals) are described in Section 3.6. Pads expressions p_expressions are defined in Section 3.15. Literals (p_literal) are described in Section 3.4. Expressions (expression) represent any C expression. We put single quotation marks around the left and right brackets to indicate they appear in the grammer, rather than as a meta-notation for optionality.
In this section, we illustrate various uses of Parrays.
To describe a resolved IP address in ASCII, an example of which is:
135.207.26.22
we use the specification:
Parray nIP {
Puint8 [4] : Psep(’.’) && Pterm(’ ’);
};
which indicates that Parray nIP is a sequence of four Puint8’s. The elements of the sequence are separated by dots and the sequence is terminated by a space. As a part of parsing the sequence, the generated read function for this type will read the separators. It will check that the terminator is present, but will not consume it. This specification has two termination conditions: a maximum size (4) and a terminator (a space). Parsing will terminate when either condition is satisified. An error will be reported if the other condition is not also satisifed.
As another example, the Parray seq_t
Parray seq_t{
Pb_int32 [] : Plast(elts[current] > 10);
};
uses the Plast predicate to terminate a sequence of 32-bit binary integers as soon as one of those integers is greater than ten. The special variable elts refers to the sequence matched so far, while current is the index of the most recently read element. If the expression within the Plast clause evaluates to true, parsing for the array terminates. The current element is then the last element in the array.
In the next example, the Parray sorted_t uses a Pwhere clause to check that the elements of the sequence were sorted by the id field of the element type..
Precord Pstruct elem_t{
Puint32 id;
’|’;
Puint32 val;
};
Parray sorted_t(: Pint32 size :) {
elem_t [size];
} Pwhere {
Pforall( i Pin [0..length-2] : elts[i].id < elts[i+1].id );
};
A Pforall expression executes its body once for each value in the range. The index variable, i in the example, is bound to the range value in the body.
Within the various expression contexts of Parray declarations, a number of variables are in scope. The following table lists the variables, their types, and their bindings.
Variable | Type | Contexts | Binding |
numRead | int | Pparsecheck | Number of elements read from source. |
length | int | all | Number of elements in in-memory representation of sequence. |
elts | rep_t * | all | In-memory representation of element sequence. |
pds | pd_t * | all | In-memory representation of parse descriptor sequence. |
current | int | all | Index of most recently read element. |
elt | rep_t | all | Most recently read element of sequence. |
pd | pd_t | Pparsecheck | Most recently set parse descriptor. |
consume | int | Pended | See Pended section for explanation. |
begin | Ppos_t | Pparsecheck | Position in input source before reading sequence. |
end | Ppos_t | Pparsecheck within Pwhere | Position in input source after reading sequence. |
eltBegin | Ppos_t | Pparsecheck | Position in input source before reading element. |
eltEnd | Ppos_t | Pparsecheck | Position in input source after reading element. |
In the table, we use rep_t to denote the type of the in-memory representation of the base type of the sequence i.e., rep_t is the base type for the sequence. Similarly, pd_t denotes the type of the parse descriptor for the base type of the sequence. In the table, arrayName denotes the name of the array; it is a synonym for the special variable elts.
There are five different kinds of size specifications possible:
[] | unbounded |
[n] | exactly size n |
[low:high] | at least low, at most high |
[low:] | at least low, no upper bound |
[:high] | at most high, no lower bound |
As an example, the following specification describes an integer sequence of length at least min and at most max:
Parray list(:Puint32 min, Puint32 max:) {
Pint32 [min : max] : Psep(’|’);
};
A maximum size specification constitutes a termination condition for reading the array. It is an error for the lower bound to be greater than the upper bound.
The Psep constraint is used to specify separators, i.e. data in the source that occurs between sequence elements. The body of the Psep constraint can be a character, string, or regular expression. It is an error for a Parray to contain more than one Psep clause.
The Pterm constraint is used to specify a terminator, i.e. data in the source that occurs after the last element of the sequence and indicates that the sequence has ended. The body of the Pterm constraint can be a character, string, or regular expression, or the special keyword Pnosep, which indicates that the lack of a separator should be interpreted as signalling the end of the sequence. The terminator is not consumed by the read function for the array. It is an error for a Parray to contain more than one Pterm clause. The following description uses Pnosep to indicate that an IP address should be terminated by the lack of a period.
Parray nIP2 {
Puint8 [4] : Psep(’.’) && Pterm(Pnosep);
};
The Plast constraint is used to specify an arbitrary termination condition. The body of the constraint is a predicate expression (cf. 3.3). It is evaluated after reading each element in the array. If it evaluates to true, array parsing terminates. Section 7.1.2 lists the variables that are in scope within the predicate. The following description uses a Plast predicate to terminate array parsing if the number of input bytes consumed by reading the sequence is greater than a specified amount.
Pstruct character_string {
Psbh_uint8(:1:) length;
Pa_string_FW(:length:) bytes;
};
Parray TXT_t(:Puint16 rdlength:) {
character_string [] :
Plast(Pparsecheck(eltEnd.offset - begin.offset >= rdlength));
};
If an array has a Plast constraint, it cannot have a Plongest or a Pended constraint.
The Plongest specifier indicates that parsing the sequence continues until the parser detects an error, either in parsing a separator for the array or in parsing an element. The parser returns the data that produced the error to the input. If the array declaration includes a separator, the separator preceding the returned element is also returned to the input. The following Pads code describes a non-empty sequence of even integers separated by the character (’a’), followed by a character (’a’), followed by a non-empty sequence of odd integers, separated by the character (’a’) and terminated by the character (’b’):
Ptypedef Pint32 even_t : even_t x => {x % 2 == 0};
Ptypedef Pint32 odd_t : odd_t x => {x % 2 == 1};
Parray eseq_t {
even_t [1:] : Psep(’a’) && Plongest;
};
Parray oseq_t {
odd_t [1:] : Psep(’a’) && Plongest && Pterm(’b’);
};
Precord Pstruct entry{
eseq_t evens;
’a’;
oseq_t odds;
’b’;
};
If an array has a Plongest constraint, it cannot have a Plast or a Pended constraint.
The Pended clause is similar to Plast, except that it allows the specification writer to indicate whether to consume the terminating element. As with Plast, Pended takes a predicate and has in scope the variables describd in Section 7.1.2. When the predicate returns true, the array terminates. By default, the terminating element is returned to the input, as is any preceeding separator. To indicate that the terminating element should instead be considered part of the array, the predicate can set the special variable consume to true as a side-effect.
The specification in Figure 7.1 describes a sequence of comma-separated integers. The isDone function in the Pended clause examines the most recently read value and the parse descriptor to determine if the sequence has terminated. Because the function takes the parse descriptor as an argument, it must be within the scope of a Pparsecheck clause.
int isDone(Pint32 value, Pbase_pd p, int *consume){
if (p.errCode != P_NO_ERR) return 0; /* continue sequence */
if (value == 1) {
*consume = 1; /* consume value */
return 1; /* terminate sequence */
};
if (value == -1){
*consume = 0; /* return value to input */
return 1; /* terminate sequence */
};
return 0; /* continue sequence */
};
Parray fseq_t {
Pint32 [] : Psep(’,’)
&& Pended(Pparsecheck(isDone(fseq_t[current],
pds[current], &consume)));
};
If an array has a Pended constraint, it cannot have a Plast or a Plongest constraint.
Like Plast and Pended, the Pomit clause takes a predicate and has the variables in scope described in Section 7.1.2. The predicate is evaluated after reading each element in the array. When the predicate returns true, the just-read element is not stored into the array; it is “omitted”. It is because of the Pomit predicate that the special variables numRead and length need not be the same. Variable numRead indicates the total number of possible elements that have been read, while length indicates the number that have been stored into the array. Consequently, numRead will always be greater than or equal to length.
The following code reads a sequence of up to four space-separated, signed integers terminated by the end of the record. It omits each of the negative numbers from the in-memory representation.
Parray nseq_t{
Pint32 [:4] : Psep(’ ’) && Pomit(elt < 0) && Pterm(Peor);
};
If given, a Pwhere clause expresses constraints over the entirety of a Parray value. The special variables that are in scope are described in Section 7.1.2. If the predicate given in the Pwhere clause evaluates to false (i.e., zero), the error code in the associated parse descriptor will indicate a user-constraint error has occurred. Syntactically, Pwhere clauses for arrays are a &&-separated sequence of array predicates. There are two types of array predicates: general predicates, which include Pparsecheck constraints, and Pforall predicates. The value of the Pwhere clause is the conjunction of the values of each of the array predicates. The predicates are evaluated in the order they are listed.
General predicates, which include Pparsecheck predicates, are described in Section 3.3
Pforall predicates provide a way to write constraints over the entirety of the in-memory representation of an array after it has been fully parsed. The following example checks that the resulting integer sequence is in sorted order
Precord Parray intList {
Puint32 [INTLIST_SIZE] : Psep(’|’);
} Pwhere {
Pforall( i Pin [0..length-2] : (intList[i] < intList[i+1]) ) &&
Pparsecheck(end.offset - begin.offset > 10);
};
Syntactically, the Pforall clause gives an index variable (i in the example) and a range for that variable (0 to length-2, inclusive). Ranges can be written either as an integer lower and upper bound, as in the example, or as the name of the array, in which case the lower bound is zero and the upper bound is one less than the length of the array. Following the colon, it gives a boolean-valued expression. This expression is executed once for each value of the index in the range; the value of the clause is the conjunction of the resulting values.
For conciseness, Parrays can be declared in-line in Pstruct and Punion declartions (cf. Section 5.1.5 and Section 6.1.7).
There are a number of conditions under which parsing for an array will terminate:
A single array specification can include many of these termination conditions. If any of the conditions trigger, then array parsing terminates.
After a termination condition has been detected, the parse function does error checking. In particular, if an array specification has a minimum size and the length of the in-memory representation does not satisfy this requirement, the parse descriptor will record the error P_ARRAY_SIZE_ERR.
To explain the data structures and operations generated for array declarations, we will use the description of resolved IP addresses as an example.
Parray nIP {
Puint8 [4] : Psep(’.’) && Pterm(’ ’);
};
The in-memory representation of a Parray is a struct with three fields.
typedef struct nIP_s nIP;
struct nIP_s {
Puint32 length;
Puint8 *elts;
RBuf_t *_internal;
};
The first of these is an integer recording the length of the array in memory. The second is an array of elements of the base type of the array. The third field is a pointer to a growable buffer that manages the dynamically-allocated space for the array.
The mask for a Parray is a struct containing a pair of masks. The element field stores a mask of the base element mask type. This mask specifies how to handle each element in the array. The arrayLevel mask allows the user to toggle behavior at the level of the Parray as a whole.
typedef struct nIP_m_s nIP_m;
struct nIP_m_s {
Pbase_m element; /* per-element */
Pbase_m arrayLevel; /* entire array */
};
The parse descriptor for a Parray is a ¸ struct, with all the fields described in Section 3.13.
typedef struct nIP_pd_s nIP_pd;
struct nIP_pd_s {
Pflags_t pstate;
Puint32 nerr; /* Number of array errors */
PerrCode_t errCode;
Ploc_t loc;
Puint32 neerr; /* Number of element errors */
Puint32 firstError; /* if errCode == ARRAY_ELEM_ERR,
index of first error */
Puint32 numRead; /* Number of elements read */
Puint32 length; /* Number of elements in memory */
Pbase_pd *elts;
RBuf_t *_internal;
};
In addition, the field neerr records the number of errors that occurred while processing elements of the array. The field firstError records the index of the first element that triggered an error. This field is only valid if the errCode has the value ARRAY_ELEM_ERR. The numRead field records the number of elements in the array storing the parse descriptors for each of the array elements. The field elts stores the array of base parse descriptors. Finally, field _internal contains a pointer to the growable buffer that manages the dynamically-allocated space for the array of base parse descriptors.
The operations generated by the Pads compiler for a Parray are those described in Chapter 3. Figure 7.2 lists these operations.
Perror_t nIP_init (P_t *pads,nIP *rep);
Perror_t nIP_pd_init (P_t *pads,nIP_pd *pd);
Perror_t nIP_cleanup (P_t *pads,nIP *rep);
Perror_t nIP_pd_cleanup (P_t *pads,nIP_pd *pd);
Perror_t nIP_copy (P_t *pads,nIP *rep_dst,nIP *rep_src);
Perror_t nIP_pd_copy (P_t *pads,nIP_pd *pd_dst,nIP_pd *pd_src);
void nIP_m_init (P_t *pads,nIP_m *mask,Pbase_m baseMask);
Perror_t nIP_read (P_t *pads,nIP_m *m,nIP_pd *pd,nIP *rep);
int nIP_verify (nIP *rep);
int nIP_genPD (P_t *pads, nIP *rep, nIP_pd *pd);
The error codes for Pstructs appear in Figure 7.3.
Code Meaning P_NO_ERR No error occurred P_ARRAY_ELEM_ERR An error occurred during parsing one of the elements of the Parray. The parse descriptor for the element whose index is firstError will contain more information describing the precise nature of the error. P_ARRAY_SEP_ERR Error reading a separator. P_ARRAY_TERM_ERR Error reading the terminator. P_STRUCT_EXTRA_BEFORE_SEP Unexpected data before a separator. P_STRUCT_EXTRA_BEFORE_TERM Unexpected data before the terminator. P_ARRAY_SEP_TERM_SAME_ERR The separator and terminator sequences were the same. P_ARRAY_SIZE_ERR The size of the parsed Parray did not match the specified size constraints. P_ARRAY_MIN_BIGGER_THAN_MAX_ERR Minimum size larger than maximum size. P_ARRAY_MIN_NEGATIVE Negative minimum size. P_ARRAY_MAX_NEGATIVE Negative maximum size. P_ARRAY_USER_CONSTRAINT_ERR Pwhere clause returned false.
If multiple errors occur during the parsing of a Parray, the errCode field will reflect the first detected error. The array of parse descriptors will describe any errors detected while reading the Parray elements.
Accumulator functions for Parrays are described in Chapter 16.
Histogram functions for Parrays are described in Chapter 17.
Clustering functions for Parrays are described in Chapter 18.