Punions are used to express variations in data. Pads supports two forms of union: switched and in-place. The first form supports data sources where there is an indication (i.e., a switch) in the data prior to the union indicating which alternative should be chosen. The second form supports data sources where no such switch is present. The default for this case is to try the branches in order until one parses without any errors. There is also a qualifier (Plongest) that indicates the parser should take the branch that consumes the most input.
union_field | ::= | full_field ∣ comp_field ∣ literal_field ∣ array_field ∣ opt_field ∣ Pempty |
branch | ::= | Pcase expression : union_field ∣ Pdefault: union_field |
branches | ::= | branch ∣ branch branches |
switched | ::= | Pswitch (expression){ branches } |
in_place | ::= | union_field ∣ union_field in_place |
union_bdy | ::= | switched ∣ in_place |
union_ty | ::= | [Plongest] Punion identifier [p_formals] { |
union_bdy | ||
} [ Pwhere { predicate }] ; |
We explain the meaning of this syntax in the remainder of this chapter. All non-terminals not defined in this grammar fragment were defined previously, as follows. Full fields (full_field), in-line array declarations (array_field), and in-line option declarations (option_field) appear in Section 5.1.2, computed fields (comp_field) in Section 5.1.3, literals (literal_fields) in Section 3.4 and Pads parameter lists (p_formals) in Section 3.6. Expressions (expression) represent any C expression.
The Pads declarations in Figure 6.1 describe data which uses an integer tag to determine the format of the rest of the data. The Pstruct choice specifies that the integer field which should be passed to the switched Punion branches. The branches declaration describes three alternatives, depending upon the value of the tag which.
Punion branches(:Puint32 which:) {
Pswitch (which) {
Pcase 1 : Pint32 number : number % 2 == 0;
Pcase 2 : Pstring_SE(:"EOR":) name;
Pdefault : Pcompute Puint32 other = which;
}
}
Precord Pstruct choice{
Puint32 which;
branches(:which:) branch;
}
A tag value of 1 indicates an unsigned integer will follow:
1 4
while a tag value of 2 indicates a string terminated by an end-of-record mark:
2 hello
Any other value for the tag will fall into the default clause of the union, which indicates that no further data exists:
3
The Pads declarations in Figure 6.2 are similar to those in Figure 6.1 except that instead of computing a default value for the default case, this declaration uses the special type Pempty, which indicates that there is nothing on disk for this case.
Punion body_t(:Puint8 j:) {
Pswitch (j) {
Pcase 0 : Puint32 i;
Pcase 1 : Pchar c;
Pdefault : Pempty;
}
};
Precord Pstruct entry_t {
Puint8 i;
’:’;
body_t(:i:) body;
};
The following in-place Punion describes a data fragment that is either a resolved or a symbolic IP address:
Punion clihost_t {
nIP resolved; /- 135.207.23.32
sIP symbolic; /- www.research.att.com
};
The (omitted) types nIP and sIP describe named and symbolic IP addresses, respectively. The comments embedded in the description give an example of each of the two forms. With in-place Punions, the parser tries each of the branches in turn until it finds one that matches the data without any errors.
The following in-place Punion describes a data fragment that is either an integer or nothing using the special type Pempty.
Punion Choice_t {
Puint32 i;
Pempty;
};
The expression on which a switched Punion branches can be any C expression of integer type (as in a switch statement in C). Typically, this expression is computed from a parameter to the switched Punion.
The body of a switched Punion is a non-empty sequence of branches. Each branch in a switched Punion can have one of two forms: a Pcase statement, or a Pdefault statement. The Pcase form specifies an integer value. During parsing, the first Pcase expression whose value equals the Pswitch expression is selected as the active description. If no Pcase expression matches, the Pdefault expression (if present) matches instead.
The body of each branch in a switched Punion is a union_field, while the body of each in-place Punion is a non-empty sequence of such fields. There are six varieties of union fields, five borrowed from Pstructs: full fields (Section 5.1.2), computed fields (Section 5.1.3), in-place array declarations (Section 5.1.5), in-place option declarations (Section 5.1.5), and literal fields, and Pempty, which is special to Punions. The only semantic change from Pstructs is that in Punions, earlier fields are not in scope for later fields because only one branch of a union can be active at a time. A Pempty branch corresponds to nothing in the physical representation.
The name of a branch in either kind of union is the name of the declared identifier for full and computed fields. For literal fields, it is the literal itself, unless the programmer specifies a different name using the Pfrom form. The Pfrom form must be used when the literal is not a valid C identifier. For example, the following in-place union uses the Pfrom form to provide names for the string literal "*" and the regular expression literal Pre "/a+/".
Precord Punion test {
"baz";
’c’;
star Pfrom("*");
as Pfrom(Pre "/a+/");
Pint32 f;
};
Within a Punion, the type Pvoid describes no physical data. There is no corresponding field in the in-memory representation of the union, but the union tag is set to indicate which branch of the union was used during the parse. This type is useful with switched unions where the value of the tag indicates that there is no data associated with the tag. For example, the following Punion branches
Punion branches(:Puint32 which:) {
Pswitch (which) {
Pcase 0 : Pvoid noValue;
Pcase 1 : Pint32 someValue;
}
}
indicates that when the value of the tag which is 0, there is no more data associated with the type branches in the input source, while the tag value 1 indicates there is a Pint32 following.
In-line declarations in Punions have the same form as in Pstructs cf. Section 5.1.5.
By default, in-place Punions commit to the first branch that parses without any errors. Adding the Plongest qualifier to the Punion declaration, however, indicates that the parser should instead select the longest match, i.e., the match that consumes the most input.
If given, a Pwhere clause expresses constraints over the entirety of a Punion value. Special constants tag and val are in scope, of the tag and value types for the union, respectively (cf. Section 6.2.1). The first indicates which branch of the union matched, while the second contains the representation of the matched branch. Within the context of a Pparsecheck clause, constants begin and end, each of type Ppos_t are available. Constant begin is bound to the input position of the beginning of the Punion; end is bound to its end. If the predicate given in the Pwhere clause evaluates to false (i.e., zero), the error code in the associated parse descriptor will indicate a user-constraint error has occurred.
In addition to the types generated for every Pads specification, the Pads compiler generates an extra type declaration for every Punion: a enumerated type with one component for each branch in the union, plus an extra component corresponding to a match failure. The names of the tags correspond to the names of the branches in the union, unless that name has already been defined elsewhere. In this case, the name of the tag is unionName_branchName. The name of the tag enumeration is the name of the Pads specification with the _tag suffix. For example, the generated enumeration type for the Punion branches is the following:
typedef enum branches_tag_e branches_tag;
enum branches_tag_e {
branches_err=0,
number=1,
name=2,
other=3
};
The in-memory representation of both forms of Punion is a C struct containing a tag field to indicate which branch of the union has been populated followed by a val field storing the union itself. We represent unions as C unions, with one component per non-literal branch of the union. The representation-related type declarations for the Punion branches appear in Figure 6.3.
typedef union branches_u_u branches_u;
union branches_u_u {
Pint32 number; /* number % 2 == 0 */
Pstring name;
Puint32 other; /* other = which */
};
typedef struct branches_s branches;
struct branches_s {
branches_tag tag;
branches_u val;
};
The mask of a Punion is a C struct. For each full and computed field in the union, there is a corresponding field in the mask, the type of which is the type of the mask type for that field. For example, the mask type branches_m has the following structure:
typedef struct branches_m_s branches_m;
struct branches_m_s {
Pbase_m unionLevel;
Pbase_m number; /* nested constraints */
Pbase_m number_con; /* union constraints */
Pbase_m name; /* nested constraints */
};
Union masks have one additional field unionLevel that allows the programmer to toggle behavior at the level of the union as a whole.
The parse descriptor of a Punion is a C struct, with all the fields described in Section 3.13. In addition, there is a tag field indicating which branch of the union was populated during parsing and a val field which stores the parse descriptor of the populated branch, represented as a C union. The parse descriptor declarations corresponding to the Pads type branches appear in Figure 6.4.
typedef union branches_pd_u_u branches_pd_u;
union branches_pd_u_u {
Pbase_pd number;
Pbase_pd name;
Pbase_pd other; /* other = which */
};
typedef struct branches_pd_s branches_pd;
struct branches_pd_s {
Pflags_t pstate;
Puint32 nerr;
PerrCode_t errCode;
Ploc_t loc;
branches_tag tag;
branches_pd_u val;
};
The operations generated by the Pads compiler for a Punion are those described in Chapter 3. In addition, there is an extra function that converts a value of the tagtype for the union to a string. For a Punion named myUnion, this function has the name myUnion_tag2str. For the Punion branches, the prototypes for all the generated functions appear in Figure 6.5.
char const *branches_tag2str (branches_tag which);
Perror_t branches_init (P_t *pads,branches *rep);
Perror_t branches_pd_init (P_t *pads,branches_pd *pd);
Perror_t branches_cleanup (P_t *pads,branches *rep);
Perror_t branches_pd_cleanup (P_t *pads,branches_pd *pd);
Perror_t branches_copy (P_t *pads,branches *rep_dst,branches *rep_src);
Perror_t branches_pd_copy (P_t *pads,branches_pd *pd_dst,
branches_pd *pd_src);
void branches_m_init (P_t *pads,branches_m *mask,Pbase_m baseMask);
Perror_t branches_read (P_t *pads,branches_m *m,Puint32 which,
branches_pd *pd,branches *rep);
ssize_t branches_write2buf (P_t *pads,Pbyte *buf,size_t buf_len,int *buf_full,
Puint32 which,branches_pd *pd,branches *rep);
ssize_t branches_write2io (P_t *pads,Sfio_t *io,Puint32 which,
branches_pd *pd,branches *rep);
int branches_verify (branches *rep,Puint32 which);
int branches_genPD (P_t *pads,branches *rep, branches_pd *pd, Puint32 which);
The read function for a switched Punion evaluates the Pswitch expression and uses a C switch statement to jump to the appropriate branch. If there is no Pdefault branch and none of the Pcase branches match, the read function will return the error code P_UNION_MATCH_ERR and set the tag fields of the parse descriptor and in-memory representation to the error tag.
For an in-place Punion without the Plongest qualifier, the read function speculatively reads each branch in turn until it finds one that parses without errors. Before reading a branch, the read function marks the current location in the input. It then tries to read the data described by the type of the branch. If the nested read function succeeds and any user-level constraint on the branch also succeeds, the read function commits to the parse, sets the tag fields to the name of the successful branch, and returns P_NO_ERR. If an error occurs, the read function aborts the read, rolling back the input to the marked location, and tries the next branch. If the last branch in an in-place Punion fails, the read function returns the error code P_UNION_MATCH_ERR and sets the tag fields of the parse descriptor and in-memory representation to the error tag. Errors that occur duing parsing branches of an in-line Punion are surpressed because of the speculative nature of the parsing.
For in-place Punions with the Plongest qualifier, the read function parses each branch, preserving in the in-memory representation the non-erroneous branch that occupied the most space in the physical source. Ties are resolved in favor of the earliest branch. Non-erroneious branches that occupy zero space in the physical representation, such as Pcompute fields or non-matching options, are considered matches; the first such branch will be preserved in the in-memory representation if no longer non-erroneous branch is found. The read function reports an error if no branch matches without an error.
The error codes for Punions are:
Code | Meaning |
P_NO_ERR | Indicates no error occurred |
P_UNION_MATCH_ERR | Indicates that no branch of the union parsed without error. |
Accumulator functions for Punions are described in Chapter 16.
Histogram functions for Punions are described in Chapter 17.
Clustering functions for Punions are described in Chapter 18.