Previous Up Next

Chapter 5  Pstructs

Pstructs are used to describe sequences of values with potentially unrelated types. Intuitively, they correspond to record-like structures externally and C-structs in memory.

5.1  Syntax

The syntax for Pstructs is given by the following BNF grammar fragment:

qualifier ::= PomitPendian
qualifiers ::= qualifierqualifier qualifiers
constraint ::= : predicate
ty ::= c_typ_ty
full_field ::=  [qualifiers] p_ty identifier [constraint] ; [p_comment]
comp_field ::= Pcompute [Pomit] ty identifier = expression [constraint] ;
literal_field ::= p_coreliteral;
array_field ::=  [qualifiers] p_ty ‘[’p_size_spec‘]’ identifier [: p_array_constraints] ; [p_comment]
opt_field ::=  [qualifiers] Popt p_ty identifier [: opt_predicates] ; [p_comment]
field ::= full_fieldcomp_fieldliteral_fieldarray_fieldopt_field
fields ::= fieldfield fields
struct_ty ::= Pstruct identifier [p_formals] {
    fields
  } [ Pwhere { predicate }] ;

We explain the meaning of this syntax in the remainder of this chapter. All non-terminals not defined in this grammar fragment are defined elsewhere. Predicates (predicate) are described in Section 3.3. Pads types (p_ty) and formal parameters (p_formals) are described in Section 3.6. Pads comments (p_comment) are described in Section 3.2. Core literals (p_coreliteral) are described in Section 3.4. For in-line arrays, size specifications (p_size_spec) and array constraints appear in Chapter 7. Option constraints (opt_predicates) are defined in Section 9.1. Expressions (expression) represent any C expression, while c_ty denotes any C type.

5.1.1  Example

The following Pstruct describes the request portion of a common-log format web-server log, an example of which is:

GET /research.att.com/projects/PADS/index.html HTTP/1.0
Pstruct http_request_t {
  
’\"’; http_method_t   meth;     /- Method used during request
  
’ ’;  Pstring(:’ ’:)  req_uri;  /- Requested uri.
  
’ ’;  http_v_t        version : checkVersion(version, meth);
                                  /- HTTP version
  
’\"’;
};

The Pstruct http_request_t has full fields meth, req_uri, and version that use the (omitted) auxiliary types http_method_t, Pstring, and http_v_t to describe the HTTP method, URI, and version formats, respectively. It has literal fields \" and ’ ’ to describe the quotations and spaces in the external representation. The version field uses the C function checkVersion:

int  checkVersion(http_v_t version, http_method_t meth) {
  
if ((version.major == 1) && (version.minor == 0)) return 1;
  
if ((meth == LINK)  || (meth == UNLINK )) return 0;
  
return 1;
}

to ensure that the obsolete HTTP methods LINK and UNLINK are only used with HTTP version 1.0.

5.1.2  Full fields

Each full field in a Pstruct must include the name of the field and its type. The name serves to document the data and to permit later reference. The type determines how that piece of the Pstruct will be processed. Optionally, each full field may be preceeded by a qualifier sequence cf. Section 5.1.6.

Each full field may be followed by a constraint (cf. Section 3.3). Such a constraint is used to express the conditions under which a properly parsed value of the field type is a legal value for the field. The field itself and all earlier fields in the Pstruct are in scope in the constraint, as are any parameters to the Pstruct. In the example, the checkVersion predicate on the version field uses the values of the meth and version fields to determine if the version value is legal. If the constraint associated with a field evaluates to false (i.e., zero) after parsing, then the parse descriptor returned with the in-memory representation will indicate a user-constraint violation has occurred for the field.

Each full field in a Pstruct may optionally be followed by a Pads comment. Such comments are reflected by the Pads compiler into the output library as comments.

5.1.3  Computed fields

Instead of being read from an external source, the value of a computed field is set from an initializing expression. Such fields are marked by the Pcompute keyword. Each such field gives its name and the type to be included in the in-memory representation. If the given type is a Pads type, the field will behave exactly as if it were read from the external source. With a C type, some services may not be available in the generated library, such as automatic accumulation and printing. Each computed field also gives a C expression to initialize the field. This expression must have the type declared for the field. Previously read fields in the Pstruct and any parameters to the Pstruct are in scope in this expression. Like full fields, computed fields admit the Pomit qualifier and may have an associated constraint.

The computeExample Pstruct sets the value of its computed field index from the full field base and the offset parameter.

Pstruct computeExample(:int offset:){
  Pint32 base;
  
Pcompute int index = base + offset;
};

5.1.4  Literal fields

Literal fields can be character, string, or regular expression literals. They are written using the notation described in Section 3.4.

In addition to specifying literals to consume from the external representation, literal fields also play a role in error recovery. If the generated parser encounters a syntactic error while parsing a full field, causing it to enter panic mode (cf. Section 3.10), the parser will scan to find the next literal, marking all intervening fields as errors in the associated parse descriptor. The library discipline has parameters that allow the library user to tune the extent of such scanning (cf. Section 15.1.5).

5.1.5  In-line declarations

For conciseness, Pads allows anonymous option and array types to be declared within Pstruct field declarations.

Array declarations

In-line array declarations include a size specification after the type of the field. For example, the following Pstruct matches a resolved IP address (of the form 135.27.24.12) and an integer recording a number of bytes, separted by a vertical bar:

 Pstruct log_t {
  Puint8 [
4] ip  : Psep(’.’) && Pterm(Pnosep); /- resolved ip address
  
’|’;
  Puint32 numBytes;
};

After the field name, Pads permits an optional colon followed by array constraints. Details about size specifications, array constraints, and the in-memory representation of arrays may be found in Chapter 7.

Option declarations

In-line options are marked by the keyword Popt. For example, the following Pstruct matches two optional integers separated by a vertical bar and terminated by a newline.

Precord Pstruct entry2{
  
Popt Puint32 f;
  
’|’;
  
Popt Puint32 g;
}

This declaration is equivalent to the entry1 type defined in Section 9.1.1. Fields with in-line option declarations admit the option form of constraints, which are described in Section 9.1.2. As an example, the Pstruct entry4

Precord Pstruct entry4{
  
Popt Puint32 x1 : Psome i => { i % 2 == 0};
  
Popt Puint32 x2 : Psome i => { i % 2 != 0};
  
’|’;
  
Popt Puint32 y1 : Psome i => { i % 2 == 0};
  
Popt Puint32 y  : Psome i => { i % 2 != 0};
  
’|’;
};

uses option constraints to specify when the option should match. Type entry4 is equivalent to the type entry3 defined in Section 9.1.1 .

Details about the in-memory representation of options appear in Section 9.2.2.

5.1.6  Qualifiers

Non-literal fields can take one or more qualifiers.

Pomit
This qualifier indicates that the field should not be included in the in-memory representation of the Pstruct. Because they are not included in-memory, omitted fields cannot be accumulated or printed.
Pendian
During initialization, the Pads library determines the endian-ness of the underlying machine and stores the result in the library handle. Each library handle discipline stores the endian-ness of the data being parsed, initially assuming the endian-ness of the data matches that of the machine. The Pendian qualifier directs the generated parser to check the endian-ness of the data; it can only be used in the presence of a user constraint. The qualifier causes the parser to read the field and check the associated constraint. If the constraint is violated, the bytes associated with the field are swapped, and the constraint is re-tested. If this second attempt succeeds, the endian-ness of the data is toggled in the library discipline. The value of the data endian-ness flag can also be set programmatically (cf.Section 15.1.8).

5.1.7  Optional Pwhere clause

If given, a Pwhere clause expresses constraints over the entirety of a Pstruct value. The values of all previous fields and any parameters to the Pstruct are in scope. Within the context of a Pparsecheck clause, constants begin and end, each of type Ppos_t are available. Constant begin is bound to the input position of the beginning of the Pstruct; end is bound to its end. If the predicate given in the Pwhere clause evaluates to false (i.e., zero), the error code in the associated parse descriptor will indicate a user-constraint error has occurred.

The Pwhere clause in the whereExample Pstruct ensures that the sum of the first two fields is less than the given limit.

Pstruct whereExample(:int limit:){
  Pint32 first;
  Pint32 second;
Pwhere {first + second < limit;};

5.2  Generated library

5.2.1  In-memory representation

The in-memory representation of a Pstruct is a C struct of the same name. Each field of the C struct corresponds to a full or computed field of the Pstruct. The type of each full field in the C struct is the in-memory representation of the Pads type associated with the field. The type of each computed field is the given C type.

The C type http_request_t is the in-memory representation of the Pads type of the same name.

The type Pstring is the in-memory representation of the base type Pstring (cf. Chapter 4). Note that literal fields do not appear in the in-memory representation.

5.2.2  Mask

The mask of a Pstruct with name myStruct is a C struct with name myStruct_m. For each full field in myStruct, there is a corresponding field in the mask struct, the type of which is the mask type for the field. In addition, there is a structLevel field, which has the base mask type. This field allows library users to toggle operations at the level of the structure as a whole.

For example, the mask type http_request_t_m has the following structure:

typedef struct http_request_t_m_s http_request_t_m;

struct http_request_t_m_s {
  Pbase_m structLevel;
  http_method_t_m meth;         
/* nested constraints */
  Pbase_m req_uri;              
/* nested constraints */
  http_v_t_m version;           
/* nested constraints */
  Pbase_m version_con;          
/* struct constraints */
};

5.2.3  Parse descriptor

The parse descriptor of a Pstruct with name myStruct is a C struct with name myStruct_pd. This struct has the fields described in Section 3.13. In addition, for each full field in myStruct, there is a corresponding field in the parse descriptor struct, the type of which is the parse descriptor type for the field.

For example, the parse descriptor type http_request_t_pd has the following structure:

typedef struct http_request_t_pd_s http_request_t_pd;

struct http_request_t_pd_s {
  Pflags_t pstate;
  Puint32 nerr;
  PerrCode_t errCode;
  Ploc_t loc;
  http_method_t_pd meth;
  Pbase_pd req_uri;
  http_v_t_pd version;
};

5.2.4  Operations

The operations generated by the Pads compiler for a Pstruct are those described in Chapter 3. For the Pstruct http_request_t, the prototypes of the generated functions appear inFigure 5.1


Perror_t http_request_t_init (P_t *pads,http_request_t *rep);

Perror_t http_request_t_pd_init (P_t *pads,http_request_t_pd *pd);

Perror_t http_request_t_cleanup (P_t *pads,http_request_t *rep);

Perror_t http_request_t_pd_cleanup (P_t *pads,http_request_t_pd *pd);

Perror_t http_request_t_copy (P_t *pads,http_request_t *rep_dst,
                              http_request_t *rep_src);

Perror_t http_request_t_pd_copy (P_t *pads,http_request_t_pd *pd_dst,
                                 http_request_t_pd *pd_src);

void http_request_t_m_init (P_t *pads,http_request_t_m *mask,
                            Pbase_m baseMask);

Perror_t http_request_t_read (P_t *pads,http_request_t_m *m,
                              http_request_t_pd *pd,http_request_t *rep);

ssize_t http_request_t_write2buf (P_t *pads,Pbyte *buf,size_t buf_len,
int *buf_full,
                                  http_request_t_pd *pd,http_request_t *rep);

ssize_t http_request_t_write2io (P_t *pads,Sfio_t *io,
                                 http_request_t_pd *pd,http_request_t *rep);

int http_request_t_verify (http_request_t *rep);

int http_request_t_genPD (P_t *pads, http_request_t *rep,
                          http_request_t_pd *pd);
Figure 5.1: Prototypes of operations generated for the Pstruct httpRequest.

Read function

The error codes for Pstructs are:

CodeMeaning
P_NO_ERRIndicates no error occurred
P_STRUCT_FIELD_ERRIndicates that an error occurred during parsing one of the fields of the Pstruct. The parse descriptor for each full field with an error will contain more information describing the precise nature of the error.
P_STRUCT_EXTRA_BEFORE_SEPIndicates that there were unexpected data before a literal field in the Pstruct.
P_MISSING_LITERALIndicates that the read function failed to find a literal field

If multiple errors occur during the parsing of a Pstruct, the errCode field will reflect the first detected error. The parse descriptors for nested pieces will describe any errors detected while reading those pieces.

Warning: At the moment, read functions do not check that all referenced data in constraint expressions are meaningful before checking the constraint. Referenced data might be meaningless either because there was an error parsing earlier data or because the supplied mask directed the read function to skip the field.

Accumulator functions

Accumulator functions for Pstructs are described in Chapter 16.

Histogram functions

Histogram functions for Pstructs are described in Chapter 17.

Clustering functions

Clustering functions for Pstructs are described in Chapter 18.


Previous Up Next