Previous Up Next

Chapter 21  Filters

Filters partition an input file into two output files: one with data conforming to the specification, the other with data containing errors. Filters apply to data formats that contain an optional header followed by a sequence of records.

21.1  Template Program

Because generating a filter from a Pads description is a very routine task, Pads provides a template program to automate the task for common data formats. In particular, the template applies to data that can be viewed as an optional header followed by a sequence of records.

When instantiated, the template program takes an optional command-line argument specifying the path to the data source. If no argument is given, it uses a default location for the data specified by the template user. The location for the clean and error records can be set in the template program.

The template first reads the optional header, echoing it to either the clean or the error file, depending upon the resulting parse descriptor. It then reads each record, echoing it to the approporiate file until the data source is exhausted.

Like the accumulator template, the filter template is a C header file parameterized by a number of macros that permit the user to customize the template by defining appropriate values for these macros. The following list describes the macros used by the filter template:

DATE_IN_FMT
If defined, this macro sets the default input format for dates described by Pdate. See Section 15.1.12 for more information.
DATE_OUT_FMT
If defined, this macro sets the default output format for Pdate and Pdate_explicit. See Section 15.1.13 for more information.
DEF_INPUT_FILE
If defined, this macros specifies a string representation of the path to the default data source. If no path to the data is supplied at the command-line, this is the location used for input data.
EXTRA_BAD_READ_CODE
If defined, this macro points to a C statement that will be executed after any body record containing an error.
EXTRA_BEGIN_CODE
If defined, this macro points to a C statement that will be executed after all initialization code is performed, but before the optional header is read.
EXTRA_DECLS
This optional macro defines additional C declarations that proceed all accumulator code.
EXTRA_DONE_CODE
If defined, this macro points to a C statement that will be executed after generating the accumulator report.
EXTRA_GOOD_READ_CODE
If defined, this macro points to a C statement that will be executed after any body record not containing an error.
EXTRA_HEADER_READ_ARGS
If the type of the header record was parameterized, this macro allows the user to supply corresponding parameters.
EXTRA_READ_ARGS
If the type of the repeated record was parameterized, this macro allows the user to supply corresponding parameters.
IN_TIME_ZONE
If set, this macro specifies the input time zone of date types that do not include time zone information. See Section 15.1.10 for more detail.
IO_DISC_MK
If defined, this macro specifies the interpretation of Precord by indicating which IO discpline the system should install. It specifies the discipline by naming the function to create the discipline. Section 15.2 describes the available IO discipline creation functions. If the user does not define this macro, the system installs the IO discipline corresponding to new-line terminated ASCII records.
MAX_RECS
If defined, this macro specifies an integer that limits the number of repeated records that the accumulator program should read.
OUT_TIME_ZONE
If set, this macro specifies the output time zone of date types. See Section 15.1.11 for more detail.
PADS_HDR_TY
Intuitively, this macro defines the type of the header record in the data source. This macro need only be defined if the data source has a header record. It defines a function used by the template program to generate the various function and type names derived from the name of the header record type, i.e., the type of the associated in-memory representation, mask, parse descriptor, read function, etc.
PADS_TY
Intuitively, this macro defines the type of the repeated record in the data source, i.e., the type of the value to be accumulated. This macro must be defined to use the accumulator template. It defines a function used by the template program to generate the various function and type names derived from the name of the record type, i.e., the type of the associated in-memory representation, mask, parse descriptor, read function, etc.
READ_MASK
This macro specifies the mask to use in reading the repeated record. If not defined by the user, the template uses the value P_CheckAndSet.
TIME_IN_FMT
If defined, this macro sets the default input format for Ptime. See Section 15.1.12 for more information.
TIME_OUT_FMT
If defined, this macro sets the default output format for Ptime and Ptime_explicit. See Section 15.1.13 for more information.
TIMESTAMP_IN_FMT
If defined, this macro sets the default input format for Ptimestamp. See Section 15.1.12 for more information.
TIMESTAMP_OUT_FMT
If defined, this macro sets the default output format for the Pads types Ptimestamp and Ptimestamp_explicit. See Section 15.1.13 for more information.
WSPACE_OK
If defined, this macro indicates that leading white space for variable-width ASCII integers is okay, as well as leading and trailing white space for fixed-width ASCII integers.

Previous Up Next