Previous Up Next

Chapter 4  Base Types Overview

Pads base types describe inidivual, small values: numbers, strings, dates, and so on. This chapter provides an overview of some of the most important built-in Pads base types, with a focus on how these types are used within Pads source files. Appendix B gives detailed descriptions of each of the built-in base types, including the full set of library API calls for each type. (As discussed in Chapter 3, each type has correspond read, write, format, and accumulator functions.)

In addition to the built-in types, it is possible to extend Pads with new base types; see Section 15.3.

4.1  In-Memory Representation

Each base type has an external and an in-memory representation. Related base types share the same in-memory representation. For example, while there are 18 different string base types, all of them use Pstring as their in-memory representation.

This section reviews the different in-memory representation types.

4.1.1  Pchar

Type Pchar is the in-memory representation of an external character. It is equivalent to the C type unsigned char, or type Puint8: all are 8-bit unsigned values. N.B.: Regardless of the external character that is read, the corresponding ASCII character is stored in the in-memory representation.

4.1.2  Pstring

Type Pstring is the in-memory representation for all forms of external strings. A Pstring s has two fields:

In addition, Pstring has fields that are manipulated by various string functions (some of which are described below). Most programmers should only use s.len and s.str.

The library discipline has a field copy_strings which controls copying behavior for string read calls. If copy_strings is non-zero, the string read functions always copy strings. Otherwise, a copy is not made and the target Pstring points to memory managed by the current IO discipline. copy_strings should only be set to zero for record-based IO disciplines where strings from record K are not used after P_io_next_rec has been called to move the IO cursor to record K+1. Note: Pstring_preserve can be used to force a string that is using sharing to make a copy so that the string is ’preserved’ (remains valid) across calls to P_io_next_rec.

When copying is used, the string copies are stored in an internal resizable buffer, and s.str points into this buffer. To ensure correct behavior, function Pstring_init should be called prior to using a Pstring, and function Pstring_cleanup should be called when a given Pstring is no longer in use. Generated initialization and cleanup functions call these routines for any Pads type containing a Pstring.

The full set of string helper functions appears in Figure 4.1.


Perror_t Pstring_init(P_t *pads, Pstring *s);
Perror_t Pstring_cleanup(P_t *pads, Pstring *s);
Perror_t Pstring_share(P_t *pads, Pstring *targ, 
const Pstring *src);
Perror_t Pstring_cstr_share(P_t *pads, Pstring *targ, 
const char *src, size_t len);
Perror_t Pstring_copy(P_t *pads, Pstring *targ, 
const Pstring *src);
Perror_t Pstring_cstr_copy(P_t *pads, Pstring *targ, 
const char *src, size_t len);
Perror_t Pstring_preserve(P_t *pads, Pstring *s);
int Pstring_eq(const Pstring *str1, const Pstring *str2);
int Pstring_eq_cstr(const Pstring *str, const char *cstr);

Pint8    Pstring2int8  (
const Pstring *str);   /* returns P_MIN_INT8 on error   */
Pint16   Pstring2int16 (
const Pstring *str);   /* returns P_MIN_INT16 on error  */
Pint32   Pstring2int32 (
const Pstring *str);   /* returns P_MIN_INT32 on error  */
Pint64   Pstring2int64 (
const Pstring *str);   /* returns P_MIN_INT64 on error  */

Puint8   Pstring2uint8 (
const Pstring *str);   /* returns P_MAX_UINT8 on error  */
Puint16  Pstring2uint16(
const Pstring *str);   /* returns P_MAX_UINT16 on error */
Puint32  Pstring2uint32(
const Pstring *str);   /* returns P_MAX_UINT32 on error */
Puint64  Pstring2uint64(
const Pstring *str);   /* returns P_MAX_UINT64 on error */

Pfloat32 Pstring2float32(
const Pstring *str);  /* returns P_MIN_FLOAT32 on error */
Pfloat64 Pstring2float64(
const Pstring *str);  /* returns P_MIN_FLOAT64 on error */
Figure 4.1: Library functions for manipulating Pstrings.

The following list describes the behaviors of these functions.

Pstring_init(pads, s)
Initialize s to valid empty string (no dynamic memory allocated yet).
Pstring_cleanup(pads, s)
Free any dynamic memory allocated for s.
Pstring_share(pads, targ, src)
Make targ refer to the string in src, sharing the space with the original owner.
Pstring_cstr_share(pads, targ, src, len)
Make targ refer to len characters in C-string src.
Pstring_copy(pads, targ, src)
Copy the string in src into targ; sharing is not used.
Pstring_cstr_copy(pads, targ, src, len)
Copy len characters from C-string src into targ; sharing is not used.
Pstring_eq(str1, str2)
Returns 1 if str1 and str2 are of equal length and str1 equals str2 (based on strncmp). Otherwise, returns 0.
Pstring_eq_cstr(str, cstr)
Returns 1 if str1 and str2 are of equal length and str1 equals str2 (based on strncmp). Otherwise, returns 0.

Although not strictly necessary, both Pstring_copy and Pstring_cstr_copy null-terminate targ->str. Each copy function returns P_ERR on bad arguments or on failure to allocate space, otherwise it returns P_OK.

The various Pstring2NUMERIC functions convert a string to the specified numeric type. If the contents of the string cannot be converted to the specified type, the minumum value for the numeric type is returned for signed numeric types, or the maximum value is returned for unsigned numeric types.

4.1.3  Integer types

There are eight in-memory representations for integers, four types for signed values (Pint8, Pint16, Pint32, and Pint64) and four types for unsigned values (Puint8, Puint16, Puint32, and Puint64). The number in these type names indicates the number of bits in the in-memory representation, thus there are signed and unsigned integers that use 1, 2, 4, or 8 bytes of memory. The endian-ness of these in-memory representation types is the same as the endian-ness of the processor that the code is executing on. The external representation, which may be in another format, is always converted to the primitive in-memory representation supported by the processor.

For programming convenience, the header file pads.h includes definitions of the minimum and maximum values for each signed type, and of the maximum value for each unsigned type. Figure 4.2 list these constants.


P_MIN_INT8
P_MAX_INT8
P_MAX_UINT8

P_MIN_INT16
P_MAX_INT16
P_MAX_UINT16

P_MIN_INT32
P_MAX_INT32
P_MAX_UINT32

P_MIN_INT64
P_MAX_INT64
P_MAX_UINT64
Figure 4.2: Minimum and maximum values for Pads integer types.

4.1.4  Floating-point types

Pads has only two in-memory floating point representations, Pfloat32 and Pfloat64, which correspond to ANSI C types float and double, respectively.

4.1.5  Fixed-point types

A fixed-point number is a number with a fixed number of decimal digits (digits after the ’dot’). For example, 123.456 is a fixed-point number with three decimal digits. External formats for such numbers occur, e.g., in COBOL data. We have chosen a very simple in-memory representation for such numbers: a struct with a numerator (num) that contains all of the digits and a denominator (denom) that contains some power of 10. For example, 123.456 would be represented with a numerator containing 123456 and a denominator containing 1000 (103).

The in-memory representation always has an unsigned denominator. We provide signed and unsigned representations that use signed and unsigned numerators, respectively. One can choose the number of bits used for both numerator and denominator in the same way as for the integer types, thus there are four types for signed values and four types for unsigned values:

typedef struct { Pint8   num; Puint8  denom; } Pfpoint8;
typedef struct { Pint16  num; Puint16 denom; } Pfpoint16;
typedef struct { Pint32  num; Puint32 denom; } Pfpoint32;
typedef struct { Pint64  num; Puint64 denom; } Pfpoint64;

typedef struct { Puint8  num; Puint8  denom; } Pufpoint8;
typedef struct { Puint16 num; Puint16 denom; } Pufpoint16;
typedef struct { Puint32 num; Puint32 denom; } Pufpoint32;
typedef struct { Puint64 num; Puint64 denom; } Pufpoint64;

There are two macros to help one use values of these in-memory types. P_FPOINT2FLOAT32(fp) calculates fp.num/fp.denom as a Pfloat32, while P_FPOINT2FLOAT64(fp) calculates fp.num/fp.denom as a Pfloat64.

4.2  Base Type Mask

The mask for base types is just an integer type, treated as an array of bits:

typedef Puint32 Pbase_m;

Masks for accessing individual bits of the base type masks are described in sections describing operations that use mask. (cf. Section 3.16.3). More information about how base types handle masks is available in Appendix B

4.3  Base Type Parse Descriptor

Parse descriptors for base types contain only the fields described in Section 3.13. Specific error codes are discussed when the base type read functions are described.

4.4  Character Sets

Pads currently supports two character sets for external data, ASCII and EBCDIC. As discussed in Section 3.5, the library discipline contains a field def_charset that selects which character set to use when one is not specified explicitly. As a result, for each ’kind’ of data that has an external form made up of characters (characters, strings, character-based dates, character-based integer and floating point numbers, etc.), Pads has three types: a type that indicates the external form is always ASCII, a type that indicates the external form is always EBCDIC, and a type that indicates that the external form uses the character set specified in def_charset.

In each section describing character-based types, we give a three-column table indicating the type(s) that use ASCII, EBCDIC, or DEFAULT character sets. For example, the next section begins with a table showing types Pa_char (ASCII), Pe_char (EBCDIC), Pchar (DEFAULT).

4.5  Character Base Types

4.5.1  Fixed-width character-based encoding

ASCIIEBCDICDEFAULT
Pa_char  Pe_char  Pchar

For example, writing

Pa_char c;

in a Pads source file (within a Pstruct, for example) indicates that a single ASCII character is expected. Writing a constraint such as

Pe_char  c : c == ’A’ || c == ’B’;

indicates that an EBCDIC capital letter A or B is expected. NB: Note that the constraint expression is applied to the in-memory representation of c, which is an ASCII value, thus the C character constants (ASCII constants) are used to specify letters A and B.

4.5.2  Special character counting base types

ASCIIEBCDICDEFAULT
Pa_countX  Pe_countX  PcountX
Pa_countXtoY  Pe_countXtoY  PcountXtoY

Unlike all other base types, these counting base types never advance the IO cursor. You can think of these types as “peeking ahead” to see how many occurrences of a given character appear forward of the current IO cursor position.

The PcountX types count the number of occurences of character X between the current IO cursor position and the fist EOR (end of record) or EOF (end of file). They take three parameters, x, eor_required, and count_max. x is the character to count. If eor_required is non-zero, then encountering EOF before EOR produces an error. If count_max is non-zero, EOR/EOF must be encountered before scanning count_max characters, otherwise an error is returned. For example,

Pa_countX(: ’=’00 :) my_count;

will count the number of ASCII equals-sign characters between the IO cursor and the next EOR or EOF, with no limit on the maximum scan distance.

4.6  String Base Types (including dates and times)

The large number of string base types arises from the fact that there are many different ways to indicate the extent of a string. The entire input (up to end-of-file or end-of-record) is a sequence of bytes that can be included in a string, so when specifying a string type in a Pads description, we need to indicate how much of that input we would like included in the string.

4.6.1  Pstring_FW

ASCIIEBCDICDEFAULT
Pa_string_FW  Pe_string_FW  Pstring_FW

One of the simplest ways to specify the extent of a string is to give the exact number of characters, or width, that will be included in the string. For example,

Pstring_FW(: 10 :) my_string;

Specifies a string with width 10. In this case the default character set will determine whether ASCII or EBCDIC characters are expected in the input stream. Regardless of the input character set, the resulting in-memory Pstring contains ASCII characters.

An error occurs if the specified width is not available. See Appendix B for details.

4.6.2  Pstring

ASCIIEBCDICDEFAULT
Pa_string  Pe_string  Pstring

For the Pstring type one specifies a ’stop character’ that is expected immediately following the string. The extent of the string is all characters from the IO cursor up to but not including the first occurrence of the stop char. For example,

Pe_string(:’|’:) my_string;

Indicates that a series of EBCDIC characters is expected, followed by an EBCDIC vertical bar. Note that the stop char is always specified in ASCII (in this case usng a C character constant). When the character set that is being read from the input is EBCDIC, the read function looks for the EBCDIC character that is equivalenet to the specified ASCII character.

4.6.3  Pstring_ME

ASCIIEBCDICDEFAULT
Pa_string_ME  Pe_string_ME  Pstring_ME

For type Pstring_ME, a regular expression called the matching expression is given, and the extent of the string is the longest sequence of characters starting at the current IO position which match this expression. Note that when you specify a regular expression as a C string, you must use two backslashes to indicate a single backslash character (otherwise C will think you are applying the special backslash operator to the following character). In a language such as Perl, which does not have this requirement, you might write

/\S*/

to create a regular expression which will match a sequence of zero or more non-space characters. To use the same regular expression in a PADSL description with the Pa_string_ME type, you would write:

Pa_string_ME(: "/\\S\*/" :) my_string;

This will match a sequence of zero or more non-space characters and assign it to my_string. Note that if a space occurs immediately, a match still occurs, since we specified that zero characters was OK; my_string would be a string of length zero. The extent is bound be end-of-record/end-of-file, so if there are no spaces before an end-of-record, my_string will end up containing all characters remaining in the record.

As a concrete example, if the input at the current IO cursor is hello world when the above string declaration is used to read from the input, my_string will end up containing hello. Note that the space that follows hello is not included, since it does not part of the match.

4.6.4  Pstring_SE

ASCIIEBCDICDEFAULT
Pa_string_SE  Pe_string_SE  Pstring_SE

For type Pstring_SE, a regular expression called the stop expression is given, and the extent of the string is the longest sequence of characters starting at the current IO position such that the characters immediately following successfully match the stop expression. None of the characters matching the stop expression are included in the result. For example,

Pa_string_SE(: "/\\s|$/" :) my_string;

The stop expression will match either a space character (due to the backslash-s) or end-of-record/end-of-file (due to the special dollar-sign character). As a result, my_string will end up containing all non-space characters up to (but not including) the first space characer that is found, or up to the end of the current record if no space character is found.

You may have noticed that the Pa_string_ME and Pa_string_SE examples actually specify exactly the same extent. Because of the power of regular expressions, it is often the case that you can choose to use either type. You should use whichever type results in a clearer description of what is expected in the input. (In this case, thePa_string_ME form is simpler and therefore clearer.)

4.6.5  Timestamp_explicit

ASCIIEBCDICDEFAULT
Pa_timestamp_explicit_FW  Pe_timestamp_explicit_FW  Ptimestamp_explicit_FW
Pa_timestamp_explicit  Pe_timestamp_explicit  Ptimestamp_explicit
Pa_timestamp_explicit_ME  Pe_timestamp_explicit_ME  Ptimestamp_explicit_ME
Pa_timestamp_explicit_SE  Pe_timestamp_explicit_SE  Ptimestamp_explicit_SE

A timestamp is a combination of a calendar date and a time of day. The corresponding in-memory representation is a Puint32 which represents the number of seconds since 00:00:00 1-Jan-1970 UTC, also knows as “seconds since the epoch.” Thus, the time 00:00:20 1-Jan-1970 UTC would be represented internally as the number 1200, since it occurs 1200 seconds (20 minutes) past the epoch.

If the input is ‘‘midnight Jan 1 1970’’ and the time zone is UTC, then this produces a value of 0 since this is actually the epoch. If the time zone is EST, then this produces 5 * 60 * 60 since midnight in the EST timezone occurred 5 hours after the start of the epoch.

If the input explicitly has a time zone, as in ‘‘midnight Jan 1 1970 UTC’’ then the time zone in the input is used, so this would produce 0, regardless of the time zone specified for the type. Of course, not all timestamp input formats allow you to explicitly give the time zone!

The input values are ASCII or EBCDIC strings. Each Ptimestamp_explicit type takes as first argument the same form of specifying the string’s extent as the corresponding Pstring type, and takes as second argument a timestamp format string which describes what the input string should contain.

Timestamp formats consists of literal characters that are are simply expected to be present in the input and special combinations of a percent-sign and a character used to indicate expected parts of the timestamp. For example, the input format "%Y-%m-%d+%H:%M" indicates that a format that starts with a four digit year, then a dash, then a two digit month, then a dash, then a two-digit day, then a plus sign, then a two digit hour, then a colon, then a two digit minutes. (To specify that a literal percent sign must appear in the input, use two percent signs in a row.) A full description of supported formats appears on the webpage: www.research.att.com/gsf/man/man3/tm.html

Each of the Ptimestamp_explicit types corresponds to one of the Pstring types that has already been described, where each takes one additional argument to specify the input format. For example,

Pa_timestamp_explicit(: ’|’ , "%Y-%m-%d+%H:%M", P_cstr2timezone("-0500"):) my_timestamp;

Reads an ASCII string, up to but not including a vertical bar, and converts that string into a Puint32 timestamp. The conversion will be successful only if the string has the specified format.

Some timestamp formats include explicit time zone information,such as the one above. Pads provides the function P_cstr2timezone to convert a string representation of a time zone into an value of type Tm_zone_t *. This function is described in Chapter 14.

For the rest,the input time zone is taken from the Pads discipline field disc->in_time_zone, as described in Section 15.1.10.

4.6.6  Timestamp

ASCIIEBCDICDEFAULT
Pa_timestamp_FW  Pe_timestamp_FW  Ptimestamp_FW
Pa_timestamp  Pe_timestamp  Ptimestamp
Pa_timestamp_ME  Pe_timestamp_ME  Ptimestamp_ME
Pa_timestamp_SE  Pe_timestamp_SE  Ptimestamp_SE

The timestamp types are the same as the timestamp_explicit types, except no timestamp format is given. Instead, the Pads discipline field disc->in_formats.timestamp is used for all Ptimestamp types.

4.6.7  Date_explicit

ASCIIEBCDICDEFAULT
Pa_date_explicit_FW  Pe_date_explicit_FW  Pdate_explicit_FW
Pa_date_explicit  Pe_date_explicit  Pdate_explicit
Pa_date_explicit_ME  Pe_date_explicit_ME  Pdate_explicit_ME
Pa_date_explicit_SE  Pe_date_explicit_SE  Pdate_explicit_SE

Dates are calendar days (no time of day). Like timestamps, we represent a date as a Puint32 recording “seconds since the epoch.”

The Pdate_explicit types take a second argument, a date format, which accepts the same special characters as the Ptimestamp_explcit types. So, technically there is nothing to stop you from using the date types to input time of day fields. However, we encourage you to use the Ptimestamp types when both a calendar day and a time of day are to be input, and to use the Pdate types when just the calendar day is to be input.

4.6.8  Date

ASCIIEBCDICDEFAULT
Pa_date_FW  Pe_date_FW  Pdate_FW
Pa_date  Pe_date  Pdate
Pa_date_ME  Pe_date_ME  Pdate_ME
Pa_date_SE  Pe_date_SE  Pdate_SE

The date types are the same as the date_explicit types, except no date format is given. Instead, the Pads discipline field disc->in_formats.date is used for all Pdate types.

4.6.9  Time_explicit

ASCIIEBCDICDEFAULT
Pa_time_explicit_FW  Pe_time_explicit_FW  Ptime_explicit_FW
Pa_time_explicit  Pe_time_explicit  Ptime_explicit
Pa_time_explicit_ME  Pe_time_explicit_ME  Ptime_explicit_ME
Pa_time_explicit_SE  Pe_time_explicit_SE  Ptime_explicit_SE

Times give the time of day, with no calendar date. They are represented as a Puint32 recording seconds since midnight. For examle, the time 1am is represented as 3600 (i.e., 3600 seconds, or 60 minutes, after midnight).

The Ptime_explicit types take a second argument, a time format, which accepts the same special characters as the timestamp and date types. However, we encourage you to use the Ptimes types when just a time of day is expected.

4.6.10  Time

ASCIIEBCDICDEFAULT
Pa_time_FW  Pe_time_FW  Ptime_FW
Pa_time  Pe_time  Ptime
Pa_time_ME  Pe_time_ME  Ptime_ME
Pa_time_SE  Pe_time_SE  Ptime_SE

The time types are the same as the time_explicit types, except no time format is given. Instead, the Pads discipline field disc->in_formats.time is used for all Ptime types.

4.6.11  IP

ASCIIEBCDICDEFAULT
Pa_ip_FW  Pe_ip_FW  Pip_FW
Pa_ip  Pe_ip  Pip
Pa_ip_ME  Pe_ip_ME  Pip_ME
Pa_ip_SE  Pe_ip_SE  Pip_SE

The Pip type reads an IP address string from the input that is in numeric dotted form (as in 10.1.0.17) using ASCII or EBCDIC digits and periods (dots). The string consists of up to four parts with values between 0 and 255, separated by periods, with an optional trailing period. When there are fewer than four parts, the missing parts are treated as implicitly zero, and are inserted as shown in the following diagram, which shows the eight legal input forms and the equivalent expanded form.

<part1>  —→  <part1>.0.0.0
<part1>.  —→  <part1>.0.0.0.
<part1>.<part4>  —→  <part1>.0.0.<part4>
<part1>.<part4>.  —→  <part1>.0.0.<part4>.
<part1>.<part2>.<part4>  —→  <part1>.<part2>.0.<part4>
<part1>.<part2>.<part4>.  —→  <part1>.<part2>.0.<part4>.
<part1>.<part2>.<part3>.<part4>  —→  same
<part1>.<part2>.<part3>.<part4>.  —→  same

Each <part> is made up of 1 to 3 digits which specify a number in the range [0, 255].

The result is a single Puint32 value with each part encoded in one of the four bytes. part1 is stored in the high-order byte, part4 in the low-order byte. You can obtain each part using the macro

P_IP_PART(addr, part)

where part must be an integer between 1 and 4.

The digits and the "." char are read as EBCDIC chars if the EBCDIC form is used or if the default form is used and pads->disc->def_charset is Pcharset_EBCDIC. Otherwise the data is read as ASCII chars.

4.7  Integer Base Types

4.7.1  Fixed-width character-based encoding

ASCIIEBCDICDEFAULT
Pa_int8_FW  Pe_int8_FW  Pint8_FW
Pa_int16_FW  Pe_int16_FW  Pint16_FW
Pa_int32_FW  Pe_int32_FW  Pint32_FW
Pa_int64_FW  Pe_int64_FW  Pint64_FW
Pa_uint8_FW  Pe_uint8_FW  Puint8_FW
Pa_uint16_FW  Pe_uint16_FW  Puint16_FW
Pa_uint32_FW  Pe_uint32_FW  Puint32_FW
Pa_uint64_FW  Pe_uint64_FW  Puint64_FW

The above types are used when the input representation for an integer is a fixed number of ASCII or EBCDIC characters. The int types are signed types, while the uint types are unsigned. The number in the type name specifies how many bits are used in the in-memory represenation, thus a Puint32 is a 32-bit (4 byte) representation of an unsigned integer.

The characters in the input can have an optional plus or minus sign for signed types, or an optional plus signed for unsigned types, followed by a set of one or more digits. In addition, leading or trailing whitespace can occur, but only if the Pads discipline field disc->flags has the WSPACE_OK flag set. The data is read as EBCDIC chars if an EBCDIC form (such as Pe_int8) is used, or if the default form (Pint8) is used and pads->disc->def_charset is Pcharset_EBCDIC. Otherwise, the data is read as ASCII chars.

4.7.2  Variable-width character-based encoding

ASCIIEBCDICDEFAULT
Pa_int8  Pe_int8  Pint8
Pa_int16  Pe_int16  Pint16
Pa_int32  Pe_int32  Pint32
Pa_int64  Pe_int64  Pint64
Pa_uint8  Pe_uint8  Puint8
Pa_uint16  Pe_uint16  Puint16
Pa_uint32  Pe_uint32  Puint32
Pa_uint64  Pe_uint64  Puint64

The expected input for these types is an optional sign character followed by a sequence of digits. The number of characters that make the input is variable: after the first digit, the digits are read until (but not including) the first non-digit or EOR/EOF. If the Pads discipline field disc->flags has the WSPACE_OK flag set, then leading whitespace is allowed.

4.7.3  Raw binary encoding

RAW
Pb_int8
Pb_int16
Pb_int32
Pb_int64
Pb_uint8
Pb_uint16
Pb_uint32
Pb_uint64

These are the first binary types described in this chapter. The input is not made up of ASCII or EBCDIC characters that need to be interpreted to see what number they are describing. Instead, the number itself is encoded in binary form, as a sequence of bytes.

There are binary types for signed or unsigned binary integers of common bit widths (8, 16, 32, and 64 bit widths). Pb_int8 corresponds to one byte of input, Pb_int16 to two bytes of input, and so on.

The representation in memory is just the corresponding signed or unsigned type, thus Pb_uint16 has representation type Puint16. The bytes from the input are simply copied into the bytes that make up the representation. If the endian-ness of input data is different from the endian-ness of the machine, then the byte order is reversed to form the in-memory representation; otherwise the byte order is preserved.

The endian-ness of the machine running the Pads program is fixed: it is determined automatically by the Pads libary. The input data endianess is described by Pads disipline field disc->d_endian.

In some cases it is possible to have Pads determine the proper setting for disc->d_endian automatically, by using the annotation Pendian with the first multi-byte binary integer field that appears in the data. For example, consider this header definition:

Pstruct header {
   
Pendian Pb_uint16 version : version < 10;
        ...
};

This Pads description indicates the first value in the header is a 2-byte unsigned binary integer, version, whose value should be less than ten. The Pendian annotation indicates that there should be two attempts at reading the version field: once with the current disc->d_endian setting, and (if the read fails) once with the opposite disc->d_endian setting. If the second read succeeds, then the new disc->d_endian setting is retained, otherwise the original setting is retained.

Note that the Pendian pragma is only able to determine the correct endian choice for a field that has an attached constraint, where the wrong choice of endian setting will always cause the constraint to fail. (In the above example, if a value less than ten is read with the wrong d_endian setting, the result is a value that is much greater than ten. )

4.7.4  Serialized binary encoding

SBLSBH
Psbl_int8Psbh_int8
Psbl_int16Psbh_int16
Psbl_int32Psbh_int32
Psbl_int64Psbh_int64
Psbl_uint8Psbh_uint8
Psbl_uint16Psbh_uint16
Psbl_uint32Psbh_uint32
Psbl_uint64Psbh_uint64

These types describe signed or unsigned binary integers that have been encoded with a specified number of bytes K. For the PPsbl_ types, the first byte on the input stream is treated as the low-order byte of the K byte value, For the PPsbh_ types, the first byte on the input stream is treated as the high-order byte of the K byte value, For example, Psbl_int32(:3:) describes a 3 byte binary encoding where the first byte encountered is the low-order byte.

These types are more general than the simpler Pb_ types because you explicitly specify the number of bytes (from 1 to 8) independently of the target in-memory type, allowing for types such as the Psbl_int32(:3:) type just described. These types also explicitly specify the endian-ness of the data bytes, rather than using disc->d_endian.

The following table shows those cases where serialized binary types have equivalent simple binary types.

 
SerializedEquivalent type if
Binarydisc->d_endian is
TypePbigEndianPlittleEndian
Psbl_int8(:1:) Pb_int8
Psbl_int16(:2:) Pb_int16
Psbl_int32(:4:) Pb_int32
Psbl_int64(:8:) Pb_int64
Psbl_uint8(:1:) Pb_uint8
Psbl_uint16(:2:) Pb_uint16
Psbl_uint32(:4:) Pb_uint32
Psbl_uint64(:8:) Pb_uint64
Psbh_int8(:1:)Pb_int8 
Psbh_int16(:2:)Pb_int16 
Psbh_int32(:4:)Pb_int32 
Psbh_int64(:8:)Pb_int64 
Psbh_uint8(:1:)Pb_uint8 
Psbh_uint16(:2:)Pb_uint16 
Psbh_uint32(:4:)Pb_uint32 
Psbh_uint64(:8:)Pb_uint64 

4.7.5  EBC encoding

EBC
Pebc_int8
Pebc_int16
Pebc_int32
Pebc_int64
Pebc_uint8
Pebc_uint16
Pebc_uint32
Pebc_uint64

These types describe signed or unsigned EBCDIC numeric encoded integers with a specified number of digits. N.B.: the specified number of digits must be odd if the value on disk can be negative. For example, Pebc_int32(:5:) describes a 5 digit signed integer.

Each byte on disk encodes one digit (using the low 4 bits). For signed values, the final byte encodes the sign (high 4 bits == 0xD for negative). E.g., a signed or unsigned 5 digit value is encoded in 5 bytes.

The legal range of values for the number of digits, num_digits, depends on target type:

 
Typenum_digitsMin / Max values
Pint81−3P_MIN_INT8 / P_MAX_INT8
Puint81−30 / P_MAX_UINT8
Pint161−5P_MIN_INT16 / P_MAX_INT16
Puint161−50 / P_MAX_UINT16
Pint321−10P_MIN_INT32 / P_MAX_INT32
Puint321−100 / P_MAX_UINT32
Pint641−19P_MIN_INT64 / P_MAX_INT64
Puint641−200 / P_MAX_UINT64

4.7.6  BCD encoding

BCD
Pbcd_int8
Pbcd_int16
Pbcd_int32
Pbcd_int64
Pbcd_uint8
Pbcd_uint16
Pbcd_uint32
Pbcd_uint64

These types describe signed or unsigned BCD numeric encoded integers with a specified number of digits. N.B.: the specified number of digits must be odd if the value on disk can be negative. For example, Pbcd_int32(:5:) describes a 5 digit signed integer.

Each byte on disk encodes two digits, 4 bits per digit. For signed values, a negative number is encoded by having number of digits be odd so that the remaining low 4 bits in the last byte are available for the sign. (low 4 bits == 0xD for negative). A signed or unsigned 5 digit value is encoded in 3 bytes, where the unsigned value ignores the final 4 bits and the signed value uses them to get the sign.

The legal range of values for the number of digits, num_digits, depends on target type:

 
Typenum_digitsMin / Max values
Pint81−3P_MIN_INT8 / P_MAX_INT8
Puint81−30 / P_MAX_UINT8
Pint161−5P_MIN_INT16 / P_MAX_INT16
Puint161−50 / P_MAX_UINT16
Pint321−11**P_MIN_INT32 / P_MAX_INT32
Puint321−100 / P_MAX_UINT32
Pint641−19P_MIN_INT64 / P_MAX_INT64
Puint641−200 / P_MAX_UINT64

** Note: For type Pbcd_int32 only, even though the min and max int32 have 10 digits, we allow num_digits == 11 due to the fact that 11 is required for a 10 digit negative value. (An actual 11 digit number would cause a range error, so the leading digit must be 0.)

4.8  Floating Point Base Types

4.8.1  Variable-width character-based encoding

ASCIIEBCDICDEFAULT
Pa_float32  Pe_float32  Pfloat32
Pa_float64  Pe_float64  Pfloat64

These types describe ASCII or EBCDIC character-based encodings of floating point numbers. The input representation must have this form:

[+|-]DIGITS[.][DIGITS][(e|E)[+|-]DIGITS]

Where DIGITS is a sequence of one or more digit characters, (e|E) indicates either a lower- or upper-case letter ’E’, and elements in square brackets are optional. Note that there must be at least one digit before the (optional) dot (period) character.

If the input has a valid sequence of input characters that make up a float, then the float is converted to a Pfloat32 or Pfloat64, according to the type. For example, if you specify a Pa_float32 then a characters making up a float will be read from the input and converted to an in-memory Pfloat32.

4.9  Fixed Point Base Types

The following types encode a numerator value on the input stream in different formats, as described below. They all produce an in-memory Pfpoint value whose denominator is determined from the second type argument, d_exp, where the denominator is implicitly 10dexp and is not encoded on disk.

The legal range of values for d_exp depends on the target in-memory type:

 
Typed_expMax denominator (min is 1)
Pfpoint8 / ufpoint80−2100
Pfpoint16 / ufpoint160−410,000
Pfpoint32 / ufpoint320−91,000,000,000
Pfpoint64 / ufpoint640−1910,000,000,000,000,000,000

4.9.1  Serialized binary encoding

SBLSBH
Psbl_fpoint8(K, dexp)Psbh_fpoint8(K, dexp)
Psbl_fpoint16(K, dexp)Psbh_fpoint16(K, dexp)
Psbl_fpoint32(K, dexp)Psbh_fpoint32(K, dexp)
Psbl_fpoint64(K, dexp)Psbh_fpoint64(K, dexp)
Psbl_ufpoint8(K, dexp)Psbh_ufpoint8(K, dexp)
Psbl_ufpoint16(K, dexp)Psbh_ufpoint16(K, dexp)
Psbl_ufpoint32(K, dexp)Psbh_ufpoint32(K, dexp)
Psbl_ufpoint64(K, dexp)Psbh_ufpoint64(K, dexp)

These types describe fixed-point numbers where the numerator is encoded in serialized binary form on the input stream. Serialized binary encodings are described above for the Psbl_ and Psbh_ integer types. Like those integer types, the number of bytes on the input is specified as the first type argument. The legal range of values for the number of bytes depends on target type, and follows the same rule specified for the Psbl_ and Psbh_ integer types.

Each type takes a second argument, d_exp, where the denominator value is implicitly 10dexp and is not encoded on disk. For example, sbl_fpoint32(:3, 2:) specifies that a numererator is encoded on the input as a three binary bytes with the low-order byte appearing first, where the resulting in-memory Pfpoint32 has the value read from the input as its numerator, and the number 102 (i.e., 100) as its denominator.

4.9.2  EBC encoding

EBC
Pebc_fpoint8
Pebc_fpoint16
Pebc_fpoint32
Pebc_fpoint64
Pebc_ufpoint8
Pebc_ufpoint16
Pebc_ufpoint32
Pebc_ufpoint64

These types describe fixed-point numbers where the numerator is encoded as EBCDIC numeric digits on the input stream. This encoding is described above for the Pebc_ integer types. Like those integer types, the number of digits on the input is specified as the first type argument. The legal range of values for the number of digits depends on target type, and follows the same rule specified for the Pebc_ integer types.

Each type takes a second argument, d_exp, where the denominator value is implicitly 10dexp and is not encoded on disk. For example, ebc_fpoint32(:3, 2:) specifies that a numererator is encoded on the input as three EBCDIC digits, where the resulting in-memory Pfpoint32 has the value read from the input as its numerator, and the number 102 (i.e., 100) as its denominator.

4.9.3  BCD encoding

BCD
Pbcd_fpoint8
Pbcd_fpoint16
Pbcd_fpoint32
Pbcd_fpoint64
Pbcd_ufpoint8
Pbcd_ufpoint16
Pbcd_ufpoint32
Pbcd_ufpoint64

These types describe fixed-point numbers where the numerator is encoded as BCD numeric digits on the input stream. This encoding is described above for the bcd_ integer types. Like those integer types, the number of digits on the input is specified as the first type argument. The legal range of values for the number of digits depends on target type, and follows the same rule specified for the bcd_ integer types.

Each type takes a second argument, d_exp, where the denominator value is implicitly 10dexp and is not encoded on disk. For example, bcd_fpoint32(:3, 2:) specifies that a numererator is encoded on the input as three BCD digits, where the resulting in-memory Pfpoint32 has the value read from the input as its numerator, and the number 102 (i.e., 100) as its denominator.


Previous Up Next