Chapter 4 Base Types Overview

Pads base types describe inidivual, small values: numbers, strings, dates, and so on. This chapter provides an overview of some of the most important built-in Pads base types, with a focus on how these types are used within Pads source files. Appendix B gives detailed descriptions of each of the built-in base types, including the full set of library API calls for each type. (As discussed in Chapter 3, each type has correspond read, write, format, and accumulator functions.)

In addition to the built-in types, it is possible to extend Pads with new base types; see Section 15.3.

4.1 In-Memory Representation

Each base type has an external and an in-memory representation. Related base types share the same in-memory representation. For example, while there are 18 different string base types, all of them use Pstring as their in-memory representation.

This section reviews the different in-memory representation types.

4.1.1 `Pchar`

Type Pchar is the in-memory representation of an external character. It is equivalent to the C type unsigned char, or type Puint8: all are 8-bit unsigned values. N.B.: Regardless of the external character that is read, the corresponding ASCII character is stored in the in-memory representation.

4.1.2 `Pstring`

Type Pstring is the in-memory representation for all forms of external strings. A Pstring s has two fields:

s.len : the length of the string.
s.str : a pointer to a sequence of s.len characters.

In addition, Pstring has fields that are manipulated by various string functions (some of which are described below). Most programmers should only use s.len and s.str.

The library discipline has a field copy_strings which controls copying behavior for string read calls. If copy_strings is non-zero, the string read functions always copy strings. Otherwise, a copy is not made and the target Pstring points to memory managed by the current IO discipline. copy_strings should only be set to zero for record-based IO disciplines where strings from record K are not used after P_io_next_rec has been called to move the IO cursor to record K+1. Note: Pstring_preserve can be used to force a string that is using sharing to make a copy so that the string is ’preserved’ (remains valid) across calls to P_io_next_rec.

When copying is used, the string copies are stored in an internal resizable buffer, and s.str points into this buffer. To ensure correct behavior, function Pstring_init should be called prior to using a Pstring, and function Pstring_cleanup should be called when a given Pstring is no longer in use. Generated initialization and cleanup functions call these routines for any Pads type containing a Pstring.

The full set of string helper functions appears in Figure 4.1.

Perror_t Pstring_init(P_t *pads, Pstring *s); Perror_t Pstring_cleanup(P_t *pads, Pstring *s); Perror_t Pstring_share(P_t *pads, Pstring *targ, const Pstring *src); Perror_t Pstring_cstr_share(P_t *pads, Pstring *targ, const char *src, size_t len); Perror_t Pstring_copy(P_t *pads, Pstring *targ, const Pstring *src); Perror_t Pstring_cstr_copy(P_t *pads, Pstring *targ, const char *src, size_t len); Perror_t Pstring_preserve(P_t *pads, Pstring *s);int Pstring_eq(const Pstring *str1, const Pstring *str2);int Pstring_eq_cstr(const Pstring *str, const char *cstr); Pint8 Pstring2int8 (const Pstring *str); /* returns P_MIN_INT8 on error */Pint16 Pstring2int16 (const Pstring *str); /* returns P_MIN_INT16 on error */Pint32 Pstring2int32 (const Pstring *str); /* returns P_MIN_INT32 on error */Pint64 Pstring2int64 (const Pstring *str); /* returns P_MIN_INT64 on error */Puint8 Pstring2uint8 (const Pstring *str); /* returns P_MAX_UINT8 on error */Puint16 Pstring2uint16(const Pstring *str); /* returns P_MAX_UINT16 on error */Puint32 Pstring2uint32(const Pstring *str); /* returns P_MAX_UINT32 on error */Puint64 Pstring2uint64(const Pstring *str); /* returns P_MAX_UINT64 on error */Pfloat32 Pstring2float32(const Pstring *str); /* returns P_MIN_FLOAT32 on error */Pfloat64 Pstring2float64(const Pstring *str); /* returns P_MIN_FLOAT64 on error */

Figure 4.1: Library functions for manipulating Pstrings.

The following list describes the behaviors of these functions.

Pstring_init(pads, s): Initialize s to valid empty string (no dynamic memory allocated yet).
Pstring_cleanup(pads, s): Free any dynamic memory allocated for s.
Pstring_share(pads, targ, src): Make targ refer to the string in src, sharing the space with the original owner.
Pstring_cstr_share(pads, targ, src, len): Make targ refer to len characters in C-string src.
Pstring_copy(pads, targ, src): Copy the string in src into targ; sharing is not used.
Pstring_cstr_copy(pads, targ, src, len): Copy len characters from C-string src into targ; sharing is not used.
Pstring_eq(str1, str2): Returns 1 if str1 and str2 are of equal length and str1 equals str2 (based on strncmp). Otherwise, returns 0.
Pstring_eq_cstr(str, cstr): Returns 1 if str1 and str2 are of equal length and str1 equals str2 (based on strncmp). Otherwise, returns 0.

Although not strictly necessary, both Pstring_copy and Pstring_cstr_copy null-terminate targ->str. Each copy function returns P_ERR on bad arguments or on failure to allocate space, otherwise it returns P_OK.

The various Pstring2NUMERIC functions convert a string to the specified numeric type. If the contents of the string cannot be converted to the specified type, the minumum value for the numeric type is returned for signed numeric types, or the maximum value is returned for unsigned numeric types.

4.1.3 Integer types

There are eight in-memory representations for integers, four types for signed values (Pint8, Pint16, Pint32, and Pint64) and four types for unsigned values (Puint8, Puint16, Puint32, and Puint64). The number in these type names indicates the number of bits in the in-memory representation, thus there are signed and unsigned integers that use 1, 2, 4, or 8 bytes of memory. The endian-ness of these in-memory representation types is the same as the endian-ness of the processor that the code is executing on. The external representation, which may be in another format, is always converted to the primitive in-memory representation supported by the processor.

For programming convenience, the header file pads.h includes definitions of the minimum and maximum values for each signed type, and of the maximum value for each unsigned type. Figure 4.2 list these constants.

P_MIN_INT8 P_MAX_INT8 P_MAX_UINT8 P_MIN_INT16 P_MAX_INT16 P_MAX_UINT16 P_MIN_INT32 P_MAX_INT32 P_MAX_UINT32 P_MIN_INT64 P_MAX_INT64 P_MAX_UINT64

Figure 4.2: Minimum and maximum values for Pads integer types.

4.1.4 Floating-point types

Pads has only two in-memory floating point representations, Pfloat32 and Pfloat64, which correspond to ANSI C types float and double, respectively.

4.1.5 Fixed-point types

A fixed-point number is a number with a fixed number of decimal digits (digits after the ’dot’). For example, 123.456 is a fixed-point number with three decimal digits. External formats for such numbers occur, e.g., in COBOL data. We have chosen a very simple in-memory representation for such numbers: a struct with a numerator (num) that contains all of the digits and a denominator (denom) that contains some power of 10. For example, 123.456 would be represented with a numerator containing 123456 and a denominator containing 1000 (10³).

The in-memory representation always has an unsigned denominator. We provide signed and unsigned representations that use signed and unsigned numerators, respectively. One can choose the number of bits used for both numerator and denominator in the same way as for the integer types, thus there are four types for signed values and four types for unsigned values:

typedef struct { Pint8 num; Puint8 denom; } Pfpoint8;typedef struct { Pint16 num; Puint16 denom; } Pfpoint16;typedef struct { Pint32 num; Puint32 denom; } Pfpoint32;typedef struct { Pint64 num; Puint64 denom; } Pfpoint64;typedef struct { Puint8 num; Puint8 denom; } Pufpoint8;typedef struct { Puint16 num; Puint16 denom; } Pufpoint16;typedef struct { Puint32 num; Puint32 denom; } Pufpoint32;typedef struct { Puint64 num; Puint64 denom; } Pufpoint64;

There are two macros to help one use values of these in-memory types. P_FPOINT2FLOAT32(fp) calculates fp.num/fp.denom as a Pfloat32, while P_FPOINT2FLOAT64(fp) calculates fp.num/fp.denom as a Pfloat64.

4.2 Base Type Mask

The mask for base types is just an integer type, treated as an array of bits:

typedef Puint32 Pbase_m;

Masks for accessing individual bits of the base type masks are described in sections describing operations that use mask. (cf. Section 3.16.3). More information about how base types handle masks is available in Appendix B

4.3 Base Type Parse Descriptor

Parse descriptors for base types contain only the fields described in Section 3.13. Specific error codes are discussed when the base type read functions are described.

4.4 Character Sets

Pads currently supports two character sets for external data, ASCII and EBCDIC. As discussed in Section 3.5, the library discipline contains a field def_charset that selects which character set to use when one is not specified explicitly. As a result, for each ’kind’ of data that has an external form made up of characters (characters, strings, character-based dates, character-based integer and floating point numbers, etc.), Pads has three types: a type that indicates the external form is always ASCII, a type that indicates the external form is always EBCDIC, and a type that indicates that the external form uses the character set specified in def_charset.

In each section describing character-based types, we give a three-column table indicating the type(s) that use ASCII, EBCDIC, or DEFAULT character sets. For example, the next section begins with a table showing types Pa_char (ASCII), Pe_char (EBCDIC), Pchar (DEFAULT).

4.5 Character Base Types

4.5.1 Fixed-width character-based encoding

ASCII	EBCDIC	DEFAULT
`Pa_char`	`Pe_char`	`Pchar`

For example, writing

Pa_char c;

in a Pads source file (within a Pstruct, for example) indicates that a single ASCII character is expected. Writing a constraint such as

Pe_char c : c == ’A’ || c == ’B’;

indicates that an EBCDIC capital letter A or B is expected. NB: Note that the constraint expression is applied to the in-memory representation of c, which is an ASCII value, thus the C character constants (ASCII constants) are used to specify letters A and B.

4.5.2 Special character counting base types

ASCII	EBCDIC	DEFAULT
`Pa_countX`	`Pe_countX`	`PcountX`
`Pa_countXtoY`	`Pe_countXtoY`	`PcountXtoY`

Unlike all other base types, these counting base types never advance the IO cursor. You can think of these types as “peeking ahead” to see how many occurrences of a given character appear forward of the current IO cursor position.

The PcountX types count the number of occurences of character X between the current IO cursor position and the fist EOR (end of record) or EOF (end of file). They take three parameters, x, eor_required, and count_max. x is the character to count. If eor_required is non-zero, then encountering EOF before EOR produces an error. If count_max is non-zero, EOR/EOF must be encountered before scanning count_max characters, otherwise an error is returned. For example,

Pa_countX(: ’=’, 0, 0 :) my_count;

will count the number of ASCII equals-sign characters between the IO cursor and the next EOR or EOF, with no limit on the maximum scan distance.

4.6 String Base Types (including dates and times)

The large number of string base types arises from the fact that there are many different ways to indicate the extent of a string. The entire input (up to end-of-file or end-of-record) is a sequence of bytes that can be included in a string, so when specifying a string type in a Pads description, we need to indicate how much of that input we would like included in the string.

4.6.1 Pstring_FW

ASCII	EBCDIC	DEFAULT
`Pa_string_FW`	`Pe_string_FW`	`Pstring_FW`

One of the simplest ways to specify the extent of a string is to give the exact number of characters, or width, that will be included in the string. For example,

Pstring_FW(: 10 :) my_string;

Specifies a string with width 10. In this case the default character set will determine whether ASCII or EBCDIC characters are expected in the input stream. Regardless of the input character set, the resulting in-memory Pstring contains ASCII characters.

An error occurs if the specified width is not available. See Appendix B for details.

4.6.2 Pstring

ASCII	EBCDIC	DEFAULT
`Pa_string`	`Pe_string`	`Pstring`

For the Pstring type one specifies a ’stop character’ that is expected immediately following the string. The extent of the string is all characters from the IO cursor up to but not including the first occurrence of the stop char. For example,

Pe_string(:’|’:) my_string;

Indicates that a series of EBCDIC characters is expected, followed by an EBCDIC vertical bar. Note that the stop char is always specified in ASCII (in this case usng a C character constant). When the character set that is being read from the input is EBCDIC, the read function looks for the EBCDIC character that is equivalenet to the specified ASCII character.

4.6.3 Pstring_ME

ASCII	EBCDIC	DEFAULT
`Pa_string_ME`	`Pe_string_ME`	`Pstring_ME`

For type Pstring_ME, a regular expression called the matching expression is given, and the extent of the string is the longest sequence of characters starting at the current IO position which match this expression. Note that when you specify a regular expression as a C string, you must use two backslashes to indicate a single backslash character (otherwise C will think you are applying the special backslash operator to the following character). In a language such as Perl, which does not have this requirement, you might write

/\S*/

to create a regular expression which will match a sequence of zero or more non-space characters. To use the same regular expression in a PADSL description with the Pa_string_ME type, you would write:

Pa_string_ME(: "/\\S\*/" :) my_string;

This will match a sequence of zero or more non-space characters and assign it to my_string. Note that if a space occurs immediately, a match still occurs, since we specified that zero characters was OK; my_string would be a string of length zero. The extent is bound be end-of-record/end-of-file, so if there are no spaces before an end-of-record, my_string will end up containing all characters remaining in the record.

As a concrete example, if the input at the current IO cursor is hello world when the above string declaration is used to read from the input, my_string will end up containing hello. Note that the space that follows hello is not included, since it does not part of the match.

4.6.4 Pstring_SE

ASCII	EBCDIC	DEFAULT
`Pa_string_SE`	`Pe_string_SE`	`Pstring_SE`

For type Pstring_SE, a regular expression called the stop expression is given, and the extent of the string is the longest sequence of characters starting at the current IO position such that the characters immediately following successfully match the stop expression. None of the characters matching the stop expression are included in the result. For example,

Pa_string_SE(: "/\\s|$/" :) my_string;

The stop expression will match either a space character (due to the backslash-s) or end-of-record/end-of-file (due to the special dollar-sign character). As a result, my_string will end up containing all non-space characters up to (but not including) the first space characer that is found, or up to the end of the current record if no space character is found.

You may have noticed that the Pa_string_ME and Pa_string_SE examples actually specify exactly the same extent. Because of the power of regular expressions, it is often the case that you can choose to use either type. You should use whichever type results in a clearer description of what is expected in the input. (In this case, thePa_string_ME form is simpler and therefore clearer.)

4.6.5 Timestamp_explicit

ASCII	EBCDIC	DEFAULT
`Pa_timestamp_explicit_FW`	`Pe_timestamp_explicit_FW`	`Ptimestamp_explicit_FW`
`Pa_timestamp_explicit`	`Pe_timestamp_explicit`	`Ptimestamp_explicit`
`Pa_timestamp_explicit_ME`	`Pe_timestamp_explicit_ME`	`Ptimestamp_explicit_ME`
`Pa_timestamp_explicit_SE`	`Pe_timestamp_explicit_SE`	`Ptimestamp_explicit_SE`

A timestamp is a combination of a calendar date and a time of day. The corresponding in-memory representation is a Puint32 which represents the number of seconds since 00:00:00 1-Jan-1970 UTC, also knows as “seconds since the epoch.” Thus, the time 00:00:20 1-Jan-1970 UTC would be represented internally as the number 1200, since it occurs 1200 seconds (20 minutes) past the epoch.

If the input is ‘‘midnight Jan 1 1970’’ and the time zone is UTC, then this produces a value of 0 since this is actually the epoch. If the time zone is EST, then this produces 5 * 60 * 60 since midnight in the EST timezone occurred 5 hours after the start of the epoch.

If the input explicitly has a time zone, as in ‘‘midnight Jan 1 1970 UTC’’ then the time zone in the input is used, so this would produce 0, regardless of the time zone specified for the type. Of course, not all timestamp input formats allow you to explicitly give the time zone!

The input values are ASCII or EBCDIC strings. Each Ptimestamp_explicit type takes as first argument the same form of specifying the string’s extent as the corresponding Pstring type, and takes as second argument a timestamp format string which describes what the input string should contain.

Timestamp formats consists of literal characters that are are simply expected to be present in the input and special combinations of a percent-sign and a character used to indicate expected parts of the timestamp. For example, the input format "%Y-%m-%d+%H:%M" indicates that a format that starts with a four digit year, then a dash, then a two digit month, then a dash, then a two-digit day, then a plus sign, then a two digit hour, then a colon, then a two digit minutes. (To specify that a literal percent sign must appear in the input, use two percent signs in a row.) A full description of supported formats appears on the webpage: www.research.att.com/gsf/man/man3/tm.html

Each of the Ptimestamp_explicit types corresponds to one of the Pstring types that has already been described, where each takes one additional argument to specify the input format. For example,

Pa_timestamp_explicit(: ’|’ , "%Y-%m-%d+%H:%M", P_cstr2timezone("-0500"):) my_timestamp;

Reads an ASCII string, up to but not including a vertical bar, and converts that string into a Puint32 timestamp. The conversion will be successful only if the string has the specified format.

Some timestamp formats include explicit time zone information,such as the one above. Pads provides the function P_cstr2timezone to convert a string representation of a time zone into an value of type Tm_zone_t *. This function is described in Chapter 14.

For the rest,the input time zone is taken from the Pads discipline field disc->in_time_zone, as described in Section 15.1.10.

4.6.6 Timestamp

ASCII	EBCDIC	DEFAULT
`Pa_timestamp_FW`	`Pe_timestamp_FW`	`Ptimestamp_FW`
`Pa_timestamp`	`Pe_timestamp`	`Ptimestamp`
`Pa_timestamp_ME`	`Pe_timestamp_ME`	`Ptimestamp_ME`
`Pa_timestamp_SE`	`Pe_timestamp_SE`	`Ptimestamp_SE`

The timestamp types are the same as the timestamp_explicit types, except no timestamp format is given. Instead, the Pads discipline field disc->in_formats.timestamp is used for all Ptimestamp types.

4.6.7 Date_explicit

ASCII	EBCDIC	DEFAULT
`Pa_date_explicit_FW`	`Pe_date_explicit_FW`	`Pdate_explicit_FW`
`Pa_date_explicit`	`Pe_date_explicit`	`Pdate_explicit`
`Pa_date_explicit_ME`	`Pe_date_explicit_ME`	`Pdate_explicit_ME`
`Pa_date_explicit_SE`	`Pe_date_explicit_SE`	`Pdate_explicit_SE`

Dates are calendar days (no time of day). Like timestamps, we represent a date as a Puint32 recording “seconds since the epoch.”

The Pdate_explicit types take a second argument, a date format, which accepts the same special characters as the Ptimestamp_explcit types. So, technically there is nothing to stop you from using the date types to input time of day fields. However, we encourage you to use the Ptimestamp types when both a calendar day and a time of day are to be input, and to use the Pdate types when just the calendar day is to be input.

4.6.8 Date

ASCII	EBCDIC	DEFAULT
`Pa_date_FW`	`Pe_date_FW`	`Pdate_FW`
`Pa_date`	`Pe_date`	`Pdate`
`Pa_date_ME`	`Pe_date_ME`	`Pdate_ME`
`Pa_date_SE`	`Pe_date_SE`	`Pdate_SE`

The date types are the same as the date_explicit types, except no date format is given. Instead, the Pads discipline field disc->in_formats.date is used for all Pdate types.

4.6.9 Time_explicit

ASCII	EBCDIC	DEFAULT
`Pa_time_explicit_FW`	`Pe_time_explicit_FW`	`Ptime_explicit_FW`
`Pa_time_explicit`	`Pe_time_explicit`	`Ptime_explicit`
`Pa_time_explicit_ME`	`Pe_time_explicit_ME`	`Ptime_explicit_ME`
`Pa_time_explicit_SE`	`Pe_time_explicit_SE`	`Ptime_explicit_SE`

Times give the time of day, with no calendar date. They are represented as a Puint32 recording seconds since midnight. For examle, the time 1am is represented as 3600 (i.e., 3600 seconds, or 60 minutes, after midnight).

The Ptime_explicit types take a second argument, a time format, which accepts the same special characters as the timestamp and date types. However, we encourage you to use the Ptimes types when just a time of day is expected.

4.6.10 Time

ASCII	EBCDIC	DEFAULT
`Pa_time_FW`	`Pe_time_FW`	`Ptime_FW`
`Pa_time`	`Pe_time`	`Ptime`
`Pa_time_ME`	`Pe_time_ME`	`Ptime_ME`
`Pa_time_SE`	`Pe_time_SE`	`Ptime_SE`

The time types are the same as the time_explicit types, except no time format is given. Instead, the Pads discipline field disc->in_formats.time is used for all Ptime types.

4.6.11 IP

ASCII	EBCDIC	DEFAULT
`Pa_ip_FW`	`Pe_ip_FW`	`Pip_FW`
`Pa_ip`	`Pe_ip`	`Pip`
`Pa_ip_ME`	`Pe_ip_ME`	`Pip_ME`
`Pa_ip_SE`	`Pe_ip_SE`	`Pip_SE`

The Pip type reads an IP address string from the input that is in numeric dotted form (as in 10.1.0.17) using ASCII or EBCDIC digits and periods (dots). The string consists of up to four parts with values between 0 and 255, separated by periods, with an optional trailing period. When there are fewer than four parts, the missing parts are treated as implicitly zero, and are inserted as shown in the following diagram, which shows the eight legal input forms and the equivalent expanded form.

`<part1>`	`—→`	`<part1>.0.0.0`
`<part1>.`	`—→`	`<part1>.0.0.0.`
`<part1>.<part4>`	`—→`	`<part1>.0.0.<part4>`
`<part1>.<part4>.`	`—→`	`<part1>.0.0.<part4>.`
`<part1>.<part2>.<part4>`	`—→`	`<part1>.<part2>.0.<part4>`
`<part1>.<part2>.<part4>.`	`—→`	`<part1>.<part2>.0.<part4>.`
`<part1>.<part2>.<part3>.<part4>`	`—→`	`same`
`<part1>.<part2>.<part3>.<part4>.`	`—→`	`same`

Each <part> is made up of 1 to 3 digits which specify a number in the range [0, 255].

The result is a single Puint32 value with each part encoded in one of the four bytes. part1 is stored in the high-order byte, part4 in the low-order byte. You can obtain each part using the macro

P_IP_PART(addr, part)

where part must be an integer between 1 and 4.

The digits and the "." char are read as EBCDIC chars if the EBCDIC form is used or if the default form is used and pads->disc->def_charset is Pcharset_EBCDIC. Otherwise the data is read as ASCII chars.

4.7 Integer Base Types

4.7.1 Fixed-width character-based encoding

ASCII	EBCDIC	DEFAULT
`Pa_int8_FW`	`Pe_int8_FW`	`Pint8_FW`
`Pa_int16_FW`	`Pe_int16_FW`	`Pint16_FW`
`Pa_int32_FW`	`Pe_int32_FW`	`Pint32_FW`
`Pa_int64_FW`	`Pe_int64_FW`	`Pint64_FW`
`Pa_uint8_FW`	`Pe_uint8_FW`	`Puint8_FW`
`Pa_uint16_FW`	`Pe_uint16_FW`	`Puint16_FW`
`Pa_uint32_FW`	`Pe_uint32_FW`	`Puint32_FW`
`Pa_uint64_FW`	`Pe_uint64_FW`	`Puint64_FW`

The above types are used when the input representation for an integer is a fixed number of ASCII or EBCDIC characters. The int types are signed types, while the uint types are unsigned. The number in the type name specifies how many bits are used in the in-memory represenation, thus a Puint32 is a 32-bit (4 byte) representation of an unsigned integer.

The characters in the input can have an optional plus or minus sign for signed types, or an optional plus signed for unsigned types, followed by a set of one or more digits. In addition, leading or trailing whitespace can occur, but only if the Pads discipline field disc->flags has the WSPACE_OK flag set. The data is read as EBCDIC chars if an EBCDIC form (such as Pe_int8) is used, or if the default form (Pint8) is used and pads->disc->def_charset is Pcharset_EBCDIC. Otherwise, the data is read as ASCII chars.

4.7.2 Variable-width character-based encoding

ASCII	EBCDIC	DEFAULT
`Pa_int8`	`Pe_int8`	`Pint8`
`Pa_int16`	`Pe_int16`	`Pint16`
`Pa_int32`	`Pe_int32`	`Pint32`
`Pa_int64`	`Pe_int64`	`Pint64`
`Pa_uint8`	`Pe_uint8`	`Puint8`
`Pa_uint16`	`Pe_uint16`	`Puint16`
`Pa_uint32`	`Pe_uint32`	`Puint32`
`Pa_uint64`	`Pe_uint64`	`Puint64`

The expected input for these types is an optional sign character followed by a sequence of digits. The number of characters that make the input is variable: after the first digit, the digits are read until (but not including) the first non-digit or EOR/EOF. If the Pads discipline field disc->flags has the WSPACE_OK flag set, then leading whitespace is allowed.

4.7.3 Raw binary encoding

RAW

Pb_int8

Pb_int16

Pb_int32

Pb_int64

Pb_uint8

Pb_uint16

Pb_uint32

Pb_uint64

These are the first binary types described in this chapter. The input is not made up of ASCII or EBCDIC characters that need to be interpreted to see what number they are describing. Instead, the number itself is encoded in binary form, as a sequence of bytes.

There are binary types for signed or unsigned binary integers of common bit widths (8, 16, 32, and 64 bit widths). Pb_int8 corresponds to one byte of input, Pb_int16 to two bytes of input, and so on.

The representation in memory is just the corresponding signed or unsigned type, thus Pb_uint16 has representation type Puint16. The bytes from the input are simply copied into the bytes that make up the representation. If the endian-ness of input data is different from the endian-ness of the machine, then the byte order is reversed to form the in-memory representation; otherwise the byte order is preserved.

The endian-ness of the machine running the Pads program is fixed: it is determined automatically by the Pads libary. The input data endianess is described by Pads disipline field disc->d_endian.

In some cases it is possible to have Pads determine the proper setting for disc->d_endian automatically, by using the annotation Pendian with the first multi-byte binary integer field that appears in the data. For example, consider this header definition:

Pstruct header { Pendian Pb_uint16 version : version < 10; ... };

This Pads description indicates the first value in the header is a 2-byte unsigned binary integer, version, whose value should be less than ten. The Pendian annotation indicates that there should be two attempts at reading the version field: once with the current disc->d_endian setting, and (if the read fails) once with the opposite disc->d_endian setting. If the second read succeeds, then the new disc->d_endian setting is retained, otherwise the original setting is retained.

Note that the Pendian pragma is only able to determine the correct endian choice for a field that has an attached constraint, where the wrong choice of endian setting will always cause the constraint to fail. (In the above example, if a value less than ten is read with the wrong d_endian setting, the result is a value that is much greater than ten. )

4.7.4 Serialized binary encoding

SBL	SBH
`Psbl_int8`	`Psbh_int8`
`Psbl_int16`	`Psbh_int16`
`Psbl_int32`	`Psbh_int32`
`Psbl_int64`	`Psbh_int64`
`Psbl_uint8`	`Psbh_uint8`
`Psbl_uint16`	`Psbh_uint16`
`Psbl_uint32`	`Psbh_uint32`
`Psbl_uint64`	`Psbh_uint64`

These types describe signed or unsigned binary integers that have been encoded with a specified number of bytes K. For the PPsbl_ types, the first byte on the input stream is treated as the low-order byte of the K byte value, For the PPsbh_ types, the first byte on the input stream is treated as the high-order byte of the K byte value, For example, Psbl_int32(:3:) describes a 3 byte binary encoding where the first byte encountered is the low-order byte.

These types are more general than the simpler Pb_ types because you explicitly specify the number of bytes (from 1 to 8) independently of the target in-memory type, allowing for types such as the Psbl_int32(:3:) type just described. These types also explicitly specify the endian-ness of the data bytes, rather than using disc->d_endian.

The following table shows those cases where serialized binary types have equivalent simple binary types.


Serialized	Equivalent type if
Binary	`disc->d_endian` is
Type	PbigEndian	PlittleEndian
`Psbl_int8(:1:)`		`Pb_int8`
`Psbl_int16(:2:)`		`Pb_int16`
`Psbl_int32(:4:)`		`Pb_int32`
`Psbl_int64(:8:)`		`Pb_int64`
`Psbl_uint8(:1:)`		`Pb_uint8`
`Psbl_uint16(:2:)`		`Pb_uint16`
`Psbl_uint32(:4:)`		`Pb_uint32`
`Psbl_uint64(:8:)`		`Pb_uint64`
`Psbh_int8(:1:)`	`Pb_int8`
`Psbh_int16(:2:)`	`Pb_int16`
`Psbh_int32(:4:)`	`Pb_int32`
`Psbh_int64(:8:)`	`Pb_int64`
`Psbh_uint8(:1:)`	`Pb_uint8`
`Psbh_uint16(:2:)`	`Pb_uint16`
`Psbh_uint32(:4:)`	`Pb_uint32`
`Psbh_uint64(:8:)`	`Pb_uint64`

4.7.5 EBC encoding

EBC

Pebc_int8

Pebc_int16

Pebc_int32

Pebc_int64

Pebc_uint8

Pebc_uint16

Pebc_uint32

Pebc_uint64

These types describe signed or unsigned EBCDIC numeric encoded integers with a specified number of digits. N.B.: the specified number of digits must be odd if the value on disk can be negative. For example, Pebc_int32(:5:) describes a 5 digit signed integer.

Each byte on disk encodes one digit (using the low 4 bits). For signed values, the final byte encodes the sign (high 4 bits == 0xD for negative). E.g., a signed or unsigned 5 digit value is encoded in 5 bytes.

The legal range of values for the number of digits, num_digits, depends on target type:


Type	num_digits	Min / Max values
`Pint8`	1−3	`P_MIN_INT8` / `P_MAX_INT8`
`Puint8`	1−3	`0` / `P_MAX_UINT8`
`Pint16`	1−5	`P_MIN_INT16` / `P_MAX_INT16`
`Puint16`	1−5	`0` / `P_MAX_UINT16`
`Pint32`	1−10	`P_MIN_INT32` / `P_MAX_INT32`
`Puint32`	1−10	`0` / `P_MAX_UINT32`
`Pint64`	1−19	`P_MIN_INT64` / `P_MAX_INT64`
`Puint64`	1−20	`0` / `P_MAX_UINT64`

4.7.6 BCD encoding

BCD

Pbcd_int8

Pbcd_int16

Pbcd_int32

Pbcd_int64

Pbcd_uint8

Pbcd_uint16

Pbcd_uint32

Pbcd_uint64

These types describe signed or unsigned BCD numeric encoded integers with a specified number of digits. N.B.: the specified number of digits must be odd if the value on disk can be negative. For example, Pbcd_int32(:5:) describes a 5 digit signed integer.

Each byte on disk encodes two digits, 4 bits per digit. For signed values, a negative number is encoded by having number of digits be odd so that the remaining low 4 bits in the last byte are available for the sign. (low 4 bits == 0xD for negative). A signed or unsigned 5 digit value is encoded in 3 bytes, where the unsigned value ignores the final 4 bits and the signed value uses them to get the sign.

The legal range of values for the number of digits, num_digits, depends on target type:


Type	num_digits	Min / Max values
`Pint8`	1−3	`P_MIN_INT8` / `P_MAX_INT8`
`Puint8`	1−3	`0` / `P_MAX_UINT8`
`Pint16`	1−5	`P_MIN_INT16` / `P_MAX_INT16`
`Puint16`	1−5	`0` / `P_MAX_UINT16`
`Pint32`	1−11**	`P_MIN_INT32` / `P_MAX_INT32`
`Puint32`	1−10	`0` / `P_MAX_UINT32`
`Pint64`	1−19	`P_MIN_INT64` / `P_MAX_INT64`
`Puint64`	1−20	`0` / `P_MAX_UINT64`

** Note: For type Pbcd_int32 only, even though the min and max int32 have 10 digits, we allow num_digits == 11 due to the fact that 11 is required for a 10 digit negative value. (An actual 11 digit number would cause a range error, so the leading digit must be 0.)

4.8 Floating Point Base Types

4.8.1 Variable-width character-based encoding

ASCII	EBCDIC	DEFAULT
`Pa_float32`	`Pe_float32`	`Pfloat32`
`Pa_float64`	`Pe_float64`	`Pfloat64`

These types describe ASCII or EBCDIC character-based encodings of floating point numbers. The input representation must have this form:

[+|-]DIGITS[.][DIGITS][(e|E)[+|-]DIGITS]

Where DIGITS is a sequence of one or more digit characters, (e|E) indicates either a lower- or upper-case letter ’E’, and elements in square brackets are optional. Note that there must be at least one digit before the (optional) dot (period) character.

If the input has a valid sequence of input characters that make up a float, then the float is converted to a Pfloat32 or Pfloat64, according to the type. For example, if you specify a Pa_float32 then a characters making up a float will be read from the input and converted to an in-memory Pfloat32.

4.9 Fixed Point Base Types

The following types encode a numerator value on the input stream in different formats, as described below. They all produce an in-memory Pfpoint value whose denominator is determined from the second type argument, d_exp, where the denominator is implicitly 10^d_exp and is not encoded on disk.

The legal range of values for d_exp depends on the target in-memory type:


Type	d_exp	Max denominator (min is 1)
`Pfpoint8 / ufpoint8`	0−2	`100`
`Pfpoint16 / ufpoint16`	0−4	`10,000`
`Pfpoint32 / ufpoint32`	0−9	`1,000,000,000`
`Pfpoint64 / ufpoint64`	0−19	`10,000,000,000,000,000,000`

4.9.1 Serialized binary encoding

SBL	SBH
`Psbl_fpoint8(K,` `d`_exp`)`	`Psbh_fpoint8(K,` `d`_exp`)`
`Psbl_fpoint16(K,` `d`_exp`)`	`Psbh_fpoint16(K,` `d`_exp`)`
`Psbl_fpoint32(K,` `d`_exp`)`	`Psbh_fpoint32(K,` `d`_exp`)`
`Psbl_fpoint64(K,` `d`_exp`)`	`Psbh_fpoint64(K,` `d`_exp`)`
`Psbl_ufpoint8(K,` `d`_exp`)`	`Psbh_ufpoint8(K,` `d`_exp`)`
`Psbl_ufpoint16(K,` `d`_exp`)`	`Psbh_ufpoint16(K,` `d`_exp`)`
`Psbl_ufpoint32(K,` `d`_exp`)`	`Psbh_ufpoint32(K,` `d`_exp`)`
`Psbl_ufpoint64(K,` `d`_exp`)`	`Psbh_ufpoint64(K,` `d`_exp`)`

These types describe fixed-point numbers where the numerator is encoded in serialized binary form on the input stream. Serialized binary encodings are described above for the Psbl_ and Psbh_ integer types. Like those integer types, the number of bytes on the input is specified as the first type argument. The legal range of values for the number of bytes depends on target type, and follows the same rule specified for the Psbl_ and Psbh_ integer types.

Each type takes a second argument, d_exp, where the denominator value is implicitly 10^d_exp and is not encoded on disk. For example, sbl_fpoint32(:3, 2:) specifies that a numererator is encoded on the input as a three binary bytes with the low-order byte appearing first, where the resulting in-memory Pfpoint32 has the value read from the input as its numerator, and the number 10² (i.e., 100) as its denominator.

4.9.2 EBC encoding

EBC

Pebc_fpoint8

Pebc_fpoint16

Pebc_fpoint32

Pebc_fpoint64

Pebc_ufpoint8

Pebc_ufpoint16

Pebc_ufpoint32

Pebc_ufpoint64

These types describe fixed-point numbers where the numerator is encoded as EBCDIC numeric digits on the input stream. This encoding is described above for the Pebc_ integer types. Like those integer types, the number of digits on the input is specified as the first type argument. The legal range of values for the number of digits depends on target type, and follows the same rule specified for the Pebc_ integer types.

Each type takes a second argument, d_exp, where the denominator value is implicitly 10^d_exp and is not encoded on disk. For example, ebc_fpoint32(:3, 2:) specifies that a numererator is encoded on the input as three EBCDIC digits, where the resulting in-memory Pfpoint32 has the value read from the input as its numerator, and the number 10² (i.e., 100) as its denominator.

4.9.3 BCD encoding

BCD

Pbcd_fpoint8

Pbcd_fpoint16

Pbcd_fpoint32

Pbcd_fpoint64

Pbcd_ufpoint8

Pbcd_ufpoint16

Pbcd_ufpoint32

Pbcd_ufpoint64

These types describe fixed-point numbers where the numerator is encoded as BCD numeric digits on the input stream. This encoding is described above for the bcd_ integer types. Like those integer types, the number of digits on the input is specified as the first type argument. The legal range of values for the number of digits depends on target type, and follows the same rule specified for the bcd_ integer types.

Each type takes a second argument, d_exp, where the denominator value is implicitly 10^d_exp and is not encoded on disk. For example, bcd_fpoint32(:3, 2:) specifies that a numererator is encoded on the input as three BCD digits, where the resulting in-memory Pfpoint32 has the value read from the input as its numerator, and the number 10² (i.e., 100) as its denominator.

Chapter 4 Base Types Overview

4.1 In-Memory Representation

4.1.1 Pchar

4.1.2 Pstring

4.1.3 Integer types

4.1.4 Floating-point types

4.1.5 Fixed-point types

4.2 Base Type Mask

4.3 Base Type Parse Descriptor

4.4 Character Sets

4.5 Character Base Types

4.5.1 Fixed-width character-based encoding

4.5.2 Special character counting base types

4.6 String Base Types (including dates and times)

4.6.1 Pstring_FW

4.6.2 Pstring

4.6.3 Pstring_ME

4.6.4 Pstring_SE

4.6.5 Timestamp_explicit

4.6.6 Timestamp

4.6.7 Date_explicit

4.6.8 Date

4.6.9 Time_explicit

4.6.10 Time

4.6.11 IP

4.7 Integer Base Types

4.7.1 Fixed-width character-based encoding

4.7.2 Variable-width character-based encoding

4.7.3 Raw binary encoding

4.7.4 Serialized binary encoding

4.7.5 EBC encoding

4.7.6 BCD encoding

4.8 Floating Point Base Types

4.8.1 Variable-width character-based encoding

4.9 Fixed Point Base Types

4.9.1 Serialized binary encoding

4.9.2 EBC encoding

4.9.3 BCD encoding

4.1.1 `Pchar`

4.1.2 `Pstring`