About PADS

What is PADS?

PADS is a declarative data description language that allows data analysts to describe both the physical layout of ad hoc data sources and semantic properties of that data. From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as XML or those required for loading relational databases, and tools for running XQueries over raw PADS data sources. The descriptions are concise enough to serve as ``living'' documentation while flexible enough to describe most of the ASCII, binary, and Cobol formats that we have seen in practice. The generated parsing library provides for robust, application-specific error handling.

Who should use PADS?

PADS is suitable for describing ad hoc data formats in binary, ASCII, and EBCDIC encodings. From a PADS description, the PADS compiler generates a library for manipulating the associated data source. The PADS/ML compiler generates an ML library while PADS/C generates a C library. Data analysts can use the generated library directly, or they can use a suite of auxiliary tools to summarize the data, validate it, translate it into XML, or reformat it into a form suitable for loading into relational databases.

Who shouldn't use PADS?

PADS is not designed to parse XML data or data already in a relational database. Such data should be processed with XML or database-specific tools.

The PADS Team

Tufts University

Princeton University


PADS alumni:

  • Mark Daly
  • Zach DeVito
  • Pamela Dragosh
  • Mary Fernandez
  • Andrew Forrest
  • Joel Gottleib
  • Vikas Kedia
  • John Launchbury
  • Yitzhak Mandelbaum
  • Ricardo Medel
  • Frances Spalding
  • Peter White
  • Qian Xi
  • Xuan Zheng
  • Kenny Zhu

We have always a long list of things to do. So if you want to help, don't be shy and find us on github!


We would like to thank Bala Krishnamurthy, Andrew Hume, David Poole, and Oliver Spatscheck for informative discussions about particular forms of ad hoc data and potentially useful tools for manipulating that data.

We would like to thank Glenn Fowler and Phong Vo for their help in using the AST and SFIO libraries.

We would like to thank Diane Ristaino for artistic assistance.


The PADS project has been generously supported by AT&T. In addition, portions of the work have been supported by DARPA Grant No. FA8750-07-C-0014 and the National Science Foundation under Grants No. 0612147, 0633268, and 0615062. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA or the National Science Foundation.