SDTS Notes [page banner]

By Bill Allen

The Spatial Data Transfer Standard (SDTS) is used by the U.S. Geological Survey (USGS) to provide many kinds of data including 7.5-min. 1:24000-scale digital elevation model (DEM) and digital line graph (DLG) data. An SDTS "transfer" consists of many files, mostly Data Descriptive Files with the .ddf file name extension, and often packed together into a single .tar.gz "tarball" file.

What follows are notes taken while reducing a stack of documentation into a digestable form while learning how to parse SDTS DEM transfers. What you see here now is only a beginning, and a lot more explanation and code will be coming.

The code examples are all done with Python 2.0, a freeware interpreted cross-platform language. See the related "Learn to Program! page with info for people new to programming or new to Python. All of the author's Python code published here is placed freely into the public domain. See the "License" statement at the top of the "Code Samples" page, which is a central reference to all the code pieces available here.

Note: Nothing here yet relates specifically to Digital Line Graph (DLG) SDTS transfers.


  What's New

  PullSDTS

  Python Notes
  Tkinter Notes
  Code Samples

  File Formats
  SDTS Notes [*]

  Glossary
  Index

Contents



STDS DDF Files in Plain English

An SDTS transfer consists of a set of DDFs (Data Descriptive Files). Out of perhaps 18 DDFs in a DEM transfer, several must be read to be able to use an SDTS DEM's elevation data, though the data itself probably resides in only one DDF. Each DDF includes instructions on how its contents are to be read. Said lightly, your program has to know how to parse the file's bootstrap instructions to know how to parse the data decoding instructions to know how to parse the data! Fortunately, 1) it isn't quite as bad as it sounds, and 2) you and your program only have to sift through parts of some of the DDFs and can ignore the rest of the transfer.

All you need to know to start parsing a DDF file is this: At its most basic, every DDF consists of two or more variable-length records. Each record begins with five characters (bytes 0-4) which give that record's length. Since a DDF may include binary data, open it as a binary file.

Two ways to verify that you are actually reading a DDF and reading it correctly are...

See Demo code, requires Python. ddfex1.py for a minimal Python script that skims through a DDF just to show it can. If you run it on the cel0.ddf (cell zero) from a 30m SDTS DEM, your output will look something like this:
Scanning c:/python/demtest/1168CEL0.DDF
   1 : 001872L   0600061   23 4
   2 : 00809 D     00055   33 4
   3 : 00809 D     00055   33 4
 463 : 00815 D     00055   33 4
 464 : 00815 D     00055   33 4
 465 : 00815 D     00055   33 4
	End of file
Apparently an "L" in byte 6 comes from its being in the leader part of a Data Descriptive Record (DDR), and the "D" comes from Data Record (DR). Another possibility is "R", maybe for repeat.

The old SDTS 30m DEM file used in many of these examples, for Ticaboo Mesa, Utah, like all other old SDTS files, is no longer free from the USGS Web site. The new 30m and 10m versions for Ticaboo are available free (subject to accessibility) from GIS Data Depot's Garfield County DEMs page (quad names Sp-W). You can extract the DDFs from it using pullSDTS, although ddfex1.py at the moment only works with the old 30m file (come back for an update).

Parsing is the process of breaking information components out of a file, much like a grammar student learns how to parse the meaning out of a sentence. Once you understand a file type's format, parsing becomes a routine reading-in procedure that you have written for your program, but it's an active intellectual pursuit until you have deduced the structure, including the exceptions a format is likely to throw at your program.

0 comes first: People new to programming must learn to think of item counts as starting with zero. A line 100 characters long starts with the 0th position and ends with the 99th. In an array of "raster data," such as image pixels or DEM points, the upper-leftmost element is in position 0,0 (that is, X=0, Y=0). Counting from zero may be hard to get used to, but it's much easier to program with once you get the hang of it.

DDF file naming: All DDF files in an SDTS transfer start with the same four characters (sometimes five?). These are not unique or apparently even significant characters. For instance, the old Ticaboo Mesa 30m Level 2 DEM used "1168." Only two quads south, the old 30m Level 1 DEM for Halls Crossing N.E. used the very same characters. So these two almost neighboring sets of DDFs cannot coexist in the same folder.


Data Descriptive Records

A DDF Data Descriptive Record (DDR) is composed of three sections: the leader, the directory, and the data descriptive area. You just met the leader with its record length and record type sections. Here is a diagram of the full DDR for a cel0.ddf carrying data for a 30m DEM. Without getting into what the "interchange level" code is about, just please note that some things explained here may not apply to a DDF that has a "1" or "3" in byte 5 instead of a "2" like you see here.


Diagram of an example DDF's first record, which is always a Data Descriptive Record (DDR). In this case, it is the DDR for a cel0.ddf with 30m DEM data.

A superscript 1 (¹) has been substituted in this and other illustrations here for the DDF-standard ASCII character "\31" field terminator (FT), and superscript zero (°) for the "\30" unit terminator (UT). In a DDR's data descriptive area, an exclamation mark (!) separates subfield labels.

Levels: The term "level" gets used for a lot of very different things in the SDTS standard. The example file is a USGS "Level 2 DEM," for instance, and here we see that it is in an "interchange level 2 SDTS transfer," but those two facts aren't directly related.

So what else does this DDR leader reveal?
Bytes 10-11 hold a number ("6") that specifies the length for this DDF's field control labels, of which you will find four in this record's data descriptive area: 0000;&, 0100;&, 1600;&, and 2600;& (there can be a lot more).
Bytes 12-16 hold a number that tells us this DDR's data descriptive area begins at byte 61.
Bytes 20-23 ("23 4") are the DDR directory entry map, but byte 22 isn't used. The map describes the layout of record's DDR directory entries, which in turn describes the layout of the data descriptive area. Athough the three values are given in the order A-B-C, what they represent lays in the order C-A-B.

Nothing directly gives the length of all the DDR directory entries, nor how many there are of them. However, we know the entries begin at the 24th byte (byte 23, always the last byte in a DDR leader). And we know that this particular leader says both that the area following the directory begins at byte 61, and that the individual directory entries are 9 bytes long (2+3+4). So 61 - 23 = 38, and 38 / 9 = 4.

Let's look more closely:

dir. entry    data description unit
0000 21 000 - 0000;&1168CEL0.DDF¹¹°
0001 30 021 - 0100;&DDF RECORD IDENTIFIER¹¹°
CELL 38 051 - 1600;&Cell¹MODN!RCID!ROWI!COLI¹(A,3I)°
CVLS 37 089 - 2600;&Cell Values¹*ELEVATION¹(B(16))°
Notice that the third element in a directory entry tells where the data descriptive unit begins with reference to byte 0 of the data descriptive area, which is also byte 61 of the record. The first unit is the DDF file name. The next unit is for the record identifier in the data records.

All of these units have three fields separated by \31. In this particular DDF file, only the last two units use their second field (data field name or subfield labels) and third field (data format). Also notice that there is no mechanism to give the lengths and positions of these fields, so a parsing program must go through each unit byte-by-byte, watching for separators.

Here are the correlations again. The four-character directory entry code is called a "tag," and the six-character descriptive code is called a "field control." The field control's ending "00" (zero zero) is a required filler, while the first number is a "data structure code" and the second is a "data type code." The DDF standard says that which characters are used can vary, but the ";&" used here tell a DDF reader to use those two as printable representations of the field and unit terminators respectively. That is, ";" = \31 and "&" = \30.

entry   descriptive     use   
 0000   0000;&       file control field, apparently optional
 0001   0100;&       DDF (not SDTS) record identifier
 CELL   1600;&Cell   cell attributes 
 CVLS   2600;&Cell Values
The subfield labels are separated by an "!" exclamation mark. The formatting that follows the subfield labels tells how the data in those columns is to be read.
MODN   Module Name (a DDF file is a module in an SDTS transfer)
RCID   Record ID (SDTS)
ROWI   Row Index
COLI   Column Index
(A,3I) = alphanumeric field & 3 integer fields
*ELEVATIONS - "*" specifies that the field/column repeats
(B(16)) = signed 16-bit integer (one per column)
So now we know that this DEM's data will come as records, each usually consisting of three fields: 1) The DDF record ID, which looks a lot like the DDR directory leader already discussed above. 2) Attributes for the data cell in this record. And 3) a cell of repeating elevations as signed 16-bit integers.

We also know that the second field is broken down into four subfields, one of which is an alphanumeric string of a length unspecified here, and the other three of which identify the SDTS record number and beginning XY row-column location. Something else we know is that a 16-bit value requires two bytes, and this is a good time to tell you that the SDTS standard requires big-endian numbers, more popularly known as "Mac format." This is fine on Motorola and Sun CPUs, but may have to be handled specially on Intel and Alpha CPUs.

All together, we now know almost enough to proceed with reading this DEM's raster data. But where are the scale, units, quad name, georeferencing, datum, info date, void and fill values, etc.? They are all in other DDFs within the SDTS transfer, but let's finish first with this DDF before moving on to others. If we had read some of those first, as a DDF-reading program probably would, we would know already that this DEM consists of an array of elevation points 370 wide by 464 high, but we can also discover that with this DDF alone.

If by now it looks to you like the attempt failed at explaining "DDFs in plain English," you should try wading into the documents that had to be puzzled through to grasp enough key details to assemble this page's very (believe it or not) simplified explanation. You can expect that parts of this page will get some corrections and expansions with further experience and understanding. "Plain English" was promised, but not a limit on the amount of it.


Data Records

A DDF Data Record (DR) consists of three parts: the leader, the directory, and the data area. As noted above, all DDF records are variable in length, with the length given in the first five bytes of the record, and with a \30 for the record's last character.

Below is an illustration using the 1168CEL0.DDF from the Ticaboo Mesa USGS SDTS 30m Level 2 DEM.


Diagram of a DDF's Data Records (DR)--first, second, and last of 464, showing the leader, directory, and initial data area. These DRs are from a cel0.ddf with 30m DEM data.

The DR's 24-byte record leader should look quite familiar from studying DDRs in the previous section, only simpler. The number given at bytes 12-16, used in a DDR to tell where the data descriptive area begins, here in a DR tells where the data area begins. Like a DDR's data descriptive area, the DR's data area has its own positional referencing starting from 0 in mid-record (the blue position numbers in the illustration above). The entry map at bytes 20-23 tells how to read the DR's own directory, which in turn tells how to read the data area.

Notice that counts include \30 unit terminators and any subfield \31 separators. So, when the DR directory tells that the record's DEM elevation values fill a 741 bytes, that is 740 bytes plus the unit/record \30 terminator. And, since we know from the DDF's DDR that these elevations are in 16-bit (two-byte) integers, we can find that 740 / 2 equals a horizontal row of 370 elevation values. Each row starts at the first column, which may hold true even with 10m DEMs, in which a row consists of around 1,100 integers. Also, we can see that this is a perfectly rectangular grid, so it includes empty sliver areas found at the four sides of most UTM DEMs, which are rotated to align with UTM "grid north."

A curiosity is that the "counts start with zero" rule doesn't hold here. The DDF Data Record numbers do start at zero, but the SDTS record number and the data row and column counts for some reason all start at one.


Free the data!

Now that we have learned all that, we can proceed with writing a rudimentary program to methodically and carefully parse DDFs... Or, hey! we might notice that what we really wanted in the first place--the DEM elevations, resides in the cel0.ddf in a very simple form, ready to pluck out and ride away with. So, for the moment, let's set aside all that parsing-for-parsing stuff and just quickly rescue what we need most.

Looking at the example Data Record above, you can see that A) its data area always begins at record byte 55, B) the position where the elevation data begins in its data area is reported at record bytes 51-53, and C) it has 370 two-byte elevation values horizontally. You also can see that there are 464 rows of those values, but you might not know that ahead of time. It won't always be bytes 55, 51-53, etc., thus a reader will have to parse out that much info. And so you can whip up a little program that does the following:

  1. Read in the record leader and get the info to read in the data directory.
  2. Note the record length given at bytes 0-4, and note the last number in the directory (at bytes 51-53 in this case).
  3. Read in that number of bytes, which comprise the DDF record number and SDTS record attributes, and lose them.
  4. Read in the individual byte pairs and do something with them (scan for extents, put into an array, write out to a new file, whatever), or read in the whole row of byte-pairs in one step and immediately write them out to a new file in a second step.
  5. Read in the last byte of the record and check that it is a \30 unit terminator.
  6. Loop until finished.
That's a huge kludge, but sometimes programming isn't pretty, it's just to get a job done. This little exercise shows that the "big bad SDTS wolf" isn't always so hard to deal with. Still, a one-time special solution like this barely begins to prepare us for handling variations in DEM data that a more useful general utility might encounter.

What can you do with these elevations? A good check that your coding worked, as well as a good end use for the data, is to output it to a 16-bit grayscale file that can be viewed in Photoshop and saved from there to other image formats that your target application will accept (TIFF, etc.). This can be as simple as shoving the elevation values out to a new .raw file as fast as they are read in from the DDF, and that's what we'll do here next.

Our second SDTS programming example, DEM-reading Demo code, requires Python. ddfex2.py, is a simple grab-and-run script for SDTS DDF DEM reading. It may be flexible enough to handle most 10m or 30m DEMs, so long as the elevation values are two-byte signed integers declared as "(B(16))" in the cel0.ddf's DDR. When you run ddfex2.py on a suitable 30m DEM DDF, you will see a result something like this in the Python command line interface:

Reading elevations: c:/python/demtest/1168CEL0.DDF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	Cols = 370 Rows = 464 in 16-bit 1-channel "Mac" format
And you will find in the same folder a new "raw" binary file, in this case 1168CEL0.raw. You can bring that file directly into some programs such as Photoshop, where you will get an options dialog (see below), which should be filled in as the ddfex2 script informed you. There is no header (but it's easy to change the script to add your own header to hold whatever text you might like to keep with the image). What first appears in Photoshop is probably a pretty dull image. Use Image/Adjust/Equalize to bring it into a more visual range of grays, like you see here below.
RAW Options dialog seen in Photoshop 5.0.

A 2/3-size, 4-bit view of the 16-bit 370x464 image.
A much reduced view of the 16-bit image out of the ddfex2.py script run on a 30m DEM.
Notice the edge "slivers" from a slight rotation to UTM grid north.

You can save this file from Photoshop as a 16-bit grayscale TIFF and from there bring it into some other programs.


The sdts.py Module

PullSDTS version 2 has an sdts.py module that parses SDTS DDF files and allows PullSDTS to query and report on SDTS transfers, and to extract DEM data. It is set up to work from a transfer already extracted, or from a transfer still in TAR or tarball form. It allows SDTS transfers to be inspected and to have DEM data extracted without the user having to open the usual tar.gz archive.

To save time, memory, and disk space, the procedure used is to extract and work with only the files that are actually needed, starting with the *iden.ddf and *xref.ddf files to get a handle on the transfer type (DEM or DLG) and its location and identity. DEM data, for instance, isn't extracted and loaded unless the user clicks on the [View DEM] button to bring up a grayscale view generated from that data. (Any files temporarily extracted or created by PullSDTS are deleted when it is closed.)

For most DDF files, the sdts.py module goes through the bootstrap procedures that load the file, parse its field structure, and then use that info to parse its data, which is all loaded, ready to query. The calling application creates an instance of sdtsClass, and then, as needed, the sdtsClass object is told to create instances of ddfClass, which has a getSelf() that does all the parsing and loading when invoked. With a DDF loaded, its data can be retrieved from single or multiple specific fields using the sdtsQuery() and related functions. (Writing a DDF parser is a great exercise in using and appreciating object-oriented programming.)

DEM data can be retrieved by using the ddfClass.getSelf() and sdtsQuery() function, too, but it's a bit slow and overly memory intensive, so there is also a demSpeedReader() function that PullSDTS uses. It's pretty much just a fancier version of the DEM-reading kludge described in the preceding section.

There are some changes in how the new post-January 2001 SDTS DEMs are constructed (e.g., decimal meters instead of feet for 10m DEMs), and some errors in following the SDTS standard (e.g., a missing digit in DCDT dates, and a missing "*" in "*X!Y" repeating field statements). So far as known, the sdts.py module works with all of this.


Data dumping

How do you study the structure of files you don't yet know how to read, especially ones you can't just drop into a wordprocessor? A vital programming technique is to dump data out into a form that the programmer can study. In picking apart how TAR files work, even before starting in on learning the information presented above about how DDF files work, the two first steps were 1) finding online references about these two file formats, and 2) writing a rudimentary TAR dump utility.

A file format reference is usually more like a shortcut than a solution to getting a grip, because format specifications (even good ones that aren't Obtuse with a capital "O") are often burdened with trying to explain a lot more than you need to know. Especially initially, when you only want to understand general or particular instances, when you would settle quite happily just for clear insight into the examples before you. So nothing beats looking at the file guts themselves to see how things work, and then using this real world reference to help puzzle out whatever documentation you've been able to find.

The script Demo code, requires Python. tardump.py has the advantage of not just putting into view the innards of a TAR file, but also, in the single text file it outputs, all the many DDF components of an SDTS transfer.

As configured (but you can change the configuration), tarDump sends each TAR block to a plain-text file in groups of 50-characters as lines that terminate with a "|" end-of-line marker. It dumps both the TAR header block for a constituent file as well as that file's one or more blocks, but it bypasses empty blocks and more than the first three blocks that represent a DEM or DLG SDTS transfer's main data (if that data is in a file named *CEL0.DDF or *LE01.DDF). The remaining portion of a block after a file ends can be filled with either \0 null bytes or useless and confusing garbage, so tarDump comes configured to mask that out. And non-ASCII bytes are converted to wordprocessor-safe symbolic ASCII codes, so you can safely examine the resulting .txt file in any program you like.

When you run tarDump, Python's feedback will look like this,

Extracting TAR
    from c:/Python/demtest/30_2_1_1009294.tar.gz 
    to c:/temp/dems/
    Gzip file done.
Dumping c:/temp/dems/30_2_1_1009294.tar 
    to c:/temp/dems/9294dump.txt
    All done.
where the original file is a .tar.gz or _tar.gz. It also can work with a .tar file directly.

If you take a close look at the 46K tarDump output for the Ticaboo Mesa 30m DEM, 9294dump.txt, it may occur to you that we could write another script that would process this output for its scale and georeferencing information. For example, do a search for "UTM¹NAS¹12" and you will see where to look for a DEM's datum (NAS = NAS-C = NAD27) and UTM zone, or do a Find on "Lat::" to see what riches appear. There are some problems with mining the dump file this way, such as data falling across line or block boundaries, but you have a big clue now as to how one might go about writing a script to inspect a .tar.gz file's contents without extracting its files.


SDTS Resources

Writing a program to parse SDTS transfers is not an easy undertaking, as an SDTS/DDF reader works almost like a computer language interpreter. It bootstraps from reading in just 24 bytes to correctly loading the actual data. That data might come in huge amounts, but, if not, well, it's all handled much the same for small transfers, too. From the first 24-byte leader in a DDF file's first record, a reading program must parse a directory of data descriptions to be able to parse those descriptions. These in turn are used to learn how to generally parse the data records, which, when read in, must each be individually parsed with its own leader and directory to finally get at the data itself. There may be one or more DDFs with the transfer's main data, and there is always a tree of interrelated data across multiple DDFs, including especially scale and georeferencing details.

Most GIS programmers probably don't deal with all those details, however, as there is a public domain C++ class library for Win95/NT for reading and writing SDTS files, and also a no-longer-supported library of C functions for MS-DOS and Data General Aviion Unix.

To keep programming as cross-platform as possible, and since I'm not up myself for creating Python C modules, I instead have written a minimum SDTS parser for just the narrowly defined tasks I need in PullSDTS and DEMpy, mainly identifying and extracting of DEM elevations. This parser's functions and classes are in the module, sdts.py. You can see how well the module works by running PullSDTS, and you can also import it into your own Python application to access SDTS files. (If you do import into your own script, note that it requires some functions that are in the pulldem.py and gztar.py modules.)

Here's where to learn more about the official SDTS standard:

You will see many references to various standards related to SDTS. The SDTS standard itself is built upon another standard, ISO 8211, which specifies the Data Descriptive File (DDF). ISO 8211 also has been used by some other countries for encoding their GIS data. More correctly, it is ISO/IEC 8211:1994 (which replaced ISO 8211:1985), and you may also see it referred to by the withdrawn U.S. FIPS PUB 123 standard. The first 1994 SDTS standard is called FIPS PUB 173-1, also known as FGDC (STD-002). The current 1998 SDTS standard, besides being just "the SDTS standard," is sometimes referred to as ANSI NCITS 320-1998.

Finally, if you would like to see an informal history of the SDTS standard and the circumstances under which it came about, the following may be of interest: www.socsci.umn.edu/~bongman/gisoc99/poore.htm.


SDTS Notes Page News

25 Sept. 2001: New section explaining briefly the sdts.py module.
16 Sept. 2001: Some changes begun to catch up with developments in SDTS file availability and with PullSDTS's new SDTS capabilities. (The sdts.py module was released a week ago, initially to inspect SDTS files in PullSDTS, with more capabilities to come.)
17 Feb. 2001: New section on how to safely inspect a TAR file and its DDF constituents using the new tarDump script.
9 Feb. 2001: The exploration of DDF Data Records was expanded today with basic code and an explanation about reading DEM-data DDFs.
7 Feb. 2001: "Data Records" subsection begun.
5 Feb. 2001: Section added explaining Data Descriptive Records.
3 Feb. 2001: Page first posted.
28 Jan. 2001: What would become this page begun as a section on the "Python Notes" page.

See also "What's New on the Project Site."


Revised: 25 Sep 01 rev 0
http://www.3dartist.com/WP/python/sdtsnotes.htm
This page, its text and illustrations, and its overall presentation are © COPYRIGHT 2001 Columbine, Inc. - ALL Rights Reserved
Do NOT copy or mirror this page. Feel free to link to it. Get the advantage of much remaining development to come here.
The author's code appearing on this page is free software placed into the public domain and may be used freely without restriction.
Any mentioned trademarks are the property of their respective owners.