Adopting Self-Describing Files By David J. Greer Robelle Consulting Ltd. Unit 201, 15399-102A Ave. Surrey, B.C. Canada V3R 7K1 Phone: (604) 582-1700 Fax: (604) 582-1799 http://www.robelle.com Abstract Query can generate output to self-describing (SD) files, but no HP products read these files except the old DSG and Listkeeper products. A self-describing file is a data file which stores a standard description of its own record format in its user labels. Thus, it is like a little stand-alone database (a trendy developer might call this object-oriented). If two software tools understand SD files, it becomes trivial to transfer data between them. A user can archive some data in an SD file and when it is restored five years later the SD file can tell what the data means. The author describes the internal format of SD files, gives examples on how to read and write SD files, and describes problems integrating SD files into a software tool. Copyright Robelle Consulting Ltd. 1992 Permission is granted to reprint this document (but not for profit), provided that copyright notice is given. Introduction For years, Query's Save Command has been able to create a file that is self-describing. A self-describing file is one that contains the information about the fields in the file. Normal MPE and KSAM files are not self-describing. In general, we know nothing about the structure of the fields in each record. Unfortunately, few software tools create or understand self-describing files. While Query can produce self-describing files, it cannot use them as input. Our product Suprtool can both create and understand self-describing files (including KSAM ones). In addition, Suprtool has a new self-describing format that removes some restrictions of the original self-describing structure. In this article we will do the following: o Describe the format of both the original self-describing file (this will be a summary of the information in Appendix E of the Query User Manual) and the new Robelle self-describing file. o Show how to create a self-describing file. o Give a programming example that can understand and provide a "form" listing of any self-describing file. o Describe KSAM self-describing files. o Speculate on what an "open system" self-describing file would look like. Query Versus Robelle Self-Describing Files Throughout this article we will refer to one of two types of self-describing files. The first kind are equivalent to the ones produced by Query. They are identified by the version number " A.00.00". The second kind were designed to overcome limitations with the original self-describing format. We identify the revised files by calling them Robelle self- describing files. They have a version number of " B.00.00". Examples In This Article Because we write code in SPL and SPLash!, we will give our examples in these programming languages. The only word of caution is to remember that SPL uses zero-based addressing for all its arrays. MPE User Labels User labels are an optional part of an MPE file. User labels are part of the file, but they are not part of the data (i.e., when reading the records in the file the user labels are ignored). User labels are a handy place to store extra information about a file. Unfortunately, MS-Dos and UNIX have no concept similar to MPE's user labels (see the section Future Self-Describing Formats for ideas for UNIX and MS-Dos). The number of user labels must be specified when the file is created. On most versions of MPE, the only way to create a file with user labels is via the FOPEN intrinsic. Newer versions of MPE/iX allow the ULABEL= keyword on the Build Command to specify the number of user labels. Each user label is 256 bytes long and user labels are numbered from zero. You access user labels by calling the FREADLABEL and FWRITELABEL intrinsics. Identifying Self-Describing Files An MPE file that is self-describing has a filecode of 1084. You will recognize these files by seeing "SD" next to the filecode of a :listf,2: FILENAME CODE ----------LOGICAL RECORD--------- ----SPACE---- SIZE TYP EOF LIMIT R/B SECTORS X MX LOADFILE SD 128W FB 33 10000 35 256 1 * Recognizing self-describing KSAM files is more difficult. KSAM sd-files do not have a special file code. Instead, you must look for a KSAM file with extra file labels. On MPE/iX, this is done with a :listf ,3 (on MPE V/E use Listdir.Pub.Sys): What Is A Self-Describing File A self-describing file stores information in the MPE file labels about the fields in each record of the file. File labels are like a special file within a file. An MPE file label is 256 bytes long and an MPE file is created with 0 to 256 file labels. The file labels are accessed via the Freadlabel and Fwritelabel Intrinsic. User labels are numbered from zero. Customarily, tools that create self-describing files leave the first ten file labels (numbered 0 to 9) for user applications. The self-describing information is broken into two kinds of labels: the header label and field labels. Header Label If an MPE file has n file labels, they are numbered from 0 to n-1. The self-describing labels are always added at the end of the any file labels needed by the user. The last file label will be the sd-header label and the sd-field labels are arranged backwards from this label (n-2, n-3, ...). The format of the header label is similar, but different for Query and Robelle self-describing files. Query Header Label The Query header label consists of the following fields: version (X8). Always equal to " A.00.00" for Query self- describing files. length (J1). The length of each record in the file in bytes. It appears to always be identical to the MPE record length of the file. fields (J1). The number of fields in each file record. labels (J1). Number of labels used for field descriptions plus one for the header label. This is different than the number of MPE labels for the file. fields'per'label (J1). Each field label contains one or more field descriptions. Do not assume a fixed number for this field -- you must check the value of this field. size (J1). Length of each field descriptor in 16-bit words. Because MPE file labels are always 128 words long, the fields'per'label should always be 128 / size. Again, do not assume a fixed constant for the field descriptor size. Robelle Header Label The Robelle header label contains all of the fields of Query's header label with one change (the version number is different) and three additions: 1. The version number is " B.00.00" instead of " A.00.00" (note the space at the beginning). 2. There are three new fields for handling sort keys. These fields are identical to the fields that you would pass to Sortinit (in compatibility-mode): sort'max'keys (J1). Maximum number of keys allowed in this sd-file. The sort'keys would be declared as: integer array sort'keys(0:sort'max'keys*3-1) sort'num'keys (J1). The actual number of keys in the table. This value must range from zero to sort'max'keys-1. sort'keys. The sort keys themselves using the same conventions as Sort/3000. The byte-offsets of each key start at one and not zero in the sort table. The byte offsets in each field entry remain the same (i.e., zero-based instead of one-based offsets). The sort key types correspond to those for the Sortinit intrinsic and not the newer HPSortinit. SPL Layout of the Header Label Here is the layout in SPL notation of the Robelle header label. Note that we use exactly the same layout for accessing Query header labels (we just ignore all sd'sort'... variables when accessing Query self-describing files). Each field descriptor is fifteen words long, but even the Robelle field descriptor only uses fourteen words. We leave the last word unspecified (our code always sets the filler words with binary zeroes): sdheader.srcinc: integer array sd'header(0:sd'label'len); { 0 : 127 } byte array sd'version(*) = sd'header; integer array sd'reclength(*) = sd'header(4); integer array sd'numfields(*) = sd'header(5); integer array sd'numlabels(*) = sd'header(6); integer array sd'fieldsperlabel(*) = sd'header(7); integer array sd'entrylen(*) = sd'header(8); integer array sd'sort'max'keys(*) = sd'header(9); integer array sd'sort'num'keys(*) = sd'header(10); integer array sd'sort'keys(*) = sd'header(11); Field Labels Every self-describing file has one or more field labels of 256 bytes. Each field label has one or more field descriptors. The first fields in the file will be described in label N-2, the next set of fields in N-3, and so on. This is opposite to what you might expect. Self-describing files that Query produces have eight field descriptors per user label. Query Field Labels Query always produces self-describing files with 15 words reserved for each field descriptor. Each field is described as follows: field'name (X16). The name of the field left-justified. Field names are in upper case. field'type (J1). The type of the field taken from the following list: 1. ASCII (type U and X). 2. free form ASCII numbers. 3. signed integer (type I). 4. floating point real (type R). 5. packed decimal (type P). 6. COBOL computational (type J). 7. unsigned integers (type K). 8. zoned decimal (type Z). 9. IEEE floating point (type E). This is a Robelle extension that applies to either " A.00.00" or " B.00.00" self-describing files. 10.IMAGE compound field. field'offset (J1). The offset of the field in bytes. The offset starts at zero. field'length (J1). The length of the field in bytes. reserved'space (4J1). Four words that are reserved for future use. Robelle Field Labels Many HP 3000 applications contain repeated fields. Query self-describing files map all repeated fields into type "10", which is useless for applications that understand repeated fields. It would also be nice if additional user information, such as the number of decimal points or the format of a date were available. The Robelle field descriptor provides for all of these, by using three of the four words of reserved space. All fields up to field'length are the same as QUERY's (note especially that field'length is the total length of the field and not the length of one sub-field). These are the new fields: field'repeat (J1). In IMAGE terms, this is known as the sub-count. For simple fields, field'repeat is one (and not zero). field'decplaces (J1). Logical number of decimal places in the field. Zero means there are no decimal points. This field must be zero if the field'type is byte. field'date'type (J1). Zero if the field is not a date. Otherwise, contains a constant that describes the format of the date. These constants are described below. reserved'space (J1). One word that is reserved for future use. Date Format The date format is mapped into the data type and byte-length of the field. Here are the constants for each date format: 1 yymmdd 2 ddmmyy 3 mmddyy 4 yymm 5 calendar (MPE intrinsic format) 6 yyyymmdd 7 ddmmyyyy 8 mmddyyyy 9 phdate (PowerHouse format) 10 ask (ASK ManMan format) SPL Layout of the Field Descriptor Here is the layout in SPL notation of the Robelle field descriptor. Note that we use exactly the same layout for accessing Query field descriptors (we ignore the repeat, decplaces, and date'type fields for Query self-describing files): sdfield.srcinc: integer array sd'field(0:sd'max'field'len); { 0 : 14 } byte array sd'field'name(*) = sd'field; integer array sd'field'type(*) = sd'field(8); integer array sd'field'offset(*) = sd'field(9); integer array sd'field'bytelen(*) = sd'field(10); integer array sd'field'repeat(*) = sd'field(11); integer array sd'field'decplaces(*)= sd'field(12); integer array sd'field'date'type(*)= sd'field(13); Support Routines To make our life easier, we have a standard include file with both variables and SPL/SPLash! subroutines that we use in many of our self-describing procedures: sdsubr.src: << Standard variables and subroutines needed to access fields in a self-describing file. This file must be included after all variable declarations in a procedure. >> integer file'userlabels ,file'foptions ,file'filecode ,current'labelnum ,sd'field'index ; integer array current'label(0:sd'label'len); subroutine file'error(local'filenum); value local'filenum; integer local'filenum; begin xfileinfo(local'filenum); goto error'exit; end'subr; <> subroutine read'label'error(local'filenum); value local'filenum; integer local'filenum; begin p "Unable to read label from self-describing file" err; file'error(local'filenum); end'subr; <> subroutine read'label(local'filenum,labelnum); value local'filenum, labelnum; integer local'filenum, labelnum; begin blank(current'label,sd'label'len); freadlabel(local'filenum ,current'label ,sd'label'len ,labelnum ); if < then read'label'error(local'filenum) else if > then if labelnum > file'userlabels then begin b'blank(outbuf,bl'outbuf); move outbuf := "Attempting to read label "; ascii(labelnum,10,outbuf'(26)); say'errx(outbuf,35,bl'outbuf); read'label'error(local'filenum); end'if; end'subr; <> subroutine file'info(local'filenum); value local'filenum; integer local'filenum; begin fgetinfo(local'filenum<> , <> ,file'foptions <> , <> , <> , <> , <> , <> ,file'filecode <> , <> , <> , <> , <> , <> , <> , <> , <> ,file'userlabels <> ); if <> then begin p "Unable to fgetinfo on file" err; file'error(local'filenum); end'if; end'subr; <> logical subroutine get'field(local'filenum,offset); value local'filenum, offset; integer local'filenum, offset; begin get'field := false; if sd'field'index < sd'numfields then begin if (sd'field'index mod sd'fieldsperlabel) = 0 then begin current'labelnum := current'labelnum - 1; read'label(local'filenum,current'labelnum); end'if; offset := (sd'field'index mod sd'fieldsperlabel) * sd'entrylen; move sd'field := current'label(offset),(sd'entrylen); sd'field'index := sd'field'index + 1; get'field := true; end'if; end'subr; <> File'Error To make our life easier we will take a simple approach to file system errors. If any MPE file system intrinsic returns an error, we call the Robelle equivalent of the printfileinfo intrinsic and then we exit the Formselfdesc procedure. Yes, we use a goto in the file'error subroutine. This is a good example of where a goto enhances readability and reliability. File'Info We have developed a standard set of subroutines for working with self-describing files. The file'info subroutine initializes the file'userlabels, file'foptions, and file'filecode variables (declared as part of the sdsubr.src file). Read'Label It is important to understand the error checking in read'label. MPE user labels may be allocated space, but they might not actually be written. For example, after first creating a file with user labels, none of the user labels have actually been written to the file. If we get an end-of-file condition from Freadlabel, we ignore the error unless a programming bug has caused us to attempt to read a label that is greater than the number of user labels in the file. Get'Field We will describe how the get'field subroutine works later in the section Understanding Self-Describing Information. Am I A Self-describing File We determine if a file is self-describing in two ways: 1. If the file has a filecode of 1084 and it has one or more MPE file labels. 2. The file is a KSAM file, has more than one label, we can read the last label, and the last label starts with either the string " A.00.00" or " B.00.00" (note the space at the beginning). Here is a procedure that returns True if the passed filenum is a self-describing file: $page "sd'file" << Return true if the passed file is self-describing. >> logical procedure sd'file(filenum); value filenum; integer filenum; option check 3; begin $include sdheader.srcinc $include sdfield.srcinc $include sdsubr.src $page "sd'file/mainline" sd'file := false; file'info(filenum); if file'filecode = 1084 and file'userlabels <> 0 then sd'file := true else if file'foptions.(2:3) = 1 or file'foptions.(2:3) = 3 then if file'userlabels > 1 then begin <> read'label(filenum,file'userlabels-1); move sd'header := current'label,(sd'label'len); if sd'version = " A.00.00" or sd'version = " B.00.00" then sd'file := true; end'else; error'exit: end'proc; <> Creating A Self-Describing File When we describe data structures we usually explain the input routine first and then the creation/output routine second. For self-describing files, it is easier to do it in the opposite order. We will show the structure of a simple self-describing file and then we will show the code that produced the self-describing label information for the file. HowMessy is a Robelle program that reports on database efficiency. For years, this program has produced a report. Unfortunately, reports must be read by humans. It would make more sense for HowMessy to produce a self-describing MPE file with the efficiency information from one or more databases. You could then use a tool that understood self-describing files to report and act on the information from the file produced by HowMessy. We will show all of the routines in HowMessy's self-describing module, but first we need to know the structure of the self-describing file. HowMessy's Loadfile HowMessy creates a self-describing file called Loadfile. This file has one record per database/dataset/search-field for one or more databases. Here is a "form" listing of the Loadfile: File: LOADFILE.GROUP.ACCT (SD Version B.00.00) Entry: Offset DATABASE X26 1 DATASET X16 27 DATASETNUM I1 43 DATASETTYPE X4 45 CAPACITY I2 49 ENTRIES I2 53 LOADFACTOR I2 57 << .2 >> SECONDARIES I2 61 << .2 >> MAXBLOCKS I2 65 HIGHWATER I2 69 PATHSORT X1 73 PATHPRIMARY X1 74 BLOCKFACTOR I1 75 SEARCHFIELD X16 77 MAXCHAIN I2 93 AVECHAIN I2 97 << .2 >> STDDEVIATION I2 101 << .2 >> EXPECTEDBLOCKS I2 105 << .2 >> AVERAGEBLOCKS I2 109 << .2 >> INEFFICIENTPTRS I2 113 << .2 >> ELONGATION I2 117 << .2 >> FUTUREFIELDS X136 121 Limit: 10000 EOF: 33 Entry Length: 256 Blocking: 35 Global Equates To simplify programming, we use a global constant "equates" that define specific attributes of Query and Robelle self-describing files. When reading a self-describing file, we don't need most of these constants, since the necessary numbers are provided in the self-describing file header. Here are the equates that we use when creating self-describing files: sdequate.srcinc: equate sd'max'field'len = 15 ,sd'label'len = 128 ,sd'max'fieldsperlabel = 8 ,sd'filler'labels = 10 ; equate sd'date'yymmdd = 1 ,sd'date'ddmmyy = 2 ,sd'date'mmddyy = 3 ,sd'date'yymm = 4 ,sd'date'calendar = 5 ,sd'date'yyyymmdd = 6 ,sd'date'ddmmyyyy = 7 ,sd'date'mmddyyyy = 8 ,sd'date'phdate = 9 ,sd'date'askdate = 10 ; Computing the Number of Labels Before opening the Loadfile, HowMessy must determine how many labels will be needed. The following routine is used by Robelle products to compute the number of user labels for a self-describing file. Note that we continue the Query standard of reserving the first ten labels (numbered 0 to 9) for other uses: $page "sd'compute'labels" << Compute how many labels an SD file should have, based only on the number of fields. Includes the mysterious filler labels. >> integer procedure sd'compute'labels (numfields); value numfields; integer numfields; option check 3; begin sd'compute'labels := (numfields-1+sd'max'fieldsperlabel) / sd'max'fieldsperlabel + 1 <> + sd'filler'labels; end'proc; <> Opening the Loadfile After computing the number of user labels, we can open a new MPE file called Loadfile. We designed the HowMessy Loadfile to have records 256 bytes long. To make our life easier, we have a few global equates in the HowMessy self-describing module that we'll use throughout the rest of the examples: $page "global equates and defines for the selfdesc module" equate wl'loadfile = 128 ,bl'loadfile = wl'loadfile * 2 ,bl'item'name = 16 ,max'field = 22 ! fields in Loadfile ; Here is the actual code to create the Loadfile: $page "sd'open" << Open the Loadfile and initialize the self-describing information. >> logical procedure sd'open(outfile,loadfile'filenum); integer loadfile'filenum; ! Note by reference -- returned integer array outfile; option check 3; begin $include localvar.srcinc byte array loadfile'filename(0:bl'local'filename) ; move loadfile'filename := "loadfile "; loadfile'filenum := fopen(loadfile'filename , << foptions lv >> ,1 <> << aoptions lv >> ,wl'loadfile << recsize iv >> , << device ba >> , << formmsg ba >> ,sd'compute'labels(max'field) ,35 << blockfactor iv >> , << numbuffers iv >> ,10000d << filesize dv >> , << numextents iv >> , << initialloc iv >> ,1084 << filecode iv >> ); if loadfile'filenum = 0 then begin error(outfile,10); xfileinfo(loadfile'filenum); end'if else sd'open := true; end'proc; <> Note that the filecode is 1084. For non-KSAM files, this is used to indicate a self-describing file. When you do a :Listf of such a file, MPE translates the "1084" filecode into "SD". Writing the Self-Describing Labels Having successfully opened a new self-describing file, it's time to write the self-describing information to the user labels. Remember that the last user label (N-1) contains the header information and the field labels are written in backward order (N-2, N-3, ...). Our routine to write the self-describing information to the Loadfile writes the field information first and then updates the header label as the last step: $page "sd'write'labels" << Write out the labels of a self-describing file with the Loadfile fields. >> logical procedure sd'write'labels(outfile,filenum); value filenum; integer filenum; integer array outfile; option check 3; begin $include localvar.srcinc integer field'index ,field'offset ,labelnum ; $include sdheader.srcinc $include sdfield.srcinc integer array sd'label(0:sd'label'len); $include sdsubr.src $page "sd'write'labels/subroutines" subroutine write'label(labelnum); value labelnum; integer labelnum; begin fwritelabel(filenum,sd'label,sd'label'len,labelnum); if <> then begin error(outfile,12); file'error(filenum); end'if; end'subr; <> subroutine init'header; begin zero'buf(sd'header,sd'label'len); b'blank(sd'version,8); move sd'version := " B.00.00"; sd'numfields := max'field; field'index := 0; sd'numlabels := sd'compute'labels(sd'numfields) - sd'filler'labels; sd'fieldsperlabel:= sd'max'fieldsperlabel; sd'entrylen := sd'max'field'len; end'subr; <> subroutine init'all'labels(curr'label,num'labels); value curr'label, num'labels; integer curr'label, num'labels; begin while curr'label > num'labels do begin write'label(curr'label); curr'label := curr'label - 1; end'while; end'subr; <> subroutine put'field(name,bytelen,decplaces,type); value bytelen, type, decplaces; integer bytelen, decplaces, type; byte array name; begin zero'buf(sd'field,sd'max'field'len); move sd'field'name := name,(bl'item'name); sd'field'type := type; sd'field'offset := field'offset; sd'field'bytelen := bytelen; sd'field'repeat := 1; move sd'label(sd'entrylen*sd'field'index) := sd'field, (sd'entrylen); field'offset := field'offset + sd'field'bytelen; sd'field'index := sd'field'index + 1; if sd'field'index >= sd'fieldsperlabel then begin write'label(labelnum); labelnum := labelnum - 1; sd'field'index := 0; zero'buf(sd'label,sd'label'len); end'if; b'blank(name,bl'item'name); end'subr; <> $page "sd'write'labels/mainline" sd'write'labels := false; init'header; file'info(filenum); field'offset := 0; sd'field'index := 0; zero'buf(sd'label,sd'label'len); init'all'labels(file'userlabels-1,sd'numlabels); labelnum := file'userlabels - 2; sd'reclength := bl'loadfile; b'blank(inbuf,bl'inbuf); move inbuf' := "DATABASE "; put'field(inbuf, 26,0,1); move inbuf' := "DATASET "; put'field(inbuf, 16,0,1); move inbuf' := "DATASETNUM "; put'field(inbuf, 2,0,3); move inbuf' := "DATASETTYPE "; put'field(inbuf, 4,0,1); move inbuf' := "CAPACITY "; put'field(inbuf, 4,0,3); move inbuf' := "ENTRIES "; put'field(inbuf, 4,0,3); move inbuf' := "LOADFACTOR "; put'field(inbuf, 4,2,3); move inbuf' := "SECONDARIES "; put'field(inbuf, 4,2,3); move inbuf' := "MAXBLOCKS "; put'field(inbuf, 4,0,3); move inbuf' := "HIGHWATER "; put'field(inbuf, 4,0,3); move inbuf' := "PATHSORT "; put'field(inbuf, 1,0,1); move inbuf' := "PATHPRIMARY "; put'field(inbuf, 1,0,1); move inbuf' := "BLOCKFACTOR "; put'field(inbuf, 2,0,3); move inbuf' := "SEARCHFIELD "; put'field(inbuf, 16,0,1); move inbuf' := "MAXCHAIN "; put'field(inbuf, 4,0,3); move inbuf' := "AVECHAIN "; put'field(inbuf, 4,2,3); move inbuf' := "STDDEVIATION "; put'field(inbuf, 4,2,3); move inbuf' := "EXPECTEDBLOCKS "; put'field(inbuf, 4,2,3); move inbuf' := "AVERAGEBLOCKS "; put'field(inbuf, 4,2,3); move inbuf' := "INEFFICIENTPTRS "; put'field(inbuf, 4,2,3); move inbuf' := "ELONGATION "; put'field(inbuf, 4,2,3); move inbuf' := "FUTUREFIELDS "; put'field(inbuf,136,0,1); if sd'field'index <> 0 then write'label(labelnum); move sd'label := sd'header,(sd'label'len); write'label(file'userlabels - 1); sd'write'labels := true; error'exit: end'proc; <> Init'Header We start our procedure by initializing most of the fields in the header label. We zero out the header label and then we fill in the variables of the header label. This file has fields with an implied decimal point, so we want to use the Robelle format of self-describing files (version number " B.00.00"). The number of fields is taken from our global equate. The number of self-describing labels is our computed number less the ten overhead labels. The number of fields in each label and the length of each field description are taken from global equates that match the values used by Query. We also initialize the field'index variable which is used as an index into a single label buffer (varies from 0 to sd'fieldsperlabel - 1). File'Info To make our code more general-purpose, we will not assume anything about the HowMessy Loadfile format (this also makes it easier to change later). Instead, we call Fgetinfo to obtain the number of labels in our file, so that we know exactly where the last label is. The file'info subroutine initializes the file'userlabels variable (declared in the sdsubr.src file) with the number of user labels in our file. Init'All'Labels To be on the safe side, we initialize all user labels in our file with binary zeroes. Note that our write'label subroutine uses a procedure global array called label'buf for writing. We initialize this buffer to binary zeroes and then continually write it out to all of the self-describing labels. We don't touch the initial ten labels reserved for other use. Put'Field This subroutine handles all of the details of adding a new field to our self-describing Loadfile. It initializes a new field record, moves this field record to the appropriate place in label'buf, and finally writes out labels as we overflow sd'fieldsperlabel. Each of our fields has a name (we move the name to inbuf and initialize inbuf to blanks after adding the field, a byte length, the implied number of decimal places, and a type (either byte or integer for our file). Note that the put'field subroutine looks after computing the byte offset of each field by incrementing a counter. We initialize each field record with binary zeroes (just to be safe). We then fill in each portion of field information. We then move our field record to the label'buf at the correct offset. This works well in SPL/SPLash!, but is more of a problem in C or Pascal. In these languages, we would create a record/structure that was an array of field records and index into the structure using the current field index. As we filled up the structure, we would write out a new user label to our self-describing file. Finishing Up After adding all the fields, we have to see if there is one label record that has not been written to the Loadfile. If so, we write it out. Finally, the header record is written out. We do this last, since some of the variables used by put'field were ones from the header record. Closing the Self-Describing File Our next routine closes the self-describing file and handles any errors from Fclose. Pretty straight-forward MPE programming: $page "sd'close" << Close the loadfile and check for duplicate output files. >> logical procedure sd'close(outfile,loadfile'filenum); value loadfile'filenum; integer loadfile'filenum; integer array outfile; option check 3; begin $include localvar.srcinc sd'close := false; fclose(loadfile'filenum,2,0); ! Save temp if <> then begin error(outfile,11); xfileinfo(loadfile'filenum); end'if else sd'close := true; end'proc; <> Providing a Shell To make life easier in HowMessy, we provide one routine for the main module to call. This routine purges any exiting temporary Loadfile, creates our new Loadfile, writes out the self- describing information, and saves the Loadfile. The controlling HowMessy routine then reopens Loadfile with write-access (this may seem inefficient, but HowMessy is written in both SPL/SPLash! and HP Pascal, so it was easier to organize the code this way): $page "sdcreate" << Create Loadfile with all the self-describing information. We purge any existing file called Loadfile, create a temporary one, and then fill in the labels. >> integer procedure sdcreate(outfile); integer array outfile; option check 3; begin $include localvar.srcinc $include mpecmd.srcinc integer loadfile'filenum ; logical subroutine purge'loadfile; begin purge'loadfile := false; say'str "purge loadfile,temp"; say'add rtn; if mpecmd'execute(outbuf,mpe'print'buffer) then purge'loadfile := true else error(outfile,9); end'subr; <> $page "sdcreate/mainline" sdcreate := 0; if purge'loadfile then if sd'open(outfile,loadfile'filenum) then if sd'write'labels(outfile,loadfile'filenum) then if sd'close(outfile,loadfile'filenum) then sdcreate := 1; end'proc; <> Understanding Self-Describing Information Our HowMessy example showed the form of the Loadfile using a format similar to the one Query uses, but the input was an MPE self-describing file instead of an IMAGE dataset. Here is our example form again: File: LOADFILE.GROUP.ACCT (SD Version B.00.00) Entry: Offset DATABASE X26 1 DATASET X16 27 DATASETNUM I1 43 DATASETTYPE X4 45 CAPACITY I2 49 ENTRIES I2 53 LOADFACTOR I2 57 << .2 >> SECONDARIES I2 61 << .2 >> MAXBLOCKS I2 65 HIGHWATER I2 69 PATHSORT X1 73 PATHPRIMARY X1 74 BLOCKFACTOR I1 75 SEARCHFIELD X16 77 MAXCHAIN I2 93 AVECHAIN I2 97 << .2 >> STDDEVIATION I2 101 << .2 >> EXPECTEDBLOCKS I2 105 << .2 >> AVERAGEBLOCKS I2 109 << .2 >> INEFFICIENTPTRS I2 113 << .2 >> ELONGATION I2 117 << .2 >> FUTUREFIELDS X136 121 Limit: 10000 EOF: 33 Entry Length: 256 Blocking: 35 Formselfdesc Procedure We have developed a stand-alone procedure for producing this output for a self-describing file. The following is the source code that we use: $page "formselfdesc" << If the passed filenum is a self-describing file, print a description of the fields in the file on $stdlist. >> logical procedure formselfdesc(sd'filenum); value sd'filenum; integer sd'filenum; option check 3; begin $include localvar.srcinc $include sdequate.srcinc integer file'code ,file'userlabels ,file'foptions ,current'labelnum ,field'index ,file'recsize ,file'blkfac ; double file'eof ,file'limit ; byte array filename(0:bl'local'filename) ; $include sdheader.srcinc $include sdfield.srcinc integer array current'label(0:sd'label'len); subroutine file'error; begin xfileinfo(sd'filenum); goto error'exit; end'subr; <> subroutine read'label(labelnum); value labelnum; integer labelnum; begin freadlabel(sd'filenum,current'label,sd'label'len,labelnum); if <> then begin p "Unable to read label from self-describing file" err; file'error; end'if; end'subr; <> subroutine file'info(blksize); value blksize; integer blksize; begin b'blank(filename,bl'local'filename); fgetinfo(sd'filenum <> ,filename <> ,file'foptions <> , <> ,file'recsize <> , <> , <> , <> ,file'code <> , <> ,file'eof <> ,file'limit <> , <> , <> ,blksize <> , <> , <> ,file'userlabels <> ); if <> then begin p "Unable to fgetinfo on file" err; file'error; end'if; if file'recsize <> 0 then file'blkfac := blksize / file'recsize else file'blkfac := 1; if file'recsize < 0 then file'recsize := file'recsize else file'recsize := file'recsize * 2; end'subr; <> logical subroutine file'is'sd; begin file'is'sd := false; file'info(0); if file'code = 1084 and file'userlabels <> 0 then file'is'sd := true else if file'foptions.(2:3) = 1 or file'foptions.(2:3) = 3 then if file'userlabels > 1 then begin <> read'label(file'userlabels-1); move sd'header := current'label,(sd'label'len); if sd'version = " A.00.00" or sd'version = " B.00.00" then file'is'sd := true; end'else; end'subr; <> logical subroutine get'field(offset); value offset; integer offset; begin get'field := false; if field'index < sd'numfields then begin if (field'index mod sd'fieldsperlabel) = 0 then begin current'labelnum := current'labelnum - 1; read'label(current'labelnum); end'if; offset := (field'index mod sd'fieldsperlabel) * sd'entrylen; move sd'field := current'label(offset),(sd'entrylen); field'index := field'index + 1; get'field := true; end'if; end'subr; <> subroutine print'outbuf(len); value len; integer len; begin len := bl'outbuf; while len > 0 and outbuf'(len-1) = " " do len := len - 1; print(outbuf,-len,0); end'subr; <> subroutine print'header(len); value len; integer len; begin b'blank(outbuf,bl'outbuf); move outbuf'(4) := "File: "; move outbuf'(10) := filename,(bl'local'filename); len := bl'outbuf; while len > 0 and outbuf'(len-1) = " " do len := len - 1; len := len + 5; len := len + move outbuf'(len) := "(SD Version"; len := len + move outbuf'(len) := sd'version,(8); len := len + move outbuf'(len) := ")"; print'outbuf(0); b'blank(outbuf,bl'outbuf); move outbuf'(7) := "Entry:"; move outbuf'(34) := "Offset"; print(outbuf,-50,0); end'subr; <> subroutine print'trailer(len); value len; integer len; begin b'blank(outbuf,bl'outbuf); len := move outbuf' := " "; len := len + move outbuf'(len) := "Limit: "; len := len + dascii(file'limit,10,outbuf'(len)); len := len + move outbuf'(len) := " EOF: "; len := len + dascii(file'eof,10,outbuf'(len)); len := len + move outbuf'(len) := " Entry Length: "; len := len + ascii(file'recsize,10,outbuf'(len)); len := len + move outbuf'(len) := " Blocking: "; len := len + ascii(file'blkfac,10,outbuf'(len)); print(outbuf,-len,0); end'subr; <> subroutine format'field'type; begin if 0 <= sd'field'type <= 9 then case sd'field'type of begin <<0>> move outbuf'(31) := "?"; <<1>> move outbuf'(31) := "X"; <<2>> move outbuf'(31) := "?"; <<3>> move outbuf'(31) := "I"; <<4>> move outbuf'(31) := "R"; <<5>> move outbuf'(31) := "P"; <<6>> move outbuf'(31) := "J"; <<7>> move outbuf'(31) := "K"; <<8>> move outbuf'(31) := "Z"; <<9>> move outbuf'(31) := "E"; end'case else move outbuf'(31) := "?"; end'subr; <> logical subroutine field'is'sorted(sort'index); value sort'index; integer sort'index; begin field'is'sorted := false; if sd'field'offset + 1 = sd'sort'keys(sort'index*3) and sd'field'bytelen = sd'sort'keys(sort'index*3+1) then field'is'sorted := true; end'subr; <> subroutine format'sort'key(sort'index); value sort'index; integer sort'index; begin sort'index := 0; while sort'index < sd'sort'num'keys do begin if field'is'sorted(sort'index) then begin move outbuf'(42) := "<> subroutine format'date'type; begin if 1 <= sd'field'date'type <= 10 then case sd'field'date'type of begin <<0>> ; <<1>> move outbuf'(56) := "<>"; <<2>> move outbuf'(56) := "<>"; <<3>> move outbuf'(56) := "<>"; <<4>> move outbuf'(56) := "<>"; <<5>> move outbuf'(56) := "<>"; <<6>> move outbuf'(56) := "<>"; <<7>> move outbuf'(56) := "<>"; <<8>> move outbuf'(56) := "<>"; <<9>> move outbuf'(56) := "<>"; <<10>>move outbuf'(56) := "<>"; end'case; end'subr; <> subroutine format'decplaces; begin if sd'field'decplaces > 0 then begin move outbuf'(56) := "<< ."; ascii(sd'field'decplaces,10,outbuf'(60)); move outbuf'(63) := ">>"; end'if; end'subr; <> subroutine print'field'desc(field'repeat); value field'repeat; integer field'repeat; begin b'blank(outbuf,bl'outbuf); move outbuf'(10) := sd'field'name,(16); if sd'version = " B.00.00" then begin field'repeat := sd'field'repeat; sd'field'bytelen := sd'field'bytelen / field'repeat; end'if else field'repeat := 1; if field'repeat <> 1 then ascii(field'repeat,-10,outbuf'(30)); format'field'type; if sd'field'type = 3 or <> sd'field'type = 4 or <> sd'field'type = 7 then <> ascii(sd'field'bytelen/2,10,outbuf'(32)) else if sd'field'type = 5 then <> ascii(sd'field'bytelen*2,10,outbuf'(32)) else ascii(sd'field'bytelen,10,outbuf'(32)); ascii(sd'field'offset+1,-10,outbuf'(39)); if sd'version = " B.00.00" then begin format'sort'key(0); format'date'type; format'decplaces; end'if; print'outbuf(0); end'subr; <> $page "formselfdesc/mainline" formselfdesc := false; if sd'filenum <> 0 then begin if file'is'sd then begin read'label(file'userlabels-1); move sd'header := current'label,(sd'label'len); current'labelnum := file'userlabels - 1; print'header(0); field'index := 0; while get'field(0) do print'field'desc(0); print'trailer(0); formselfdesc := true; end'if; end'if; error'exit: end'proc; <> A Different Logical Structure Our HowMessy/Loadfile example had a number of separate procedures. Our Formselfdesc procedure is self-contained, but we will describe each subroutine in this procedure. Formselfdesc Variables We include our standard files for the global self-describing equates, header layout, and field layout. We also have a number of local variables that are used for indexing through the field labels and other variables needed to enhance the output listing (e.g., the number of records in the self-describing file). File'Is'Sd This is our standard sd'file procedure, rewritten to work as a stand-alone SPL subroutine. File'Is'Sd looks after calling the file'info subroutine which calls Fgetinfo. We initialize a number of variables during the file'info call. Some of these are used for obtaining self-describing information and some are used to enhance the format of our form output (e.g., the filename and the file limit). Read'Label The basic strategy we use in this routine is to read a specific label into a buffer called current'label. We then move this label to the appropriate self-describing header or field buffer. Read'label is careful to check for file system errors and abort if it finds any. Get'Field This subroutine is the key to understanding self-describing files. When get'field is called the last label of the file has been read into the sd'header record. The variable field'index is initialized to zero is used as a counter of self-describing files. Each call to get'field returns one field description in the sd'field record. When first called, current'labelnum contains the number of the last label (minus one, since MPE numbers labels starting at zero). We check to see if we need to read in a new label with the statement: if (field'index mod sd'fieldsperlabel) = 0 then Note that we use sd'fieldsperlabel as the divisor. This is the value from our sd'header record and not our equate that we use when creating self-describing files. Get'Field assumes that the current label record is in the buffer current'label. Each user label contains one or more field descriptions (in most cases there are eight per label). We compute an offset in the label where the current field description is and then we move the field description from current'label to our sd'field record. Print'Field'Desc This routine looks after printing out the description of one field. We use the same routine whether we are dealing with Query (" A.00.00") or Robelle (" B.00.00") self-describing files. We do have to adjust the byte length for " B.00.00" self-describing fields, so that the output looks similar to what Query would produce for an IMAGE dataset. Note how the format'type routine handles IEEE floating point for either type of self-describing file. For " B.00.00" self-describing files, we can produce extra information. This is handled by the format'sort'key, format'date'type, and format'decplaces routines (which are only called for " B.00.00" self-describing files. Format'Sort'Key The sort information is stored in the sd'header record as an offset, a length, and a type. There is no direct way for us to tell that a field is sorted. Instead, we index through all of the sort keys checking if the sort key matches the current field definition (there might not be a match). We use the index into the sort information as our key to print for the user. Field'Is'Sorted To make our code clearer, we encapsulate the code for checking if a specific sort key matches the current field in a subroutine. By giving this subroutine a descriptive name, we make the intent of the format'sort'key routine clearer. Our field'is'sorted routine checks that the offset (adjusted appropriately for one-based and zero-based offsets) and the byte length of the field and the sort key match. We decided to ignore the data type (the sd'type and the sort'type have different values). Summary It's harder to understand self-describing files than it is to create them. When creating self-describing files you often only use a few of the self-describing features, but when understanding them there are no features that you can leave out. KSAM Self-Describing Files Self-describing KSAM files are a little trickier to deal with. The 1084 filecode used for self-describing MPE files doesn't work well for KSAM. It is more difficult to create a new KSAM file, since all of the key information must be passed to Fopen. Here are a few hints for creating and understanding self-describing KSAM files. SD (1084) Filecode You can create a KSAM file with a filecode of 1084, but the resulting :listf,2 gives no hint that the file is a KSAM file. Here's an example Build Command of a compatibility-mode KSAM file with a filecode of 1084 and the resulting :listf,2. :run ksamutil.pub.sys >build file1;rec=-80,16,f,ascii;keyfile=file1key; & key=i,6,2;code=1084 >exit :listf file1@,2 FILENAME CODE ----------LOGICAL RECORD--------- ----SPACE---- SIZE TYP EOF LIMIT R/B SECTORS X MX FILE1 SD 80B FA 0 1023 16 48 1 * FILE1KEY KSAMK 128W FB 98 98 1 112 1 8 Notice how there is no way to identify file1 as being a KSAM file. For this reason, we don't use the 1084 filecode on self-describing KSAM files. Creating KSAM Self-describing Files We use three steps to create self-describing KSAM files: 1. Compute the number of labels (used in the KSAMUTIL or MPE/iX build command). You could use our sd'compute'labels subroutine or you can compute the number of labels as the truncated value of: labels = (#fields + 7) / 8 + 11 2. Build your KSAM/V file with KSAMUTIL and specify Labels=[the number computed above]. For KSAM/XL, use the Build Command with the userlabel keyword ;ULABEL=x (where x is the number computed above). 3. Fopen the file as an old file with write access. The most difficult part is computing the number of labels. For example, if we have eight fields: Labels = (8 + 7) / 8 + 11 = 12 MPE V/E: :run ksamutil.pub.sys >build file2;rec=-80,16,f,ascii;keyfile=file2k; & key=i,6,2,,duplicate;labels=12 >exit MPE/iX: :build file2;rec=-80,16,f,ascii;key=(i,6,2,dup); & ksamxl;ulabel=12 Understanding KSAM Self-Describing Files Our sd'file routine returns true if a given file is self-describing. If you examine the code in this routine carefully, you'll see that for KSAM files we have the statements: if file'foptions.(2:3) = 1 or file'foptions.(2:3) = 3 then if file'userlabels > 1 then Note that we check for more than one user label. Why don't we check for more than zero user labels? All self-describing files must have at least two labels (one for the header information and one or more for the field information). When we first implemented our sd'file routine we only checked for more than zero user labels. What we found was that many users had accidentally built KSAM files with one user label (which was almost always empty). We have no idea why this seemed to be so common, but by checking for at least two labels we eliminated a lot of KSAM files that were not self-describing. Future Self-Describing Formats We were motivated to create the new Robelle format self-describing files in order to provide a better interface between our product Suprtool and ASKPlus from ARES of France. Pierre Senant of ARES is the R&D Manager and the two of us worked out the " B.00.00" self-describing format (actually we forced most of the format on poor Pierre). ARES have been doing significant R&D work on UNIX and a portable version of ASKPlus. As an example of how far we can go with self-describing information, here is an extract of Pierre's design for a UNIX implementation of self-describing files. SDASK Files A C/ISAM file is composed of two files, a data file and an index file. An SDASK file defines another file called a 'label file' which contains the complete description of the data file. Like MPE self-describing files, an SDASK file is composed of a header portion and a description of each field. The data file and the label file must be located in the same directory. Header Format Pierre's header contains a lot more information than our MPE header label. Here are the parts of the header: * Version number. * File code (1085). * Checksum (currently unused). * Number of fields per record. * Record length (in bytes). * Number of records. * Number of sort keys. * Password. * Total field area length (in the SDASK file). * File type (flat, C/ISAM, KSAM, Unibol, ...). * Data file name. * Unibol area (for migration from IBM/36 to UNIX) * Filter: logical expression defining a condition that must be True for the entries taken into account. Originally developed for Unibol files, but this feature can be used for any other system. Field Description Each field description is variable length: * Field type (U, X, I, J, K, P, R). Additional information for Ascii fields are: Roman-8, PC-8, ANSI-8, Mac-Apple, EBCIDC, and ISO7-1 ... ISO7-13. For Integer fields, there is additional information for Intel versus HP. For Real fields, there is additional information for IEEE versus Classic. * Length (in bytes). * Offset. * Scale (number of decimal places). * Repeat factor. * Flags: * Null value allowed. If this flag is True, each entry in the data file is preceded by a bitmap field. Each bit indicates whether the corresponding field value is Null or not. * Hidden field. * Key. * Duplicate key allowed. * Field name length. * Field name. * Title length. * Title. * Edit mask length. * Edit mask. * Key file name length. * Key file (reserved for future implementation on MS-Dos). Sort Information Sort descriptors are also variable length: * Expression length. * Sort expression (in ASKPlus syntax). For example, cust-name cust-zipcode cat cust-address * Flag: ascending/descending. Conclusion Self-describing files are a great idea. As users, we almost always create MPE and KSAM files with a fixed record structure in mind. By default, this record structure is lost when we build a file. With self-describing files, we can retain the structure of our files. A Final Example The HP 3000 has a rich set of tools based on IMAGE. One reason that so many good tools could be written for IMAGE was the DBINFO intrinsic. This intrinsic let any program discover the structure of an IMAGE database. Self-describing files provide the same flexibility for MPE and KSAM files. In this example, we show how two tools can be combined by using self-describing files. Our HowMessy program reports on database efficiency. While doing so it creates a self-describing file with the statistics for a database. Once you have this file, it's possible to use Suprtool to check for certain boundary cases. For example, :run howmessy.pub.robelle {create "loadfile"} Enter database: test.suprtest HowMessy creates the self-describing file called Loadfile (with the structure that we've shown previously). We now use Suprtool to create a file that has all detail datasets that are more than 85% full that also have a capacity greater than one: :run suprtool.pub.robelle >input loadfile >if datasettype = "D" and & {detail dataset} capacity > 1 and & loadfactor > 85.00 {more than 85% full} >output loaddetl,link {create SD file} >exit At Robelle, we would use our Xpress electronic mail system to mail the loaddetl file to the system manager. Another alternative would be to extract the database and dataset names and use them to create a batch job to automatically increase the capacity of detail datasets more than 85% full. The possibilities are endless, but only because HowMessy could provide information to Suprtool via the self-describing file. Software Tools Few software tools are capable of creating or understanding self-describing files. This is a shame, since self-describing files are a powerful data structure. One reason that so few tools handle self-describing files is that documentation on self-describing files has been non-existent. I hope that by publishing this description and the programming examples in this paper that more vendors and users start creating and accepting self-describing files.