How Do I?

Import Data Using Suprtool

One of the more recent questions that has come up lately is How Do I Import data with Suprtool. Let's say we have the following data layout, Image, Eloquence or SD file, it doesn't matter. The purpose of this small white paper would be to extract data to a ".csv" file, and then import that same data back. Closing the loop so to speak.

To begin our project we will extract the data from a sample data source and output to an SD file and then use STExport to create a CSV file. >get d-inventory
>out dinv,link
>xeq
IN=13, OUT=13. CPU-Sec=1. Wall-Sec=1.

:run stexport.pub.robelle
$in dinv
$out dinvcsv
$xeq
In=13. Out=13. CPU-Sec=1. Wall-Sec=1.
$listf dinvcsv,2

ACCOUNT=  GREEN       GROUP=  NEIL

FILENAME CODE ------------LOGICAL RECORD----------- ----SPACE----
SIZE TYP EOF LIMIT R/B SECTORS #X MX

DINVCSV 152B VA 13 13 1 16 1 *
So at this point we know have a file, typically how a customer would want to import using Suprtool for adding to a database etc. Note that Suprtool does not handle variable-length files so the first step is to convert the Variable length file to fixed length.

Step One: Convert to Fixed Length

On MPE you can convert using Fcopy: /file dinvcsvf;rec=-152,1,f,ascii
/fcopy from=dinvcsv; to=*dinvcsvf;new
On HP-UX you can use Qedit: /t dinvcsv
/set keep var off
/k dinvcsvf
Or Awk: (Thanks to Barry Lake from Allegro for fixify.sh)

#!/bin/sh
#
# Script to turn a typical bytestream file (variable length
# records) into a file with fixed length records by padding
# records with spaces as needed. The resulting record length
# will be that of the longest record in the file.

export INFILE=$1
OUTFILE=$(mktemp)

# Step 1: Find the length of the longest record in the input file
#         file and store it in a variable for use in the next step.

export MAXLEN=$(awk 'BEGIN { len = 0; }
  { if (length > len)
       len = length; }
  END { print len; }' $INFILE )

echo Maximum record length in \"$INFILE\" is $MAXLEN.

# Step 2: Get the MAXLEN variable; use it to create a string
#         of that many blanks; then use that to pad each input
#         record as needed to make it the same length as the
#         longest record.

awk 'BEGIN { "echo $MAXLEN" | getline maxlen; spaces = "";
             for(i=0; i$OUTFILE

# Step 3: awk can't edit a file in place so its output was
#         written to a new file which we then pour back into the
#         original file. Note: we could just as easily have done
#         cp $OUTFILE $INFILE or mv $OUTFILE $INFILE, but in that
#         case we'd lose the original file's creator and
#         permission bits.

cat $OUTFILE > $INFILE
rm $OUTFILE

You can run the above script:
$./fixify.sh datafile

At this point Suprtool can now read what is essentially variable-length data in a fixed length file.

Now remember what a typical CSV file looks like! Byte fields are surronded by quotes and separated or delimited with commas:

159,19910827,1,50532001,5053,9449,"Test index","Test index","Test index","Test index"

Step Two: Split out the Fields

The first step is to separate out each field from the record based on the delimeter, which in this case is the comma. I use byte lengths for each number field based on the rules for output ,ascii table.

Duplicated below:

  I1 J1 06 bytes      
  I2 J2 11 bytes
  I3 J3 16 bytes      
  I4 J4 20 bytes
  K1    05 bytes      
  K2    10 bytes
  Zn    n+1 bytes     
  Pn    n bytes

in dinvcsvf
{define targets}
def bin-x,1,6
def last-x,1,11
def qty-x,1,11
def prod-x,1,9
def supp-x,1,9
def cost-x,1,8
def desc1-x,1,22 {Note room for quotes}
def desc2-x,1,22
def desc3-x,1,22
def desc4-x,1,22

{define source}
def record,1,152

{extract using split }

ext bin-x=$split(record,first,",")
ext last-x=$split(record,",",1,",")
ext qty-x=$split(record,",",2,",")
ext prod-x=$split(record,",",3,",")
ext supp-x=$split(record,",",4,",")
ext cost-x=$split(record,",",5,",")
ext desc1-x=$split(record,",",6,",")
ext desc2-x=$split(record,",",7,",")
ext desc3-x=$split(record,",",8,",")
ext desc4-x=$trim($split(record,",",9,last))

{where?}
out myfile,link
xeq

Step Three: Closing the Loop

Now we just need to "close the loop" and extract the individual byte type fields into their appropriate data types. In this step we also "clean" the data of the double quotes.

Keep in mind that the data format that we want:

The defines for the above are in the section starting with the comment {Actual targets}

in myfile

{re-define number in display format}
def bin-z,bin-x,display
def last-z,last-x,display
def qty-z,qty-x,display
def prod-z,prod-x,display
def supp-z,supp-x,display
def cost-z,cost-x,display

{Actual targets}

{number}
def bin ,1,2,integer
def last,1,4,double
def qty ,1,4,double
def prod,1,8,display
def supp,1,8,display
def cost,1,4,packed

{bytes}
def desc1,1,20
def desc2,1,20
def desc3,1,20
def desc4,1,20

ext bin=$number(bin-z)
ext last=$number(last-z)
ext qty=$number (qty-z)
ext prod=$number(prod-z)
ext supp=$number(supp-z)
ext cost=$number(cost-z)

clean '"'
ext desc1=$trim($clean(desc1-x))
ext desc2=$trim($clean(desc2-x))
ext desc3=$trim($clean(desc3-x))
ext desc4=$trim($clean(desc4-x))

out loop,link
xeq

The SD file (loop) is now in the same format and layout of the original Dataset extraction prior to the STExport task. We've now closed the loop.

>form loop
    File: LOOP.NEIL.GREEN     (SD Version B.00.00)
       Entry:                     Offset
          BIN                  I1      1
          LAST                 I2      3
          QTY                  I2      7
          PROD                 Z8     11
          SUPP                 Z8     19
          COST                 P8     27
          DESC1                X20    31
          DESC2                X20    51
          DESC3                X20    71
          DESC4                X20    91
    Limit: 13  EOF: 13  Entry Length: 110  Blocking: 37

So to summarize the first step is to split out the data into separate fields with byte data types. Then to convert by re-defining with a new name to reference the data as display and use $clean and $number to extract into the final targets.

Generate Three Random Characters

A customer recently asked how they could use Suprtool to generate some random data. The answer is you can't. But Barry Durand came up with a reasonable Unix shell script, using something he found on the net.

Here is a script that called rand3:

a[0]=a; a[1]=b; a[2]=c; a[3]=d; a[4]=e; a[5]=f; a[6]=g; a[7]=h a[8]=i;
a[9]=j; a[10]=k;a[11]=l; a[12]=m; a[13]=n; a[14]=o; a[15]=p a[16]=q;
a[17]=r; a[18]=s; a[19]=t; a[20]=u; a[21]=v; a[22]=w a[23]=x; a[24]=y;
a[25]=z; a[26]=1; a[27]=2; a[28]=3; a[29]=4 a[30]=5;
a[31]=6; a[32]=7; a[33]=8; a[34]=9; a[35]=0
echo ${a[$RANDOM%36]}${a[$RANDOM%36]}${a[$RANDOM%36]}

./rand3.sh
gtj
./rand3.sh
7o2

Remove Trailing spaces

Someone asked how to remove trailing spaces on HP-UX. Here is one way:

pr -t INFILE >INFILE.spaces

Solving Floating Point Exception Errors

One of the more common tech calls we get is: my job aborted with Floating Exception(coredump) what is wrong?

First what is a Floating Point Exception.

A Floating point exception is an HP-UX specific arithmetic trap which occurs when the system encounters a character that it cannot deal with when converting to/from Floating Point. These errors occur in Suprtool and STExport for HP-UX, mainly because Suprlink just matches bytes, it doesn't try to transform the data in any manner. It is the transformation or coercion from one form to another where these FPEs occur. In Suprtool, it could be from nearly any type to nearly any other type. An FPE in Suprtool is typically a coercion error from one type TO floating point/display or packed.

Why?

This article will attempt to show how to find these errors and common reasons for these errors. The most significant and really the main reason for a coercion error is that we have bad data for the area defined for a given field that is being coerced. Now essentially there is only one reason for an FPE and that is that the low level routine that converts data from one type to another encounters a character that it cannot handle.

One of two possibilities exist:

  1. You've defined either the record size or the field position incorrectly such that you are pointing to the wrong spot.
  2. You really have bad data.

Most of the time we see that there is an incorrect definition of a field or record.

Wrong record size on Input file

in baddisp,rec 80,lf
def a,1,5,display
>if a=65312
>ext a
>out *
>xeq
65312
IN=6, OUT=1. CPU-Sec=1. Wall-Sec=1.
If we do essentially the same job, but get the record size wrong by one, we see the dreaded:

>in baddisp,rec 81,lf
>def a,1,5,display
>if a=65312
>out *
>xeq
Floating exception(coredump)

Remember that on Unix files are just a series of bytes, there is no record structure. So in the case above we read the first record and it lines up ok, however, the next record will start at the wrong spot and the coercion will fail.

Bad Field Definition:

Consider the following data: more baddisp
01234
21222
00345
12345
54321xx
65312xx

Note that the last two records have characters right beside them so if we get the definition of the beginning display field incorrect we will end up trying to interpret the record with the 4321x as display data, which will of course fail.

Now we show an incorrect field definition: >in baddisp,rec 80,lf
>def a,2,5,display {obviously wrong}
>if a<>65312
>out *
>ext a
>xeq
1234
1222
0345
2345
Floating exception(coredump)

We actually have bad data!

Once we've investigated the layout for the record and for the field there is the possibility that we may have bad data. Hidden characters or escape sequences or more often than not nulls. In Suprtool you can list the data in Hex format and look for Null (00), Tab (09), CR (0D) or Line Feed (0A). These are common "bad" characters found in fields. You can use the $clean function to clean out this data, or you would have to trace the bad data to the source.

How to find what is happening?

Where do I look first?

In any given task, really the first place to look is at the fields referenced in the if command. Concentrate on the Display and packed fields referenced in the if command.

You can see each record as it is read by using the out * command instead of output to some file. (Hint: Out * just means out to stdlist so it may not be useful if you have lots of fields). The List command to a file may be more useful, you could then tail the file to see where the problem was.

It is advisable at first to limit the number of records with the numrecs command while investigating these errors. Typically if a record definition and/or field definition are incorrect, then the abort will occur in the first few records so numrecs 10 command could save you waiting for a long listing to go to the screen.

It might not be the current task

It is important to note that the problem may not be in the current task. Consider again the following data: more baddisp
01234
21222
00345
12345
54321xx
65312xx
In the first step below we just read the data and put into a self-describing file: >in baddisp,rec 80,lf
>def a,2,5,display
>ext a
>out mydata,link
>xeq
IN=6, OUT=6. CPU-Sec=1. Wall-Sec=1.
Now in a later step we may reference the bad field and the abort will then occur. So sometimes you have to trace back a step to the one that created the file you are reading: >in mydata
>form
File: mydata (SD Version B.00.00) Has linefeeds
Entry: Offset
A Z5 1
Entry Length: 5 Blocking: 1
>if a<>65213
>list
>xeq
>IN mydata (0) >OUT $NULL (0)
A = 1234?

>IN mydata (1) >OUT $NULL (1)
A = 1222?

>IN mydata (2) >OUT $NULL (2)
A = 345?

>IN mydata (3) >OUT $NULL (3)
A = 2345?
Floating exception(coredump)

Note that the "?" is a clue that you have a bad definition as Suprtool can handle spaces after numbers, but it aborts when it encounters the "x". Hopefully this gives you information on what a Floating exception is.

We are working on catching SIGFPE and trying to give more information about the circumstances in a future version of Suprtool, but for now the above are some of the techniques that you would likely need to use to find the bad data or bad definitions.

Comparing Files in Groups

I've been working on projects that requires that I have two copies of some of our source code modules and I want to compare all the files for the given groups to monitor the changes.

Here is an MPE/Qedit command file that will compare all the files in two groups. Keep in mind that this is dependent on the groups having the same number of files in the group and the same filenames.

parm group_one group_two
purge fg1,temp >$null
purge fg2,temp >$null
purge fgout,temp >$null
echo Comparing !group_one against !group_two
listf @.!group_one,6 >fg1
listf @.!group_two,6 >fg2
build fgout;rec=-256,1,f,ascii;disc=20000;temp
file fgout;acc=append
/set totals off { No totals listed }
/tq fg1,text
/set length 256
/set right 256
/appendq "~" @
/addq last=fg2
/lsortq all
/glueq "~" @
/changeq 1 "compare " @
/changeq "~"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/appendq ",tempout >$null" @
/appendq "~:if compareoutcount <> 0 then" @
/divideq "~" @
/changeq "~"" @
/appendq "~:fcopy from=tempout; to=*fgout >$null" "compareoutcount"
/divideq "~" @
/changeq "~"" @
/appendq "~:endif" "fgout"
/divideq "~" @
/changeq "~"" @
/useq *
reset fgout
/textq fgout,yes
/listq $r @
/set totals on
The command file essentially builds a list of compare commands and then if differences are found in the files, then the compare output is copied to a single file, which is listed on my attached printer at the end.

This is specific to my needs but is easily customizable. Feel free to send any comments or improvements to neil@robelle.com.

Control Where Temp Sort Space is Allocated

Occasionally we get calls and e-mails asking about sort scratch space on HP-UX and where Suprtool puts its Sort Scratch space. This is usally due to some job failing due to an out of disc spaces on a particular volume.

By default, Suprtool typically puts the temporary space it needs on /var/tmp. If this volume runs out of space when sorting data you may receive an error similar to this:

Error:  Failure in ROBSORT'INPUT routine
ROBSORT'INPUT Error 12
Unable to write to the RSORTSCR file (length = 27657)
No space left on device


Error:  ROBSORT'END Failed
ROBSORT'END Error 19
File system error
Error 0
You can control what directory/volumeset that Suprtool puts its temp sort scratch files by setting the TMPDIR variable prior to running Suprtool.
export TMPDIR=/home/bigsortspace
The above will cause Suprtool to create Suprtool temp file in the directory space /home/bigsortspace.