One of the more recent questions that has come up lately is How Do I Import data with Suprtool. Let's say we have the following data layout, Image, Eloquence or SD file, it doesn't matter. The purpose of this small white paper would be to extract data to a ".csv" file, and then import that same data back. Closing the loop so to speak.
To begin our project we will extract the data from a sample data source and
output to an SD file and then use STExport to create a CSV file.
>get d-inventory
>out dinv,link
>xeq
IN=13, OUT=13. CPU-Sec=1. Wall-Sec=1.
:run stexport.pub.robelle
$in dinv
$out dinvcsv
$xeq
In=13. Out=13. CPU-Sec=1. Wall-Sec=1.
$listf dinvcsv,2
ACCOUNT= GREEN GROUP= NEILSo at this point we know have a file, typically how a customer would want to import using Suprtool for adding to a database etc. Note that Suprtool does not handle variable-length files so the first step is to convert the Variable length file to fixed length.
FILENAME CODE ------------LOGICAL RECORD----------- ----SPACE----
SIZE TYP EOF LIMIT R/B SECTORS #X MX
DINVCSV 152B VA 13 13 1 16 1 *
On MPE you can convert using Fcopy:
/file dinvcsvf;rec=-152,1,f,ascii
On HP-UX you can use Qedit:
/fcopy from=dinvcsv; to=*dinvcsvf;new
/t dinvcsv
Or Awk: (Thanks to Barry Lake from Allegro for fixify.sh)
/set keep var off
/k dinvcsvf
#!/bin/sh
#
# Script to turn a typical bytestream file (variable length
# records) into a file with fixed length records by padding
# records with spaces as needed. The resulting record length
# will be that of the longest record in the file.
export INFILE=$1
OUTFILE=$(mktemp)
# Step 1: Find the length of the longest record in the input file
# file and store it in a variable for use in the next step.
export MAXLEN=$(awk 'BEGIN { len = 0; }
{ if (length > len)
len = length; }
END { print len; }' $INFILE )
echo Maximum record length in \"$INFILE\" is $MAXLEN.
# Step 2: Get the MAXLEN variable; use it to create a string
# of that many blanks; then use that to pad each input
# record as needed to make it the same length as the
# longest record.
awk 'BEGIN { "echo $MAXLEN" | getline maxlen; spaces = "";
for(i=0; i$OUTFILE
# Step 3: awk can't edit a file in place so its output was
# written to a new file which we then pour back into the
# original file. Note: we could just as easily have done
# cp $OUTFILE $INFILE or mv $OUTFILE $INFILE, but in that
# case we'd lose the original file's creator and
# permission bits.
cat $OUTFILE > $INFILE
rm $OUTFILE
You can run the above script:
$./fixify.sh datafile
At this point Suprtool can now read what is essentially variable-length data in a fixed length file.
Now remember what a typical CSV file looks like! Byte fields are surronded by quotes and separated or delimited with commas:
159,19910827,1,50532001,5053,9449,"Test index","Test index","Test index","Test index"
The first step is to separate out each field from the record based on the delimeter, which in this case is the comma. I use byte lengths for each number field based on the rules for output ,ascii table.
Duplicated below:
I1 J1 06 bytes I2 J2 11 bytes I3 J3 16 bytes I4 J4 20 bytes K1 05 bytes K2 10 bytes Zn n+1 bytes Pn n bytes
in dinvcsvf
{define targets}
def bin-x,1,6
def last-x,1,11
def qty-x,1,11
def prod-x,1,9
def supp-x,1,9
def cost-x,1,8
def desc1-x,1,22 {Note room for quotes}
def desc2-x,1,22
def desc3-x,1,22
def desc4-x,1,22
{define source}
def record,1,152
{extract using split }
ext bin-x=$split(record,first,",")
ext last-x=$split(record,",",1,",")
ext qty-x=$split(record,",",2,",")
ext prod-x=$split(record,",",3,",")
ext supp-x=$split(record,",",4,",")
ext cost-x=$split(record,",",5,",")
ext desc1-x=$split(record,",",6,",")
ext desc2-x=$split(record,",",7,",")
ext desc3-x=$split(record,",",8,",")
ext desc4-x=$trim($split(record,",",9,last))
{where?}
out myfile,link
xeq
Now we just need to "close the loop" and extract the individual byte type fields into their appropriate data types. In this step we also "clean" the data of the double quotes.
Keep in mind that the data format that we want:
The defines for the above are in the section starting with the comment {Actual targets}
in myfile
{re-define number in display format}
def bin-z,bin-x,display
def last-z,last-x,display
def qty-z,qty-x,display
def prod-z,prod-x,display
def supp-z,supp-x,display
def cost-z,cost-x,display
{Actual targets}
{number}
def bin ,1,2,integer
def last,1,4,double
def qty ,1,4,double
def prod,1,8,display
def supp,1,8,display
def cost,1,4,packed
{bytes}
def desc1,1,20
def desc2,1,20
def desc3,1,20
def desc4,1,20
ext bin=$number(bin-z)
ext last=$number(last-z)
ext qty=$number (qty-z)
ext prod=$number(prod-z)
ext supp=$number(supp-z)
ext cost=$number(cost-z)
clean '"'
ext desc1=$trim($clean(desc1-x))
ext desc2=$trim($clean(desc2-x))
ext desc3=$trim($clean(desc3-x))
ext desc4=$trim($clean(desc4-x))
out loop,link
xeq
The SD file (loop) is now in the same format and layout of the original Dataset extraction prior to the STExport task. We've now closed the loop.
>form loop
File: LOOP.NEIL.GREEN (SD Version B.00.00)
Entry: Offset
BIN I1 1
LAST I2 3
QTY I2 7
PROD Z8 11
SUPP Z8 19
COST P8 27
DESC1 X20 31
DESC2 X20 51
DESC3 X20 71
DESC4 X20 91
Limit: 13 EOF: 13 Entry Length: 110 Blocking: 37
So to summarize the first step is to split out the data into separate fields with byte data types. Then to convert by re-defining with a new name to reference the data as display and use $clean and $number to extract into the final targets.
A customer recently asked how they could use Suprtool to generate some random data. The answer is you can't. But Barry Durand came up with a reasonable Unix shell script, using something he found on the net.
Here is a script that called rand3:
a[0]=a; a[1]=b; a[2]=c; a[3]=d; a[4]=e; a[5]=f; a[6]=g; a[7]=h a[8]=i;
a[9]=j; a[10]=k;a[11]=l; a[12]=m; a[13]=n; a[14]=o; a[15]=p a[16]=q;
a[17]=r; a[18]=s; a[19]=t; a[20]=u; a[21]=v; a[22]=w a[23]=x; a[24]=y;
a[25]=z; a[26]=1; a[27]=2; a[28]=3; a[29]=4 a[30]=5;
a[31]=6; a[32]=7; a[33]=8; a[34]=9; a[35]=0
echo ${a[$RANDOM%36]}${a[$RANDOM%36]}${a[$RANDOM%36]}
./rand3.sh gtj ./rand3.sh 7o2
Someone asked how to remove trailing spaces on HP-UX. Here is one way:
pr -t INFILE >INFILE.spaces
One of the more common tech calls we get is: my job aborted with Floating Exception(coredump) what is wrong?
A Floating point exception is an HP-UX specific arithmetic trap which occurs when the system encounters a character that it cannot deal with when converting to/from Floating Point. These errors occur in Suprtool and STExport for HP-UX, mainly because Suprlink just matches bytes, it doesn't try to transform the data in any manner. It is the transformation or coercion from one form to another where these FPEs occur. In Suprtool, it could be from nearly any type to nearly any other type. An FPE in Suprtool is typically a coercion error from one type TO floating point/display or packed.
One of two possibilities exist:
Most of the time we see that there is an incorrect definition of a field or record.
in baddisp,rec 80,lf
If we do essentially the same job, but get the record size wrong by one, we
see the dreaded:
def a,1,5,display
>if a=65312
>ext a
>out *
>xeq
65312
IN=6, OUT=1. CPU-Sec=1. Wall-Sec=1.
>in baddisp,rec 81,lf
>def a,1,5,display
>if a=65312
>out *
>xeq
Floating exception(coredump)
Remember that on Unix files are just a series of bytes, there is no record structure. So in the case above we read the first record and it lines up ok, however, the next record will start at the wrong spot and the coercion will fail.
more baddisp
01234
21222
00345
12345
54321xx
65312xx
Note that the last two records have characters right beside them so if we get the definition of the beginning display field incorrect we will end up trying to interpret the record with the 4321x as display data, which will of course fail.
Now we show an incorrect field definition:
>in baddisp,rec 80,lf
>def a,2,5,display {obviously wrong}
>if a<>65312
>out *
>ext a
>xeq
1234
1222
0345
2345
Floating exception(coredump)
You can see each record as it is read by using the out * command instead of output to some file. (Hint: Out * just means out to stdlist so it may not be useful if you have lots of fields). The List command to a file may be more useful, you could then tail the file to see where the problem was.
It is advisable at first to limit the number of records with the numrecs command while investigating these errors. Typically if a record definition and/or field definition are incorrect, then the abort will occur in the first few records so numrecs 10 command could save you waiting for a long listing to go to the screen.
more baddisp
01234
21222
00345
12345
54321xx
65312xx
In the first step below we just read the data and put into a self-describing
file:
>in baddisp,rec 80,lf
>def a,2,5,display
>ext a
>out mydata,link
>xeq
IN=6, OUT=6. CPU-Sec=1. Wall-Sec=1.
Now in a later step we may reference the bad field and the abort will
then occur. So sometimes you have to trace back a step to the one that
created the file you are reading:
>in mydata
>form
File: mydata (SD Version B.00.00) Has linefeeds
Entry: Offset
A Z5 1
Entry Length: 5 Blocking: 1
>if a<>65213
>list
>xeq
>IN mydata (0) >OUT $NULL (0)
A = 1234?
>IN mydata (1) >OUT $NULL (1)
A = 1222?
>IN mydata (2) >OUT $NULL (2)
A = 345?
>IN mydata (3) >OUT $NULL (3)
A = 2345?
Floating exception(coredump)
Note that the "?" is a clue that you have a bad definition as Suprtool can handle spaces after numbers, but it aborts when it encounters the "x". Hopefully this gives you information on what a Floating exception is.
We are working on catching SIGFPE and trying to give more information about the circumstances in a future version of Suprtool, but for now the above are some of the techniques that you would likely need to use to find the bad data or bad definitions.
I've been working on projects that requires that I have two copies of some of our source code modules and I want to compare all the files for the given groups to monitor the changes.
Here is an MPE/Qedit command file that will compare all the files in two groups. Keep in mind that this is dependent on the groups having the same number of files in the group and the same filenames.
parm group_one group_two
The command file essentially builds a list of compare commands and then
if differences are found in the files, then the compare output is copied
to a single file, which is listed on my attached printer at the end.
purge fg1,temp >$null
purge fg2,temp >$null
purge fgout,temp >$null
echo Comparing !group_one against !group_two
listf @.!group_one,6 >fg1
listf @.!group_two,6 >fg2
build fgout;rec=-256,1,f,ascii;disc=20000;temp
file fgout;acc=append
/set totals off { No totals listed }
/tq fg1,text
/set length 256
/set right 256
/appendq "~" @
/addq last=fg2
/lsortq all
/glueq "~" @
/changeq 1 "compare " @
/changeq "~"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/changeq " ,"," @
/appendq ",tempout >$null" @
/appendq "~:if compareoutcount <> 0 then" @
/divideq "~" @
/changeq "~"" @
/appendq "~:fcopy from=tempout; to=*fgout >$null" "compareoutcount"
/divideq "~" @
/changeq "~"" @
/appendq "~:endif" "fgout"
/divideq "~" @
/changeq "~"" @
/useq *
reset fgout
/textq fgout,yes
/listq $r @
/set totals on
This is specific to my needs but is easily customizable. Feel free to send any comments or improvements to neil@robelle.com.
By default, Suprtool typically puts the temporary space it needs on /var/tmp. If this volume runs out of space when sorting data you may receive an error similar to this:
Error: Failure in ROBSORT'INPUT routine ROBSORT'INPUT Error 12 Unable to write to the RSORTSCR file (length = 27657) No space left on device Error: ROBSORT'END Failed ROBSORT'END Error 19 File system error Error 0You can control what directory/volumeset that Suprtool puts its temp sort scratch files by setting the TMPDIR variable prior to running Suprtool.
export TMPDIR=/home/bigsortspaceThe above will cause Suprtool to create Suprtool temp file in the directory space /home/bigsortspace.
|
|
|---|