Thursday, January 5, 2012

Name Names and Describe Descriptions

I hate to have to spell something so fundamental out but...  here it goes.  Abbreviate what the data is and what it was and what you did to it.  Example:

Here's a simple run of the mill seismic dataset out found on the web somewhere.  As an example:

l11f1.sgy

At least they didn't call it l11f1.segy or l11f1.SEG-Y.  The standard is *.sgy.

If you found this dataset on at tape or disk someplace all by itself and had no idea what l11f1.sgy meant what's your next step?  Well, the extension's *.sgy so you'd assume it's in SEG-Y format.  Next, look at the EBCDIC header.  Many times like here:


C 1 CLIENT U.S.G.S.              COMPANY  WHFC                  CREW NO 0
C 2 LINE l11f1.     AREA  Columbia River       MAP ID None
C 3 REEL NO 1         DAY-START OF REEL 263 YEAR 2000 OBSERVER
C 4 INSTRUMENT:  Triton        MODEL ISIS       SERIAL NO
C 5 DATA TRACES/RECORD 1      AUXILIARY TRACES/RECORD         CDP FOLD
C 6 SAMPLE INTERVAL 258     SAMPLES/TRACE  2050 BITS/IN      BYTES/SAMPLE  2
C 7 RECORDING FORMAT 3      FORMAT THIS REEL        MEASUREMENT SYSTEM
C 8 SAMPLE CODE: Short Integers
C 9 GAIN TYPE:
C10 FILTERS: ALIAS     HZ  NOTCH     HZ  BAND           HZ  SLOPE        DB/OCT
C11 SOURCE:                 NUMBER/POINT         POINT INTERVAL
C12 PATTERN:                               LENGTH        WIDTH
C13 SWEEP:           HZ          HZ  LENGTH 531  MS  CHANNEL NO     TYPE
C14 TAPER:                    MS                   MS  TYPE
C15 SPREAD:
C16 GEOPHONES:
C17 PATTERN:
C18 TRACES SORTED BY: RECORD
C19 AMPLITUDE RECOVERY:
C20 MAP PROJECTION
C21 PROCESSING:
C22 ACOUSTIC SOURCE: SIS-1000                 FIRE RATE: 530 ms  SECS
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
C33
C34
C35
C36
C37
C38
C39
C40 END EBCDIC

Okay it's:
  • Client:  USGS
  • Company:  WHFC
  • Crew:  0 (this means nothing)
  • Line: l11f1
  • Area:  Columbia River
  • Record Length:  512 MS
  • Sample Rate:  258
  • Instrument:  Triton
  • Model:  Isis
  • Reel:  262 (not noted in binary header)
  • Year:  2000
  • Acoustic Source:  SIS-1000
  • Fire Rate:   530 MS

Okay, that's all fine and good for somebody who's spent 3 months struggling to get this data correct.  But your masterpiece is not a masterpiece until the rest of world has seen it and understood it.  Looking at the data, it looks stacked.  It "might" be 2D because the line name in C2 matches the filename (3D data usually just notes the first inline which is okay in some circumstances).  We know it's marine data (sort of) because of it mentioning Columbia River.  After that, you're at a dead end.

Next we have to look at the trace headers.  We have little or nothing to go on with what's going on with the X&Ys (byte locations 73-88).  Are those X and Ys, are they lat and logs with some weird system we have to figure out.  Where in God's name is the Columbia River?  If I google it and come up with something that looks like the numbers that are in the X & Ys - are they gonna match?  Probably not.

There's some 10 digit number in byte locations 181-184.  I have no idea what that's supposed to represent or what's in 193 (looks like a dupe of the water depth what's in 61-68.

My point is:  Explain what's where!  If you spent hours and hours figuring out this magnificent number number, please explain what is it and what it's there for.  If I spent months looking at this data, I'd know what it is too, but I have a deadline to figure this stuff out.  This data will stick around MUCH much longer that you will.  You'd better do good housekeeping.

In the EBCDIC header - please describe:
  • Where is data is that you're presenting
  • Who it's for
  • When it was done
  • What is the final process you're presenting
  • What steps did you take getting there
  • What will I need to load the data (what can I expect the coordinates to be in (projected etc))
  • Is there a grid?  If so, what is it?
  • What/what are the significant non-standard literals.
  • What are the byte locations for your data
  • Is it 2D/3D
Finally the dataset naming convention (the bare minimum):

  • areaname_linename_processname.sgy

Nobody cares if you have a 300 character file name.  People are going to care if you have a file which means nothing to somebody who's not part of your inner circle.

Tuesday, January 3, 2012

The Son of the Return of the SEG-Y Binary Header

A little more but a little more readable:


#!/bin/bash
echo -n "SEG-Y file name:  "
read _segyin


_jobid=$(hexdump -C -s 3200x -n 4 $_segyin | awk '{print $2 $3 $4 $5}')
_linenumber=$(hexdump -C -s 3204x -n 4 $_segyin | awk '{print $2 $3 $4 $5}')
_reelnumber=$(hexdump -C -s 3208x -n 4 $_segyin | awk '{print $2 $3 $4 $5}')
_tracesperrec=$(hexdump -C -s 3212x -n 2 $_segyin | awk '{print $2 $3}')
_auxtraces=$(hexdump -C -s 3214x -n 2 $_segyin | awk '{print $2 $3}')
_sampleinterval=$(hexdump -C -s 3216x -n 2 $_segyin | awk '{print $2 $3;}')
_samplespertrace=$(hexdump -C -s 3218x -n 2 $_segyin | awk '{print $2 $3;}')
_samplesperreel=$(hexdump -C -s 3220x -n 2 $_segyin | awk '{print $2 $3;}')
_filesize=$(stat -c%s $_segyin)
_formatcode=$(hexdump -C -s 3224x -n 2 $_segyin | awk '{print $2 $3;}')
_tracesortcode=$(hexdump -C -s 3228x -n 2 $_segyin | awk '{print $2 $3;}')
_feetmeters=$(hexdump -C -s 3254x -n 2 $_segyin | awk '{print $2 $3;}')


let decjobid=0x$_jobid
let declinenumber=0x$_linenumber
let decreelnumber=0x$_reelnumber
let dectracesperrec=0x$_tracesperrec
let decauxtraces=0x$_auxtraces
let decsampleinterval=0x$_sampleinterval
let decsamplespertrace=0x$_samplespertrace
let decsamplesperreel=0x$_samplesperreel
let dectracesortcode=0x$_tracesortcode
let decformatcode=0x$_formatcode
let decfeetmeters=0$_feetmeters


if [ $decformatcode = 3 ]; then
  let _bytespersample=2
else
  let _bytespersample=4
fi


let _bytespertrace=($decsamplespertrace*$_bytespersample)+240
let _minus=$_filesize-3600
let _numtraces=$_minus/$_bytespertrace


echo '************************'
echo 'Number of traces:  '$_numtraces


if [ $decjobid != 0 ]; then
  echo 'Job Id:  '$decjobid
fi
if [ $declinenumber != 0 ]; then
  echo 'Line number:  '$declinenumber
fi
if [ $decreelnumber != 0 ]; then
  echo 'Reel number:  '$decreelnumber
fi
if [ $dectracesperrec != 0 ]; then
  echo 'Traces per record:  '$dectracesperrec
fi
if [ $decauxtraces != 0 ]; then
  echo 'Number of aux traces per trace:  '$decauxtraces
fi
if [ $decsampleinterval != 0 ]; then
  echo 'Sample interval:  '$decsampleinterval
fi
if [ $decsamplespertrace != 0 ]; then
  echo 'Samples per trace:  '$decsamplespertrace
fi


if [ $decsamplesperreel != 0 ]; then
  echo 'Samples per reel:  '$decsamplesperreel
fi
if [ $decformatcode = 1 ]; then
  echo 'Format code:  4-byte IBM Floating Point'
fi
if [ $decformatcode = 2 ]; then
  echo 'Format code:  4-byte, twos complement integer'
fi
if [ $decformatcode = 3 ]; then
  echo 'Format code:  2-byte, twos complement integer'
fi
if [ $decformatcode = 4 ]; then
  echo 'Format code:  4-byte, fixed point with gain'
fi
if [ $decformatcode = 5 ]; then
  echo 'Format code:  4-byte IEEE floating point'
fi
if [ $decformatcode = 8 ]; then
  echo 'Format code:  1-byte, twos complement integer'
fi


if [ $dectracesortcode = 1 ]; then
  echo 'Trace sort code:  1 as recorded, no sorting'
fi
if [ $dectracesortcode = 2 ]; then
  echo 'Trace sort code:  2 CDP ensemble'
fi
if [ $dectracesortcode = 3 ]; then
  echo 'Trace sort code:  Single code continuous profile'
fi
if [ $dectracesortcode = 4 ]; then
  echo 'Trace sort code:  Horizontally stacked'
fi
if [ $dectracesortcode = 5 ]; then
  echo 'Trace sort code:  5 Common source point'
fi
if [ $dectracesortcode = 6 ]; then
  echo 'Trace sort code:  6 Common receiver point'
fi
if [ $dectracesortcode = 7 ]; then
  echo 'Trace sort code:  7 Common offset point'
fi
if [ $dectracesortcode = 8 ]; then
  echo 'Trace sort code:  8 Common mid-point'
fi
if [ $dectracesortcode = 9 ]; then
  echo 'Trace sort code:  9 Common conversion point'
fi
if [ $decfeetmeters = 1 ]; then
  echo 'Measurement system:  Meters'
fi
if [ $decfeetmeters = 2 ]; then
  echo 'Measurement system:  Feet'
fi
echo 'Dec filesize:  '$_filesize' bytes'
echo '************************'



You should get something that looks like this:

SEG-Y file name:  102.sgy
************************
Number of traces:  41047
Line number:  102
Reel number:  1
Traces per record:  1
Sample interval:  2000
Samples per reel:  1024
Format code:  4-byte IBM Floating Point
Measurement system:  Meters
Dec filesize:  9854992 bytes
************************

or on the rare occasion:


SEG-Y file name:  R_22.sgy
************************
Number of traces:  10122
Job Id:  666
Line number:  22
Reel number:  1
Traces per record:  1
Sample interval:  4000
Samples per trace:  2000
Samples per reel:  2001
Format code:  2-byte, twos complement integer
Trace sort code:  Horizontally stacked
Measurement system:  Meters
Dec filesize:  42924156 bytes
************************

La dee da!!


Sunday, January 1, 2012

More SEG-Y Command Line Stuff: Binary header

A really sleazy bash script to read SEG-Y binary header data.  Basically a file stuffed full of command line stuff.

#!/bin/bash
echo -n "SEG-Y file name:  "
read _segyin


_jobid=$(hexdump -C -s 3200x -n 4 $_segyin | awk '{print $2 $3 $4 $5}')
_linenumber=$(hexdump -C -s 3204x -n 4 $_segyin | awk '{print $2 $3 $4 $5}')
_reelnumber=$(hexdump -C -s 3208x -n 4 $_segyin | awk '{print $2 $3 $4 $5}')
_tracesperrec=$(hexdump -C -s 3212x -n 2 $_segyin | awk '{print $2 $3}')
_auxtraces=$(hexdump -C -s 3214x -n 2 $_segyin | awk '{print $2 $3}')
_sampleinterval=$(hexdump -C -s 3216x -n 2 $_segyin | awk '{print $2 $3;}')
_samplespertrace=$(hexdump -C -s 3220x -n 2 $_segyin | awk '{print $2 $3;}')
_filesize=$(stat -c%s $_segyin)
_formatcode=$(hexdump -C -s 3224x -n 2 $_segyin | awk '{print $2 $3;}')


let decjobid=0x$_jobid
let declinenumber=0x$_linenumber
let decreelnumber=0x$_reelnumber
let dectracesperrec=0x$_tracesperrec
let decauxtraces=0x$_auxtraces
let decsampleinterval=0x$_sampleinterval
let decsamplespertrace=0x$_samplespertrace
let decformatcode=0x$_formatcode
let declocation1=0x$_location1


echo 'Job Id:  '$decjobid
echo 'Line number:  '$declinenumber
echo 'Reel number:  '$decreelnumber
echo 'Traces per record:  '$dectracesperrec
echo 'Number of aux traces per trace:  '$decauxtraces
echo 'Sample interval:  '$decsampleinterval
echo 'Samples per trace:  '$decsamplespertrace
echo 'Format code:  '$decformatcode
echo 'Dec filesize:  '$_filesize