Thursday, January 5, 2012

Name Names and Describe Descriptions

I hate to have to spell something so fundamental out but...  here it goes.  Abbreviate what the data is and what it was and what you did to it.  Example:

Here's a simple run of the mill seismic dataset out found on the web somewhere.  As an example:

l11f1.sgy

At least they didn't call it l11f1.segy or l11f1.SEG-Y.  The standard is *.sgy.

If you found this dataset on at tape or disk someplace all by itself and had no idea what l11f1.sgy meant what's your next step?  Well, the extension's *.sgy so you'd assume it's in SEG-Y format.  Next, look at the EBCDIC header.  Many times like here:


C 1 CLIENT U.S.G.S.              COMPANY  WHFC                  CREW NO 0
C 2 LINE l11f1.     AREA  Columbia River       MAP ID None
C 3 REEL NO 1         DAY-START OF REEL 263 YEAR 2000 OBSERVER
C 4 INSTRUMENT:  Triton        MODEL ISIS       SERIAL NO
C 5 DATA TRACES/RECORD 1      AUXILIARY TRACES/RECORD         CDP FOLD
C 6 SAMPLE INTERVAL 258     SAMPLES/TRACE  2050 BITS/IN      BYTES/SAMPLE  2
C 7 RECORDING FORMAT 3      FORMAT THIS REEL        MEASUREMENT SYSTEM
C 8 SAMPLE CODE: Short Integers
C 9 GAIN TYPE:
C10 FILTERS: ALIAS     HZ  NOTCH     HZ  BAND           HZ  SLOPE        DB/OCT
C11 SOURCE:                 NUMBER/POINT         POINT INTERVAL
C12 PATTERN:                               LENGTH        WIDTH
C13 SWEEP:           HZ          HZ  LENGTH 531  MS  CHANNEL NO     TYPE
C14 TAPER:                    MS                   MS  TYPE
C15 SPREAD:
C16 GEOPHONES:
C17 PATTERN:
C18 TRACES SORTED BY: RECORD
C19 AMPLITUDE RECOVERY:
C20 MAP PROJECTION
C21 PROCESSING:
C22 ACOUSTIC SOURCE: SIS-1000                 FIRE RATE: 530 ms  SECS
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
C33
C34
C35
C36
C37
C38
C39
C40 END EBCDIC

Okay it's:
  • Client:  USGS
  • Company:  WHFC
  • Crew:  0 (this means nothing)
  • Line: l11f1
  • Area:  Columbia River
  • Record Length:  512 MS
  • Sample Rate:  258
  • Instrument:  Triton
  • Model:  Isis
  • Reel:  262 (not noted in binary header)
  • Year:  2000
  • Acoustic Source:  SIS-1000
  • Fire Rate:   530 MS

Okay, that's all fine and good for somebody who's spent 3 months struggling to get this data correct.  But your masterpiece is not a masterpiece until the rest of world has seen it and understood it.  Looking at the data, it looks stacked.  It "might" be 2D because the line name in C2 matches the filename (3D data usually just notes the first inline which is okay in some circumstances).  We know it's marine data (sort of) because of it mentioning Columbia River.  After that, you're at a dead end.

Next we have to look at the trace headers.  We have little or nothing to go on with what's going on with the X&Ys (byte locations 73-88).  Are those X and Ys, are they lat and logs with some weird system we have to figure out.  Where in God's name is the Columbia River?  If I google it and come up with something that looks like the numbers that are in the X & Ys - are they gonna match?  Probably not.

There's some 10 digit number in byte locations 181-184.  I have no idea what that's supposed to represent or what's in 193 (looks like a dupe of the water depth what's in 61-68.

My point is:  Explain what's where!  If you spent hours and hours figuring out this magnificent number number, please explain what is it and what it's there for.  If I spent months looking at this data, I'd know what it is too, but I have a deadline to figure this stuff out.  This data will stick around MUCH much longer that you will.  You'd better do good housekeeping.

In the EBCDIC header - please describe:
  • Where is data is that you're presenting
  • Who it's for
  • When it was done
  • What is the final process you're presenting
  • What steps did you take getting there
  • What will I need to load the data (what can I expect the coordinates to be in (projected etc))
  • Is there a grid?  If so, what is it?
  • What/what are the significant non-standard literals.
  • What are the byte locations for your data
  • Is it 2D/3D
Finally the dataset naming convention (the bare minimum):

  • areaname_linename_processname.sgy

Nobody cares if you have a 300 character file name.  People are going to care if you have a file which means nothing to somebody who's not part of your inner circle.

No comments:

Post a Comment