Re: F77 indexed file support

public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed

From: Roland Hughes <roland@logikalsolutions.com>
To: Arjen Markus <arjen.markus895@gmail.com>,
	Thomas Koenig <tkoenig@netcologne.de>
Cc: "fortran@gcc.gnu.org" <fortran@gcc.gnu.org>
Subject: Re: F77 indexed file support
Date: Wed, 8 Mar 2023 07:31:07 -0600	[thread overview]
Message-ID: <836b7c0f-4a1f-a92f-d835-8207da0f58cc@logikalsolutions.com> (raw)
In-Reply-To: <CAMCbSMq=RBzPL-szSfqHscbB8AiKCU1MpX5-GZE73G1qYMzwhA@mail.gmail.com>

Hello Arjen,

Thanks for your reply.

You are confusing RMS Files-11 file versioning with Indexing.

Sorry, this got away from me. Once I started I couldn't stop.

Real computers, didn't matter who made them or their OS, all provided at 
least one type of indexed file. These were business class platforms. 
Even scientific users needed a business class platform.

Organization types:

*DIRECT *

This was also called a "Hash File" on many platforms. It was a glorified 
sequential file you accessed via RECORD NUMBER. Sometimes data was 
stored and accessed as an on-disk array. Most had some kind of hash 
algorithm.

On the PC platform a somewhat crippled version of this was the file 
format used by DBase and the other Xbase tools of the day used. Each 
"index" was stored in its own file. The platform could only have one 
"index" open at a time so inserts/additions to the data were only 
reflected in that file. You had to PACK the data file and REINDEX each 
time you opened a different "index" because that was the only way to be 
certain they were in synch.

https://www.theminimumyouneedtoknow.com/xbase_book.html

If you really want to know more, you can download that book from the 
xBaseJ project on SourceForge. I wrote it and donated it to the project.

No matter what platform DIRECT access had the problem of deletions. The 
were only "marked" as deleted within the data. You had to PACK a file to 
remove them. Lazy developers used bad hash algos so you also had to deal 
with collisions and missing data.

Today we have amazing hash algos. Even in commercial relational database 
systems certain indexes/keys are implemented via hash because it is the 
fastest for look-up only data.

DIRECT access files are still popular on the lesser platforms because, 
if you have a logically contiguous sequential file and mandate fixed 
size records, the C/C++ fseek() function lets you basically implement one.

*ISAM*

I had to spend two weeks in COBOL I drawing on paper and chalk board 
until I understood how this worked. While the VAX claimed support for it 
(to make it easier to port IBM software to VAX) I never encountered 
anyone using it. Even this small description will be more than anyone 
wishes to know about it.

Indexed Sequential Access Method was really only used on platforms that 
allocated disk storage in Volumes, Cylinders, and Tracks. A certain 
number of tracks (or cylinders) were allocated to the "index area." This 
was assumed to be at the top of the file. There key values would be 
grouped into partially filled "buckets" along with their data track number.

The Primary Data Area would be allocated a certain number of tracks, 
cylinders, or even volumes (a volume is an entire disk spindle).

The Overflow Data Area got allocated tracks, cylinders, and potentially 
volumes as well.

We will skip the conversation of "bucket splits." Buckets were fixed 
size, you tried to keep them 50% or less full. Finding a record was a 
linear search in the index area reading the first entry of each bucket 
until you found one greater than what you were looking for which meant 
your value existed in the previous bucket if at all.

Assuming your key was in said bucket, you then read the track in the 
data area and chewed through it record by record (they could be packed) 
looking for the record with your key or one greater. (lesser if you were 
using a descending index)

After going through one or more tracks and coming up empty handed your 
search was not over, it had to perform a sequential search on the entire 
overflow area.

All of this pain I just shared with you was handled by the OS. Your 
COBOL, FORTRAN, BASIC, DIBOL, etc. program simply did READ RECORD blah 
KEY EQ more-blah.

If you were coding in Assembly, this pain you had to do yourself, 
especially if your OS didn't provide a system services library to do it 
mostly for you.

Deletions are simply flagged deleted. You have to REORG these files to 
empty out the overflow area, reclaim deleted data space, and clean up 
the buckets.

*VSAM*

Virtual Sequential Access Method.

Has only Index Areas (plural) and Data Areas. Each "bucket" is at least 
one disk block in size. At its end it contains a link to the next area 
bucket. These buckets can be scattered anywhere on disk. If you have 
bound volumes or any other OS ability to group multiple disks into one 
logical disk, they can be on any of those spindles. Data areas are the same.

You have the option of processing this file sequentially by doing a 
keyed hit to the first record of some index and sequentially reading. It 
will read in key sequence until end.

Index buckets are required to have the actual data bucket of the record. 
Records could span blocks/buckets and they could be packed.

Deleted records were actually deleted. If block/bucket spanning was 
enabled you paid a bit of overhead price while things shuffled around.

The file system keeps track of the lowest and highest key value for each 
index as well as the bucket count for the index. A binary search is done 
to find the correct index bucket.

Again, COBOL, FORTRAN, BASIC, DIBOL, etc programmers just did their 
language's version of READ RECORD blah KEY EQ more-blah. The OS did 
everything else.

Assembler programmers on platforms that didn't have system services to 
call had to do all of this on their own.

We all had to take Assembler so we all had to learn this stuff.

Sorry, once that started spilling out I couldn't stop it.

Roland

On 3/7/2023 11:32 PM, Arjen Markus wrote:
> I have never worked much with VAXes, but I do remember that VAX used a 
> file system where you made a new version of a file and the older 
> versions were automatically kept. I guess that is the purpose of the 
> INDEXED organisation. It is not so much a limitation of gfortran that 
> it does not support this, but a consequence of the operating system's 
> completely different view on files and file management.
>
> Regards,
>
> Arjen
>
> Op di 7 mrt 2023 om 23:58 schreef Thomas Koenig via Fortran 
> <fortran@gcc.gnu.org>:
>
>     Hi Roland,
>
>     >   210  OPEN (UNIT=K_DRAW_CHAN,
>     >       1        FILE=DRAWING_DATA,
>     >       2        STATUS='OLD',
>     >       3        ORGANIZATION='INDEXED',
>
>     I'd never heard of that one up to now.
>
>     >       4        ACCESS='KEYED',
>     >       5        RECORDTYPE='FIXED',
>     >       6        FORM='UNFORMATTED',
>     >       7        RECL=K_DRAWING_RECORD_SIZE/4,
>     >       8        CARRIAGECONTROL='FORTRAN',
>     >       9        KEY=(1:8:CHARACTER),
>     >       1        DISP='KEEP',
>     >       2        IOSTAT=L_DRAW_ERR,
>     >       3        ERR=999)
>     >
>     > The ORGANIZATION='INDEXED' is key.
>     >
>     > GnuCOBOL
>     >
>     > https://gnucobol.sourceforge.io/
>     >
>     > uses the BerkleyDB (sp?) library so the standard COBOL indexed file
>     > support from the big computers can at least be mimicked.
>     >
>     > I'm searching everywhere and I cannot find Gnu Fortran (any flavor)
>     > having an ORGANIZATION clause in the OPEN().
>
>     ORGANIZATION is not an extension that gfortran supports.
>     ifort, which traces its lineage back to VMS Fortran, supports
>     ORGANIZATION, but not 'INDEXED', according to
>
>     https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference/file-operation-i-o-statements/open-statement-specifiers/open-organization-specifier.html
>
>     This is likely a Fortran interface to a VMS speciality; the older
>     operating systems had stuff like that.  UNIX did away with all
>     the record-orientation (I also remember VSAM and ISAM data sets
>     on old IBM mainframes) and UNIX and derivatives, and Windows, now
>     just offers the "stream of bytes" model.
>
>     So, if you need the functionality, you will have to implement it
>     yourself, possibly via a database.
>
>     Best regards
>
>             Thomas
>
-- 
Roland Hughes, President
Logikal Solutions
(630)-205-1593  (cell)
http://www.theminimumyouneedtoknow.com
http://www.infiniteexposure.net
http://www.johnsmith-book.com

next prev parent reply	other threads:[~2023-03-08 13:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07 22:18 Roland Hughes
2023-03-07 22:58 ` Thomas Koenig
2023-03-08  5:32   ` Arjen Markus
2023-03-08 13:31     ` Roland Hughes [this message]
2023-03-08  7:57 ` Bernhard Reutner-Fischer
2023-03-08 13:32   ` Roland Hughes
2023-03-08 14:30     ` Arjen Markus
2023-03-08 15:19       ` Roland Hughes
2023-03-09  8:09         ` Arjen Markus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=836b7c0f-4a1f-a92f-d835-8207da0f58cc@logikalsolutions.com \
    --to=roland@logikalsolutions.com \
    --cc=arjen.markus895@gmail.com \
    --cc=fortran@gcc.gnu.org \
    --cc=tkoenig@netcologne.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).