From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gsl-discuss-return-5602-listarch-gsl-discuss=sources.redhat.com@sourceware.org>
Received: (qmail 19144 invoked by alias); 7 Jan 2010 03:29:21 -0000
Received: (qmail 19130 invoked by uid 22791); 7 Jan 2010 03:29:19 -0000
X-SWARE-Spam-Status: No, hits=-3.1 required=5.0
	tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Received: from mail.phy.duke.edu (HELO mail.phy.duke.edu) (152.3.182.2)
    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 07 Jan 2010 03:29:13 +0000
Received: from localhost (localhost [127.0.0.1])
	by mail.phy.duke.edu (Postfix) with ESMTP id 01493780B4;
	Wed,  6 Jan 2010 22:29:05 -0500 (EST)
Received: from mail.phy.duke.edu ([127.0.0.1])
	by localhost (mail.phy.duke.edu [127.0.0.1]) (amavisd-new, port 10026)
	with LMTP id wRG46gbJUHbH; Wed,  6 Jan 2010 22:29:04 -0500 (EST)
Received: from lilith.rgb.private.net (cpe-069-134-079-008.nc.res.rr.com [69.134.79.8])
	by mail.phy.duke.edu (Postfix) with ESMTP id B4322780B2;
	Wed,  6 Jan 2010 22:29:04 -0500 (EST)
Date: Thu, 07 Jan 2010 03:29:00 -0000
From: "Robert G. Brown" <rgb@phy.duke.edu>
To: Gerard Jungman <jungman@lanl.gov>
cc: GSL Discuss Mailing List <gsl-discuss@sourceware.org>
Subject: Re: gsl container designs
In-Reply-To: <1262829020.27244.361.camel@manticore.lanl.gov>
Message-ID: <alpine.LFD.2.00.1001062138040.3462@localhost>
References: <1259110486.3028.69.camel@manticore.lanl.gov>  <m3fx7q4mdr.wl%bjg@network-theory.co.uk> <4B4477B8.50305@iki.fi>  <alpine.LFD.2.00.1001061011530.3462@localhost> <1262829020.27244.361.camel@manticore.lanl.gov>
User-Agent: Alpine 2.00 (LFD 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gsl-discuss.sourceware.org>
List-Subscribe: <mailto:gsl-discuss-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gsl-discuss/>
List-Post: <mailto:gsl-discuss@sourceware.org>
List-Help: <mailto:gsl-discuss-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gsl-discuss-owner@sourceware.org
X-SW-Source: 2010-q1/txt/msg00006.txt.bz2

On Wed, 6 Jan 2010, Gerard Jungman wrote:

> In fact, I don't even see a need to sacrifice efficiency. I think

I recall from my benchmarking days that -- depending on compiler --
there is a small dereferencing penalty for packed matrices (vectors
packed into dereferencing **..* pointers) compared to doing the offset
arithmetic via brute force inline or via a macro.  I ran two different
implementations of stream (one with direct pointer traversal of the
vectors, one with vector offsets repacked into **matrices to do the
tests.  The size of this penalty did decrease as gcc advanced; I haven't
run the benchmark recently and don't know how large it currently is.  It
was never so large that it stopped me from using repacked pointers for
code clarity.

> this goal is directly achievable at essentially optimal efficiency.
> My attitude about C is that it is all about lining up the bytes.
> A good C design is just some syntactic layer on top of a
> rudimentary layer, where-in you are careful to line up all
> the bytes. After that, optimal efficiency is pretty much
> automatic. C is very friendly, once you are careful to
> line everything up. I hope you see my small but non-vacuous
> technical meaning.

Agreed.  And I can see no reason that modern compilers may not have
eliminated most of the dereferencing penalty, especially if one aligns
things carefully.  However, some of my beowulfish friends who work(ed)
for compiler companies argued that one of Fortran's longstanding
advantages in numerical code efficiency is simply because of the fact
that matrices in Fortran are written in stone and so compiler writers
can optimize the hell out of them.  C pointers, OTOH -- well, one can
make triangular arrays, rectangular arrays, one can start the indices at
0 or 1 or -100 or 47 as you like and run to wherever you like.  How can
anyone pre-optimize a compiler to handle all of that?

Truthfully, I can't see why they CAN'T, but I'm not a compiler writer
and empirically there has been a small penalty between C in general and
fortran and C with direct addressing and C with dereferenced addressing
as well.  IIRC there is one more layer of address resolution in indirect
dereferencing, which is the fundamental origin of the latter.

>> What I want is extremely simple:
>>
>> * A way to construct/allocate a matrix of arbitrary dimension (out to some
>> (un)reasonable max, e.g. 10
>
> Check.
>
>
>> but I want to be able to use it
>> just like m[i][j][k]...[r][s] when all is said and done.
>
> This is the only part which I wonder about. I thought about this
> before and came to the conclusion that it is not easy to get right.
> Probably possible, but very tricky.

I've written code to do so many times -- Numerical Recipes provides the
basic idea in its matrix-packing routines.  Simply allocate e.g.

double **..*m,*v;

Then malloc v long enough to hold the actual data.  Then malloc m to
hold the first index, malloc vectors long enough to hold the second
index and assign them to these indices in a first-index loop (recurse
until the next to last index) and finally pack the appropriately offset
addresses from the actual vector v into the next to last index so that
the last index actually traverses v.  In this scheme the last index is
always the linear/fast index in the actual data vector, the other
indices are all there to permit the compiler to handle the
dereferencing.

To free the matrix, one has to work backwards -- free each successive
index one malloc'd from the right back to the left (you need the
leftmost indices to know the addresses to free at each level), and
finally free m and v themselves.  All very tedious and mechanical, but
all fully automatable and debuggable as well.  The biggest problem with
NR is that each type has an individual allocation/free command, and it
only goes to 2 dimensions.  To make a single routine pair handle more
dimensions and arbitrary types is more work and requires void typing and
casting (and/or switches to handle the unique aspects of individual
tyees internally).

> The most annoying aspect of this is that these little problems
> are trivial in C++. But maybe just beyond the capability of C.

How could that be?  People write operating systems in C.  I hardly think
that allocating sensible matrices is beyond it.

I'm not trying to be argumentative or flame, by the way.  Could you be
more explicit about why packing pointers to permit the dereferencing of
an arbitrary matrix is technically difficult (as opposed to merely
tedious) in C and would be easier in C++ (which I thought discouraged
the use of pointers altogether)?

>> * The specific dimension bounds need to be user specified.  If I want to
>> make a 6 dimensional triple triangular array to hold e.g. 3j symbols, I
>> don't want to have to allocate a rectangular array most of which is
>> empty wasted space.
>
> See the earlier comment by Tuomo Keskitalo, about banded/packed storage.
> Not something I have thought about yet, but probably not hard.

I'm not sure about "banded", but packed is (as noted) simple.  It is
just a matter of picking the lengths of the successive ***, **, * length
vectors one mallocs to fill the toplevel e.g. **** pointer, plus the use
of appropriate offsets.  In particular, NR contains examples of how to
do e.g.:

   **dmatrix(l1,u1,l2,u2)

for arbitrary lower/upper indices in the two slots (as well as the
associated free) and it is straightforward to generalize the basic
algorithm to higher dimensions and allow for arbitrary data types.  I
don't think twice about allocating a vector of structs of more or less
arbitrary length (including structs that contain vectors or matrices
themselves) and dereferencing it via a packed, cast or typed, matrix.

This is one of the ADVANTAGES of C -- one can just do what one wants to
do in the most natural way via pointers, mallocs and suitable typedefs
and/or structs.  Without a safety net, admittedly, and you're
responsible for both code integrity and efficiency with little compiler
help, but OTOH you can quite easily visualize what the underlying
assembler is going to (approximately) look like, so one CAN tweak it in
reasonable ways for efficiency without resorting to actual assembler.

>> * Access to the underlying actual data vector (non-opaque, in other
>> words) so it can be passed to e.g. ODE solvers and the like that can
>> only be sensibly written in a general way for a vector.
>
> Check.
>
>
>> My problem in this regard with the GSL is that so far, the matrix and
>> vector constructs are pretty much useless -- they are far away from the
>> standard way one usually does this sort of thing in C using pointers or
>> just allocating a vector or matrix.
>
> I find them useless as well, for exactly these reasons.
>
>
>> One of the ADVANTAGES of C is that it permits one to work pointer magic
>> to improve the readability (and sometimes execution efficiency) of code.
>> Opaque types and set/get calls are dark evil in both regards.
>
> Right. The layer which defines the "types" should be essentially
> syntactic, just there to make it easier to organize/understand
> your code. The underlying implementation should be a brutally
> simple pile of bytes, and it should be visible.

Amen, my brother.  C is a THIN veneer of upper-level language
sensibilities on top of raw assembler, and its ability to work "close to
the metal" is its power.  This power should be preserved as much as
possible in the GSL, in the process of providing a human readable
interface.  I actually wrote all of this, in a more or less GSL--like
form, several years ago.  Some of the code was ugly, but it worked
charmlike, at the cost of requiring a cast of a void type to tell the
compiler how to dereference the outermost (actual) vector index.
Internally, all the dereferencing pointer arithmetic is more or less the
same for a void (universal) type, but the pointer into the actual data vector 
has to be advanced with the correct stride, and the compiler has to know 
what that stride should be.

I can probably dig this code out and post it here (or post a link to it)
if you want to look at it.  I think I went up to 9 or 10 dimensions (way
more than I needed) and did a lot of things via brute force to make the
purely mechanical recursion obvious; I planned an eventual rewrite into
a more compact form that I never got around to.

    rgb

>
>
> --
> G. Jungman
>
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu