gsl container designs

public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed

* gsl container designs
@ 2009-11-25  0:55 Gerard Jungman
  2009-11-29 21:04 ` Brian Gough
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Gerard Jungman @ 2009-11-25  0:55 UTC (permalink / raw)
  To: gsl-discuss

[-- Attachment #1: Type: text/plain, Size: 266 bytes --]

Here are header files for a couple different approaches to containers.
I didn't bother with any implementations; it seems obvious how to
implement most of these functions.

The designs are not complete, but they express most of
the important stuff.

--
G. Jungman

[-- Attachment #2: gsl-container-designs.tar.gz --]
[-- Type: application/x-compressed-tar, Size: 11583 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2009-11-25  0:55 gsl container designs Gerard Jungman
@ 2009-11-29 21:04 ` Brian Gough
  2009-12-04 18:48 ` Brian Gough
  2010-01-06 12:04 ` Tuomo Keskitalo
  2 siblings, 0 replies; 21+ messages in thread
From: Brian Gough @ 2009-11-29 21:04 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

At Tue, 24 Nov 2009 17:54:46 -0700,
Gerard Jungman wrote:
> Here are header files for a couple different approaches to containers.
> I didn't bother with any implementations; it seems obvious how to
> implement most of these functions.
> 
> The designs are not complete, but they express most of
> the important stuff.

Thanks, I will study them this week.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2009-11-25  0:55 gsl container designs Gerard Jungman
  2009-11-29 21:04 ` Brian Gough
@ 2009-12-04 18:48 ` Brian Gough
  2009-12-09 21:04   ` using GSL with C++ (was Re: gsl container designs) James Amundson
  2010-01-06 11:45   ` gsl container designs Tuomo Keskitalo
  2010-01-06 12:04 ` Tuomo Keskitalo
  2 siblings, 2 replies; 21+ messages in thread
From: Brian Gough @ 2009-12-04 18:48 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

At Tue, 24 Nov 2009 17:54:46 -0700,
Gerard Jungman wrote:
> Here are header files for a couple different approaches to containers.
> I didn't bother with any implementations; it seems obvious how to
> implement most of these functions.
> 
> The designs are not complete, but they express most of
> the important stuff.

Thanks for the document, I have studied the designs this week.  It
seems that changing to design 1 / 1u / 2 would be trading one set of
problems for another.  Looking at each case, the change doesn't seem
sufficient to justify the compatibility cost.

Regarding the introductory points.

2a+b) While we may have had some kind of goal of supporting C++ I
don't think it's worth encouraging that today as the gap between the
languages has widened so much since the start of the project.

3) Non-levelised types.  These seem to be the price for type safety.
In terms of the look/feel, expressions like &row.vector and
&column.vector don't seem too unnatural to me.

4) I think this can be fixed in the current framework.

5) Since row-major was in the original design it's too much of a break
to change that.

I think the most realistic way forward is to add a multiarray type,
with rank-1, rank-2 and rank-N versions, in the existing framework and
use non-levelised types for const/non-const views to preserve
type-safety.

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 21+ messages in thread

* using GSL with C++ (was Re: gsl container designs)
  2009-12-04 18:48 ` Brian Gough
@ 2009-12-09 21:04   ` James Amundson
  2009-12-09 21:14     ` Jochen Küpper
                       ` (4 more replies)
  2010-01-06 11:45   ` gsl container designs Tuomo Keskitalo
  1 sibling, 5 replies; 21+ messages in thread
From: James Amundson @ 2009-12-09 21:04 UTC (permalink / raw)
  To: gsl-discuss; +Cc: Brian Gough, Gerard Jungman

On 12/04/2009 12:36 PM, Brian Gough wrote:
> At Tue, 24 Nov 2009 17:54:46 -0700,
> Gerard Jungman wrote:
>    
>> Here are header files for a couple different approaches to containers.
> Regarding the introductory points.
>
> 2a+b) While we may have had some kind of goal of supporting C++ I
> don't think it's worth encouraging that today as the gap between the
> languages has widened so much since the start of the project.
>    

I, for one, use GSL almost exclusively from C++. When I'm not using C++, 
I am usually using Python. How are other readers of this list using GSL? 
Are you using a pure C environment, or do you use C++ and/or other 
languages?

I have no interest in debating the relative merits of various languages; 
I am just curious to know what is happening in practice.

--Jim Amundson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: using GSL with C++ (was Re: gsl container designs)
  2009-12-09 21:04   ` using GSL with C++ (was Re: gsl container designs) James Amundson
@ 2009-12-09 21:14     ` Jochen Küpper
  2009-12-09 21:54     ` Jari Häkkinen
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Jochen Küpper @ 2009-12-09 21:14 UTC (permalink / raw)
  To: gsl-discuss

On 09.12.2009, at 22:02, James Amundson wrote:

> On 12/04/2009 12:36 PM, Brian Gough wrote:
>> 

>> 2a+b) While we may have had some kind of goal of supporting C++ I
>> don't think it's worth encouraging that today as the gap between the
>> languages has widened so much since the start of the project.  
> 
> I, for one, use GSL almost exclusively from C++. When I'm not using C++, I am usually using Python. How are other readers of this list using GSL? Are you using a pure C environment, or do you use C++ and/or other languages?


C++ and sometimes Python (here I more often use scipy than pygsl)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: using GSL with C++ (was Re: gsl container designs)
  2009-12-09 21:04   ` using GSL with C++ (was Re: gsl container designs) James Amundson
  2009-12-09 21:14     ` Jochen Küpper
@ 2009-12-09 21:54     ` Jari Häkkinen
  2009-12-10 11:46     ` Brian Gough
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Jari Häkkinen @ 2009-12-09 21:54 UTC (permalink / raw)
  Cc: gsl-discuss

I use GSL exclusively from C++. I cannot remember when I wrote C ... 
probably 12-13 years has passed since I did my last (latest?) C 
consultancy gig.

Jari

James Amundson wrote:
> On 12/04/2009 12:36 PM, Brian Gough wrote:
>> At Tue, 24 Nov 2009 17:54:46 -0700,
>> Gerard Jungman wrote:
>>   
>>> Here are header files for a couple different approaches to containers.
>> Regarding the introductory points.
>>
>> 2a+b) While we may have had some kind of goal of supporting C++ I
>> don't think it's worth encouraging that today as the gap between the
>> languages has widened so much since the start of the project.
>>    
>
> I, for one, use GSL almost exclusively from C++. When I'm not using 
> C++, I am usually using Python. How are other readers of this list 
> using GSL? Are you using a pure C environment, or do you use C++ 
> and/or other languages?
>
> I have no interest in debating the relative merits of various 
> languages; I am just curious to know what is happening in practice.
>
> --Jim Amundson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: using GSL with C++ (was Re: gsl container designs)
  2009-12-09 21:04   ` using GSL with C++ (was Re: gsl container designs) James Amundson
  2009-12-09 21:14     ` Jochen Küpper
  2009-12-09 21:54     ` Jari Häkkinen
@ 2009-12-10 11:46     ` Brian Gough
  2009-12-10 12:09       ` Brian Gough
  2009-12-10 13:42     ` Robert G. Brown
  2009-12-10 15:15     ` Kevin H. Hobbs
  4 siblings, 1 reply; 21+ messages in thread
From: Brian Gough @ 2009-12-10 11:46 UTC (permalink / raw)
  To: James Amundson; +Cc: gsl-discuss

At Wed, 09 Dec 2009 15:02:20 -0600,
James Amundson wrote:
> I, for one, use GSL almost exclusively from C++. When I'm not using C++, 
> I am usually using Python. How are other readers of this list using GSL? 
> Are you using a pure C environment, or do you use C++ and/or other 
> languages?

What types of functions do you mainly use?  The impedance mismatch for
the "higher-order" functions, e.g. using functions arguments and void
* parameters, seems to be the problematic area to me. 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: using GSL with C++ (was Re: gsl container designs)
  2009-12-10 11:46     ` Brian Gough
@ 2009-12-10 12:09       ` Brian Gough
  0 siblings, 0 replies; 21+ messages in thread
From: Brian Gough @ 2009-12-10 12:09 UTC (permalink / raw)
  To: gsl-discuss

At Thu, 10 Dec 2009 11:44:54 +0000,
Brian Gough wrote:
> What types of functions do you mainly use?  The impedance mismatch for
> the "higher-order" functions...

Specifically in C++, where there are better ways to do it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: using GSL with C++ (was Re: gsl container designs)
  2009-12-09 21:04   ` using GSL with C++ (was Re: gsl container designs) James Amundson
                       ` (2 preceding siblings ...)
  2009-12-10 11:46     ` Brian Gough
@ 2009-12-10 13:42     ` Robert G. Brown
  2009-12-10 21:44       ` James Bergstra
  2009-12-10 15:15     ` Kevin H. Hobbs
  4 siblings, 1 reply; 21+ messages in thread
From: Robert G. Brown @ 2009-12-10 13:42 UTC (permalink / raw)
  To: James Amundson; +Cc: gsl-discuss, Brian Gough, Gerard Jungman

On Wed, 9 Dec 2009, James Amundson wrote:

> On 12/04/2009 12:36 PM, Brian Gough wrote:
>> At Tue, 24 Nov 2009 17:54:46 -0700,
>> Gerard Jungman wrote:
>> 
>>> Here are header files for a couple different approaches to containers.
>> Regarding the introductory points.
>> 
>> 2a+b) While we may have had some kind of goal of supporting C++ I
>> don't think it's worth encouraging that today as the gap between the
>> languages has widened so much since the start of the project.
>> 
>
> I, for one, use GSL almost exclusively from C++. When I'm not using C++, I am 
> usually using Python. How are other readers of this list using GSL? Are you 
> using a pure C environment, or do you use C++ and/or other languages?
>
> I have no interest in debating the relative merits of various languages; I am 
> just curious to know what is happening in practice.

I use C only.  Consequently I come down in favor of using casts to type
e.g. generalized tensor forms, because pointer-based indirection can
lead to reasonably efficient, highly readable code (and because the GSL
really needs a generalized tensor form -- the computational universe of
interest to many scientists is not just one or two dimensional,
rectangular, with indices starting at 0 or 1.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: using GSL with C++ (was Re: gsl container designs)
  2009-12-10 13:42     ` Robert G. Brown
@ 2009-12-10 21:44       ` James Bergstra
  0 siblings, 0 replies; 21+ messages in thread
From: James Bergstra @ 2009-12-10 21:44 UTC (permalink / raw)
  To: Robert G. Brown; +Cc: gsl-discuss

On Thu, Dec 10, 2009 at 8:42 AM, Robert G. Brown <rgb@phy.duke.edu> >
I use C only.  Consequently I come down in favor of using casts to
type
> e.g. generalized tensor forms, because pointer-based indirection can
> lead to reasonably efficient, highly readable code (and because the GSL
> really needs a generalized tensor form -- the computational universe of
> interest to many scientists is not just one or two dimensional,
> rectangular, with indices starting at 0 or 1.
>
>   rgb
+1

I mainly use python, numpy, & scipy now. But one of the main reasons
for switching from gsl (aside from the python language) was natural
support and syntax that numpy provides for n-dimensional arrays.  I
typically use arrays with n <= 5.

James
-- 
http://www-etud.iro.umontreal.ca/~bergstrj

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: using GSL with C++ (was Re: gsl container designs)
  2009-12-09 21:04   ` using GSL with C++ (was Re: gsl container designs) James Amundson
                       ` (3 preceding siblings ...)
  2009-12-10 13:42     ` Robert G. Brown
@ 2009-12-10 15:15     ` Kevin H. Hobbs
  4 siblings, 0 replies; 21+ messages in thread
From: Kevin H. Hobbs @ 2009-12-10 15:15 UTC (permalink / raw)
  To: gsl-discuss

[-- Attachment #1: Type: text/plain, Size: 382 bytes --]

On 12/09/2009 04:02 PM, James Amundson wrote:
> How are other readers of this list using GSL? Are you using a pure C
> environment, or do you use C++ and/or other languages?

I use GSL almost exclusively from C.

I'm usually running lots of ODE simulations where the parameters are
tuned by an MPI based evolutionary algorithm.

From time to time I'll use GSL from C++.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2009-12-04 18:48 ` Brian Gough
  2009-12-09 21:04   ` using GSL with C++ (was Re: gsl container designs) James Amundson
@ 2010-01-06 11:45   ` Tuomo Keskitalo
  2010-01-06 15:47     ` Robert G. Brown
  2010-01-07 18:29     ` Brian Gough
  1 sibling, 2 replies; 21+ messages in thread
From: Tuomo Keskitalo @ 2010-01-06 11:45 UTC (permalink / raw)
  To: Brian Gough; +Cc: Gerard Jungman, GSL Discuss Mailing List

Hi,

On 12/04/2009 08:36 PM, Brian Gough wrote:

> At Tue, 24 Nov 2009 17:54:46 -0700,
> Gerard Jungman wrote:
>> Here are header files for a couple different approaches to containers.
>> I didn't bother with any implementations; it seems obvious how to
>> implement most of these functions.
>>
>> The designs are not complete, but they express most of
>> the important stuff.
> 
> Thanks for the document, I have studied the designs this week.  It
> seems that changing to design 1 / 1u / 2 would be trading one set of
> problems for another.  Looking at each case, the change doesn't seem
> sufficient to justify the compatibility cost.

Do you mean the compatibility to GSL 1 types by the compatibility cost? 
When talking about GSL 2 I don't think we should give too much value to 
maintaining backwards compatibility. GSL 1 is not going to cease to 
exist, and people who have tied themselves deeply to GSL 1 data 
structures can continue to use it.

> 3) Non-levelised types.  These seem to be the price for type safety.
> In terms of the look/feel, expressions like &row.vector and
> &column.vector don't seem too unnatural to me.

Here is a crazy idea: If we take Gerards design 1, would it be too 
absurd to use a horrible big wrapper struct like

typedef struct {
   gsl_marray *m;
   gsl_const_marray *cm;
   gsl_marray_1 *m1;
   gsl_const_marray_1 *cm1;
   gsl_marray_2 *m2;
   gsl_const_marray_2 *cm2;
   gsl_marray_3 *m3;
   gsl_const_marray_3 *cm3;
   gsl_vector *vec;
   gsl_const_vector *cvec;
   gsl_matrix *mat;
   gsl_const_matrix *cmat;
   ...
} gsl_container;

and make everything a gsl_container, and then always use them like
&a.mat or &b.vec? Maybe the const types could be in another 
gsl_const_container struct..

One downside is that this would effectively mask the type of a or b in 
program code (gsl_container a; // is it a vector, matrix, or marray?), 
which might make reading code a pain..

-- 
Tuomo.Keskitalo@iki.fi
http://iki.fi/tuomo.keskitalo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2010-01-06 11:45   ` gsl container designs Tuomo Keskitalo
@ 2010-01-06 15:47     ` Robert G. Brown
  2010-01-07  1:50       ` Gerard Jungman
  2010-01-07 18:29     ` Brian Gough
  1 sibling, 1 reply; 21+ messages in thread
From: Robert G. Brown @ 2010-01-06 15:47 UTC (permalink / raw)
  To: Tuomo Keskitalo; +Cc: Brian Gough, Gerard Jungman, GSL Discuss Mailing List

On Wed, 6 Jan 2010, Tuomo Keskitalo wrote:

> Here is a crazy idea: If we take Gerards design 1, would it be too absurd to 
> use a horrible big wrapper struct like
>
> typedef struct {
>  gsl_marray *m;
>  gsl_const_marray *cm;
>  gsl_marray_1 *m1;
>  gsl_const_marray_1 *cm1;
>  gsl_marray_2 *m2;
>  gsl_const_marray_2 *cm2;
>  gsl_marray_3 *m3;
>  gsl_const_marray_3 *cm3;
>  gsl_vector *vec;
>  gsl_const_vector *cvec;
>  gsl_matrix *mat;
>  gsl_const_matrix *cmat;
>  ...
> } gsl_container;
>
> and make everything a gsl_container, and then always use them like
> &a.mat or &b.vec? Maybe the const types could be in another 
> gsl_const_container struct..
>
> One downside is that this would effectively mask the type of a or b in 
> program code (gsl_container a; // is it a vector, matrix, or marray?), which 
> might make reading code a pain..

IMO the primary reason for "fixing" vector/matrix objects in the GSL is
to enable the single vector that represents the actual data in the
object to be presented to the user in a matrix-intuitive way, even at
some cost in code efficiency (and to try to minimize that cost or
provide further benefits to counterbalance it by enabling efficient
linear algebra routines to be matched to it, for example).

Any of us who have coded for a decade or two know perfectly well how to
address a vector via explicit displacement arithmetic, burying horrible
constructs in one's code to compute the displacement of an i,j,k element
in a vector containin aa "3d matrix".  Or, one can use C pointer
constructs that are equally unreadable for the same general purpose.  We
have also had to DEBUG this sort of thing, which is extremely
unpleasant, or understand somebody else's displacement code, which is
often unpleasant to the point of being nightmarish.

What I want is extremely simple:

* A way to construct/allocate a matrix of arbitrary dimension (out to some
(un)reasonable max, e.g. 10, likely to be higher than anybody needs for
coding purposes at least at this point, and deconstruct/free the matrix
when I'm done with it, ideally with a single call in each case, ideally
for any type of variable.  I'm perfectly comfortable with requiring a
pointer and cast to establish the type, but I want to be able to use it
just like m[i][j][k]...[r][s] when all is said and done.

* The specific dimension bounds need to be user specified.  If I want to
make a 6 dimensional triple triangular array to hold e.g. 3j symbols, I
don't want to have to allocate a rectangular array most of which is
empty wasted space.  I'm not talking about sparse matrix here, just
non-rectangular arrays.  Angular momentum indices are perfect (and
common) examples.

* Access to the underlying actual data vector (non-opaque, in other
words) so it can be passed to e.g. ODE solvers and the like that can
only be sensibly written in a general way for a vector.  I want to be
able to pass the ODE solver the vector and its dimension (and perhaps
an auxiliary similar vector for the derivatives with similar packing
and indexing), but I want to be able to write the ODE deriv routine 
using the readable matrix form.

None of this is particularly difficult to accomplish.  I do it all the
time in my own code.  In the end, one can integrate sets of ODEs where
the derivatives are indexed by angular momentum l,m, where the deriviative
expression contains sums over l',m', l",m" with precomputed tables of
e.g. Gaunt numbers, and where the derivative expression is HUMAN
READABLE and directly comparable to an expression in a physics paper or
textbook.

My problem in this regard with the GSL is that so far, the matrix and
vector constructs are pretty much useless -- they are far away from the
standard way one usually does this sort of thing in C using pointers or
just allocating a vector or matrix.

One of the ADVANTAGES of C is that it permits one to work pointer magic
to improve the readability (and sometimes execution efficiency) of code.
Opaque types and set/get calls are dark evil in both regards.  I just
want to be able to do (double) I[l][m][l'][m'][l"][m"]aand/or (int)
epsilon[i][j][k] transparently and reasonably efficiently, because both
of these occur all the time in real numerical code tht I've written.  I
can do it myself and still use the GSL, and have in the past done so.
For ODE solvers this isn't much of a problem.  For linear algebra,
however, it starts being a problem.  Then issues like column-major
become important and have to be solved in a standardized way.  This
would make even my own code more efficient, as well, as most of what I do
with the I() matrix above is basically matrix multiplication or
contraction -- summing out shared indices -- and linear algebra routines
that operate on correctly allocated structures can be 2x-3x more
efficient than ones that sum across big steps in the underlying vectors.

    rgb

>
> -- 
> Tuomo.Keskitalo@iki.fi
> http://iki.fi/tuomo.keskitalo
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2010-01-06 15:47     ` Robert G. Brown
@ 2010-01-07  1:50       ` Gerard Jungman
  2010-01-07  3:29         ` Robert G. Brown
  0 siblings, 1 reply; 21+ messages in thread
From: Gerard Jungman @ 2010-01-07  1:50 UTC (permalink / raw)
  To: GSL Discuss Mailing List

On Wed, 2010-01-06 at 10:47 -0500, Robert G. Brown wrote:

First, I want to say that I agree with everything you
say. These are exactly the problems that we need to solve.

> IMO the primary reason for "fixing" vector/matrix objects in the GSL is
> to enable the single vector that represents the actual data in the
> object to be presented to the user in a matrix-intuitive way, even at
> some cost in code efficiency (and to try to minimize that cost or
> provide further benefits to counterbalance it by enabling efficient
> linear algebra routines to be matched to it, for example).

In fact, I don't even see a need to sacrifice efficiency. I think 
this goal is directly achievable at essentially optimal efficiency.
My attitude about C is that it is all about lining up the bytes.
A good C design is just some syntactic layer on top of a
rudimentary layer, where-in you are careful to line up all
the bytes. After that, optimal efficiency is pretty much
automatic. C is very friendly, once you are careful to
line everything up. I hope you see my small but non-vacuous
technical meaning.

> What I want is extremely simple:
> 
> * A way to construct/allocate a matrix of arbitrary dimension (out to some
> (un)reasonable max, e.g. 10

Check.

> but I want to be able to use it
> just like m[i][j][k]...[r][s] when all is said and done.

This is the only part which I wonder about. I thought about this
before and came to the conclusion that it is not easy to get right.
Probably possible, but very tricky.

The most annoying aspect of this is that these little problems
are trivial in C++. But maybe just beyond the capability of C.

> * The specific dimension bounds need to be user specified.  If I want to
> make a 6 dimensional triple triangular array to hold e.g. 3j symbols, I
> don't want to have to allocate a rectangular array most of which is
> empty wasted space.

See the earlier comment by Tuomo Keskitalo, about banded/packed storage.
Not something I have thought about yet, but probably not hard.

> * Access to the underlying actual data vector (non-opaque, in other
> words) so it can be passed to e.g. ODE solvers and the like that can
> only be sensibly written in a general way for a vector.

Check.

> My problem in this regard with the GSL is that so far, the matrix and
> vector constructs are pretty much useless -- they are far away from the
> standard way one usually does this sort of thing in C using pointers or
> just allocating a vector or matrix.

I find them useless as well, for exactly these reasons.

> One of the ADVANTAGES of C is that it permits one to work pointer magic
> to improve the readability (and sometimes execution efficiency) of code.
> Opaque types and set/get calls are dark evil in both regards.

Right. The layer which defines the "types" should be essentially
syntactic, just there to make it easier to organize/understand
your code. The underlying implementation should be a brutally
simple pile of bytes, and it should be visible.

--
G. Jungman

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2010-01-07  1:50       ` Gerard Jungman
@ 2010-01-07  3:29         ` Robert G. Brown
       [not found]           ` <4a00655d1001062110m139c0a8tf2eae7de67da8f6f@mail.gmail.com>
  2010-01-08 20:08           ` Gerard Jungman
  0 siblings, 2 replies; 21+ messages in thread
From: Robert G. Brown @ 2010-01-07  3:29 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: GSL Discuss Mailing List

On Wed, 6 Jan 2010, Gerard Jungman wrote:

> In fact, I don't even see a need to sacrifice efficiency. I think

I recall from my benchmarking days that -- depending on compiler --
there is a small dereferencing penalty for packed matrices (vectors
packed into dereferencing **..* pointers) compared to doing the offset
arithmetic via brute force inline or via a macro.  I ran two different
implementations of stream (one with direct pointer traversal of the
vectors, one with vector offsets repacked into **matrices to do the
tests.  The size of this penalty did decrease as gcc advanced; I haven't
run the benchmark recently and don't know how large it currently is.  It
was never so large that it stopped me from using repacked pointers for
code clarity.

> this goal is directly achievable at essentially optimal efficiency.
> My attitude about C is that it is all about lining up the bytes.
> A good C design is just some syntactic layer on top of a
> rudimentary layer, where-in you are careful to line up all
> the bytes. After that, optimal efficiency is pretty much
> automatic. C is very friendly, once you are careful to
> line everything up. I hope you see my small but non-vacuous
> technical meaning.

Agreed.  And I can see no reason that modern compilers may not have
eliminated most of the dereferencing penalty, especially if one aligns
things carefully.  However, some of my beowulfish friends who work(ed)
for compiler companies argued that one of Fortran's longstanding
advantages in numerical code efficiency is simply because of the fact
that matrices in Fortran are written in stone and so compiler writers
can optimize the hell out of them.  C pointers, OTOH -- well, one can
make triangular arrays, rectangular arrays, one can start the indices at
0 or 1 or -100 or 47 as you like and run to wherever you like.  How can
anyone pre-optimize a compiler to handle all of that?

Truthfully, I can't see why they CAN'T, but I'm not a compiler writer
and empirically there has been a small penalty between C in general and
fortran and C with direct addressing and C with dereferenced addressing
as well.  IIRC there is one more layer of address resolution in indirect
dereferencing, which is the fundamental origin of the latter.

>> What I want is extremely simple:
>>
>> * A way to construct/allocate a matrix of arbitrary dimension (out to some
>> (un)reasonable max, e.g. 10
>
> Check.
>
>
>> but I want to be able to use it
>> just like m[i][j][k]...[r][s] when all is said and done.
>
> This is the only part which I wonder about. I thought about this
> before and came to the conclusion that it is not easy to get right.
> Probably possible, but very tricky.

I've written code to do so many times -- Numerical Recipes provides the
basic idea in its matrix-packing routines.  Simply allocate e.g.

double **..*m,*v;

Then malloc v long enough to hold the actual data.  Then malloc m to
hold the first index, malloc vectors long enough to hold the second
index and assign them to these indices in a first-index loop (recurse
until the next to last index) and finally pack the appropriately offset
addresses from the actual vector v into the next to last index so that
the last index actually traverses v.  In this scheme the last index is
always the linear/fast index in the actual data vector, the other
indices are all there to permit the compiler to handle the
dereferencing.

To free the matrix, one has to work backwards -- free each successive
index one malloc'd from the right back to the left (you need the
leftmost indices to know the addresses to free at each level), and
finally free m and v themselves.  All very tedious and mechanical, but
all fully automatable and debuggable as well.  The biggest problem with
NR is that each type has an individual allocation/free command, and it
only goes to 2 dimensions.  To make a single routine pair handle more
dimensions and arbitrary types is more work and requires void typing and
casting (and/or switches to handle the unique aspects of individual
tyees internally).

> The most annoying aspect of this is that these little problems
> are trivial in C++. But maybe just beyond the capability of C.

How could that be?  People write operating systems in C.  I hardly think
that allocating sensible matrices is beyond it.

I'm not trying to be argumentative or flame, by the way.  Could you be
more explicit about why packing pointers to permit the dereferencing of
an arbitrary matrix is technically difficult (as opposed to merely
tedious) in C and would be easier in C++ (which I thought discouraged
the use of pointers altogether)?

>> * The specific dimension bounds need to be user specified.  If I want to
>> make a 6 dimensional triple triangular array to hold e.g. 3j symbols, I
>> don't want to have to allocate a rectangular array most of which is
>> empty wasted space.
>
> See the earlier comment by Tuomo Keskitalo, about banded/packed storage.
> Not something I have thought about yet, but probably not hard.

I'm not sure about "banded", but packed is (as noted) simple.  It is
just a matter of picking the lengths of the successive ***, **, * length
vectors one mallocs to fill the toplevel e.g. **** pointer, plus the use
of appropriate offsets.  In particular, NR contains examples of how to
do e.g.:

   **dmatrix(l1,u1,l2,u2)

for arbitrary lower/upper indices in the two slots (as well as the
associated free) and it is straightforward to generalize the basic
algorithm to higher dimensions and allow for arbitrary data types.  I
don't think twice about allocating a vector of structs of more or less
arbitrary length (including structs that contain vectors or matrices
themselves) and dereferencing it via a packed, cast or typed, matrix.

This is one of the ADVANTAGES of C -- one can just do what one wants to
do in the most natural way via pointers, mallocs and suitable typedefs
and/or structs.  Without a safety net, admittedly, and you're
responsible for both code integrity and efficiency with little compiler
help, but OTOH you can quite easily visualize what the underlying
assembler is going to (approximately) look like, so one CAN tweak it in
reasonable ways for efficiency without resorting to actual assembler.

>> * Access to the underlying actual data vector (non-opaque, in other
>> words) so it can be passed to e.g. ODE solvers and the like that can
>> only be sensibly written in a general way for a vector.
>
> Check.
>
>
>> My problem in this regard with the GSL is that so far, the matrix and
>> vector constructs are pretty much useless -- they are far away from the
>> standard way one usually does this sort of thing in C using pointers or
>> just allocating a vector or matrix.
>
> I find them useless as well, for exactly these reasons.
>
>
>> One of the ADVANTAGES of C is that it permits one to work pointer magic
>> to improve the readability (and sometimes execution efficiency) of code.
>> Opaque types and set/get calls are dark evil in both regards.
>
> Right. The layer which defines the "types" should be essentially
> syntactic, just there to make it easier to organize/understand
> your code. The underlying implementation should be a brutally
> simple pile of bytes, and it should be visible.

Amen, my brother.  C is a THIN veneer of upper-level language
sensibilities on top of raw assembler, and its ability to work "close to
the metal" is its power.  This power should be preserved as much as
possible in the GSL, in the process of providing a human readable
interface.  I actually wrote all of this, in a more or less GSL--like
form, several years ago.  Some of the code was ugly, but it worked
charmlike, at the cost of requiring a cast of a void type to tell the
compiler how to dereference the outermost (actual) vector index.
Internally, all the dereferencing pointer arithmetic is more or less the
same for a void (universal) type, but the pointer into the actual data vector 
has to be advanced with the correct stride, and the compiler has to know 
what that stride should be.

I can probably dig this code out and post it here (or post a link to it)
if you want to look at it.  I think I went up to 9 or 10 dimensions (way
more than I needed) and did a lot of things via brute force to make the
purely mechanical recursion obvious; I planned an eventual rewrite into
a more compact form that I never got around to.

    rgb

>
>
> --
> G. Jungman
>
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu

^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <4a00655d1001062110m139c0a8tf2eae7de67da8f6f@mail.gmail.com>]

* Re: gsl container designs
       [not found]           ` <4a00655d1001062110m139c0a8tf2eae7de67da8f6f@mail.gmail.com>
@ 2010-01-07  5:46             ` Rhys Ulerich
  2010-01-07 13:22               ` Robert G. Brown
  0 siblings, 1 reply; 21+ messages in thread
From: Rhys Ulerich @ 2010-01-07  5:46 UTC (permalink / raw)
  To: gsl-discuss

> I recall from my benchmarking days that -- depending on compiler --
> there is a small dereferencing penalty for packed matrices (vectors
> packed into dereferencing **..* pointers) compared to doing the offset
> arithmetic via brute force inline or via a macro.
> ......
> I haven't
> run the benchmark recently and don't know how large it currently is.  It
> was never so large that it stopped me from using repacked pointers for
> code clarity..

Mostly unscientific, but worth tossing into the mix:

Using Intel 10.1 compilers on a fairly recent AMD chip, 100,000 iterations
of doing the nested pointers approach is neck-and-neck with index arithmetic
on a 10x10 double matrix.  For the 100x100 case it takes 1.3 times longer
to iterate using the nested pointers.  Work in the inner loop "compute
kernel" is
*= against a constant scalar.  Optimization flags on -O3.  I've seen similar
behavior on recent GNU compilers.

I'm happy to provide the test code if anyone's interested.

- Rhys

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2010-01-07  5:46             ` Rhys Ulerich
@ 2010-01-07 13:22               ` Robert G. Brown
  0 siblings, 0 replies; 21+ messages in thread
From: Robert G. Brown @ 2010-01-07 13:22 UTC (permalink / raw)
  To: Rhys Ulerich; +Cc: gsl-discuss

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1760 bytes --]

On Wed, 6 Jan 2010, Rhys Ulerich wrote:

>> I recall from my benchmarking days that -- depending on compiler --
>> there is a small dereferencing penalty for packed matrices (vectors
>> packed into dereferencing **..* pointers) compared to doing the offset
>> arithmetic via brute force inline or via a macro.
>> ......
>> I haven't
>> run the benchmark recently and don't know how large it currently is. Â It
>> was never so large that it stopped me from using repacked pointers for
>> code clarity..
>
> Mostly unscientific, but worth tossing into the mix:
>
> Using Intel 10.1 compilers on a fairly recent AMD chip, 100,000 iterations
> of doing the nested pointers approach is neck-and-neck with index arithmetic
> on a 10x10 double matrix. Â For the 100x100 case it takes 1.3 times longer
> to iterate using the nested pointers. Â Work in the inner loop "compute
> kernel" is
> *= against a constant scalar. Â Optimization flags on -O3. Â I've seen similar
> behavior on recent GNU compilers.

That sounds partly like a cache effect -- 10x10 almost certainly stays
in L1, 100x100 won't fit.  My own experience is similar, although I
don't recall the multiplier being as large as 1.3 (but then, I was doing
stream and stream-like tests with very large vectors, which means that
one spends more time in a vector streaming mode and minimizes
cache-thrashing when turning corners).  And my memory could be faulty --
I'm an old guy, after all, early Alzheimers...;-)

    rgb

>
> I'm happy to provide the test code if anyone's interested.
>
> - Rhys
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2010-01-07  3:29         ` Robert G. Brown
       [not found]           ` <4a00655d1001062110m139c0a8tf2eae7de67da8f6f@mail.gmail.com>
@ 2010-01-08 20:08           ` Gerard Jungman
  1 sibling, 0 replies; 21+ messages in thread
From: Gerard Jungman @ 2010-01-08 20:08 UTC (permalink / raw)
  To: GSL Discuss Mailing List

On Wed, 2010-01-06 at 22:29 -0500, Robert G. Brown wrote:

> However, some of my beowulfish friends who work(ed)
> for compiler companies argued that one of Fortran's longstanding
> advantages in numerical code efficiency is simply because of the fact
> that matrices in Fortran are written in stone and so compiler writers
> can optimize the hell out of them.  C pointers, OTOH -- well,

I think the main difference is that fortran can assume no aliasing,
which allows some extra optimization that C cannot do. The new 
'restrict' keyword is supposed to help with this, but we'll see.

> I've written code to do so many times -- Numerical Recipes provides the
> basic idea in its matrix-packing routines.  Simply allocate e.g.
> 
> double **..*m,*v;

I understand the mechanics of it. I'm not sure what my concern was;
I thought about this a few months ago and decided there was some
semantic problem. But it looks like the **..*m has exactly the
same life-cycle and maintenance as *v, so I'm not sure what I
was thinking. It might come back to me.

> I can probably dig this code out and post it here (or post a link to it)
> if you want to look at it.  I think I went up to 9 or 10 dimensions (way
> more than I needed) and did a lot of things via brute force to make the
> purely mechanical recursion obvious; I planned an eventual rewrite into
> a more compact form that I never got around to.

Sure. Feel free to post it. I like to look at everything and
cherry-pick the best parts from other people's stuff. Makes
things a lot easier.

--
G. Jungman

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2010-01-06 11:45   ` gsl container designs Tuomo Keskitalo
  2010-01-06 15:47     ` Robert G. Brown
@ 2010-01-07 18:29     ` Brian Gough
  1 sibling, 0 replies; 21+ messages in thread
From: Brian Gough @ 2010-01-07 18:29 UTC (permalink / raw)
  To: Tuomo Keskitalo; +Cc: Gerard Jungman, GSL Discuss Mailing List

At Wed, 06 Jan 2010 13:44:56 +0200,
Tuomo Keskitalo wrote:
> Do you mean the compatibility to GSL 1 types by the compatibility cost? 
> When talking about GSL 2 I don't think we should give too much value to 
> maintaining backwards compatibility. GSL 1 is not going to cease to 
> exist, and people who have tied themselves deeply to GSL 1 data 
> structures can continue to use it.

I'd say it's a significant benefit if people don't have to modify
their code. 

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2009-11-25  0:55 gsl container designs Gerard Jungman
  2009-11-29 21:04 ` Brian Gough
  2009-12-04 18:48 ` Brian Gough
@ 2010-01-06 12:04 ` Tuomo Keskitalo
  2010-01-06 19:57   ` Gerard Jungman
  2 siblings, 1 reply; 21+ messages in thread
From: Tuomo Keskitalo @ 2010-01-06 12:04 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

Hello,

On 11/25/2009 02:54 AM, Gerard Jungman wrote:

> Here are header files for a couple different approaches to containers.
> I didn't bother with any implementations; it seems obvious how to
> implement most of these functions.
> 
> The designs are not complete, but they express most of
> the important stuff.

Your design 1 looks very interesting. I don't think your designs would 
support banded and packed matrices, or would it? (see the last pages of 
"Legacy BLAS: C Interface to the Legacy BLAS" at 
http://www.netlib.org/blas/blast-forum/)

I could not think of anything why your design 1 would not work for me. 
The index mapping is particularly elegant!

-- 
Tuomo.Keskitalo@iki.fi
http://iki.fi/tuomo.keskitalo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: gsl container designs
  2010-01-06 12:04 ` Tuomo Keskitalo
@ 2010-01-06 19:57   ` Gerard Jungman
  0 siblings, 0 replies; 21+ messages in thread
From: Gerard Jungman @ 2010-01-06 19:57 UTC (permalink / raw)
  To: gsl-discuss

On Wed, 2010-01-06 at 14:04 +0200, Tuomo Keskitalo wrote:

> Your design 1 looks very interesting. I don't think your designs would 
> support banded and packed matrices, or would it? (see the last pages of 
> "Legacy BLAS: C Interface to the Legacy BLAS" at 
> http://www.netlib.org/blas/blast-forum/)

Thanks. It's been a long time since I looked at that document.

You are right; I have only considered the dense case,
purely from laziness. However, I think the banded and
packed cases can be handled similarly. One needs only
to define the indexing schemes to layer on top of
the (essentially universal) data structs. Finally,
add a syntactic layer, to inject appropriate names,
analogous to the syntactic layer that declares
vectors, matrices, etc.

At least, I think it is not conceptually difficult.
Maybe I am missing something.

--
G. Jungman

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-01-08 20:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-25  0:55 gsl container designs Gerard Jungman
2009-11-29 21:04 ` Brian Gough
2009-12-04 18:48 ` Brian Gough
2009-12-09 21:04   ` using GSL with C++ (was Re: gsl container designs) James Amundson
2009-12-09 21:14     ` Jochen Küpper
2009-12-09 21:54     ` Jari Häkkinen
2009-12-10 11:46     ` Brian Gough
2009-12-10 12:09       ` Brian Gough
2009-12-10 13:42     ` Robert G. Brown
2009-12-10 21:44       ` James Bergstra
2009-12-10 15:15     ` Kevin H. Hobbs
2010-01-06 11:45   ` gsl container designs Tuomo Keskitalo
2010-01-06 15:47     ` Robert G. Brown
2010-01-07  1:50       ` Gerard Jungman
2010-01-07  3:29         ` Robert G. Brown
     [not found]           ` <4a00655d1001062110m139c0a8tf2eae7de67da8f6f@mail.gmail.com>
2010-01-07  5:46             ` Rhys Ulerich
2010-01-07 13:22               ` Robert G. Brown
2010-01-08 20:08           ` Gerard Jungman
2010-01-07 18:29     ` Brian Gough
2010-01-06 12:04 ` Tuomo Keskitalo
2010-01-06 19:57   ` Gerard Jungman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).