public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
* Re: containers tentative design summary
@ 2009-11-20  9:37 Justin Lenzo
  0 siblings, 0 replies; 23+ messages in thread
From: Justin Lenzo @ 2009-11-20  9:37 UTC (permalink / raw)
  To: gsl-discuss

> At Thu, 29 Oct 2009 17:40:30 -0400,
> James Bergstra wrote:
> > If the purpose is to protect the user from accidentally messing around
> > with data then, as Gerard suggests, maybe we shouldn't bother.  This
> > is not a battle that we can win in C.  Good naming conventions for
> > functions, which indicate the arguments that will be modified, is the
> > most that a C library is expected to provide.
>
> The purpose is to make programs safer, rather than provide any hints
> for optimisation.  The current system is type-safe and gives a
> "discarding const" compiler warning if people try to pass const
> objects to functions as non-const arguments - these seem like useful
> features.  It's not clear to me what the actual benefit would be if we
> only had non-const vectors and matrices.
>

What if you went with a structure like the following:

typedef struct {
  const double *data;
  double *wdata;
  ...
} gsl_vector;

When a gsl_vector is allocated, the array is attached to 'wdata' and
aliased to 'data'.  When a read-only vector view is made, it returns a
new instance of the gsl_vector datatype where the 'data' and other
members are copied over, but sets 'wdata' to NULL.  Basically, you
would be allowing the same datatype to cover allocated vectors,
read-write views, and read-only views.  It seems to me this achieves
the goal of protecting underlying read-only data, as long as your
willing to check for NULL pointers in the routines that write to a
vector.  I don't know anything about the security implications, so
forgive me if this is a dangerous approach.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-11-15  9:13       ` Tuomo Keskitalo
  2009-11-15 16:44         ` Jonathan Underwood
@ 2009-11-16 11:56         ` Brian Gough
  1 sibling, 0 replies; 23+ messages in thread
From: Brian Gough @ 2009-11-16 11:56 UTC (permalink / raw)
  To: gsl-discuss

At Sun, 15 Nov 2009 11:12:55 +0200,
Tuomo Keskitalo wrote:
> Currently GSL uses C in a type-safe manner which forces somewhat 
> complicated APIs for everyone but enables users to find some lethal 
> bugs. More user-friendly APIs would allow people to silently break their 
> programs if they are not careful. And there is no compromise. Does this 
> summarize the situation?

I think that's a fair summary.  The philosophy in GSL has always been
to try to make things safe by default (for example,  the error
handler), because writing correct numerical programs is already
difficult enough without silent errors.
 
-- 
Brian Gough

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-11-15 16:44         ` Jonathan Underwood
@ 2009-11-15 18:41           ` Robert G. Brown
  0 siblings, 0 replies; 23+ messages in thread
From: Robert G. Brown @ 2009-11-15 18:41 UTC (permalink / raw)
  To: Jonathan Underwood; +Cc: gsl-discuss

On Sun, 15 Nov 2009, Jonathan Underwood wrote:

> I think it would be better if GSL, being written in C,  were to
> present the most functional interface at the expense of sacrificing
> type safety, and higher level language bindings could then wrap that
> API and do type checking as needed. As gerard says, C isn't about type
> safety.

I agree.  A good C programmer is comfortable with casts and knows
perfectly well that it is up to them to track them.  And the compiler
(give a chance) will usually at least warn you about apparent
inconsistency.

    rgb

>
> Jonathan
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-11-15  9:13       ` Tuomo Keskitalo
@ 2009-11-15 16:44         ` Jonathan Underwood
  2009-11-15 18:41           ` Robert G. Brown
  2009-11-16 11:56         ` Brian Gough
  1 sibling, 1 reply; 23+ messages in thread
From: Jonathan Underwood @ 2009-11-15 16:44 UTC (permalink / raw)
  To: gsl-discuss

2009/11/15 Tuomo Keskitalo <Tuomo.Keskitalo@iki.fi>:
> Hello,
>
> On 11/14/2009 04:42 PM, Brian Gough wrote:
>
>> At Mon, 09 Nov 2009 16:07:43 -0700,
>> Gerard Jungman wrote:
>>>
>>> On Fri, 2009-11-06 at 14:42 +0000, Brian Gough wrote:
>>>>
>>>> Ok, I have read the paper now.  I do think the practice of casting
>>>> described there is rather dated.  When people had no viable
>>>> alternative to C, they had to resort to such tricks.  It is not
>>>> something that should be encouraged today -- programs should either be
>>>> written safely, following the rules of type-checking in C, or be
>>>> written in another language.
>>
>> From 1.3 million lines of code they describe only one method which
>> does not use casts, which is the one we use.  I don't think we are
>> going to find anything that is better than the current method for
>> views.
>
> Apparently this is a fundamental question.
>
> Currently GSL uses C in a type-safe manner which forces somewhat complicated
> APIs for everyone but enables users to find some lethal bugs. More
> user-friendly APIs would allow people to silently break their programs if
> they are not careful. And there is no compromise. Does this summarize the
> situation?
>
> I am sure there are users for both approaches. I don't know.. Maybe GSL
> should be the safe, strict library, and there should be another scientific
> library in C which aims towards interoperability between data, libraries and
> languages? This would split forces, but both projects would also benefit
> from each other.
>


I think it would be better if GSL, being written in C,  were to
present the most functional interface at the expense of sacrificing
type safety, and higher level language bindings could then wrap that
API and do type checking as needed. As gerard says, C isn't about type
safety.

Jonathan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-11-14 15:25     ` Brian Gough
@ 2009-11-15  9:13       ` Tuomo Keskitalo
  2009-11-15 16:44         ` Jonathan Underwood
  2009-11-16 11:56         ` Brian Gough
  0 siblings, 2 replies; 23+ messages in thread
From: Tuomo Keskitalo @ 2009-11-15  9:13 UTC (permalink / raw)
  To: Brian Gough, Gerard Jungman, gsl-discuss

Hello,

On 11/14/2009 04:42 PM, Brian Gough wrote:

> At Mon, 09 Nov 2009 16:07:43 -0700,
> Gerard Jungman wrote:
>> On Fri, 2009-11-06 at 14:42 +0000, Brian Gough wrote:
>>> Ok, I have read the paper now.  I do think the practice of casting
>>> described there is rather dated.  When people had no viable
>>> alternative to C, they had to resort to such tricks.  It is not
>>> something that should be encouraged today -- programs should either be
>>> written safely, following the rules of type-checking in C, or be
>>> written in another language.
> 
> From 1.3 million lines of code they describe only one method which
> does not use casts, which is the one we use.  I don't think we are
> going to find anything that is better than the current method for
> views.

Apparently this is a fundamental question.

Currently GSL uses C in a type-safe manner which forces somewhat 
complicated APIs for everyone but enables users to find some lethal 
bugs. More user-friendly APIs would allow people to silently break their 
programs if they are not careful. And there is no compromise. Does this 
summarize the situation?

I am sure there are users for both approaches. I don't know.. Maybe GSL 
should be the safe, strict library, and there should be another 
scientific library in C which aims towards interoperability between 
data, libraries and languages? This would split forces, but both 
projects would also benefit from each other.

Anyway, I am interested to see what Gerard comes up with.

-- 
Tuomo.Keskitalo@iki.fi
http://iki.fi/tuomo.keskitalo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-11-09 23:06   ` Gerard Jungman
@ 2009-11-14 15:25     ` Brian Gough
  2009-11-15  9:13       ` Tuomo Keskitalo
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Gough @ 2009-11-14 15:25 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

At Mon, 09 Nov 2009 16:07:43 -0700,
Gerard Jungman wrote:
> On Fri, 2009-11-06 at 14:42 +0000, Brian Gough wrote:
> > Ok, I have read the paper now.  I do think the practice of casting
> > described there is rather dated.  When people had no viable
> > alternative to C, they had to resort to such tricks.  It is not
> > something that should be encouraged today -- programs should either be
> > written safely, following the rules of type-checking in C, or be
> > written in another language.
> 
> How does this comment help us in designing a C library?

From 1.3 million lines of code they describe only one method which
does not use casts, which is the one we use.  I don't think we are
going to find anything that is better than the current method for
views.

> > Our approach is actually described in the paper under the "first
> > member" section, in the &(cp.p) example -- although they don't point
> > out that it's the only one that doesn't require a cast and can be
> > checked by the compiler, unlike all the others.
> 
> How does this solve the const-ness problem?

The gsl const view  is defined as

  typedef struct
  {
    gsl_vector vector;
  } _gsl_vector_const_view;

  typedef const _gsl_vector_const_view gsl_vector_const_view;

so &view.vector pointers can only be passed to functions accepting
const gsl_vector * not gsl_vector *.


-- 
Brian Gough

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-11-09 20:41 ` Brian Gough
@ 2009-11-09 23:06   ` Gerard Jungman
  2009-11-14 15:25     ` Brian Gough
  0 siblings, 1 reply; 23+ messages in thread
From: Gerard Jungman @ 2009-11-09 23:06 UTC (permalink / raw)
  To: gsl-discuss

On Fri, 2009-11-06 at 14:42 +0000, Brian Gough wrote:
> 
> Ok, I have read the paper now.  I do think the practice of casting
> described there is rather dated.  When people had no viable
> alternative to C, they had to resort to such tricks.  It is not
> something that should be encouraged today -- programs should either be
> written safely, following the rules of type-checking in C, or be
> written in another language.

How does this comment help us in designing a C library?


> Our approach is actually described in the paper under the "first
> member" section, in the &(cp.p) example -- although they don't point
> out that it's the only one that doesn't require a cast and can be
> checked by the compiler, unlike all the others.

How does this solve the const-ness problem?

--
G. Jungman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-11-03 19:44 Gerard Jungman
@ 2009-11-09 20:41 ` Brian Gough
  2009-11-09 23:06   ` Gerard Jungman
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Gough @ 2009-11-09 20:41 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

At Tue, 03 Nov 2009 12:45:49 -0700,
Gerard Jungman wrote:
> Actually, it endorses a "C" practice. It's "bad"-ness is
> a theological issue. See the attached paper by Siff et AL.,

Ok, I have read the paper now.  I do think the practice of casting
described there is rather dated.  When people had no viable
alternative to C, they had to resort to such tricks.  It is not
something that should be encouraged today -- programs should either be
written safely, following the rules of type-checking in C, or be
written in another language.

Our approach is actually described in the paper under the "first
member" section, in the &(cp.p) example -- although they don't point
out that it's the only one that doesn't require a cast and can be
checked by the compiler, unlike all the others.

> There are other ways to design containers. Some ways
> involve radical change to the GSL container methodology.
> For example, the const-ness of the data pointer could be
> explicitly divorced from the layout meta-data which constitutes
> the rest of the container.

That is more interesting theoretically.  From a practical point of
view, I think we have to restrict ourselves to solutions where the
data pointer and the metadata are a single object though.

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
@ 2009-11-03 19:44 Gerard Jungman
  2009-11-09 20:41 ` Brian Gough
  0 siblings, 1 reply; 23+ messages in thread
From: Gerard Jungman @ 2009-11-03 19:44 UTC (permalink / raw)
  To: gsl-discuss


On Fri, 2009-10-30 at 16:03 +0000, Brian Gough wrote:
> 
> I don't think this interface is safe to use in a normal manner it it
> requires an explicit cast to pass a non-const object to a function
> accepting the const version.

You really do want to use a strongly-typed language.
But C is not it.

> Any error in the argument of an explicit
> cast is undetectable.

If the container stuff is designed properly,
then it hardly matters, because it just works
anyway, for any container type for which the
operation is meaningful.

If your world were limited to the vector and
const_vector types, as in the example, it wouldn't
matter what you did, the result would obviously be correct.

If, on the other hand, you try to jam your externally-defined
square peg into the GSL round hole, it could be bad. So?
How is this surprising?

If you think about the full container design, you will see that
it is possible to design a rational and useful system, where the
types fully inter-operate. The basis would be a generic multi-array
implementation. The vector example given was purely to address the
const-ness problem, and I think it does that.

It's not enough to scare people with notions of the wrong-cast
bogeyman. You should explain how these things can happen. When
you face the fear directly, you may realize it is not such a
big deal after all.


> We should not require the use of explicit casts
> for common operations, it would endorse a bad practice.

Actually, it endorses a "C" practice. It's "bad"-ness is
a theological issue. See the attached paper by Siff et al.,
[sorry, I tried to attach the paper, but the list won't
 accept a message with an attached pdf; the paper is
 "Coping with Type Casts in C", 1999 Bell-Labs report]
where this construct is labeled "physical sub-typing".
They give data on actual usage in the world, see Table 1.

The whole business is far from perfect. But, again, if you
want to do things in C, you need tricks, because the
language has no support for expressing relationships
between types.


In my opinion, the idea of a container library in C is
somewhat brain-damaged anyway. But here we are...


There are other ways to design containers. Some ways
involve radical change to the GSL container methodology.
For example, the const-ness of the data pointer could be
explicitly divorced from the layout meta-data which constitutes
the rest of the container. After all, the headaches with const-ness
come from the simple fact that the data pointer is embedded in the
struct and it conflates the semantics in un-desirable ways.

Example separation:

  typedef struct {
    size_t size;
    int    stride;
  } gsl_vector_layout;

  gsl_fnc_on_stuff(double * d, const gsl_vector_layout * l);
  gsl_fnc_on_const_stuff(const double * d, const gsl_vector_layout * l);


There are many ways to skin the cat. But a change like this
would effect all gsl interfaces involving container types.
It also has other noticeable imperfections.

Choose the form of the destroyer...

--
G. Jungman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-29 21:40             ` James Bergstra
@ 2009-10-30 16:54               ` Brian Gough
  0 siblings, 0 replies; 23+ messages in thread
From: Brian Gough @ 2009-10-30 16:54 UTC (permalink / raw)
  To: James Bergstra; +Cc: Gerard Jungman, gsl-discuss

At Thu, 29 Oct 2009 17:40:30 -0400,
James Bergstra wrote:
> If the purpose is to protect the user from accidentally messing around
> with data then, as Gerard suggests, maybe we shouldn't bother.  This
> is not a battle that we can win in C.  Good naming conventions for
> functions, which indicate the arguments that will be modified, is the
> most that a C library is expected to provide.

The purpose is to make programs safer, rather than provide any hints
for optimisation.  The current system is type-safe and gives a
"discarding const" compiler warning if people try to pass const
objects to functions as non-const arguments - these seem like useful
features.  It's not clear to me what the actual benefit would be if we
only had non-const vectors and matrices.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-29 20:41           ` Gerard Jungman
  2009-10-29 21:40             ` James Bergstra
@ 2009-10-30 16:54             ` Brian Gough
  1 sibling, 0 replies; 23+ messages in thread
From: Brian Gough @ 2009-10-30 16:54 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

At Thu, 29 Oct 2009 14:41:46 -0600,
Gerard Jungman wrote:
> You will never succeed in making an interface "safe" in
> this sense. However, you _can_ make an interface intuitive
> and safe to use in a normal manner. I don't think you
> can ask any more of C.

I don't think this interface is safe to use in a normal manner it it
requires an explicit cast to pass a non-const object to a function
accepting the const version.  Any error in the argument of an explicit
cast is undetectable.  We should not require the use of explicit casts
for common operations, it would endorse a bad practice.

-- 
Brian Gough


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-29 20:41           ` Gerard Jungman
@ 2009-10-29 21:40             ` James Bergstra
  2009-10-30 16:54               ` Brian Gough
  2009-10-30 16:54             ` Brian Gough
  1 sibling, 1 reply; 23+ messages in thread
From: James Bergstra @ 2009-10-29 21:40 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

Taking a step back, why do we need to have const and non-const
versions of the struct?  I think we can get away with just having a
non-const version.

If the purpose of using const float * things is to permit compiler
optimizations, then this detail can be hidden at the level of using
gsl_vector.  Const-ness can be limited to use in functions that take
raw types (float *, int, etc.) and actually do algorithmic work.  Like
cblas functions.  C will automatically upcast from non-const pointers
to const pointers by the usual rules.

If the purpose is to protect the user from accidentally messing around
with data then, as Gerard suggests, maybe we shouldn't bother.  This
is not a battle that we can win in C.  Good naming conventions for
functions, which indicate the arguments that will be modified, is the
most that a C library is expected to provide.

Providing only non-const versions of these structs makes a lot of
problems disappear right?

All matrices are born non-const.

James

On Thu, Oct 29, 2009 at 4:41 PM, Gerard Jungman <jungman@lanl.gov> wrote:
> On Thu, 2009-10-29 at 16:51 +0000, Brian Gough wrote:
>> At Tue, 27 Oct 2009 17:09:09 -0600,
>> >   /* claims not to twiddle v.data[] */
>> >   gsl_some_typical_const_function((const gsl_const_vector *) &v);
>> >
>>
>> The problem with anything involving explicit casts is that we lose
>> type-safety.  If v is not a vector there's no way to detect the error,
>> which closes the hole of const-related errors but opens another one.
>
> General statement: C is a weakly-typed language.
>
> Consequence: We cannot prevent people from loading the gun,
> pointing it at their heads, and pulling the trigger. They
> are always free to do this. In fact, it is a tenet of C
> that you should trust people to do what they need to do.
>
> You will never succeed in making an interface "safe" in
> this sense. However, you _can_ make an interface intuitive
> and safe to use in a normal manner. I don't think you
> can ask any more of C.
>
>
> Tangential but powerful argument: I talked to Tanmoy about it.
> He considers this normal and useful. The benefits outweigh the
> defects. "Just do it and stop worrying about it" is an exact quote.
>
> Everybody who understands C, including the standards committee,
> knows that const-ness is screwed up, because of the way it was
> tacked on to the old language. This is one of a small number of
> recognized ways to get around the defects in the language.
>
> --
> G. Jungman
>
>
>



-- 
http://www-etud.iro.umontreal.ca/~bergstrj

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-29 18:06         ` Brian Gough
@ 2009-10-29 20:41           ` Gerard Jungman
  2009-10-29 21:40             ` James Bergstra
  2009-10-30 16:54             ` Brian Gough
  0 siblings, 2 replies; 23+ messages in thread
From: Gerard Jungman @ 2009-10-29 20:41 UTC (permalink / raw)
  To: gsl-discuss

On Thu, 2009-10-29 at 16:51 +0000, Brian Gough wrote:
> At Tue, 27 Oct 2009 17:09:09 -0600,
> >   /* claims not to twiddle v.data[] */
> >   gsl_some_typical_const_function((const gsl_const_vector *) &v);
> > 
> 
> The problem with anything involving explicit casts is that we lose
> type-safety.  If v is not a vector there's no way to detect the error,
> which closes the hole of const-related errors but opens another one.

General statement: C is a weakly-typed language.

Consequence: We cannot prevent people from loading the gun,
pointing it at their heads, and pulling the trigger. They
are always free to do this. In fact, it is a tenet of C
that you should trust people to do what they need to do.

You will never succeed in making an interface "safe" in
this sense. However, you _can_ make an interface intuitive
and safe to use in a normal manner. I don't think you
can ask any more of C.


Tangential but powerful argument: I talked to Tanmoy about it.
He considers this normal and useful. The benefits outweigh the
defects. "Just do it and stop worrying about it" is an exact quote.

Everybody who understands C, including the standards committee,
knows that const-ness is screwed up, because of the way it was
tacked on to the old language. This is one of a small number of
recognized ways to get around the defects in the language.

--
G. Jungman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-27 23:06       ` Gerard Jungman
       [not found]         ` <7f1eaee30910271628h70785125m68e47c7a7b5c25b7@mail.gmail.com>
@ 2009-10-29 18:06         ` Brian Gough
  2009-10-29 20:41           ` Gerard Jungman
  1 sibling, 1 reply; 23+ messages in thread
From: Brian Gough @ 2009-10-29 18:06 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

At Tue, 27 Oct 2009 17:09:09 -0600,
>   /* claims not to twiddle v.data[] */
>   gsl_some_typical_const_function((const gsl_const_vector *) &v);
> 

The problem with anything involving explicit casts is that we lose
type-safety.  If v is not a vector there's no way to detect the error,
which closes the hole of const-related errors but opens another one.

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
       [not found]         ` <7f1eaee30910271628h70785125m68e47c7a7b5c25b7@mail.gmail.com>
@ 2009-10-27 23:49           ` Gerard Jungman
  0 siblings, 0 replies; 23+ messages in thread
From: Gerard Jungman @ 2009-10-27 23:49 UTC (permalink / raw)
  To: gsl-discuss

On Tue, 2009-10-27 at 19:28 -0400, James Bergstra wrote:
> Sorry to bug you, but could you point me to a link that would explain
> why you put both of those structures into the union?

[ I assume you meant to send this to the list; so I have replied there ]


Here is an interesting (and very long) reference:

http://www.coding-guidelines.com/cbook/cbook1_2.pdf

The relevant section is 6.5.2.3, the discussion around sentences
1037, 1038 of the standard. Sentence 1037 introduces the union
construct. Sentence 1038 is a definition for
"common initial sequence", which is used in 1037.

Of course, we all believe it works right without the union.
But the presence of the union seems to guarantee that it
works right (by the new standard; the old standard is
apparently mute). I may be misreading this, and the
union may not actually be necessary, but there is
some confusion about this, if you search the web,
including a couple of formal "defect reports" for
the standard, regarding these sentences.

--
G. Jungman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-23 21:28     ` Brian Gough
@ 2009-10-27 23:06       ` Gerard Jungman
       [not found]         ` <7f1eaee30910271628h70785125m68e47c7a7b5c25b7@mail.gmail.com>
  2009-10-29 18:06         ` Brian Gough
  0 siblings, 2 replies; 23+ messages in thread
From: Gerard Jungman @ 2009-10-27 23:06 UTC (permalink / raw)
  To: gsl-discuss

[-- Attachment #1: Type: text/plain, Size: 722 bytes --]

On Fri, 2009-10-23 at 14:58 +0100, Brian Gough wrote: 
> 
> I think this is an interesting example.  How would these two classes
> work in practice in C? For example, how would one pass a non-const
> matrix (taken as a view of a non-const multi-array) to a function
> taking a const matrix argument.  Dealing with the interaction between
> const and non-const arguments is a fundamental issue.

> The challenge for any scheme is getting reasonable const behavior in
> C.  If that problem can be solved better then everything else follows
> more easily.


Ok. The attachment is the obvious solution. I don't want to have
theological wars about this, but it seems to me like it is
the "C way". Let's debate.

--
G. Jungman


[-- Attachment #2: foobar.c --]
[-- Type: text/x-csrc, Size: 1438 bytes --]

#include <stdlib.h>


/* The union guarantees that the structs are aligned, in C99.
 *
 * In C90, standard does not discuss the issue, but all
 * compilers known to man align them anyway. The C99
 * guarantee seems to be an attempt to standardize
 * that behaviour.
 *
 * The union type is never actually used, unless somebody
 * can think of a reason to use it...
 */
union gsl_vector_union_t
{
  struct gsl_vector_struct_t
  {
    double * data;
    size_t   size;
    int      stride;
  } as_vector;

  struct gsl_const_vector_struct_t
  {
    const double * data;
    size_t   size;
    int      stride;
  } as_const_vector;
};


typedef  struct gsl_vector_struct_t        gsl_vector;
typedef  struct gsl_const_vector_struct_t  gsl_const_vector;



void gsl_some_typical_function(const gsl_vector * v)
{
  /* do something which might twiddle v.data[] */
}

void gsl_some_typical_const_function(const gsl_const_vector * v)
{
  /* do something which does not twiddle v.data[] */
}



int main()
{
  gsl_vector v;
  gsl_const_vector cv;


  /* might be twiddling v.data[] */
  gsl_some_typical_function(&v);

  /* claims not to twiddle cv.data[] */
  gsl_some_typical_const_function(&cv);

  /* claims not to twiddle v.data[] */
  gsl_some_typical_const_function((const gsl_const_vector *) &v);

  /* a misuse; but this sort of thing can never be prevented anyway */
  gsl_some_typical_function((const gsl_vector *) &cv);

  return 0;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-05 23:00   ` Gerard Jungman
  2009-10-05 23:45     ` James Bergstra
       [not found]     ` <645d17210910060537s762d6323pfd2bec8590ad28e9@mail.gmail.com>
@ 2009-10-23 21:28     ` Brian Gough
  2009-10-27 23:06       ` Gerard Jungman
  2 siblings, 1 reply; 23+ messages in thread
From: Brian Gough @ 2009-10-23 21:28 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

At Mon, 05 Oct 2009 16:56:07 -0600,
Gerard Jungman wrote:
> On Mon, 2009-10-05 at 10:50 -0400, James Bergstra wrote:
> > I'm a bit rusty with my C structs... but you would need two distinct
> > static classes to have const and non-const data pointers for your view
> > right?
> 
> Yes, sorry. I left that out.

I think this is an interesting example.  How would these two classes
work in practice in C? For example, how would one pass a non-const
matrix (taken as a view of a non-const multi-array) to a function
taking a const matrix argument.  Dealing with the interaction between
const and non-const arguments is a fundamental issue.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
       [not found]     ` <645d17210910060537s762d6323pfd2bec8590ad28e9@mail.gmail.com>
@ 2009-10-06 20:02       ` Gerard Jungman
  0 siblings, 0 replies; 23+ messages in thread
From: Gerard Jungman @ 2009-10-06 20:02 UTC (permalink / raw)
  To: gsl-discuss

On Tue, 2009-10-06 at 13:37 +0100, Jonathan Underwood wrote:
> 
> I'm not sure if I fully follow this, but I wonder if the numpy concept
> of "broadcasting" is useful here? More info:
> 
> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting

Ok. This could be useful. I will study it and see if it helps.
If it turns out to be too dynamic an idea for efficient C, then
I'm not sure if we can do anything with it. But who knows...


> In general, I can't help thinking that it would be a great advantage
> if compatibility with the underlying data structures used in numpy was
> achieved during this redesign (numpy is mostly implemented in C and is
> BSD licensed).

I'm looking at this right now:

 /usr/lib64/python2.6/site-packages/numpy/core/include/numpy/ndarrayobject.h

Certainly it has the same basic functionality, with a shape described
by dimensions and strides in each index place. Some of the other stuff,
like byte-order worries, I imagine we have to ignore. That is presumably
beyond the scope of GSL.

Can we be precise about the meaning of "compatibility"? This is the
kind of thing we need to discuss. I would definitely like to see
GSL play better with others.

There are other tools out there as well. I don't follow the
history of Numeric and NumPy and ScientificPython, blah blah.
It all looks a little disorganized to me. People who know
about these things, and have opinions, should tell us what
we should be doing.

--
G. Jungman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-05 23:45     ` James Bergstra
@ 2009-10-06 19:59       ` Gerard Jungman
  0 siblings, 0 replies; 23+ messages in thread
From: Gerard Jungman @ 2009-10-06 19:59 UTC (permalink / raw)
  To: gsl-discuss

On Mon, 2009-10-05 at 19:45 -0400, James Bergstra wrote:

> I'm thinking of something like
> 
> struct marray { int rank; int * dims; int * strides; dtype * base; };
> marray_add_inplace( const marray a, const marray b, marray c);  //N.B.
> the passing on the stack
> 
> Inside that function, we can check for special cases of interest and
> optimize them as necessary.

This is not so clear. You can't have a test inside the indexing
functions; that would obviously kill you. So we still need a way
to pick the correct indexing function at compile-time; I think
this means that we need the fixed-rank types. And the indexing
calculation is best done with 'dims' and 'strides' on the stack,
rather than through the pointer indirection.

Maybe you are suggesting the above as an implementation detail,
to be wrapped by the fixed-rank types? That would allow for
compile-time selection of the indexing, but it would still
probably force an indirection through those pointers.


> One disadvantage of such a structure I think you'll agree, is the
> internal pointers, which interfere with convenient stack allocation.

Yes. I rejected internal pointers for that reason.


> A somewhat ugly alternative (which I am nonetheless suggesting) would
> be the following...
> 
> { int rank; dtype * base; int dim0, dim1, dim2; int str0, str1, str2;
> int * extra_dims; int * extra_strides; };
> 
> For low-rank multiarrays, this structure can be manipulated on the
> stack, so it is possible to write expressions like
> 
> marray_add_inplace(marray_reshape2(a, 5, 50), marray_slice(b, 17),
> marray_transpose(c));  // add a reshaped view of a to the 17th
> matrix-slice of b, storing result in a transposed view of c.
> 
> At the same time, arbitrarily high ranks are supported if the user is
> willing to put up with the syntax of heap-based manipulation.

This is closer to what I have done, although I just separated the
fixed-rank and the arbitrary rank (heap-based) cases. In the end,
that seemed like the cleanest choice.


> (Of course we can haggle over how many extra ranks can be built into
> the static part of the structure.)

The way I have done it, I avoid this question. Here are some snippets
from what I have, just so you get an idea of how I am thinking about
it. I'll work downwards, from the multi-arrays, down to the
implementation details.

First, we must have separate types for the fixed rank cases, in order
to insure that we select the correct indexing function at compile-time.
So...

  typedef struct { ... } gsl_marray_1;  // rank 1
  typedef struct { ... } gsl_marray_2;  // rank 2
  typedef struct { ... } gal_marray_3;  // rank 3
  etc.


The basic ingredient for indexing is a dimension and stride
in an index place. So...

  typedef struct
  {
    size_t  dimens; // dimension in this index place
    size_t  stride; // addressing stride for this index
  }
  gsl_marray_idx;


Given this, the marray declarations can look like:

  typedef struct
  {
    int            rank;
    double *       data;
    gsl_marray_idx idx[1];
    ...
  }
  gsl_marray_1

  typedef struct
  {
    int            rank;
    double *       data;
    gsl_marray_idx idx[2];
    ...
  }
  gsl_marray_2

  etc.


Now, at the level of implementations, we write everything in
terms of (rank, gsl_marray_idx[]), avoiding an intermediate type.
We can freely mix generic versions of functions and specializations.
Generic versions handle the general case, for operations which are
not performance critical (like formatted i/o and stuff...).
Specializations handle the things that matter, like indexing.

Examples:

  // some "low-level" io implementation function;
  // handles the access generically
  gsl_marray_fread_raw(FILE * f, double * data, int rank,
                       const gsl_marray_idx idx[]);


  // indexing (offset calculation) requires specialization
  inline
  size_t
  gsl_marray_idx_1_offset(const gsl_marray_idx idx[], const int index[])
  { return index[0]*idx[0].stride; }

  inline
  size_t
  gsl_marray_idx_2_offset(const gsl_marray_idx idx[], const int index[])
  { return index[0]*idx[0].stride + index[1]*idx[1].stride; }

  etc.

  inline
  gsl_marray_1_offset(const gsl_marray_1 * ma, const int index[])
  { return gsl_marray_idx_1_offset(ma->idx, index); }


You get the idea. These are taken (almost) verbatim from the code
that I have written so far. Finally, for the generic case we have
something like:

  typedef struct
  {
    int              rank;
    gsl_marray_idx * idx;
    double *         data;
    ...
  };

with the obvious heap allocating constructor, etc. The same
low-level generic functions as above are also used to implement
the class-specific functions here.

Anyway, the whole thing seems to be layered properly, and there
are no "tricky" bits. No rocket science. Just following my nose.


I will shortly produce a complete package for people to examine. But
you know how it is. As David Byrne once sang, "you can't see it 'til
it's finished...".

In the meantime, keep suggesting stuff. Discussion always helps.

--
G. Jungman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-05 23:00   ` Gerard Jungman
@ 2009-10-05 23:45     ` James Bergstra
  2009-10-06 19:59       ` Gerard Jungman
       [not found]     ` <645d17210910060537s762d6323pfd2bec8590ad28e9@mail.gmail.com>
  2009-10-23 21:28     ` Brian Gough
  2 siblings, 1 reply; 23+ messages in thread
From: James Bergstra @ 2009-10-05 23:45 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

On Mon, Oct 5, 2009 at 6:56 PM, Gerard Jungman <jungman@lanl.gov> wrote:
> On Mon, 2009-10-05 at 10:50 -0400, James Bergstra wrote:
>> Two comments:
>>
>> I'm a bit rusty with my C structs... but you would need two distinct
>> static classes to have const and non-const data pointers for your view
>> right?
>
> Yes, sorry. I left that out.
>
>
>> Also, it sounds like writing code that will work for a tensor of any
>> rank (e.g. add two tensors together) might be either tedious or
>> impossible.  I recognize that part of the problem is the lack of
>> templating and polymorphism, but it would at least be comforting to
>> see just how bad the situation is via a use case or two in the design
>> documentation.   I (naively?) fear that to get good performance will
>> require a whole library of functions for even the most basic of
>> operations.
>>
>> gsl_marray_add_1_0( gsl_marray_1, double );
>> gsl_marray_add_1_1( gsl_marray_1, gsl_marray_1);
>> gsl_marray_add_1_2( gsl_marray_1, gsl_marray_2);
>> gsl_marray_add_2_2(... )
>> ...
>>
>> gsl_marray_sub_1_0( ... )
>>
>> Maybe a system of macros could be designed to help here, but it sounds
>> like it will never be as easy as writing a couple of for-loops.
>
>
> First, I want to be sure that we distinguish the multi-array
> case from the vector/matrix/tensor case. Algebra for vectors and
> matrices is well-defined and already on place; we only need to
> re-write some wrappers, etc. It is handled by BLAS calls. As I
> mentioned, this places a constraint on matrices, that the
> fast traversals be of unit stride. It seems like we just
> have to live with that constraint, for the matrix type.
>
> Addition, subtraction, scaling of multi-arrays is not hard
> because it is only defined within the same rank. So there
> is only a linear complexity, and no combinatorial explosion
> in the interfaces, for these operations. Of course, there
> are issues of shape conformance at run-time, but that is
> also true for vectors and matrices.
>
> That leaves multi-array algebra as an open question. By algebra,
> I mean technically the "cross-rank" operations, which form
> some kind of general multi-linear algebra. Sounds pretty
> hairy, as you suspect.
>

Another idea here would be to implement general-cases in terms of
structures with a dynamic rank right?  The rank does not need to be
part of the struct, but can be a variable.

I'm thinking of something like

struct marray { int rank; int * dims; int * strides; dtype * base; };
marray_add_inplace( const marray a, const marray b, marray c);  //N.B.
the passing on the stack

Inside that function, we can check for special cases of interest and
optimize them as necessary.
One disadvantage of such a structure I think you'll agree, is the
internal pointers, which interfere with convenient stack allocation.

A somewhat ugly alternative (which I am nonetheless suggesting) would
be the following...

{ int rank; dtype * base; int dim0, dim1, dim2; int str0, str1, str2;
int * extra_dims; int * extra_strides; };

For low-rank multiarrays, this structure can be manipulated on the
stack, so it is possible to write expressions like

marray_add_inplace(marray_reshape2(a, 5, 50), marray_slice(b, 17),
marray_transpose(c));  // add a reshaped view of a to the 17th
matrix-slice of b, storing result in a transposed view of c.

At the same time, arbitrarily high ranks are supported if the user is
willing to put up with the syntax of heap-based manipulation.

(Of course we can haggle over how many extra ranks can be built into
the static part of the structure.)

James
-- 
http://www-etud.iro.umontreal.ca/~bergstrj

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-05 14:50 ` James Bergstra
@ 2009-10-05 23:00   ` Gerard Jungman
  2009-10-05 23:45     ` James Bergstra
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Gerard Jungman @ 2009-10-05 23:00 UTC (permalink / raw)
  To: gsl-discuss

On Mon, 2009-10-05 at 10:50 -0400, James Bergstra wrote:
> Two comments:
> 
> I'm a bit rusty with my C structs... but you would need two distinct
> static classes to have const and non-const data pointers for your view
> right?

Yes, sorry. I left that out.


> Also, it sounds like writing code that will work for a tensor of any
> rank (e.g. add two tensors together) might be either tedious or
> impossible.  I recognize that part of the problem is the lack of
> templating and polymorphism, but it would at least be comforting to
> see just how bad the situation is via a use case or two in the design
> documentation.   I (naively?) fear that to get good performance will
> require a whole library of functions for even the most basic of
> operations.
> 
> gsl_marray_add_1_0( gsl_marray_1, double );
> gsl_marray_add_1_1( gsl_marray_1, gsl_marray_1);
> gsl_marray_add_1_2( gsl_marray_1, gsl_marray_2);
> gsl_marray_add_2_2(... )
> ...
> 
> gsl_marray_sub_1_0( ... )
> 
> Maybe a system of macros could be designed to help here, but it sounds
> like it will never be as easy as writing a couple of for-loops.


First, I want to be sure that we distinguish the multi-array
case from the vector/matrix/tensor case. Algebra for vectors and
matrices is well-defined and already on place; we only need to
re-write some wrappers, etc. It is handled by BLAS calls. As I
mentioned, this places a constraint on matrices, that the
fast traversals be of unit stride. It seems like we just
have to live with that constraint, for the matrix type.

Addition, subtraction, scaling of multi-arrays is not hard
because it is only defined within the same rank. So there
is only a linear complexity, and no combinatorial explosion
in the interfaces, for these operations. Of course, there
are issues of shape conformance at run-time, but that is
also true for vectors and matrices.

That leaves multi-array algebra as an open question. By algebra,
I mean technically the "cross-rank" operations, which form
some kind of general multi-linear algebra. Sounds pretty
hairy, as you suspect.

First Idea: In fact, none of these operations are required
from the library. If you have a good (fast) indexing scheme,
then the user can implement whatever they want. This is the
boost solution; boost::multi_array has no support for algebra
operations. So we just punt on this. This was my implicit
choice in the design summary.

Second Idea: Implement as much as seems reasonable, in a way which
is equivalent to what a user would do, with some loops. I am not
sure that efficiency is an issue, since I don't see how you
could do better than some loops, in the general case. Higher
efficiency can be obtained for the vector and matrix types,
using a good BLAS, but then it should be clear that the
vector and matrix types are what you want, and not the
general multi-array types.

Third Idea: Figure out a way to generate everything automatically.
Hmmm. Not likely. And the interface explosion would be ridiculous.


Finally, we come to tensors. As defined in the design summary,
tensors are multi-index objects with the same dimension at
each place. We definitely do want to support the usual
tensor algebra operations: tensor product and contraction.

The question seems to be: How much simplicity do we gain in
going from the general multi-array case to the restricted
tensor case? If the interfaces for doing the tensor stuff
are no less complicated than the general case, then we
might as well go back and solve it for general
multi-arrays.


In any case, I think multi-array support would be a good thing,
even without multi-linear algebra. Fixing up vectors and matrices
so that the view types make sense is also a good thing. These
are mostly orthogonal tasks; I just want to get them both
clear in my head so I understand the limited ways in which
they will interact. Right now I think the interaction is
limited to some light-weight conversion functions between
them and a common slicing mechanism.

Tensors are another mostly orthogonal task. Again, they will
benefit from the generic slicing mechanism, and there will be
some light-weight conversion functions. But we can solve the
problems here as we come to them.


Does this make sense?

--
G. Jungman



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: containers tentative design summary
  2009-10-05 10:12 Gerard Jungman
@ 2009-10-05 14:50 ` James Bergstra
  2009-10-05 23:00   ` Gerard Jungman
  0 siblings, 1 reply; 23+ messages in thread
From: James Bergstra @ 2009-10-05 14:50 UTC (permalink / raw)
  To: Gerard Jungman; +Cc: gsl-discuss

Two comments:

I'm a bit rusty with my C structs... but you would need two distinct
static classes to have const and non-const data pointers for your view
right?

Also, it sounds like writing code that will work for a tensor of any
rank (e.g. add two tensors together) might be either tedious or
impossible.  I recognize that part of the problem is the lack of
templating and polymorphism, but it would at least be comforting to
see just how bad the situation is via a use case or two in the design
documentation.   I (naively?) fear that to get good performance will
require a whole library of functions for even the most basic of
operations.

gsl_marray_add_1_0( gsl_marray_1, double );
gsl_marray_add_1_1( gsl_marray_1, gsl_marray_1);
gsl_marray_add_1_2( gsl_marray_1, gsl_marray_2);
gsl_marray_add_2_2(... )
...

gsl_marray_sub_1_0( ... )

Maybe a system of macros could be designed to help here, but it sounds
like it will never be as easy as writing a couple of for-loops.

James

On Sun, Oct 4, 2009 at 10:05 PM, Gerard Jungman <jungman@cybermesa.com> wrote:
> A summary of the tentative design for containers. Code will follow,
> as soon as a few things are sorted out. (See previous post on
> questions about block/vector/matrix).
>
> --
> G. Jungman
>
>



-- 
http://www-etud.iro.umontreal.ca/~bergstrj

^ permalink raw reply	[flat|nested] 23+ messages in thread

* containers tentative design summary
@ 2009-10-05 10:12 Gerard Jungman
  2009-10-05 14:50 ` James Bergstra
  0 siblings, 1 reply; 23+ messages in thread
From: Gerard Jungman @ 2009-10-05 10:12 UTC (permalink / raw)
  To: gsl-discuss

[-- Attachment #1: Type: text/plain, Size: 184 bytes --]

A summary of the tentative design for containers. Code will follow,
as soon as a few things are sorted out. (See previous post on
questions about block/vector/matrix).

--
G. Jungman


[-- Attachment #2: gsl-containers-design-summary.txt --]
[-- Type: text/plain, Size: 11270 bytes --]

** Multi-array.

   A multi-array is a data structure managing a reference to
   a data segment (think double * as an origin) and enabling
   indexed access to the data. Indexing is controlled by
   an indexing map, which translates indices into an
   offset in the underlying data segment. The number
   of indices (dimension of the domain of the indexing map),
   is called the rank of the indexing map.

   The data segment may be externally controlled, or internally
   controlled by the multi-array. In essence, "control" means
   control of the allocation/deallocation process for the segment.

   No notion of view is needed. When the data is internally
   controlled, we can think of the multi-array as "original".
   When the data is externally controlled, we can think of
   the multi-array as a view. But this is not a precise way
   of thinking. More precisely, the life-cycle of the data
   is simply independent of the indexing semantics, and the
   need for a separate notion of view evaporates.

   This lack of a separate view type is not shared by some other
   designs. For example, boost introduces a distinct reference
   type boost::multi_array_ref, in addition to boost:multi_array.
   The reason for this choice is not clear. It may be for absolutely
   optimal efficiency. Certain choices are easier to make if one
   separates these types; however, my current feeling is that
   the extra type is not justified.

   If we decide to introduce separate view types later, we must
   insure that it is done in a way which does not force other
   interfaces in the library to grow combinatorially. And
   without doubt, the extra types must have the same generic
   interfaces as the non-view types; this is one area where
   GSL fails miserably.



** Indexing map.

   An indexing map translates an index vector to an offset. It is
   specified by a shape. At a minimum, shapes include the dimension
   of each index (number of indices addressed). When a shape allows
   an index in place i to have only one value, that index place
   is called "invariate". If it can have no values (null set), it is
   called "null". Otherwise (the normal case) it is called "variate".

   The number of indices (dimension of the domain of the indexing map),
   is called the "static rank" or simply "rank" of the map. The
   number of variate indices in a given map is called the
   "dynamic rank" or "effective rank" of the map.

   Examples using C-array notation:

     double x[3][4][5];  // static rank = 3, dynamic rank = 3

     double x[3][1][5];  // static rank = 3, dynamic rank = 2
                         // [3] and [5] are variate, [1] is invariate



** Structure of indexing maps.

   We choose the most general affine form for indexing maps.
   Therefore, the offset is calculated as
   
     offset = sum(n, index[n] * stride[n])

   The data segment is stored as a pointer to the origin of
   the data. Therefore, the most general map calculation
   takes the affine form

     location = origin + sum(n, index[n] * stride[n]).



** Slicing as composition of indexing maps.

   Let i[N] be an element of the index set for a rank N indexing map.
   Let j[M] be an element of the index set for a rank M indexing map.

   Let delta[N][M] be an integer-valued matrix and b[N] an
   integer-valued vector. Then the following formula which
   maps M-dimensional indices to N-dimensional indices
   defines a rank M index map from any rank-N index map.

     i[n] = b[n] + sum(m, delta[n][m] * j[m])

   Notice the functoriality. "Old" indices are determined in
   terms of "new" indices. In other language, the new indices
   are pulled-back to the old context. This is the correct
   functoriality to allow arbitrary chained composition of
   index maps, and therefore slices of slices, to any order.

   It should be clear that the index map generated this way
   is always affine, and all affine indexing maps can be
   generated in this way.

   Therefore, a general slice of an affine indexing map
   is specified by a vector b[] and matrix delta[][], of
   the indicated dimensions.

   The reader can convince himself that any form of strided
   access is supported in this way. For example, it is equally
   easy to slice out the super-diagonal of a matrix as it is to
   slice out a row or column, or for that matter to slice out
   any regularly strided sub-matrix, etc.

   The underlying direct data access, by computation of an
   offset, is itself an affine indexing map, specialized to
   the case where the "old" index is one-dimensional. So
   the design space of affine indexing maps is closed.



** Multi-array implementation requirements.

 * Indexing calculations must be as fast as possible. The indexing
   arithmetic should be inlined when possible. The shape data
   structure should enable the fastest possible computation,
   consistent with general strided access.

   This suggests that different rank objects be of different static
   types, since this is the most straight-forward way to insure that
   the correct indexing calculation is chosen at compile-time. This
   design corresponds to the usual designs (such as in GSL), where
   vectors and matrices (for example) are different static types.
   This choice also corresponds to the design of boost::multi_array,
   where the rank is a template parameter and therefore a
   compile-time constant which generates a separate type.

 * The shape of a multi-array should be dynamic. This corresponds
   to the usual design, such as in GSL, where the dimensions of 
   a matrix (for example) are run-time constants, not compile-time
   constants.

 * It should be possible to construct multi-arrays without
   heap allocations. For example:

     double data[64];
     gsl_marray_2 ma_8x8 = gsl_marray_2_build(data, 8, 8);

   Placing such objects on the stack has good implications
   for efficiency and is very natural from the client's point
   of view. For example, in C++, one simply writes

    std::vector<double> v;

   which places the object on the stack as an automatic variable.



** Multi-array implementation choices. These are open for debate,
   but these are the choices I have made. No loss of functional
   generality is implied by these choices, only some loss of
   syntactic generality.

 * Multi-array indexing conforms precisely to C-style array
   indexing. Shapes are completely specified by dimensions and
   strides. This means that indices always "start" at 0, and
   end at dimension-1 (inclusive).

 * The notion of storage order does not enter. The "fast" index
   is always to the right, conforming to C array semantics. See
   the discussion of matrices below.



** Vector.

   A vector manages a data segment and supports indexed access
   to the data through a rank one indexing map. However,
   vectors may be required to satisfy extra constraints, not
   compatible with rank one multi-arrays. Also, they may provide
   other capability, such as indexing which "starts" at some
   non-zero value. Personally, I have no desire for this extra
   sort of functionality. It would only add overhead to an
   otherwise clean design. My first inlination is to
   implement vectors as follows:

       typedef  gsl_marray_1  gsl_vector; // DONE

   However, the reality may not be quite as simple as this in
   the end. However it is done, I think it is a good idea
   to preserve the name "vector" in the design, although
   it will probably imply some significant boiler-plate
   in function declarations, since C does not allow us
   to simply inherit the member function namespace
   of another class. SIGH.



** Matrix.

   Matrices are clearly different from rank two multi-arrays.
   There are necessary extra capabilities, and there are also
   constraints, in order to make them conform to the notions
   used in linear algebra packages.

   The extra capability corresponds to the identification of "row"
   and "column" indices. Multi-array semantics are governed by data
   locality, so the notion of "fast" index is immutable. But a matrix
   may either identify its rows with fast index traversal (C order)
   or its columns with fast index traversal (Fortran order). Matrices
   will need to remember which and to adjust their indexing computations
   accordingly.

   The extra constraint on matrices, in comparison to multi-arrays,
   is the need for the fast index to traverse the data with stride 1.
   For example, BLAS interfaces allow for vectors of arbitrary
   stride, but do not allow for a matrix "fast index" stride.

   Example:
     Let A be a 3x3 matrix. Let B1 be a 2x2 matrix defined as a
     view (sub-matrix) of A defined by the x's in the following
     diagram:
     
       x x -
       - - -
       x x -

     Then B1 is a matrix, in the BLAS sense. It can be passed
     to BLAS functions without further reshaping. To be exact,
     it would be passed with origin equal to the given origin
     and with tda = 6 and dimensions (2,2).

     Let B2 be the 2x2 sub-matrix view defined by the following
     diagram.
     
      x - x
      - - -
      x - x

     Then B2 is _not_ a BLAS compatible matrix, since there is no
     parameter in the BLAS interfaces which allows for the non-unit
     stride in the fast index. Some reshaping, involving data
     motion/copy, would be required to pass this view to BLAS
     functions.

   Therefore, I do not think that generic rank two multi-arrays can
   be identified with matrices. Rather, a lightweight conversion
   from rank-two multi-arrays to matrices must be provided, and this
   conversion must fail at run-time if the multi-array fast index
   stride is not unity.



** Tensor

  No explicit notion of tensor currently exists, beyond that
  expressed in the tensor/ extension package for GSL, by
  Jordi Burguet-Castell.

  The Burguet-Castell package does not meet the requirements
  layed out above, mainly because it looks and works too much
  like the GSL types, having been modelled on them for obvious
  reasons. In particular, it allows the rank to be determined
  at run-time. This may or may not be desirable, depending on
  how we want tensors to interact withy multi-arrays.

  However, we draw some motivation from the Burguet-Castell package.
  A tensor is defined to be a multi-array where the shape is constrained
  to have all dimensions equal.

  Like in the case of matrices, we encounter some small technical
  problems in the identification with multi-arrays. Since the
  shape of a multi-array is dynamic, the compatibility of
  a given multi-array with the tensor concept is question
  that must be deferred to run-time.

  The question of the role of multi-arrays in the implementation
  of tensors remains open, as it does for matrices. But we do
  require a certain level of interoperability, perhaps expressed
  as light-weight conversion functions which fail to convert a
  multi-array to a tensor if the shape is not consistent. This
  seems like the most likely solution. It also allows the tensor
  implementation to be independent of the high-level multi-array
  interfaces, which strikes me as the right way to do it, from
  both a general design point of view and an efficiency point of
  view.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-11-20  9:37 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-20  9:37 containers tentative design summary Justin Lenzo
  -- strict thread matches above, loose matches on Subject: below --
2009-11-03 19:44 Gerard Jungman
2009-11-09 20:41 ` Brian Gough
2009-11-09 23:06   ` Gerard Jungman
2009-11-14 15:25     ` Brian Gough
2009-11-15  9:13       ` Tuomo Keskitalo
2009-11-15 16:44         ` Jonathan Underwood
2009-11-15 18:41           ` Robert G. Brown
2009-11-16 11:56         ` Brian Gough
2009-10-05 10:12 Gerard Jungman
2009-10-05 14:50 ` James Bergstra
2009-10-05 23:00   ` Gerard Jungman
2009-10-05 23:45     ` James Bergstra
2009-10-06 19:59       ` Gerard Jungman
     [not found]     ` <645d17210910060537s762d6323pfd2bec8590ad28e9@mail.gmail.com>
2009-10-06 20:02       ` Gerard Jungman
2009-10-23 21:28     ` Brian Gough
2009-10-27 23:06       ` Gerard Jungman
     [not found]         ` <7f1eaee30910271628h70785125m68e47c7a7b5c25b7@mail.gmail.com>
2009-10-27 23:49           ` Gerard Jungman
2009-10-29 18:06         ` Brian Gough
2009-10-29 20:41           ` Gerard Jungman
2009-10-29 21:40             ` James Bergstra
2009-10-30 16:54               ` Brian Gough
2009-10-30 16:54             ` Brian Gough

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).