* containers tentative design summary
@ 2009-10-05 10:12 Gerard Jungman
2009-10-05 14:50 ` James Bergstra
0 siblings, 1 reply; 30+ messages in thread
From: Gerard Jungman @ 2009-10-05 10:12 UTC (permalink / raw)
To: gsl-discuss
[-- Attachment #1: Type: text/plain, Size: 184 bytes --]
A summary of the tentative design for containers. Code will follow,
as soon as a few things are sorted out. (See previous post on
questions about block/vector/matrix).
--
G. Jungman
[-- Attachment #2: gsl-containers-design-summary.txt --]
[-- Type: text/plain, Size: 11270 bytes --]
** Multi-array.
A multi-array is a data structure managing a reference to
a data segment (think double * as an origin) and enabling
indexed access to the data. Indexing is controlled by
an indexing map, which translates indices into an
offset in the underlying data segment. The number
of indices (dimension of the domain of the indexing map),
is called the rank of the indexing map.
The data segment may be externally controlled, or internally
controlled by the multi-array. In essence, "control" means
control of the allocation/deallocation process for the segment.
No notion of view is needed. When the data is internally
controlled, we can think of the multi-array as "original".
When the data is externally controlled, we can think of
the multi-array as a view. But this is not a precise way
of thinking. More precisely, the life-cycle of the data
is simply independent of the indexing semantics, and the
need for a separate notion of view evaporates.
This lack of a separate view type is not shared by some other
designs. For example, boost introduces a distinct reference
type boost::multi_array_ref, in addition to boost:multi_array.
The reason for this choice is not clear. It may be for absolutely
optimal efficiency. Certain choices are easier to make if one
separates these types; however, my current feeling is that
the extra type is not justified.
If we decide to introduce separate view types later, we must
insure that it is done in a way which does not force other
interfaces in the library to grow combinatorially. And
without doubt, the extra types must have the same generic
interfaces as the non-view types; this is one area where
GSL fails miserably.
** Indexing map.
An indexing map translates an index vector to an offset. It is
specified by a shape. At a minimum, shapes include the dimension
of each index (number of indices addressed). When a shape allows
an index in place i to have only one value, that index place
is called "invariate". If it can have no values (null set), it is
called "null". Otherwise (the normal case) it is called "variate".
The number of indices (dimension of the domain of the indexing map),
is called the "static rank" or simply "rank" of the map. The
number of variate indices in a given map is called the
"dynamic rank" or "effective rank" of the map.
Examples using C-array notation:
double x[3][4][5]; // static rank = 3, dynamic rank = 3
double x[3][1][5]; // static rank = 3, dynamic rank = 2
// [3] and [5] are variate, [1] is invariate
** Structure of indexing maps.
We choose the most general affine form for indexing maps.
Therefore, the offset is calculated as
offset = sum(n, index[n] * stride[n])
The data segment is stored as a pointer to the origin of
the data. Therefore, the most general map calculation
takes the affine form
location = origin + sum(n, index[n] * stride[n]).
** Slicing as composition of indexing maps.
Let i[N] be an element of the index set for a rank N indexing map.
Let j[M] be an element of the index set for a rank M indexing map.
Let delta[N][M] be an integer-valued matrix and b[N] an
integer-valued vector. Then the following formula which
maps M-dimensional indices to N-dimensional indices
defines a rank M index map from any rank-N index map.
i[n] = b[n] + sum(m, delta[n][m] * j[m])
Notice the functoriality. "Old" indices are determined in
terms of "new" indices. In other language, the new indices
are pulled-back to the old context. This is the correct
functoriality to allow arbitrary chained composition of
index maps, and therefore slices of slices, to any order.
It should be clear that the index map generated this way
is always affine, and all affine indexing maps can be
generated in this way.
Therefore, a general slice of an affine indexing map
is specified by a vector b[] and matrix delta[][], of
the indicated dimensions.
The reader can convince himself that any form of strided
access is supported in this way. For example, it is equally
easy to slice out the super-diagonal of a matrix as it is to
slice out a row or column, or for that matter to slice out
any regularly strided sub-matrix, etc.
The underlying direct data access, by computation of an
offset, is itself an affine indexing map, specialized to
the case where the "old" index is one-dimensional. So
the design space of affine indexing maps is closed.
** Multi-array implementation requirements.
* Indexing calculations must be as fast as possible. The indexing
arithmetic should be inlined when possible. The shape data
structure should enable the fastest possible computation,
consistent with general strided access.
This suggests that different rank objects be of different static
types, since this is the most straight-forward way to insure that
the correct indexing calculation is chosen at compile-time. This
design corresponds to the usual designs (such as in GSL), where
vectors and matrices (for example) are different static types.
This choice also corresponds to the design of boost::multi_array,
where the rank is a template parameter and therefore a
compile-time constant which generates a separate type.
* The shape of a multi-array should be dynamic. This corresponds
to the usual design, such as in GSL, where the dimensions of
a matrix (for example) are run-time constants, not compile-time
constants.
* It should be possible to construct multi-arrays without
heap allocations. For example:
double data[64];
gsl_marray_2 ma_8x8 = gsl_marray_2_build(data, 8, 8);
Placing such objects on the stack has good implications
for efficiency and is very natural from the client's point
of view. For example, in C++, one simply writes
std::vector<double> v;
which places the object on the stack as an automatic variable.
** Multi-array implementation choices. These are open for debate,
but these are the choices I have made. No loss of functional
generality is implied by these choices, only some loss of
syntactic generality.
* Multi-array indexing conforms precisely to C-style array
indexing. Shapes are completely specified by dimensions and
strides. This means that indices always "start" at 0, and
end at dimension-1 (inclusive).
* The notion of storage order does not enter. The "fast" index
is always to the right, conforming to C array semantics. See
the discussion of matrices below.
** Vector.
A vector manages a data segment and supports indexed access
to the data through a rank one indexing map. However,
vectors may be required to satisfy extra constraints, not
compatible with rank one multi-arrays. Also, they may provide
other capability, such as indexing which "starts" at some
non-zero value. Personally, I have no desire for this extra
sort of functionality. It would only add overhead to an
otherwise clean design. My first inlination is to
implement vectors as follows:
typedef gsl_marray_1 gsl_vector; // DONE
However, the reality may not be quite as simple as this in
the end. However it is done, I think it is a good idea
to preserve the name "vector" in the design, although
it will probably imply some significant boiler-plate
in function declarations, since C does not allow us
to simply inherit the member function namespace
of another class. SIGH.
** Matrix.
Matrices are clearly different from rank two multi-arrays.
There are necessary extra capabilities, and there are also
constraints, in order to make them conform to the notions
used in linear algebra packages.
The extra capability corresponds to the identification of "row"
and "column" indices. Multi-array semantics are governed by data
locality, so the notion of "fast" index is immutable. But a matrix
may either identify its rows with fast index traversal (C order)
or its columns with fast index traversal (Fortran order). Matrices
will need to remember which and to adjust their indexing computations
accordingly.
The extra constraint on matrices, in comparison to multi-arrays,
is the need for the fast index to traverse the data with stride 1.
For example, BLAS interfaces allow for vectors of arbitrary
stride, but do not allow for a matrix "fast index" stride.
Example:
Let A be a 3x3 matrix. Let B1 be a 2x2 matrix defined as a
view (sub-matrix) of A defined by the x's in the following
diagram:
x x -
- - -
x x -
Then B1 is a matrix, in the BLAS sense. It can be passed
to BLAS functions without further reshaping. To be exact,
it would be passed with origin equal to the given origin
and with tda = 6 and dimensions (2,2).
Let B2 be the 2x2 sub-matrix view defined by the following
diagram.
x - x
- - -
x - x
Then B2 is _not_ a BLAS compatible matrix, since there is no
parameter in the BLAS interfaces which allows for the non-unit
stride in the fast index. Some reshaping, involving data
motion/copy, would be required to pass this view to BLAS
functions.
Therefore, I do not think that generic rank two multi-arrays can
be identified with matrices. Rather, a lightweight conversion
from rank-two multi-arrays to matrices must be provided, and this
conversion must fail at run-time if the multi-array fast index
stride is not unity.
** Tensor
No explicit notion of tensor currently exists, beyond that
expressed in the tensor/ extension package for GSL, by
Jordi Burguet-Castell.
The Burguet-Castell package does not meet the requirements
layed out above, mainly because it looks and works too much
like the GSL types, having been modelled on them for obvious
reasons. In particular, it allows the rank to be determined
at run-time. This may or may not be desirable, depending on
how we want tensors to interact withy multi-arrays.
However, we draw some motivation from the Burguet-Castell package.
A tensor is defined to be a multi-array where the shape is constrained
to have all dimensions equal.
Like in the case of matrices, we encounter some small technical
problems in the identification with multi-arrays. Since the
shape of a multi-array is dynamic, the compatibility of
a given multi-array with the tensor concept is question
that must be deferred to run-time.
The question of the role of multi-arrays in the implementation
of tensors remains open, as it does for matrices. But we do
require a certain level of interoperability, perhaps expressed
as light-weight conversion functions which fail to convert a
multi-array to a tensor if the shape is not consistent. This
seems like the most likely solution. It also allows the tensor
implementation to be independent of the high-level multi-array
interfaces, which strikes me as the right way to do it, from
both a general design point of view and an efficiency point of
view.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-05 10:12 containers tentative design summary Gerard Jungman
@ 2009-10-05 14:50 ` James Bergstra
2009-10-05 23:00 ` Gerard Jungman
2009-10-05 23:04 ` some general questions Gerard Jungman
0 siblings, 2 replies; 30+ messages in thread
From: James Bergstra @ 2009-10-05 14:50 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
Two comments:
I'm a bit rusty with my C structs... but you would need two distinct
static classes to have const and non-const data pointers for your view
right?
Also, it sounds like writing code that will work for a tensor of any
rank (e.g. add two tensors together) might be either tedious or
impossible. I recognize that part of the problem is the lack of
templating and polymorphism, but it would at least be comforting to
see just how bad the situation is via a use case or two in the design
documentation. I (naively?) fear that to get good performance will
require a whole library of functions for even the most basic of
operations.
gsl_marray_add_1_0( gsl_marray_1, double );
gsl_marray_add_1_1( gsl_marray_1, gsl_marray_1);
gsl_marray_add_1_2( gsl_marray_1, gsl_marray_2);
gsl_marray_add_2_2(... )
...
gsl_marray_sub_1_0( ... )
Maybe a system of macros could be designed to help here, but it sounds
like it will never be as easy as writing a couple of for-loops.
James
On Sun, Oct 4, 2009 at 10:05 PM, Gerard Jungman <jungman@cybermesa.com> wrote:
> A summary of the tentative design for containers. Code will follow,
> as soon as a few things are sorted out. (See previous post on
> questions about block/vector/matrix).
>
> --
> G. Jungman
>
>
--
http://www-etud.iro.umontreal.ca/~bergstrj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-05 14:50 ` James Bergstra
@ 2009-10-05 23:00 ` Gerard Jungman
2009-10-05 23:45 ` James Bergstra
` (2 more replies)
2009-10-05 23:04 ` some general questions Gerard Jungman
1 sibling, 3 replies; 30+ messages in thread
From: Gerard Jungman @ 2009-10-05 23:00 UTC (permalink / raw)
To: gsl-discuss
On Mon, 2009-10-05 at 10:50 -0400, James Bergstra wrote:
> Two comments:
>
> I'm a bit rusty with my C structs... but you would need two distinct
> static classes to have const and non-const data pointers for your view
> right?
Yes, sorry. I left that out.
> Also, it sounds like writing code that will work for a tensor of any
> rank (e.g. add two tensors together) might be either tedious or
> impossible. I recognize that part of the problem is the lack of
> templating and polymorphism, but it would at least be comforting to
> see just how bad the situation is via a use case or two in the design
> documentation. I (naively?) fear that to get good performance will
> require a whole library of functions for even the most basic of
> operations.
>
> gsl_marray_add_1_0( gsl_marray_1, double );
> gsl_marray_add_1_1( gsl_marray_1, gsl_marray_1);
> gsl_marray_add_1_2( gsl_marray_1, gsl_marray_2);
> gsl_marray_add_2_2(... )
> ...
>
> gsl_marray_sub_1_0( ... )
>
> Maybe a system of macros could be designed to help here, but it sounds
> like it will never be as easy as writing a couple of for-loops.
First, I want to be sure that we distinguish the multi-array
case from the vector/matrix/tensor case. Algebra for vectors and
matrices is well-defined and already on place; we only need to
re-write some wrappers, etc. It is handled by BLAS calls. As I
mentioned, this places a constraint on matrices, that the
fast traversals be of unit stride. It seems like we just
have to live with that constraint, for the matrix type.
Addition, subtraction, scaling of multi-arrays is not hard
because it is only defined within the same rank. So there
is only a linear complexity, and no combinatorial explosion
in the interfaces, for these operations. Of course, there
are issues of shape conformance at run-time, but that is
also true for vectors and matrices.
That leaves multi-array algebra as an open question. By algebra,
I mean technically the "cross-rank" operations, which form
some kind of general multi-linear algebra. Sounds pretty
hairy, as you suspect.
First Idea: In fact, none of these operations are required
from the library. If you have a good (fast) indexing scheme,
then the user can implement whatever they want. This is the
boost solution; boost::multi_array has no support for algebra
operations. So we just punt on this. This was my implicit
choice in the design summary.
Second Idea: Implement as much as seems reasonable, in a way which
is equivalent to what a user would do, with some loops. I am not
sure that efficiency is an issue, since I don't see how you
could do better than some loops, in the general case. Higher
efficiency can be obtained for the vector and matrix types,
using a good BLAS, but then it should be clear that the
vector and matrix types are what you want, and not the
general multi-array types.
Third Idea: Figure out a way to generate everything automatically.
Hmmm. Not likely. And the interface explosion would be ridiculous.
Finally, we come to tensors. As defined in the design summary,
tensors are multi-index objects with the same dimension at
each place. We definitely do want to support the usual
tensor algebra operations: tensor product and contraction.
The question seems to be: How much simplicity do we gain in
going from the general multi-array case to the restricted
tensor case? If the interfaces for doing the tensor stuff
are no less complicated than the general case, then we
might as well go back and solve it for general
multi-arrays.
In any case, I think multi-array support would be a good thing,
even without multi-linear algebra. Fixing up vectors and matrices
so that the view types make sense is also a good thing. These
are mostly orthogonal tasks; I just want to get them both
clear in my head so I understand the limited ways in which
they will interact. Right now I think the interaction is
limited to some light-weight conversion functions between
them and a common slicing mechanism.
Tensors are another mostly orthogonal task. Again, they will
benefit from the generic slicing mechanism, and there will be
some light-weight conversion functions. But we can solve the
problems here as we come to them.
Does this make sense?
--
G. Jungman
^ permalink raw reply [flat|nested] 30+ messages in thread
* some general questions
2009-10-05 14:50 ` James Bergstra
2009-10-05 23:00 ` Gerard Jungman
@ 2009-10-05 23:04 ` Gerard Jungman
2009-10-06 16:21 ` Brian Gough
1 sibling, 1 reply; 30+ messages in thread
From: Gerard Jungman @ 2009-10-05 23:04 UTC (permalink / raw)
To: gsl-discuss
Mainly directed at Brian...
** Random questions which touch on GSL as a whole.
Q1. Regarding header files: Why are the struct declarations inside
__BEGIN__DECLS / __END_DECLS pairs? I think they could be outside.
I have in mind some ideas for better interaction with C++, which
we can discuss later. In any case, if they can go outside (and
I'm pretty sure they can), then they should. There's nothing
"extern C" about a struct declaration. POD is POD.
Language lawyers should correct me if I'm wrong. But I don't
think it has anything to do with languages, so much as the
assumptions of binary compatibility inherent in linking
across the C / C++ barrier.
Q2. How is the global variable gsl_check_range supposed to work?
It doesn't seem to be used in any way. Is it just cruft?
Q3. Why do we still have the 'const' qualifiers on by-value parameters
in some header files? I remember it had something to do with the
behaviour of a brain-damaged compiler (microsoft?) ten years ago.
But that was ten years ago. Let's clean that up. What does
the standard say?
Q4. Why are the dependencies for including "source" files in the
templatized world broken? Updating a "templatized" source
file does not force a recompile; obviously it should.
Q5. Can we extend the "templatizing" scheme to generate
declarations too? Of course, if it obscures the
header files, then it is not acceptable.
Q6. More a statement than a question: We should be more explicit
about the levelization of the design. This means expressing
the dependence of components clearly. For example, matrix
depends on vector, yet there is nothing in the build or in
the structure of the library which makes this clear.
Everything depends on error handling. Some things depend on
ieee. Some things are almost standing alone. We currently
have no meaningful notion of sub-libraries or component
dependency.
Simple observations, for people who don't get it:
* There are too many "things" in the root directory,
both files and directories.
* There are loose header files in the root directory.
Their role cannot be drawn from context when they
are floating in the open sea.
* There are actual source files
(gsl-histogram.c, gsl-randist.c, version.c)
in the root directory. Blechhh.
Let's clean house.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-05 23:00 ` Gerard Jungman
@ 2009-10-05 23:45 ` James Bergstra
2009-10-06 19:59 ` Gerard Jungman
[not found] ` <645d17210910060537s762d6323pfd2bec8590ad28e9@mail.gmail.com>
2009-10-23 21:28 ` Brian Gough
2 siblings, 1 reply; 30+ messages in thread
From: James Bergstra @ 2009-10-05 23:45 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
On Mon, Oct 5, 2009 at 6:56 PM, Gerard Jungman <jungman@lanl.gov> wrote:
> On Mon, 2009-10-05 at 10:50 -0400, James Bergstra wrote:
>> Two comments:
>>
>> I'm a bit rusty with my C structs... but you would need two distinct
>> static classes to have const and non-const data pointers for your view
>> right?
>
> Yes, sorry. I left that out.
>
>
>> Also, it sounds like writing code that will work for a tensor of any
>> rank (e.g. add two tensors together) might be either tedious or
>> impossible. I recognize that part of the problem is the lack of
>> templating and polymorphism, but it would at least be comforting to
>> see just how bad the situation is via a use case or two in the design
>> documentation. I (naively?) fear that to get good performance will
>> require a whole library of functions for even the most basic of
>> operations.
>>
>> gsl_marray_add_1_0( gsl_marray_1, double );
>> gsl_marray_add_1_1( gsl_marray_1, gsl_marray_1);
>> gsl_marray_add_1_2( gsl_marray_1, gsl_marray_2);
>> gsl_marray_add_2_2(... )
>> ...
>>
>> gsl_marray_sub_1_0( ... )
>>
>> Maybe a system of macros could be designed to help here, but it sounds
>> like it will never be as easy as writing a couple of for-loops.
>
>
> First, I want to be sure that we distinguish the multi-array
> case from the vector/matrix/tensor case. Algebra for vectors and
> matrices is well-defined and already on place; we only need to
> re-write some wrappers, etc. It is handled by BLAS calls. As I
> mentioned, this places a constraint on matrices, that the
> fast traversals be of unit stride. It seems like we just
> have to live with that constraint, for the matrix type.
>
> Addition, subtraction, scaling of multi-arrays is not hard
> because it is only defined within the same rank. So there
> is only a linear complexity, and no combinatorial explosion
> in the interfaces, for these operations. Of course, there
> are issues of shape conformance at run-time, but that is
> also true for vectors and matrices.
>
> That leaves multi-array algebra as an open question. By algebra,
> I mean technically the "cross-rank" operations, which form
> some kind of general multi-linear algebra. Sounds pretty
> hairy, as you suspect.
>
Another idea here would be to implement general-cases in terms of
structures with a dynamic rank right? The rank does not need to be
part of the struct, but can be a variable.
I'm thinking of something like
struct marray { int rank; int * dims; int * strides; dtype * base; };
marray_add_inplace( const marray a, const marray b, marray c); //N.B.
the passing on the stack
Inside that function, we can check for special cases of interest and
optimize them as necessary.
One disadvantage of such a structure I think you'll agree, is the
internal pointers, which interfere with convenient stack allocation.
A somewhat ugly alternative (which I am nonetheless suggesting) would
be the following...
{ int rank; dtype * base; int dim0, dim1, dim2; int str0, str1, str2;
int * extra_dims; int * extra_strides; };
For low-rank multiarrays, this structure can be manipulated on the
stack, so it is possible to write expressions like
marray_add_inplace(marray_reshape2(a, 5, 50), marray_slice(b, 17),
marray_transpose(c)); // add a reshaped view of a to the 17th
matrix-slice of b, storing result in a transposed view of c.
At the same time, arbitrarily high ranks are supported if the user is
willing to put up with the syntax of heap-based manipulation.
(Of course we can haggle over how many extra ranks can be built into
the static part of the structure.)
James
--
http://www-etud.iro.umontreal.ca/~bergstrj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: some general questions
2009-10-05 23:04 ` some general questions Gerard Jungman
@ 2009-10-06 16:21 ` Brian Gough
2009-10-06 20:32 ` Gerard Jungman
0 siblings, 1 reply; 30+ messages in thread
From: Brian Gough @ 2009-10-06 16:21 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
At Mon, 05 Oct 2009 17:06:15 -0600,
Gerard Jungman wrote:
> Q1. Regarding header files: Why are the struct declarations inside
> __BEGIN__DECLS / __END_DECLS pairs? I think they could be outside.
> I have in mind some ideas for better interaction with C++, which
> we can discuss later. In any case, if they can go outside (and
> I'm pretty sure they can), then they should. There's nothing
> "extern C" about a struct declaration.
I'm 99% sure you are right. Other than slight inelegance, is there
any specific problem with them being inside? Presumably it does not
prevent them being used in C++.
> Q2. How is the global variable gsl_check_range supposed to work?
> It doesn't seem to be used in any way. Is it just cruft?
It is used in the non-inline version of the inline functions, so that
checking can still be controlled if the compiler uses the static
versions of the functions for some reason. See build.h which is
included in files defining the static versions of the functions.
> Q3. Why do we still have the 'const' qualifiers on by-value parameters
> in some header files? I remember it had something to do with the
> behaviour of a brain-damaged compiler (microsoft?) ten years ago.
> But that was ten years ago. Let's clean that up. What does
> the standard say?
On some functions we wanted to declare a variable const just for
safety in the implementation of the function (this is not strictly
necessary of course). The only way to do that is to put const on the
argument in the definition -- and there are a number of compilers
which complain if it is not on the prototype as well.
I believe some compilers do actually handle the case of a const scalar
argument more efficiently by keeping it in a register (in fact I think
gcc does this now in some cases), so one could argue that we should do
that in more cases.
> Q4. Why are the dependencies for including "source" files in the
> templatized world broken? Updating a "templatized" source
> file does not force a recompile; obviously it should.
In configure.ac we have
AM_INIT_AUTOMAKE([gnu no-dependencies])
We didn't use them originally because they weren't very reliable in
old versions of Automake. We could turn them back on now if you want.
> Q5. Can we extend the "templatizing" scheme to generate
> declarations too? Of course, if it obscures the
> header files, then it is not acceptable.
The problem would be that it would make them unreadable to the user
(they would have to figure out how the macros work to read the
prototypes). Also it might hinder any automated tools which look at
the headers (e.g. for auto-completion of function names etc).
What we could do is adopt a more standardised way of generating the
header files, rather than using an undocumented perl script. That
would mean using m4 or something like that.
> Q6. More a statement than a question: We should be more explicit
> about the levelization of the design. This means expressing
> the dependence of components clearly. For example, matrix
> depends on vector, yet there is nothing in the build or in
> the structure of the library which makes this clear.
> Everything depends on error handling. Some things depend on
> ieee. Some things are almost standing alone. We currently
> have no meaningful notion of sub-libraries or component
> dependency.
A starting point would be to generate a dependency graph using one of
the automated tools for this - for example, GNU cflow.
> Simple observations, for people who don't get it:
>
> * There are too many "things" in the root directory,
> both files and directories.
I have no problem if you want to move some of these into a
subdirectory, e.g. examples/ or sys/ etc
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-05 23:45 ` James Bergstra
@ 2009-10-06 19:59 ` Gerard Jungman
0 siblings, 0 replies; 30+ messages in thread
From: Gerard Jungman @ 2009-10-06 19:59 UTC (permalink / raw)
To: gsl-discuss
On Mon, 2009-10-05 at 19:45 -0400, James Bergstra wrote:
> I'm thinking of something like
>
> struct marray { int rank; int * dims; int * strides; dtype * base; };
> marray_add_inplace( const marray a, const marray b, marray c); //N.B.
> the passing on the stack
>
> Inside that function, we can check for special cases of interest and
> optimize them as necessary.
This is not so clear. You can't have a test inside the indexing
functions; that would obviously kill you. So we still need a way
to pick the correct indexing function at compile-time; I think
this means that we need the fixed-rank types. And the indexing
calculation is best done with 'dims' and 'strides' on the stack,
rather than through the pointer indirection.
Maybe you are suggesting the above as an implementation detail,
to be wrapped by the fixed-rank types? That would allow for
compile-time selection of the indexing, but it would still
probably force an indirection through those pointers.
> One disadvantage of such a structure I think you'll agree, is the
> internal pointers, which interfere with convenient stack allocation.
Yes. I rejected internal pointers for that reason.
> A somewhat ugly alternative (which I am nonetheless suggesting) would
> be the following...
>
> { int rank; dtype * base; int dim0, dim1, dim2; int str0, str1, str2;
> int * extra_dims; int * extra_strides; };
>
> For low-rank multiarrays, this structure can be manipulated on the
> stack, so it is possible to write expressions like
>
> marray_add_inplace(marray_reshape2(a, 5, 50), marray_slice(b, 17),
> marray_transpose(c)); // add a reshaped view of a to the 17th
> matrix-slice of b, storing result in a transposed view of c.
>
> At the same time, arbitrarily high ranks are supported if the user is
> willing to put up with the syntax of heap-based manipulation.
This is closer to what I have done, although I just separated the
fixed-rank and the arbitrary rank (heap-based) cases. In the end,
that seemed like the cleanest choice.
> (Of course we can haggle over how many extra ranks can be built into
> the static part of the structure.)
The way I have done it, I avoid this question. Here are some snippets
from what I have, just so you get an idea of how I am thinking about
it. I'll work downwards, from the multi-arrays, down to the
implementation details.
First, we must have separate types for the fixed rank cases, in order
to insure that we select the correct indexing function at compile-time.
So...
typedef struct { ... } gsl_marray_1; // rank 1
typedef struct { ... } gsl_marray_2; // rank 2
typedef struct { ... } gal_marray_3; // rank 3
etc.
The basic ingredient for indexing is a dimension and stride
in an index place. So...
typedef struct
{
size_t dimens; // dimension in this index place
size_t stride; // addressing stride for this index
}
gsl_marray_idx;
Given this, the marray declarations can look like:
typedef struct
{
int rank;
double * data;
gsl_marray_idx idx[1];
...
}
gsl_marray_1
typedef struct
{
int rank;
double * data;
gsl_marray_idx idx[2];
...
}
gsl_marray_2
etc.
Now, at the level of implementations, we write everything in
terms of (rank, gsl_marray_idx[]), avoiding an intermediate type.
We can freely mix generic versions of functions and specializations.
Generic versions handle the general case, for operations which are
not performance critical (like formatted i/o and stuff...).
Specializations handle the things that matter, like indexing.
Examples:
// some "low-level" io implementation function;
// handles the access generically
gsl_marray_fread_raw(FILE * f, double * data, int rank,
const gsl_marray_idx idx[]);
// indexing (offset calculation) requires specialization
inline
size_t
gsl_marray_idx_1_offset(const gsl_marray_idx idx[], const int index[])
{ return index[0]*idx[0].stride; }
inline
size_t
gsl_marray_idx_2_offset(const gsl_marray_idx idx[], const int index[])
{ return index[0]*idx[0].stride + index[1]*idx[1].stride; }
etc.
inline
gsl_marray_1_offset(const gsl_marray_1 * ma, const int index[])
{ return gsl_marray_idx_1_offset(ma->idx, index); }
You get the idea. These are taken (almost) verbatim from the code
that I have written so far. Finally, for the generic case we have
something like:
typedef struct
{
int rank;
gsl_marray_idx * idx;
double * data;
...
};
with the obvious heap allocating constructor, etc. The same
low-level generic functions as above are also used to implement
the class-specific functions here.
Anyway, the whole thing seems to be layered properly, and there
are no "tricky" bits. No rocket science. Just following my nose.
I will shortly produce a complete package for people to examine. But
you know how it is. As David Byrne once sang, "you can't see it 'til
it's finished...".
In the meantime, keep suggesting stuff. Discussion always helps.
--
G. Jungman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
[not found] ` <645d17210910060537s762d6323pfd2bec8590ad28e9@mail.gmail.com>
@ 2009-10-06 20:02 ` Gerard Jungman
0 siblings, 0 replies; 30+ messages in thread
From: Gerard Jungman @ 2009-10-06 20:02 UTC (permalink / raw)
To: gsl-discuss
On Tue, 2009-10-06 at 13:37 +0100, Jonathan Underwood wrote:
>
> I'm not sure if I fully follow this, but I wonder if the numpy concept
> of "broadcasting" is useful here? More info:
>
> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting
Ok. This could be useful. I will study it and see if it helps.
If it turns out to be too dynamic an idea for efficient C, then
I'm not sure if we can do anything with it. But who knows...
> In general, I can't help thinking that it would be a great advantage
> if compatibility with the underlying data structures used in numpy was
> achieved during this redesign (numpy is mostly implemented in C and is
> BSD licensed).
I'm looking at this right now:
/usr/lib64/python2.6/site-packages/numpy/core/include/numpy/ndarrayobject.h
Certainly it has the same basic functionality, with a shape described
by dimensions and strides in each index place. Some of the other stuff,
like byte-order worries, I imagine we have to ignore. That is presumably
beyond the scope of GSL.
Can we be precise about the meaning of "compatibility"? This is the
kind of thing we need to discuss. I would definitely like to see
GSL play better with others.
There are other tools out there as well. I don't follow the
history of Numeric and NumPy and ScientificPython, blah blah.
It all looks a little disorganized to me. People who know
about these things, and have opinions, should tell us what
we should be doing.
--
G. Jungman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: some general questions
2009-10-06 16:21 ` Brian Gough
@ 2009-10-06 20:32 ` Gerard Jungman
2009-10-10 14:45 ` Tuomo Keskitalo
2009-10-14 20:00 ` Brian Gough
0 siblings, 2 replies; 30+ messages in thread
From: Gerard Jungman @ 2009-10-06 20:32 UTC (permalink / raw)
To: gsl-discuss
On Tue, 2009-10-06 at 17:02 +0100, Brian Gough wrote:
> At Mon, 05 Oct 2009 17:06:15 -0600,
> Gerard Jungman wrote:
> > Q1. Regarding header files: Why are the struct declarations inside
> > __BEGIN__DECLS / __END_DECLS pairs? I think they could be outside.
> > I have in mind some ideas for better interaction with C++, which
> > we can discuss later. In any case, if they can go outside (and
> > I'm pretty sure they can), then they should. There's nothing
> > "extern C" about a struct declaration.
>
> I'm 99% sure you are right. Other than slight inelegance, is there
> any specific problem with them being inside? Presumably it does not
> prevent them being used in C++.
I am toying with an idea like
#ifdef __cplusplus
struct thingy
{
thingy() : x(0), y(0) { }
int x;
int y;
};
#else
typedef struct
{
int x;
int y;
}
thingy;
#endif
The goal being to insure that thingy's are properly default
constructed in the C++ world. But I haven't tested it yet.
I'm not even sure it it compiles properly. But you get the
idea.
> > Q2. How is the global variable gsl_check_range supposed to work?
> > It doesn't seem to be used in any way. Is it just cruft?
>
> It is used in the non-inline version of the inline functions, so that
> checking can still be controlled if the compiler uses the static
> versions of the functions for some reason. See build.h which is
> included in files defining the static versions of the functions.
Let me see if I understand this. There are two independent
definitions of the GSL_RANGE_COND macro, one which knows
about gsl_check_range, and one which doesn't. GSL header
files with inline function definitions use the version
which does not know about gsl_check_range, through its
definition in gsl_inline.h. C implementation files use
the one which does know about gsl_check_range, through
its definition in build.h. Is this right?
It's confusing to me, but never mind that. What I don't
understand is how you prevent them from colliding in the
presence of both definitions. Obviously I understand
that build.h #undef's it and redefines it. I'm not
asking a mechanical question. I mean to ask, why
do they need to be the same name?
> > Q3. Why do we still have the 'const' qualifiers on by-value parameters
> > in some header files? I remember it had something to do with the
> > behaviour of a brain-damaged compiler (microsoft?) ten years ago.
> > But that was ten years ago. Let's clean that up. What does
> > the standard say?
>
> On some functions we wanted to declare a variable const just for
> safety in the implementation of the function (this is not strictly
> necessary of course). The only way to do that is to put const on the
> argument in the definition --
Yes, I know.
> and there are a number of compilers
> which complain if it is not on the prototype as well.
This is the part I am asking about. I remember when that happened,
and somebody reported it. This was years ago. And I thought we
concluded that such compilers were brain-damaged, but we would
just go ahead and placate them. Q: Are such compilers still
brain-damaged, are they actually doing the right thing (I can't
imagine), or what?
> > Q4. Why are the dependencies for including "source" files in the
> > templatized world broken? Updating a "templatized" source
> > file does not force a recompile; obviously it should.
>
> In configure.ac we have
>
> AM_INIT_AUTOMAKE([gnu no-dependencies])
Ah. I see.
> We didn't use them originally because they weren't very reliable in
> old versions of Automake. We could turn them back on now if you want.
I think this has bugged me other times as well, though I
couldn't put my finger on what was wrong. I just kept
making clean and rebuilding. Well, does it work better
now? If so, then let's turn it on.
> > Q5. Can we extend the "templatizing" scheme to generate
> > declarations too? Of course, if it obscures the
> > header files, then it is not acceptable.
>
> The problem would be that it would make them unreadable to the user
> (they would have to figure out how the macros work to read the
> prototypes). Also it might hinder any automated tools which look at
> the headers (e.g. for auto-completion of function names etc).
It's not obvious that they would be unreadable. What I'm asking
is, can you think of some clever way to make them both readable
and generic?
> What we could do is adopt a more standardised way of generating the
> header files, rather than using an undocumented perl script. That
> would mean using m4 or something like that.
Speak no further about m4.
> > Q6. More a statement than a question: We should be more explicit
> > about the levelization of the design. This means expressing
> > the dependence of components clearly. For example, matrix
> > depends on vector, yet there is nothing in the build or in
> > the structure of the library which makes this clear.
> > Everything depends on error handling. Some things depend on
> > ieee. Some things are almost standing alone. We currently
> > have no meaningful notion of sub-libraries or component
> > dependency.
>
> A starting point would be to generate a dependency graph using one of
> the automated tools for this - for example, GNU cflow.
I don't think this is a problem for machine solution. It's up to
us to decide how the thing is organized. What depends on what,
which parts are foundational, and which parts are built on
top of those? Then we should organize the code so it
expresses that.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: some general questions
2009-10-06 20:32 ` Gerard Jungman
@ 2009-10-10 14:45 ` Tuomo Keskitalo
2009-10-24 11:16 ` Brian Gough
2009-10-14 20:00 ` Brian Gough
1 sibling, 1 reply; 30+ messages in thread
From: Tuomo Keskitalo @ 2009-10-10 14:45 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
Hello,
your ideas seem quite promising, I look forward to see your code. :-)
On 10/06/2009 11:34 PM, Gerard Jungman wrote:
>>> Q6. More a statement than a question: We should be more explicit
>>> about the levelization of the design. This means expressing
>>> the dependence of components clearly. For example, matrix
...
> I don't think this is a problem for machine solution. It's up to
> us to decide how the thing is organized. What depends on what,
> which parts are foundational, and which parts are built on
> top of those? Then we should organize the code so it
> expresses that.
I run a few ad hoc shell commands described below, to find the
interdependencies within current GSL directories. Even though the
results are probably not complete nor fully correct, this might provide
a starting point for the organization?
-----
Lowest level ("many" dependencies): blas, block, cblas, complex, vector,
err, ieee_utils
Intermediate level (at least one other directory seems to use these):
eigen, histogram, integration, linalg, min, permutation, poly, randist,
rng, roots, sort, statistics
Highest level (no other directory seems to use these): bspline, cdf,
cheb, combination, const, deriv, dht, diff, fit, fft, interpolation,
monte, multifit, multimin, multiroots, ntuple, ode-initval, qrng, siman,
specfunc, sum, wavelet
-----
I got to this result by running the following commands with zsh in
gsl-1.13 root directory, and by ogling the resulting
gsl-dependencies.txt file.
for f in `find -type d`; do; for g in `grep include $f/*.h $f/*.c`; do;
echo "$f : $g" >> out; done; done;
grep ': <' <out | sort | uniq | egrep -v 'doc/examples' >
gsl-dependencies-all.txt
for f in `find -type d | perl -pe 's/\.\///' | sort`; do; echo "$f is
used by:" >> gsl-dependencies.txt; grep $f gsl-dependencies-all.txt |
egrep -v "^\.\/$f" >> gsl-dependencies.txt; echo ""
>>gsl-dependencies.txt; done;
--
Tuomo.Keskitalo@iki.fi
http://iki.fi/tuomo.keskitalo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: some general questions
2009-10-06 20:32 ` Gerard Jungman
2009-10-10 14:45 ` Tuomo Keskitalo
@ 2009-10-14 20:00 ` Brian Gough
2009-10-15 18:39 ` Tuomo Keskitalo
1 sibling, 1 reply; 30+ messages in thread
From: Brian Gough @ 2009-10-14 20:00 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
[Sorry for the delay in replying, I was moving house.]
At Tue, 06 Oct 2009 14:34:52 -0600,
Gerard Jungman wrote:
> Let me see if I understand this. There are two independent
> definitions of the GSL_RANGE_COND macro, one which knows
> about gsl_check_range, and one which doesn't. GSL header
> files with inline function definitions use the version
> which does not know about gsl_check_range, through its
> definition in gsl_inline.h. C implementation files use
> the one which does know about gsl_check_range, through
> its definition in build.h. Is this right?
Yes, that is correct.
> It's confusing to me, but never mind that. What I don't
> understand is how you prevent them from colliding in the
> presence of both definitions. Obviously I understand
> that build.h #undef's it and redefines it. I'm not
> asking a mechanical question. I mean to ask, why
> do they need to be the same name?
We now compile all the non-inline versions of functions directly from
their inline definitions in the header files. This avoids any
inconsistencies (previously we had separate implementations).
GSL_RANGE_COND is redefined in the non-inline version to allow range
checking via gsl_check_range so the user can at least manually make it
consistent with their choice of whether or not to use range checking
in the inline versions.
> This is the part I am asking about. I remember when that happened,
> and somebody reported it. This was years ago. And I thought we
> concluded that such compilers were brain-damaged, but we would
> just go ahead and placate them. Q: Are such compilers still
> brain-damaged, are they actually doing the right thing (I can't
> imagine), or what?
I don't have my copy of the standard at the moment. Maybe somebody
could look it up.
> I think this has bugged me other times as well, though I
> couldn't put my finger on what was wrong. I just kept
> making clean and rebuilding. Well, does it work better
> now? If so, then let's turn it on.
Ok, I have just done that.
> It's not obvious that they would be unreadable. What I'm asking
> is, can you think of some clever way to make them both readable
> and generic?
Not really. Suppose someone knows (partially) the name of the
function or some other part of the prototype and greps the header
files for it -- if it is declared via a macro they will not find it.
> > A starting point would be to generate a dependency graph using one of
> > the automated tools for this - for example, GNU cflow.
>
> I don't think this is a problem for machine solution. It's up to
> us to decide how the thing is organized. What depends on what,
> which parts are foundational, and which parts are built on
> top of those? Then we should organize the code so it
> expresses that.
I was just thinking that it's not always obvious what the dependencies
are. For example, the special functions depend on linear algebra
(through the Mathieu function, which needs to compute some
eigenvalues) and there may be some other cases like that.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: some general questions
2009-10-14 20:00 ` Brian Gough
@ 2009-10-15 18:39 ` Tuomo Keskitalo
0 siblings, 0 replies; 30+ messages in thread
From: Tuomo Keskitalo @ 2009-10-15 18:39 UTC (permalink / raw)
To: Brian Gough; +Cc: Gerard Jungman, gsl-discuss
Hello,
On 10/14/2009 10:57 PM, Brian Gough wrote:
>> I don't think this is a problem for machine solution. It's up to
>> us to decide how the thing is organized. What depends on what,
>> which parts are foundational, and which parts are built on
>> top of those? Then we should organize the code so it
>> expresses that.
>
> I was just thinking that it's not always obvious what the dependencies
> are. For example, the special functions depend on linear algebra
> (through the Mathieu function, which needs to compute some
> eigenvalues) and there may be some other cases like that.
ode-initval is another example. None of the explicit methods are
dependent on linear algebra, while implicit methods are. Would we need
to split the code in this case?
--
Tuomo.Keskitalo@iki.fi
http://iki.fi/tuomo.keskitalo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-05 23:00 ` Gerard Jungman
2009-10-05 23:45 ` James Bergstra
[not found] ` <645d17210910060537s762d6323pfd2bec8590ad28e9@mail.gmail.com>
@ 2009-10-23 21:28 ` Brian Gough
2009-10-27 23:06 ` Gerard Jungman
2 siblings, 1 reply; 30+ messages in thread
From: Brian Gough @ 2009-10-23 21:28 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
At Mon, 05 Oct 2009 16:56:07 -0600,
Gerard Jungman wrote:
> On Mon, 2009-10-05 at 10:50 -0400, James Bergstra wrote:
> > I'm a bit rusty with my C structs... but you would need two distinct
> > static classes to have const and non-const data pointers for your view
> > right?
>
> Yes, sorry. I left that out.
I think this is an interesting example. How would these two classes
work in practice in C? For example, how would one pass a non-const
matrix (taken as a view of a non-const multi-array) to a function
taking a const matrix argument. Dealing with the interaction between
const and non-const arguments is a fundamental issue.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: some general questions
2009-10-10 14:45 ` Tuomo Keskitalo
@ 2009-10-24 11:16 ` Brian Gough
0 siblings, 0 replies; 30+ messages in thread
From: Brian Gough @ 2009-10-24 11:16 UTC (permalink / raw)
To: Tuomo Keskitalo; +Cc: gsl-discuss
At Sat, 10 Oct 2009 17:45:24 +0300,
Tuomo Keskitalo wrote:
> I got to this result by running the following commands with zsh in
> gsl-1.13 root directory, and by ogling the resulting
> gsl-dependencies.txt file.
I think one can get the complete call tree at the directory-level by
building the library (without inline functions) and using commands
like
nm --defined-only integration/.libs/libgslintegration.a
nm --undefined-only integration/.libs/libgslintegration.a
to extract the definitions and calls.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-23 21:28 ` Brian Gough
@ 2009-10-27 23:06 ` Gerard Jungman
[not found] ` <7f1eaee30910271628h70785125m68e47c7a7b5c25b7@mail.gmail.com>
2009-10-29 18:06 ` Brian Gough
0 siblings, 2 replies; 30+ messages in thread
From: Gerard Jungman @ 2009-10-27 23:06 UTC (permalink / raw)
To: gsl-discuss
[-- Attachment #1: Type: text/plain, Size: 722 bytes --]
On Fri, 2009-10-23 at 14:58 +0100, Brian Gough wrote:
>
> I think this is an interesting example. How would these two classes
> work in practice in C? For example, how would one pass a non-const
> matrix (taken as a view of a non-const multi-array) to a function
> taking a const matrix argument. Dealing with the interaction between
> const and non-const arguments is a fundamental issue.
> The challenge for any scheme is getting reasonable const behavior in
> C. If that problem can be solved better then everything else follows
> more easily.
Ok. The attachment is the obvious solution. I don't want to have
theological wars about this, but it seems to me like it is
the "C way". Let's debate.
--
G. Jungman
[-- Attachment #2: foobar.c --]
[-- Type: text/x-csrc, Size: 1438 bytes --]
#include <stdlib.h>
/* The union guarantees that the structs are aligned, in C99.
*
* In C90, standard does not discuss the issue, but all
* compilers known to man align them anyway. The C99
* guarantee seems to be an attempt to standardize
* that behaviour.
*
* The union type is never actually used, unless somebody
* can think of a reason to use it...
*/
union gsl_vector_union_t
{
struct gsl_vector_struct_t
{
double * data;
size_t size;
int stride;
} as_vector;
struct gsl_const_vector_struct_t
{
const double * data;
size_t size;
int stride;
} as_const_vector;
};
typedef struct gsl_vector_struct_t gsl_vector;
typedef struct gsl_const_vector_struct_t gsl_const_vector;
void gsl_some_typical_function(const gsl_vector * v)
{
/* do something which might twiddle v.data[] */
}
void gsl_some_typical_const_function(const gsl_const_vector * v)
{
/* do something which does not twiddle v.data[] */
}
int main()
{
gsl_vector v;
gsl_const_vector cv;
/* might be twiddling v.data[] */
gsl_some_typical_function(&v);
/* claims not to twiddle cv.data[] */
gsl_some_typical_const_function(&cv);
/* claims not to twiddle v.data[] */
gsl_some_typical_const_function((const gsl_const_vector *) &v);
/* a misuse; but this sort of thing can never be prevented anyway */
gsl_some_typical_function((const gsl_vector *) &cv);
return 0;
}
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
[not found] ` <7f1eaee30910271628h70785125m68e47c7a7b5c25b7@mail.gmail.com>
@ 2009-10-27 23:49 ` Gerard Jungman
0 siblings, 0 replies; 30+ messages in thread
From: Gerard Jungman @ 2009-10-27 23:49 UTC (permalink / raw)
To: gsl-discuss
On Tue, 2009-10-27 at 19:28 -0400, James Bergstra wrote:
> Sorry to bug you, but could you point me to a link that would explain
> why you put both of those structures into the union?
[ I assume you meant to send this to the list; so I have replied there ]
Here is an interesting (and very long) reference:
http://www.coding-guidelines.com/cbook/cbook1_2.pdf
The relevant section is 6.5.2.3, the discussion around sentences
1037, 1038 of the standard. Sentence 1037 introduces the union
construct. Sentence 1038 is a definition for
"common initial sequence", which is used in 1037.
Of course, we all believe it works right without the union.
But the presence of the union seems to guarantee that it
works right (by the new standard; the old standard is
apparently mute). I may be misreading this, and the
union may not actually be necessary, but there is
some confusion about this, if you search the web,
including a couple of formal "defect reports" for
the standard, regarding these sentences.
--
G. Jungman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-27 23:06 ` Gerard Jungman
[not found] ` <7f1eaee30910271628h70785125m68e47c7a7b5c25b7@mail.gmail.com>
@ 2009-10-29 18:06 ` Brian Gough
2009-10-29 20:41 ` Gerard Jungman
1 sibling, 1 reply; 30+ messages in thread
From: Brian Gough @ 2009-10-29 18:06 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
At Tue, 27 Oct 2009 17:09:09 -0600,
> /* claims not to twiddle v.data[] */
> gsl_some_typical_const_function((const gsl_const_vector *) &v);
>
The problem with anything involving explicit casts is that we lose
type-safety. If v is not a vector there's no way to detect the error,
which closes the hole of const-related errors but opens another one.
--
Brian Gough
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-29 18:06 ` Brian Gough
@ 2009-10-29 20:41 ` Gerard Jungman
2009-10-29 21:40 ` James Bergstra
2009-10-30 16:54 ` Brian Gough
0 siblings, 2 replies; 30+ messages in thread
From: Gerard Jungman @ 2009-10-29 20:41 UTC (permalink / raw)
To: gsl-discuss
On Thu, 2009-10-29 at 16:51 +0000, Brian Gough wrote:
> At Tue, 27 Oct 2009 17:09:09 -0600,
> > /* claims not to twiddle v.data[] */
> > gsl_some_typical_const_function((const gsl_const_vector *) &v);
> >
>
> The problem with anything involving explicit casts is that we lose
> type-safety. If v is not a vector there's no way to detect the error,
> which closes the hole of const-related errors but opens another one.
General statement: C is a weakly-typed language.
Consequence: We cannot prevent people from loading the gun,
pointing it at their heads, and pulling the trigger. They
are always free to do this. In fact, it is a tenet of C
that you should trust people to do what they need to do.
You will never succeed in making an interface "safe" in
this sense. However, you _can_ make an interface intuitive
and safe to use in a normal manner. I don't think you
can ask any more of C.
Tangential but powerful argument: I talked to Tanmoy about it.
He considers this normal and useful. The benefits outweigh the
defects. "Just do it and stop worrying about it" is an exact quote.
Everybody who understands C, including the standards committee,
knows that const-ness is screwed up, because of the way it was
tacked on to the old language. This is one of a small number of
recognized ways to get around the defects in the language.
--
G. Jungman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-29 20:41 ` Gerard Jungman
@ 2009-10-29 21:40 ` James Bergstra
2009-10-30 16:54 ` Brian Gough
2009-10-30 16:54 ` Brian Gough
1 sibling, 1 reply; 30+ messages in thread
From: James Bergstra @ 2009-10-29 21:40 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
Taking a step back, why do we need to have const and non-const
versions of the struct? I think we can get away with just having a
non-const version.
If the purpose of using const float * things is to permit compiler
optimizations, then this detail can be hidden at the level of using
gsl_vector. Const-ness can be limited to use in functions that take
raw types (float *, int, etc.) and actually do algorithmic work. Like
cblas functions. C will automatically upcast from non-const pointers
to const pointers by the usual rules.
If the purpose is to protect the user from accidentally messing around
with data then, as Gerard suggests, maybe we shouldn't bother. This
is not a battle that we can win in C. Good naming conventions for
functions, which indicate the arguments that will be modified, is the
most that a C library is expected to provide.
Providing only non-const versions of these structs makes a lot of
problems disappear right?
All matrices are born non-const.
James
On Thu, Oct 29, 2009 at 4:41 PM, Gerard Jungman <jungman@lanl.gov> wrote:
> On Thu, 2009-10-29 at 16:51 +0000, Brian Gough wrote:
>> At Tue, 27 Oct 2009 17:09:09 -0600,
>> > /* claims not to twiddle v.data[] */
>> > gsl_some_typical_const_function((const gsl_const_vector *) &v);
>> >
>>
>> The problem with anything involving explicit casts is that we lose
>> type-safety. If v is not a vector there's no way to detect the error,
>> which closes the hole of const-related errors but opens another one.
>
> General statement: C is a weakly-typed language.
>
> Consequence: We cannot prevent people from loading the gun,
> pointing it at their heads, and pulling the trigger. They
> are always free to do this. In fact, it is a tenet of C
> that you should trust people to do what they need to do.
>
> You will never succeed in making an interface "safe" in
> this sense. However, you _can_ make an interface intuitive
> and safe to use in a normal manner. I don't think you
> can ask any more of C.
>
>
> Tangential but powerful argument: I talked to Tanmoy about it.
> He considers this normal and useful. The benefits outweigh the
> defects. "Just do it and stop worrying about it" is an exact quote.
>
> Everybody who understands C, including the standards committee,
> knows that const-ness is screwed up, because of the way it was
> tacked on to the old language. This is one of a small number of
> recognized ways to get around the defects in the language.
>
> --
> G. Jungman
>
>
>
--
http://www-etud.iro.umontreal.ca/~bergstrj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-29 21:40 ` James Bergstra
@ 2009-10-30 16:54 ` Brian Gough
0 siblings, 0 replies; 30+ messages in thread
From: Brian Gough @ 2009-10-30 16:54 UTC (permalink / raw)
To: James Bergstra; +Cc: Gerard Jungman, gsl-discuss
At Thu, 29 Oct 2009 17:40:30 -0400,
James Bergstra wrote:
> If the purpose is to protect the user from accidentally messing around
> with data then, as Gerard suggests, maybe we shouldn't bother. This
> is not a battle that we can win in C. Good naming conventions for
> functions, which indicate the arguments that will be modified, is the
> most that a C library is expected to provide.
The purpose is to make programs safer, rather than provide any hints
for optimisation. The current system is type-safe and gives a
"discarding const" compiler warning if people try to pass const
objects to functions as non-const arguments - these seem like useful
features. It's not clear to me what the actual benefit would be if we
only had non-const vectors and matrices.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-10-29 20:41 ` Gerard Jungman
2009-10-29 21:40 ` James Bergstra
@ 2009-10-30 16:54 ` Brian Gough
1 sibling, 0 replies; 30+ messages in thread
From: Brian Gough @ 2009-10-30 16:54 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
At Thu, 29 Oct 2009 14:41:46 -0600,
Gerard Jungman wrote:
> You will never succeed in making an interface "safe" in
> this sense. However, you _can_ make an interface intuitive
> and safe to use in a normal manner. I don't think you
> can ask any more of C.
I don't think this interface is safe to use in a normal manner it it
requires an explicit cast to pass a non-const object to a function
accepting the const version. Any error in the argument of an explicit
cast is undetectable. We should not require the use of explicit casts
for common operations, it would endorse a bad practice.
--
Brian Gough
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
@ 2009-11-20 9:37 Justin Lenzo
0 siblings, 0 replies; 30+ messages in thread
From: Justin Lenzo @ 2009-11-20 9:37 UTC (permalink / raw)
To: gsl-discuss
> At Thu, 29 Oct 2009 17:40:30 -0400,
> James Bergstra wrote:
> > If the purpose is to protect the user from accidentally messing around
> > with data then, as Gerard suggests, maybe we shouldn't bother. This
> > is not a battle that we can win in C. Good naming conventions for
> > functions, which indicate the arguments that will be modified, is the
> > most that a C library is expected to provide.
>
> The purpose is to make programs safer, rather than provide any hints
> for optimisation. The current system is type-safe and gives a
> "discarding const" compiler warning if people try to pass const
> objects to functions as non-const arguments - these seem like useful
> features. It's not clear to me what the actual benefit would be if we
> only had non-const vectors and matrices.
>
What if you went with a structure like the following:
typedef struct {
const double *data;
double *wdata;
...
} gsl_vector;
When a gsl_vector is allocated, the array is attached to 'wdata' and
aliased to 'data'. When a read-only vector view is made, it returns a
new instance of the gsl_vector datatype where the 'data' and other
members are copied over, but sets 'wdata' to NULL. Basically, you
would be allowing the same datatype to cover allocated vectors,
read-write views, and read-only views. It seems to me this achieves
the goal of protecting underlying read-only data, as long as your
willing to check for NULL pointers in the routines that write to a
vector. I don't know anything about the security implications, so
forgive me if this is a dangerous approach.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-11-15 9:13 ` Tuomo Keskitalo
2009-11-15 16:44 ` Jonathan Underwood
@ 2009-11-16 11:56 ` Brian Gough
1 sibling, 0 replies; 30+ messages in thread
From: Brian Gough @ 2009-11-16 11:56 UTC (permalink / raw)
To: gsl-discuss
At Sun, 15 Nov 2009 11:12:55 +0200,
Tuomo Keskitalo wrote:
> Currently GSL uses C in a type-safe manner which forces somewhat
> complicated APIs for everyone but enables users to find some lethal
> bugs. More user-friendly APIs would allow people to silently break their
> programs if they are not careful. And there is no compromise. Does this
> summarize the situation?
I think that's a fair summary. The philosophy in GSL has always been
to try to make things safe by default (for example, the error
handler), because writing correct numerical programs is already
difficult enough without silent errors.
--
Brian Gough
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-11-15 16:44 ` Jonathan Underwood
@ 2009-11-15 18:41 ` Robert G. Brown
0 siblings, 0 replies; 30+ messages in thread
From: Robert G. Brown @ 2009-11-15 18:41 UTC (permalink / raw)
To: Jonathan Underwood; +Cc: gsl-discuss
On Sun, 15 Nov 2009, Jonathan Underwood wrote:
> I think it would be better if GSL, being written in C, were to
> present the most functional interface at the expense of sacrificing
> type safety, and higher level language bindings could then wrap that
> API and do type checking as needed. As gerard says, C isn't about type
> safety.
I agree. A good C programmer is comfortable with casts and knows
perfectly well that it is up to them to track them. And the compiler
(give a chance) will usually at least warn you about apparent
inconsistency.
rgb
>
> Jonathan
>
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-11-15 9:13 ` Tuomo Keskitalo
@ 2009-11-15 16:44 ` Jonathan Underwood
2009-11-15 18:41 ` Robert G. Brown
2009-11-16 11:56 ` Brian Gough
1 sibling, 1 reply; 30+ messages in thread
From: Jonathan Underwood @ 2009-11-15 16:44 UTC (permalink / raw)
To: gsl-discuss
2009/11/15 Tuomo Keskitalo <Tuomo.Keskitalo@iki.fi>:
> Hello,
>
> On 11/14/2009 04:42 PM, Brian Gough wrote:
>
>> At Mon, 09 Nov 2009 16:07:43 -0700,
>> Gerard Jungman wrote:
>>>
>>> On Fri, 2009-11-06 at 14:42 +0000, Brian Gough wrote:
>>>>
>>>> Ok, I have read the paper now. I do think the practice of casting
>>>> described there is rather dated. When people had no viable
>>>> alternative to C, they had to resort to such tricks. It is not
>>>> something that should be encouraged today -- programs should either be
>>>> written safely, following the rules of type-checking in C, or be
>>>> written in another language.
>>
>> From 1.3 million lines of code they describe only one method which
>> does not use casts, which is the one we use. I don't think we are
>> going to find anything that is better than the current method for
>> views.
>
> Apparently this is a fundamental question.
>
> Currently GSL uses C in a type-safe manner which forces somewhat complicated
> APIs for everyone but enables users to find some lethal bugs. More
> user-friendly APIs would allow people to silently break their programs if
> they are not careful. And there is no compromise. Does this summarize the
> situation?
>
> I am sure there are users for both approaches. I don't know.. Maybe GSL
> should be the safe, strict library, and there should be another scientific
> library in C which aims towards interoperability between data, libraries and
> languages? This would split forces, but both projects would also benefit
> from each other.
>
I think it would be better if GSL, being written in C, were to
present the most functional interface at the expense of sacrificing
type safety, and higher level language bindings could then wrap that
API and do type checking as needed. As gerard says, C isn't about type
safety.
Jonathan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-11-14 15:25 ` Brian Gough
@ 2009-11-15 9:13 ` Tuomo Keskitalo
2009-11-15 16:44 ` Jonathan Underwood
2009-11-16 11:56 ` Brian Gough
0 siblings, 2 replies; 30+ messages in thread
From: Tuomo Keskitalo @ 2009-11-15 9:13 UTC (permalink / raw)
To: Brian Gough, Gerard Jungman, gsl-discuss
Hello,
On 11/14/2009 04:42 PM, Brian Gough wrote:
> At Mon, 09 Nov 2009 16:07:43 -0700,
> Gerard Jungman wrote:
>> On Fri, 2009-11-06 at 14:42 +0000, Brian Gough wrote:
>>> Ok, I have read the paper now. I do think the practice of casting
>>> described there is rather dated. When people had no viable
>>> alternative to C, they had to resort to such tricks. It is not
>>> something that should be encouraged today -- programs should either be
>>> written safely, following the rules of type-checking in C, or be
>>> written in another language.
>
> From 1.3 million lines of code they describe only one method which
> does not use casts, which is the one we use. I don't think we are
> going to find anything that is better than the current method for
> views.
Apparently this is a fundamental question.
Currently GSL uses C in a type-safe manner which forces somewhat
complicated APIs for everyone but enables users to find some lethal
bugs. More user-friendly APIs would allow people to silently break their
programs if they are not careful. And there is no compromise. Does this
summarize the situation?
I am sure there are users for both approaches. I don't know.. Maybe GSL
should be the safe, strict library, and there should be another
scientific library in C which aims towards interoperability between
data, libraries and languages? This would split forces, but both
projects would also benefit from each other.
Anyway, I am interested to see what Gerard comes up with.
--
Tuomo.Keskitalo@iki.fi
http://iki.fi/tuomo.keskitalo
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-11-09 23:06 ` Gerard Jungman
@ 2009-11-14 15:25 ` Brian Gough
2009-11-15 9:13 ` Tuomo Keskitalo
0 siblings, 1 reply; 30+ messages in thread
From: Brian Gough @ 2009-11-14 15:25 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
At Mon, 09 Nov 2009 16:07:43 -0700,
Gerard Jungman wrote:
> On Fri, 2009-11-06 at 14:42 +0000, Brian Gough wrote:
> > Ok, I have read the paper now. I do think the practice of casting
> > described there is rather dated. When people had no viable
> > alternative to C, they had to resort to such tricks. It is not
> > something that should be encouraged today -- programs should either be
> > written safely, following the rules of type-checking in C, or be
> > written in another language.
>
> How does this comment help us in designing a C library?
From 1.3 million lines of code they describe only one method which
does not use casts, which is the one we use. I don't think we are
going to find anything that is better than the current method for
views.
> > Our approach is actually described in the paper under the "first
> > member" section, in the &(cp.p) example -- although they don't point
> > out that it's the only one that doesn't require a cast and can be
> > checked by the compiler, unlike all the others.
>
> How does this solve the const-ness problem?
The gsl const view is defined as
typedef struct
{
gsl_vector vector;
} _gsl_vector_const_view;
typedef const _gsl_vector_const_view gsl_vector_const_view;
so &view.vector pointers can only be passed to functions accepting
const gsl_vector * not gsl_vector *.
--
Brian Gough
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-11-09 20:41 ` Brian Gough
@ 2009-11-09 23:06 ` Gerard Jungman
2009-11-14 15:25 ` Brian Gough
0 siblings, 1 reply; 30+ messages in thread
From: Gerard Jungman @ 2009-11-09 23:06 UTC (permalink / raw)
To: gsl-discuss
On Fri, 2009-11-06 at 14:42 +0000, Brian Gough wrote:
>
> Ok, I have read the paper now. I do think the practice of casting
> described there is rather dated. When people had no viable
> alternative to C, they had to resort to such tricks. It is not
> something that should be encouraged today -- programs should either be
> written safely, following the rules of type-checking in C, or be
> written in another language.
How does this comment help us in designing a C library?
> Our approach is actually described in the paper under the "first
> member" section, in the &(cp.p) example -- although they don't point
> out that it's the only one that doesn't require a cast and can be
> checked by the compiler, unlike all the others.
How does this solve the const-ness problem?
--
G. Jungman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
2009-11-03 19:44 containers tentative design summary Gerard Jungman
@ 2009-11-09 20:41 ` Brian Gough
2009-11-09 23:06 ` Gerard Jungman
0 siblings, 1 reply; 30+ messages in thread
From: Brian Gough @ 2009-11-09 20:41 UTC (permalink / raw)
To: Gerard Jungman; +Cc: gsl-discuss
At Tue, 03 Nov 2009 12:45:49 -0700,
Gerard Jungman wrote:
> Actually, it endorses a "C" practice. It's "bad"-ness is
> a theological issue. See the attached paper by Siff et AL.,
Ok, I have read the paper now. I do think the practice of casting
described there is rather dated. When people had no viable
alternative to C, they had to resort to such tricks. It is not
something that should be encouraged today -- programs should either be
written safely, following the rules of type-checking in C, or be
written in another language.
Our approach is actually described in the paper under the "first
member" section, in the &(cp.p) example -- although they don't point
out that it's the only one that doesn't require a cast and can be
checked by the compiler, unlike all the others.
> There are other ways to design containers. Some ways
> involve radical change to the GSL container methodology.
> For example, the const-ness of the data pointer could be
> explicitly divorced from the layout meta-data which constitutes
> the rest of the container.
That is more interesting theoretically. From a practical point of
view, I think we have to restrict ourselves to solutions where the
data pointer and the metadata are a single object though.
--
Brian Gough
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: containers tentative design summary
@ 2009-11-03 19:44 Gerard Jungman
2009-11-09 20:41 ` Brian Gough
0 siblings, 1 reply; 30+ messages in thread
From: Gerard Jungman @ 2009-11-03 19:44 UTC (permalink / raw)
To: gsl-discuss
On Fri, 2009-10-30 at 16:03 +0000, Brian Gough wrote:
>
> I don't think this interface is safe to use in a normal manner it it
> requires an explicit cast to pass a non-const object to a function
> accepting the const version.
You really do want to use a strongly-typed language.
But C is not it.
> Any error in the argument of an explicit
> cast is undetectable.
If the container stuff is designed properly,
then it hardly matters, because it just works
anyway, for any container type for which the
operation is meaningful.
If your world were limited to the vector and
const_vector types, as in the example, it wouldn't
matter what you did, the result would obviously be correct.
If, on the other hand, you try to jam your externally-defined
square peg into the GSL round hole, it could be bad. So?
How is this surprising?
If you think about the full container design, you will see that
it is possible to design a rational and useful system, where the
types fully inter-operate. The basis would be a generic multi-array
implementation. The vector example given was purely to address the
const-ness problem, and I think it does that.
It's not enough to scare people with notions of the wrong-cast
bogeyman. You should explain how these things can happen. When
you face the fear directly, you may realize it is not such a
big deal after all.
> We should not require the use of explicit casts
> for common operations, it would endorse a bad practice.
Actually, it endorses a "C" practice. It's "bad"-ness is
a theological issue. See the attached paper by Siff et al.,
[sorry, I tried to attach the paper, but the list won't
accept a message with an attached pdf; the paper is
"Coping with Type Casts in C", 1999 Bell-Labs report]
where this construct is labeled "physical sub-typing".
They give data on actual usage in the world, see Table 1.
The whole business is far from perfect. But, again, if you
want to do things in C, you need tricks, because the
language has no support for expressing relationships
between types.
In my opinion, the idea of a container library in C is
somewhat brain-damaged anyway. But here we are...
There are other ways to design containers. Some ways
involve radical change to the GSL container methodology.
For example, the const-ness of the data pointer could be
explicitly divorced from the layout meta-data which constitutes
the rest of the container. After all, the headaches with const-ness
come from the simple fact that the data pointer is embedded in the
struct and it conflates the semantics in un-desirable ways.
Example separation:
typedef struct {
size_t size;
int stride;
} gsl_vector_layout;
gsl_fnc_on_stuff(double * d, const gsl_vector_layout * l);
gsl_fnc_on_const_stuff(const double * d, const gsl_vector_layout * l);
There are many ways to skin the cat. But a change like this
would effect all gsl interfaces involving container types.
It also has other noticeable imperfections.
Choose the form of the destroyer...
--
G. Jungman
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-11-20 9:37 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-05 10:12 containers tentative design summary Gerard Jungman
2009-10-05 14:50 ` James Bergstra
2009-10-05 23:00 ` Gerard Jungman
2009-10-05 23:45 ` James Bergstra
2009-10-06 19:59 ` Gerard Jungman
[not found] ` <645d17210910060537s762d6323pfd2bec8590ad28e9@mail.gmail.com>
2009-10-06 20:02 ` Gerard Jungman
2009-10-23 21:28 ` Brian Gough
2009-10-27 23:06 ` Gerard Jungman
[not found] ` <7f1eaee30910271628h70785125m68e47c7a7b5c25b7@mail.gmail.com>
2009-10-27 23:49 ` Gerard Jungman
2009-10-29 18:06 ` Brian Gough
2009-10-29 20:41 ` Gerard Jungman
2009-10-29 21:40 ` James Bergstra
2009-10-30 16:54 ` Brian Gough
2009-10-30 16:54 ` Brian Gough
2009-10-05 23:04 ` some general questions Gerard Jungman
2009-10-06 16:21 ` Brian Gough
2009-10-06 20:32 ` Gerard Jungman
2009-10-10 14:45 ` Tuomo Keskitalo
2009-10-24 11:16 ` Brian Gough
2009-10-14 20:00 ` Brian Gough
2009-10-15 18:39 ` Tuomo Keskitalo
2009-11-03 19:44 containers tentative design summary Gerard Jungman
2009-11-09 20:41 ` Brian Gough
2009-11-09 23:06 ` Gerard Jungman
2009-11-14 15:25 ` Brian Gough
2009-11-15 9:13 ` Tuomo Keskitalo
2009-11-15 16:44 ` Jonathan Underwood
2009-11-15 18:41 ` Robert G. Brown
2009-11-16 11:56 ` Brian Gough
2009-11-20 9:37 Justin Lenzo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).