From: Gerard Jungman <jungman@lanl.gov>
To: GSL Discuss Mailing List <gsl-discuss@sourceware.org>
Subject: Re: GSL 2.0 roadmap (one man's view)
Date: Thu, 27 Aug 2009 23:13:00 -0000 [thread overview]
Message-ID: <1251414939.23092.82.camel@manticore.lanl.gov> (raw)
In-Reply-To: <1251414774.23092.80.camel@manticore.lanl.gov>
[-- Attachment #1: Type: text/plain, Size: 1 bytes --]
[-- Attachment #2: gsl-2.0-outline-2009.08.27.txt --]
[-- Type: text/plain, Size: 15443 bytes --]
Requirements (a very incomplete list)
-------------------------------------
* A controlled shift in the C interfaces. Too much interface change
may scare off 1.x users. But there must be change...
* Better integration with existing toolsets.
- vector / matrix models
- linear algebra (!)
* Better overall organization, expression of module dependencies, etc.
* GSL should be a technology leader. That is part of the point of the
GPL licensing. If GSL were just another of many essentially equivalent
libraries, then it would make more sense for it to be LGPL or even BSD.
To justify GPL licensing, which coerces people into our open-source
world, we better have something unique and excellent.
Taking Stock of 1.x
===================
** Overall Design
* Split into "independent sublibraries".
The stated design goal of independent sublibraries was abandoned
in all but name quite early in the project. Several obvious problems
with a naive interpretation of "independent" occur:
- Some sublibraries depend on other "foundational" sublibraries;
such dependencies are natural and desirable. But with no effective
explicit way to express these dependencies, the overall organization
degenerates to a monolith.
- The packaging environment does not _easily_ support independence.
This is related to the lack of an effective expression of
dependencies among sublibraries, as noted above. It is also
related to practical problems in supporting a hierarchical
build with the autotools setup. Witness the apparent need to
create the gsl/ directory of links to header files as the
first step in the build hackery.
Consequences: Various practical problems, such as the organization
of header files, the grotesque link lines (a real problem
at one point), and general difficulty in comprehension of
the structure of the project.
Requirement: A natural, effective, and explicit way to express
dependencies, which can be exported to the build
and packaging systems
* One Language Only
The benefits of a one-language only design are described in the
design document. They are fairly obvious and depend on the fact
that C is the "universal" system language.
Nevertheless, there are large costs associated to this choice.
Amongst these are the following.
- Important "legacy" tools which are implemented in other languages
are unavailable to GSL. This creates a major deficiency in areas
such as linear algebra. Some holes were partially plugged with
native GSL implementations, but this can never be an acceptable
solution, either from a software design standpoint or from an
end-user standpoint. From a software-design standpoint, GSL fails
to gain from the maturity and the ongoing development in these
external tools. From an end-user standpoint, the lack of performance
has caused many users to abandon these aspects of GSL. Further
discussion of these points occurs in specialized sections below,
as appropriate.
- Many new developments in methodology are occuring outside C, notably
in C++. As long as such developments were essentially experimental and
unfocused, they could be ignored; one could argue that this was indeed
the case a decade ago, when GSL was initiated. But this can no longer
be argued. The performance of these new tools exceeds (often by a
large factor) the performance of the GSL tools; some are
well-established, readily available, and of clear utility.
Again, many end-users have abandoned the corresponding areas of GSL.
- When a user abandons some core area of GSL for other implementations,
they tend to abandon other aspects as well, since GSL interfaces are not
always friendly to other data models. In many cases it would be
impossible to make it so, due to insuperable inter-language barriers.
GSL should not continue to ignore these developments. At the very least
it should allow for the existence of these external developments, by
paying very close attention to inter-language issues.
These issues are related to the design goal of "naturalness", stated in
the design document. Quote:
"If there is something which is unnatural in C and has to be simulated
then we avoid using it."
It may be coming to pass that some/many useful methods and designs for a
numerical library are now "unnatural in C" by onstruction, since the performant
and well-designed tools are no longer being created in C. For example, it
may be the case that problem domains such linear algebra are no longer
"natural in C". Whither GSL in such a world?
* Error Handling
GSL implements a very rudimentary type of error-handling, which is
quite appropriate for small projects. The "register handler, fail-by-default"
model works for a small library with a "low stack depth" usage model. By this,
I mean that library functions are called by clients, but not by other library
functions; in such a case, it is easy for the client to interpret the errors.
But GSL does not conform to this "low stack depth" usage model, and the
fail-by-default model can be confusing for clients, when the failure
occurs at depths that they cannot control.
Although it may not seem like a pressing issue, this monolithic
error-handling design seems to be another symptom of the lack of
explicit models for interdependence in GSL. It should be revisited.
** Build System
The autotools build system can be an impediment to overall design.
As discussed above, the GSL build architecture, based on a typical
autotools single-project setup, could not be coerced into supporting
notions of dependence/independence for sublibraries. Rather, these
dependencies exist in spite of the build system.
** Complex Numbers
GSL implements complex numbers. This functionality may be superfluous at
this time, depending on the status of C99 features in the compilers of
interest.
Beyond this observation, there are some thorny issues with complex numbers,
when one considers inter-language issues. As argued in several places here,
GSL cannot afford to ignore these issues. Users are often turning to basic
functionality implemented in other languages (linear algebra!), and
inter-operability at the level of data interchange is an absolute
necessity. An array of complex values computed in GSL should be
appropriate for passage through an inter-language interface.
The two languages of most concern are fortran and c++. In fortran, complex
numbers are a language feature, and in c++ they are a library feature.
In both cases, the representation is, in principle, implementation-
dependent. These problems have been addressed by other projects;
effective solutions exist for the interface to fortran. It seems
likely that the c-c++ interface problem is solvable, especially
within the context of a consistent build with a single compiler.
Mixing compilers may be more difficult, but should be solvable
in principle.
** Vector and Matrix Data Structures [ vector/, matrix/, block/ ]
The basic features of the GSL vector and matrix data structures
are reasonable, as an implementation of dense storage for basic
data types.
The important notion of slicing is (partially) implemented in GSL
in terms of the "view" concept. One can construct submatrices as
views of given matrices, change the stride of vector data by
creating vector views, etc. But there are clear flaws in the
design. The design does not express the obvious idea that
a "view" is itself a "thing", simply because the view classes
do not have an inheritance relationship to the main classes.
In fact, there is a problem with the functoriality of the design,
because there is an obvious logical sense in which a matrix is
itself a view type, using a view notion corresponding to the
default strides, but the design seems to "point the other way".
There are also significant usability issues with the simple aspects
of the interfaces. The get() and set() functions have been deplored
by many users. This syntax leads to unwieldy user code to accomplish
the simplest tasks. The unwieldy nature of these interfaces often
causes the user to introduce otherwise unnecessary temporaries.
Although the compiler can remove such temporaries, the user cannot,
and they end up occupying valuable _intellectual_ real estate.
These problems stem fundamentally from the fact that the view
classes were added after the main classes, and the design of the
main classes was not modified as logic would have dictated. This
is a failure of design.
Executive Summary: I was there when this stuff was born,
and I don't even understand it. I have to stare at the header
files whenever I try to do anything with vectors.
** Basic Vector Matrix Operations [ blas/ ]
The existence of a C interface standard for BLAS functionality
greatly simplified the GSL apporach to BLAS. In GSL, all BLAS
function usage adheres to the cblas interface standard, meaning
that any cblas conformant binary interface is acceptable; any
library which exports a cblas binary interface can be linked
to a GSL application. For example, the ATLAS library provides
a full cblas conformant implementation and can be linked to
any GSL application.
GSL also provides a native implementation of the cblas interfaces, based
on the reference implmentation for the standard, which was a draft
standard at the time of GSL 1.0 release. This is where GSL blas
functionality begins to fall down. The GSL cblas implementation was felt
to be a necessary evil, simply because not all installations could be
expected to have a cblas conformant library installed. Unfortunately,
many users are unaware of the underlying architecture and end up using
the native GSL implementation by default, resulting in poor performance,
which reflects badly on GSL as a whole.
This is a natural place where an inter-language approach would have been
useful. The standard fortran BLAS implementation is readily available and
of adequate performance for almost all users. For example, in the current
era, it is readily available as an rpm package. It would have been natural
at that time to use cblas wrappers over the underlying fortran BLAS
(wrappers perhaps also available as part of the draft cblas standard). For
completeness, this may have required shipping the fortran source as a
necessary part of GSL; this idea was rejected because it violated the "One
Language" design rule. As a consequence, the developers responsible for the
actual implementation (mainly myself!) were forced to transcribe the cblas
reference implementation into GSL, resulting in about 8000 lines of needless
code, violating every reasonable notion of software reuse.
In a case like cblas, where an interface standard exists, the link-time choice
method currently implemented in GSL seems like the correct solution.
When faced with the choice of using a decent external implementation in a
language other than C and re-implementing (badly) in C, "One Language"
religious issues should not take precedence in the decision process.
* sublibs that use blas
- eigen/
- linalg/
- multifit/
- multimin/
** Linear Algebra [ linalg/ ]
The defacto standard for linear algebra functionality is LAPACK. LAPACK
exists as (and is essentially defined by) a fortran implementation which
is commonly available, on essentially all platforms. Like blas, I can
just 'yum install' it.
Partly because of the lack of a C standard interface, and partly because
of "One Language" religious beliefs, this defacto standard implementation
was not available for the design of GSL. Rather, GSL relies on a native
implementation of some (smallish) subset of lapack functionality.
The resulting lack of performance has driven many users away from
GSL. The often heard advice on the GSL discussion list is that
"serious users" should "use LAPACK instead". But which users
of GSL consider themselves to be "non-serious"? The notion is
absurd. It is not users which lack seriousness, but the
GSL implementation.
Linear algebra is a foundational tool in numerical computing.
Users need it; other modules in GSL need it. We need a
serious solution for this mess.
* sublibs that use linear algebra
- eigen/
- interpolation/
- multifit/
- multiroots/
- ode-initval/
** Random Number Generation
The GSL random number generators represent one of the few foundational
aspects of the library which can be considered successful. RNGs naturally
lend themselves to a kind of shallow object-oriented design which is
very natural in C. The design allows for easy extension, and a user
who understands one of the rng types essentially understands them all.
I believe this success follows from the relative simplicity of the
problem domain, as far as it bears on the interface design.
** Special Functions
The special function sublibrary in GSL suffers mainly from a lack
of consistency and coherence in the implementations. This stems
from the decision made early in the project that coverage was
very important; if users had to leave GSL (and use that _other_ library)
to evaluate functions, they would likely leave GSL aside entirely.
Correctness was not sacrificed in principle, but in practice the
implementations span the spectrum, from ironclad to somewhat-fishy.
As a kind of apology for this state of affairs, the functions attempt
to estimate errors, using heuristic and sometimes ill-defined methods.
This turns out to be a poor apology, since it tends to gum-up the
works for the whole sublibrary, eating performance and occupying
a large piece of intellectual real estate.
The error estimation code must, at the least, be factored out. More
to the point, is should likely be discarded in the main line of
development. Other notions of error control should be investigated.
Similar to the notion of sublibrary dependence discussed elsewhere,
the special functions should be hierarchically organized. The
dependencies should be made clear, and foundational levels of
ironclad functions should be made explicit. Such ironclad
functions should occupy the same place in the users mind as
platform-native implementations of sin(x); they should be
beyond question for daily use.
Other functions, which suffer from implementation problems
or are simply too complicated to guarantee the same level
of correctness should be explicitly identified, as part
of the design (not just the documentation!).
** FFT
The GSL native fft implementations suffer in the same way that
the GSL native linear algebra implementations suffer. Better
solutions are available, from almost all points of view, and
the GSL implementations therefore detract from the GSL
cachet as a whole. "Serious users should use FFTW" is not
a good tagline for the sublibrary; see the discussion of
linear algebra above.
next prev parent reply other threads:[~2009-08-27 23:13 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-30 17:07 ode-initval implicit solvers and development Tuomo Keskitalo
2008-10-01 18:29 ` Brian Gough
2008-10-09 13:22 ` Brian Gough
2008-11-02 17:35 ` Tuomo Keskitalo
2008-11-03 18:09 ` Brian Gough
2009-01-24 11:52 ` Tuomo Keskitalo
2009-02-01 17:01 ` Brian Gough
2009-02-02 17:05 ` Tuomo Keskitalo
2009-03-01 14:37 ` Tuomo Keskitalo
2009-03-03 16:34 ` Brian Gough
2009-03-05 19:47 ` Tuomo Keskitalo
2009-03-05 19:54 ` Heikki Orsila
2009-03-06 20:03 ` Brian Gough
2009-04-05 12:28 ` Tuomo Keskitalo
2009-05-01 14:05 ` Tuomo Keskitalo
2009-05-04 11:23 ` Brian Gough
2009-05-08 10:51 ` Brian Gough
2009-08-06 13:51 ` GSL 2.0 roadmap Tuomo Keskitalo
2009-08-21 20:42 ` Brian Gough
2009-08-27 11:42 ` Tuomo Keskitalo
2009-08-27 12:51 ` Robert G. Brown
2009-08-28 13:57 ` Jordi Burguet Castell
2009-08-27 17:13 ` Robert G. Brown
2009-08-28 13:58 ` Brian Gough
2009-08-27 23:10 ` Gerard Jungman
2009-08-27 23:13 ` Gerard Jungman [this message]
2009-08-28 13:58 ` GSL 2.0 roadmap (one man's view) Brian Gough
2009-09-16 0:43 ` Gerard Jungman
2009-09-03 19:37 ` Brian Gough
2009-09-16 0:44 ` Gerard Jungman
2009-09-07 15:10 ` Brian Gough
2009-09-16 0:44 ` Gerard Jungman
2009-09-17 20:12 ` Brian Gough
[not found] ` <645d17210909090818u474f32f0q19a6334578b9f02c@mail.gmail.com>
2009-09-17 19:14 ` Brian Gough
2009-09-07 15:10 ` Brian Gough
2009-09-16 0:47 ` Gerard Jungman
2009-09-27 8:03 ` new double precision data structure? Tuomo Keskitalo
2009-09-28 8:44 ` James Bergstra
2009-09-28 15:48 ` Tuomo Keskitalo
2009-10-16 13:59 ` Brian Gough
2009-09-29 18:38 ` Gerard Jungman
2009-09-07 15:10 ` GSL 2.0 roadmap (one man's view) Brian Gough
2009-09-16 0:46 ` Gerard Jungman
2009-09-16 2:48 ` Robert G. Brown
2009-09-17 19:14 ` Brian Gough
2009-09-07 15:10 ` Brian Gough
2009-09-16 0:46 ` Gerard Jungman
2009-09-17 20:12 ` Brian Gough
2009-09-07 15:10 ` Brian Gough
2009-09-16 0:45 ` Gerard Jungman
2009-09-20 9:36 ` Tuomo Keskitalo
2009-09-20 13:23 ` Robert G. Brown
2009-09-20 15:31 ` Rhys Ulerich
2009-09-20 16:19 ` Robert G. Brown
2009-09-21 15:13 ` Brian Gough
2009-09-20 15:08 ` Rhys Ulerich
2009-09-21 12:08 ` Brian Gough
2009-09-07 15:10 ` Brian Gough
2009-09-07 15:34 ` Rhys Ulerich
2009-09-07 18:21 ` Robert G. Brown
2009-09-16 0:47 ` Gerard Jungman
2009-09-18 3:51 ` column-major Z F
2009-09-21 12:08 ` column-major Brian Gough
2009-08-28 13:58 ` GSL 2.0 roadmap Brian Gough
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1251414939.23092.82.camel@manticore.lanl.gov \
--to=jungman@lanl.gov \
--cc=gsl-discuss@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).