public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
From: Gerard Jungman <jungman@lanl.gov>
To: GSL Discuss Mailing List <gsl-discuss@sourceware.org>
Subject: Re: GSL 2.0 roadmap (one man's view)
Date: Thu, 27 Aug 2009 23:13:00 -0000	[thread overview]
Message-ID: <1251414939.23092.82.camel@manticore.lanl.gov> (raw)
In-Reply-To: <1251414774.23092.80.camel@manticore.lanl.gov>

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: gsl-2.0-outline-2009.08.27.txt --]
[-- Type: text/plain, Size: 15443 bytes --]

Requirements (a very incomplete list)
-------------------------------------

* A controlled shift in the C interfaces. Too much interface change
  may scare off 1.x users. But there must be change...

* Better integration with existing toolsets.
  - vector / matrix models
  - linear algebra (!)

* Better overall organization, expression of module dependencies, etc.

* GSL should be a technology leader. That is part of the point of the
  GPL licensing. If GSL were just another of many essentially equivalent
  libraries, then it would make more sense for it to be LGPL or even BSD.
  To justify GPL licensing, which coerces people into our open-source
  world, we better have something unique and excellent.



Taking Stock of 1.x
===================


** Overall Design

 * Split into "independent sublibraries".

   The stated design goal of independent sublibraries was abandoned
   in all but name quite early in the project. Several obvious problems
   with a naive interpretation of "independent" occur:

    - Some sublibraries depend on other "foundational" sublibraries;
      such dependencies are natural and desirable. But with no effective
      explicit way to express these dependencies, the overall organization
      degenerates to a monolith.

    - The packaging environment does not _easily_ support independence.
      This is related to the lack of an effective expression of
      dependencies among sublibraries, as noted above. It is also
      related to practical problems in supporting a hierarchical
      build with the autotools setup. Witness the apparent need to
      create the gsl/ directory of links to header files as the
      first step in the build hackery.

   Consequences: Various practical problems, such as the organization
                 of header files, the grotesque link lines (a real problem
		 at one point), and general difficulty in comprehension of
		 the structure of the project.

   Requirement: A natural, effective, and explicit way to express
                dependencies, which can be exported to the build
		and packaging systems

 * One Language Only

   The benefits of a one-language only design are described in the
   design document. They are fairly obvious and depend on the fact
   that C is the "universal" system language.

   Nevertheless, there are large costs associated to this choice.
   Amongst these are the following.

    - Important "legacy" tools which are implemented in other languages
      are unavailable to GSL. This creates a major deficiency in areas
      such as linear algebra. Some holes were partially plugged with
      native GSL implementations, but this can never be an acceptable
      solution, either from a software design standpoint or from an
      end-user standpoint. From a software-design standpoint, GSL fails
      to gain from the maturity and the ongoing development in these
      external tools. From an end-user standpoint, the lack of performance
      has caused many users to abandon these aspects of GSL. Further
      discussion of these points occurs in specialized sections below,
      as appropriate.

    - Many new developments in methodology are occuring outside C, notably
      in C++. As long as such developments were essentially experimental and
      unfocused, they could be ignored; one could argue that this was indeed
      the case a decade ago, when GSL was initiated. But this can no longer
      be argued. The performance of these new tools exceeds (often by a
      large factor) the performance of the GSL tools; some are
      well-established, readily available, and of clear utility.
      Again, many end-users have abandoned the corresponding areas of GSL.

    - When a user abandons some core area of GSL for other implementations,
      they tend to abandon other aspects as well, since GSL interfaces are not
      always friendly to other data models. In many cases it would be
      impossible to make it so, due to insuperable inter-language barriers.
      GSL should not continue to ignore these developments. At the very least
      it should allow for the existence of these external developments, by
      paying very close attention to inter-language issues.

    These issues are related to the design goal of "naturalness", stated in
    the design document. Quote:
      "If there is something which is unnatural in C and has to be simulated
       then we avoid using it."
    It may be coming to pass that some/many useful methods and designs for a
    numerical library are now "unnatural in C" by onstruction, since the performant
    and well-designed tools are no longer being created in C. For example, it
    may be the case that problem domains such linear algebra are no longer
    "natural in C". Whither GSL in such a world?


 * Error Handling

   GSL implements a very rudimentary type of error-handling, which is
   quite appropriate for small projects. The "register handler, fail-by-default"
   model works for a small library with a "low stack depth" usage model. By this,
   I mean that library functions are called by clients, but not by other library
   functions; in such a case, it is easy for the client to interpret the errors.
   But GSL does not conform to this "low stack depth" usage model, and the
   fail-by-default model can be confusing for clients, when the failure
   occurs at depths that they cannot control.

   Although it may not seem like a pressing issue, this monolithic
   error-handling design seems to be another symptom of the lack of
   explicit models for interdependence in GSL. It should be revisited.



** Build System

   The autotools build system can be an impediment to overall design.
   As discussed above, the GSL build architecture, based on a typical
   autotools single-project setup, could not be coerced into supporting
   notions of dependence/independence for sublibraries. Rather, these
   dependencies exist in spite of the build system.



** Complex Numbers

   GSL implements complex numbers. This functionality may be superfluous at
   this time, depending on the status of C99 features in the compilers of
   interest.

   Beyond this observation, there are some thorny issues with complex numbers,
   when one considers inter-language issues. As argued in several places here,
   GSL cannot afford to ignore these issues. Users are often turning to basic
   functionality implemented in other languages (linear algebra!), and
   inter-operability at the level of data interchange is an absolute
   necessity. An array of complex values computed in GSL should be
   appropriate for passage through an inter-language interface.

   The two languages of most concern are fortran and c++. In fortran, complex
   numbers are a language feature, and in c++ they are a library feature.
   In both cases, the representation is, in principle, implementation-
   dependent. These problems have been addressed by other projects;
   effective solutions exist for the interface to fortran. It seems
   likely that the c-c++ interface problem is solvable, especially
   within the context of a consistent build with a single compiler.
   Mixing compilers may be more difficult, but should be solvable
   in principle.



** Vector and Matrix Data Structures [ vector/, matrix/, block/ ]

  The basic features of the GSL vector and matrix data structures
  are reasonable, as an implementation of dense storage for basic
  data types.

  The important notion of slicing is (partially) implemented in GSL
  in terms of the "view" concept. One can construct submatrices as
  views of given matrices, change the stride of vector data by
  creating vector views, etc. But there are clear flaws in the
  design. The design does not express the obvious idea that
  a "view" is itself a "thing", simply because the view classes
  do not have an inheritance relationship to the main classes.

  In fact, there is a problem with the functoriality of the design,
  because there is an obvious logical sense in which a matrix is
  itself a view type, using a view notion corresponding to the
  default strides, but the design seems to "point the other way".

  There are also significant usability issues with the simple aspects
  of the interfaces. The get() and set() functions have been deplored
  by many users. This syntax leads to unwieldy user code to accomplish
  the simplest tasks. The unwieldy nature of these interfaces often
  causes the user to introduce otherwise unnecessary temporaries.
  Although the compiler can remove such temporaries, the user cannot,
  and they end up occupying valuable _intellectual_ real estate.

  These problems stem fundamentally from the fact that the view
  classes were added after the main classes, and the design of the
  main classes was not modified as logic would have dictated. This
  is a failure of design.

  Executive Summary: I was there when this stuff was born,
  and I don't even understand it. I have to stare at the header
  files whenever I try to do anything with vectors.



** Basic Vector Matrix Operations [ blas/ ]

   The existence of a C interface standard for BLAS functionality
   greatly simplified the GSL apporach to BLAS. In GSL, all BLAS
   function usage adheres to the cblas interface standard, meaning
   that any cblas conformant binary interface is acceptable; any
   library which exports a cblas binary interface can be linked
   to a GSL application. For example, the ATLAS library provides
   a full cblas conformant implementation and can be linked to
   any GSL application.

   GSL also provides a native implementation of the cblas interfaces, based
   on the reference implmentation for the standard, which was a draft
   standard at the time of GSL 1.0 release. This is where GSL blas
   functionality begins to fall down. The GSL cblas implementation was felt
   to be a necessary evil, simply because not all installations could be
   expected to have a cblas conformant library installed. Unfortunately,
   many users are unaware of the underlying architecture and end up using
   the native GSL implementation by default, resulting in poor performance,
   which reflects badly on GSL as a whole.

   This is a natural place where an inter-language approach would have been
   useful. The standard fortran BLAS implementation is readily available and
   of adequate performance for almost all users. For example, in the current
   era, it is readily available as an rpm package. It would have been natural
   at that time to use cblas wrappers over the underlying fortran BLAS
   (wrappers perhaps also available as part of the draft cblas standard). For
   completeness, this may have required shipping the fortran source as a
   necessary part of GSL; this idea was rejected because it violated the "One
   Language" design rule. As a consequence, the developers responsible for the
   actual implementation (mainly myself!) were forced to transcribe the cblas
   reference implementation into GSL, resulting in about 8000 lines of needless
   code, violating every reasonable notion of software reuse.

   In a case like cblas, where an interface standard exists, the link-time choice
   method currently implemented in GSL seems like the correct solution. 

   When faced with the choice of using a decent external implementation in a 
   language other than C and re-implementing (badly) in C, "One Language"
   religious issues should not take precedence in the decision process.


 * sublibs that use blas
   - eigen/
   - linalg/
   - multifit/
   - multimin/



** Linear Algebra [ linalg/ ]

   The defacto standard for linear algebra functionality is LAPACK. LAPACK
   exists as (and is essentially defined by) a fortran implementation which
   is commonly available, on essentially all platforms. Like blas, I can
   just 'yum install' it.

   Partly because of the lack of a C standard interface, and partly because
   of "One Language" religious beliefs, this defacto standard implementation
   was not available for the design of GSL. Rather, GSL relies on a native
   implementation of some (smallish) subset of lapack functionality.

   The resulting lack of performance has driven many users away from
   GSL. The often heard advice on the GSL discussion list is that
   "serious users" should "use LAPACK instead". But which users
   of GSL consider themselves to be "non-serious"? The notion is
   absurd. It is not users which lack seriousness, but the
   GSL implementation.

   Linear algebra is a foundational tool in numerical computing.
   Users need it; other modules in GSL need it. We need a
   serious solution for this mess.

 * sublibs that use linear algebra
   - eigen/
   - interpolation/
   - multifit/
   - multiroots/
   - ode-initval/



** Random Number Generation

   The GSL random number generators represent one of the few foundational
   aspects of the library which can be considered successful. RNGs naturally
   lend themselves to a kind of shallow object-oriented design which is
   very natural in C. The design allows for easy extension, and a user
   who understands one of the rng types essentially understands them all.

   I believe this success follows from the relative simplicity of the
   problem domain, as far as it bears on the interface design.



** Special Functions

   The special function sublibrary in GSL suffers mainly from a lack
   of consistency and coherence in the implementations. This stems
   from the decision made early in the project that coverage was
   very important; if users had to leave GSL (and use that _other_ library)
   to evaluate functions, they would likely leave GSL aside entirely.
   Correctness was not sacrificed in principle, but in practice the
   implementations span the spectrum, from ironclad to somewhat-fishy.

   As a kind of apology for this state of affairs, the functions attempt
   to estimate errors, using heuristic and sometimes ill-defined methods.
   This turns out to be a poor apology, since it tends to gum-up the
   works for the whole sublibrary, eating performance and occupying
   a large piece of intellectual real estate.

   The error estimation code must, at the least, be factored out. More
   to the point, is should likely be discarded in the main line of
   development. Other notions of error control should be investigated.

   Similar to the notion of sublibrary dependence discussed elsewhere,
   the special functions should be hierarchically organized. The
   dependencies should be made clear, and foundational levels of
   ironclad functions should be made explicit. Such ironclad
   functions should occupy the same place in the users mind as
   platform-native implementations of sin(x); they should be
   beyond question for daily use.

   Other functions, which suffer from implementation problems
   or are simply too complicated to guarantee the same level
   of correctness should be explicitly identified, as part
   of the design (not just the documentation!).


** FFT

   The GSL native fft implementations suffer in the same way that
   the GSL native linear algebra implementations suffer. Better
   solutions are available, from almost all points of view, and
   the GSL implementations therefore detract from the GSL
   cachet as a whole. "Serious users should use FFTW" is not
   a good tagline for the sublibrary; see the discussion of
   linear algebra above.


  reply	other threads:[~2009-08-27 23:13 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-30 17:07 ode-initval implicit solvers and development Tuomo Keskitalo
2008-10-01 18:29 ` Brian Gough
2008-10-09 13:22 ` Brian Gough
2008-11-02 17:35 ` Tuomo Keskitalo
2008-11-03 18:09   ` Brian Gough
2009-01-24 11:52     ` Tuomo Keskitalo
2009-02-01 17:01       ` Brian Gough
2009-02-02 17:05         ` Tuomo Keskitalo
2009-03-01 14:37           ` Tuomo Keskitalo
2009-03-03 16:34             ` Brian Gough
2009-03-05 19:47               ` Tuomo Keskitalo
2009-03-05 19:54                 ` Heikki Orsila
2009-03-06 20:03                 ` Brian Gough
2009-04-05 12:28             ` Tuomo Keskitalo
2009-05-01 14:05             ` Tuomo Keskitalo
2009-05-04 11:23               ` Brian Gough
2009-05-08 10:51               ` Brian Gough
2009-08-06 13:51                 ` GSL 2.0 roadmap Tuomo Keskitalo
2009-08-21 20:42                   ` Brian Gough
2009-08-27 11:42                     ` Tuomo Keskitalo
2009-08-27 12:51                       ` Robert G. Brown
2009-08-28 13:57                         ` Jordi Burguet Castell
2009-08-27 17:13                           ` Robert G. Brown
2009-08-28 13:58                       ` Brian Gough
2009-08-27 23:10                     ` Gerard Jungman
2009-08-27 23:13                       ` Gerard Jungman [this message]
2009-08-28 13:58                         ` GSL 2.0 roadmap (one man's view) Brian Gough
2009-09-16  0:43                           ` Gerard Jungman
2009-09-03 19:37                         ` Brian Gough
2009-09-16  0:44                           ` Gerard Jungman
2009-09-07 15:10                         ` Brian Gough
2009-09-16  0:44                           ` Gerard Jungman
2009-09-17 20:12                             ` Brian Gough
     [not found]                           ` <645d17210909090818u474f32f0q19a6334578b9f02c@mail.gmail.com>
2009-09-17 19:14                             ` Brian Gough
2009-09-07 15:10                         ` Brian Gough
2009-09-16  0:47                           ` Gerard Jungman
2009-09-27  8:03                             ` new double precision data structure? Tuomo Keskitalo
2009-09-28  8:44                               ` James Bergstra
2009-09-28 15:48                                 ` Tuomo Keskitalo
2009-10-16 13:59                                   ` Brian Gough
2009-09-29 18:38                               ` Gerard Jungman
2009-09-07 15:10                         ` GSL 2.0 roadmap (one man's view) Brian Gough
2009-09-16  0:46                           ` Gerard Jungman
2009-09-16  2:48                             ` Robert G. Brown
2009-09-17 19:14                             ` Brian Gough
2009-09-07 15:10                         ` Brian Gough
2009-09-16  0:46                           ` Gerard Jungman
2009-09-17 20:12                             ` Brian Gough
2009-09-07 15:10                         ` Brian Gough
2009-09-16  0:45                           ` Gerard Jungman
2009-09-20  9:36                             ` Tuomo Keskitalo
2009-09-20 13:23                               ` Robert G. Brown
2009-09-20 15:31                                 ` Rhys Ulerich
2009-09-20 16:19                                   ` Robert G. Brown
2009-09-21 15:13                                   ` Brian Gough
2009-09-20 15:08                               ` Rhys Ulerich
2009-09-21 12:08                               ` Brian Gough
2009-09-07 15:10                         ` Brian Gough
2009-09-07 15:34                           ` Rhys Ulerich
2009-09-07 18:21                             ` Robert G. Brown
2009-09-16  0:47                           ` Gerard Jungman
2009-09-18  3:51                             ` column-major Z F
2009-09-21 12:08                               ` column-major Brian Gough
2009-08-28 13:58                     ` GSL 2.0 roadmap Brian Gough

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1251414939.23092.82.camel@manticore.lanl.gov \
    --to=jungman@lanl.gov \
    --cc=gsl-discuss@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).