Requirements (a very incomplete list) ------------------------------------- * A controlled shift in the C interfaces. Too much interface change may scare off 1.x users. But there must be change... * Better integration with existing toolsets. - vector / matrix models - linear algebra (!) * Better overall organization, expression of module dependencies, etc. * GSL should be a technology leader. That is part of the point of the GPL licensing. If GSL were just another of many essentially equivalent libraries, then it would make more sense for it to be LGPL or even BSD. To justify GPL licensing, which coerces people into our open-source world, we better have something unique and excellent. Taking Stock of 1.x =================== ** Overall Design * Split into "independent sublibraries". The stated design goal of independent sublibraries was abandoned in all but name quite early in the project. Several obvious problems with a naive interpretation of "independent" occur: - Some sublibraries depend on other "foundational" sublibraries; such dependencies are natural and desirable. But with no effective explicit way to express these dependencies, the overall organization degenerates to a monolith. - The packaging environment does not _easily_ support independence. This is related to the lack of an effective expression of dependencies among sublibraries, as noted above. It is also related to practical problems in supporting a hierarchical build with the autotools setup. Witness the apparent need to create the gsl/ directory of links to header files as the first step in the build hackery. Consequences: Various practical problems, such as the organization of header files, the grotesque link lines (a real problem at one point), and general difficulty in comprehension of the structure of the project. Requirement: A natural, effective, and explicit way to express dependencies, which can be exported to the build and packaging systems * One Language Only The benefits of a one-language only design are described in the design document. They are fairly obvious and depend on the fact that C is the "universal" system language. Nevertheless, there are large costs associated to this choice. Amongst these are the following. - Important "legacy" tools which are implemented in other languages are unavailable to GSL. This creates a major deficiency in areas such as linear algebra. Some holes were partially plugged with native GSL implementations, but this can never be an acceptable solution, either from a software design standpoint or from an end-user standpoint. From a software-design standpoint, GSL fails to gain from the maturity and the ongoing development in these external tools. From an end-user standpoint, the lack of performance has caused many users to abandon these aspects of GSL. Further discussion of these points occurs in specialized sections below, as appropriate. - Many new developments in methodology are occuring outside C, notably in C++. As long as such developments were essentially experimental and unfocused, they could be ignored; one could argue that this was indeed the case a decade ago, when GSL was initiated. But this can no longer be argued. The performance of these new tools exceeds (often by a large factor) the performance of the GSL tools; some are well-established, readily available, and of clear utility. Again, many end-users have abandoned the corresponding areas of GSL. - When a user abandons some core area of GSL for other implementations, they tend to abandon other aspects as well, since GSL interfaces are not always friendly to other data models. In many cases it would be impossible to make it so, due to insuperable inter-language barriers. GSL should not continue to ignore these developments. At the very least it should allow for the existence of these external developments, by paying very close attention to inter-language issues. These issues are related to the design goal of "naturalness", stated in the design document. Quote: "If there is something which is unnatural in C and has to be simulated then we avoid using it." It may be coming to pass that some/many useful methods and designs for a numerical library are now "unnatural in C" by onstruction, since the performant and well-designed tools are no longer being created in C. For example, it may be the case that problem domains such linear algebra are no longer "natural in C". Whither GSL in such a world? * Error Handling GSL implements a very rudimentary type of error-handling, which is quite appropriate for small projects. The "register handler, fail-by-default" model works for a small library with a "low stack depth" usage model. By this, I mean that library functions are called by clients, but not by other library functions; in such a case, it is easy for the client to interpret the errors. But GSL does not conform to this "low stack depth" usage model, and the fail-by-default model can be confusing for clients, when the failure occurs at depths that they cannot control. Although it may not seem like a pressing issue, this monolithic error-handling design seems to be another symptom of the lack of explicit models for interdependence in GSL. It should be revisited. ** Build System The autotools build system can be an impediment to overall design. As discussed above, the GSL build architecture, based on a typical autotools single-project setup, could not be coerced into supporting notions of dependence/independence for sublibraries. Rather, these dependencies exist in spite of the build system. ** Complex Numbers GSL implements complex numbers. This functionality may be superfluous at this time, depending on the status of C99 features in the compilers of interest. Beyond this observation, there are some thorny issues with complex numbers, when one considers inter-language issues. As argued in several places here, GSL cannot afford to ignore these issues. Users are often turning to basic functionality implemented in other languages (linear algebra!), and inter-operability at the level of data interchange is an absolute necessity. An array of complex values computed in GSL should be appropriate for passage through an inter-language interface. The two languages of most concern are fortran and c++. In fortran, complex numbers are a language feature, and in c++ they are a library feature. In both cases, the representation is, in principle, implementation- dependent. These problems have been addressed by other projects; effective solutions exist for the interface to fortran. It seems likely that the c-c++ interface problem is solvable, especially within the context of a consistent build with a single compiler. Mixing compilers may be more difficult, but should be solvable in principle. ** Vector and Matrix Data Structures [ vector/, matrix/, block/ ] The basic features of the GSL vector and matrix data structures are reasonable, as an implementation of dense storage for basic data types. The important notion of slicing is (partially) implemented in GSL in terms of the "view" concept. One can construct submatrices as views of given matrices, change the stride of vector data by creating vector views, etc. But there are clear flaws in the design. The design does not express the obvious idea that a "view" is itself a "thing", simply because the view classes do not have an inheritance relationship to the main classes. In fact, there is a problem with the functoriality of the design, because there is an obvious logical sense in which a matrix is itself a view type, using a view notion corresponding to the default strides, but the design seems to "point the other way". There are also significant usability issues with the simple aspects of the interfaces. The get() and set() functions have been deplored by many users. This syntax leads to unwieldy user code to accomplish the simplest tasks. The unwieldy nature of these interfaces often causes the user to introduce otherwise unnecessary temporaries. Although the compiler can remove such temporaries, the user cannot, and they end up occupying valuable _intellectual_ real estate. These problems stem fundamentally from the fact that the view classes were added after the main classes, and the design of the main classes was not modified as logic would have dictated. This is a failure of design. Executive Summary: I was there when this stuff was born, and I don't even understand it. I have to stare at the header files whenever I try to do anything with vectors. ** Basic Vector Matrix Operations [ blas/ ] The existence of a C interface standard for BLAS functionality greatly simplified the GSL apporach to BLAS. In GSL, all BLAS function usage adheres to the cblas interface standard, meaning that any cblas conformant binary interface is acceptable; any library which exports a cblas binary interface can be linked to a GSL application. For example, the ATLAS library provides a full cblas conformant implementation and can be linked to any GSL application. GSL also provides a native implementation of the cblas interfaces, based on the reference implmentation for the standard, which was a draft standard at the time of GSL 1.0 release. This is where GSL blas functionality begins to fall down. The GSL cblas implementation was felt to be a necessary evil, simply because not all installations could be expected to have a cblas conformant library installed. Unfortunately, many users are unaware of the underlying architecture and end up using the native GSL implementation by default, resulting in poor performance, which reflects badly on GSL as a whole. This is a natural place where an inter-language approach would have been useful. The standard fortran BLAS implementation is readily available and of adequate performance for almost all users. For example, in the current era, it is readily available as an rpm package. It would have been natural at that time to use cblas wrappers over the underlying fortran BLAS (wrappers perhaps also available as part of the draft cblas standard). For completeness, this may have required shipping the fortran source as a necessary part of GSL; this idea was rejected because it violated the "One Language" design rule. As a consequence, the developers responsible for the actual implementation (mainly myself!) were forced to transcribe the cblas reference implementation into GSL, resulting in about 8000 lines of needless code, violating every reasonable notion of software reuse. In a case like cblas, where an interface standard exists, the link-time choice method currently implemented in GSL seems like the correct solution. When faced with the choice of using a decent external implementation in a language other than C and re-implementing (badly) in C, "One Language" religious issues should not take precedence in the decision process. * sublibs that use blas - eigen/ - linalg/ - multifit/ - multimin/ ** Linear Algebra [ linalg/ ] The defacto standard for linear algebra functionality is LAPACK. LAPACK exists as (and is essentially defined by) a fortran implementation which is commonly available, on essentially all platforms. Like blas, I can just 'yum install' it. Partly because of the lack of a C standard interface, and partly because of "One Language" religious beliefs, this defacto standard implementation was not available for the design of GSL. Rather, GSL relies on a native implementation of some (smallish) subset of lapack functionality. The resulting lack of performance has driven many users away from GSL. The often heard advice on the GSL discussion list is that "serious users" should "use LAPACK instead". But which users of GSL consider themselves to be "non-serious"? The notion is absurd. It is not users which lack seriousness, but the GSL implementation. Linear algebra is a foundational tool in numerical computing. Users need it; other modules in GSL need it. We need a serious solution for this mess. * sublibs that use linear algebra - eigen/ - interpolation/ - multifit/ - multiroots/ - ode-initval/ ** Random Number Generation The GSL random number generators represent one of the few foundational aspects of the library which can be considered successful. RNGs naturally lend themselves to a kind of shallow object-oriented design which is very natural in C. The design allows for easy extension, and a user who understands one of the rng types essentially understands them all. I believe this success follows from the relative simplicity of the problem domain, as far as it bears on the interface design. ** Special Functions The special function sublibrary in GSL suffers mainly from a lack of consistency and coherence in the implementations. This stems from the decision made early in the project that coverage was very important; if users had to leave GSL (and use that _other_ library) to evaluate functions, they would likely leave GSL aside entirely. Correctness was not sacrificed in principle, but in practice the implementations span the spectrum, from ironclad to somewhat-fishy. As a kind of apology for this state of affairs, the functions attempt to estimate errors, using heuristic and sometimes ill-defined methods. This turns out to be a poor apology, since it tends to gum-up the works for the whole sublibrary, eating performance and occupying a large piece of intellectual real estate. The error estimation code must, at the least, be factored out. More to the point, is should likely be discarded in the main line of development. Other notions of error control should be investigated. Similar to the notion of sublibrary dependence discussed elsewhere, the special functions should be hierarchically organized. The dependencies should be made clear, and foundational levels of ironclad functions should be made explicit. Such ironclad functions should occupy the same place in the users mind as platform-native implementations of sin(x); they should be beyond question for daily use. Other functions, which suffer from implementation problems or are simply too complicated to guarantee the same level of correctness should be explicitly identified, as part of the design (not just the documentation!). ** FFT The GSL native fft implementations suffer in the same way that the GSL native linear algebra implementations suffer. Better solutions are available, from almost all points of view, and the GSL implementations therefore detract from the GSL cachet as a whole. "Serious users should use FFTW" is not a good tagline for the sublibrary; see the discussion of linear algebra above.