From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18317 invoked by alias); 22 Oct 2002 23:33:41 -0000 Mailing-List: contact gsl-discuss-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sources.redhat.com Received: (qmail 18310 invoked from network); 22 Oct 2002 23:33:40 -0000 Received: from unknown (HELO ra.astro.lsa.umich.edu) (141.211.105.12) by sources.redhat.com with SMTP; 22 Oct 2002 23:33:40 -0000 Received: from orb.astro.lsa.umich.edu (orb.astro.lsa.umich.edu [141.211.105.50]) by ra.astro.lsa.umich.edu (8.9.3/8.9.1) with ESMTP id TAA15291 for ; Tue, 22 Oct 2002 19:33:37 -0400 (EDT) Date: Tue, 22 Oct 2002 16:55:00 -0000 From: Christos Siopis X-X-Sender: siopis@orb.astro.lsa.umich.edu To: gsl-discuss@sources.redhat.com Subject: Re: About coordinated efforts on scientific software. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2002-q4/txt/msg00072.txt.bz2 On Mon, 21 Oct 2002, Manoj Warrier wrote: > I guess (hope rather) that GSL will eventually cover the numerical library > part of point (1). For plotting and graphics we again have a similar > situation as in the "mathematic packages" ... Check out > (http://scilinux.sf.net/graphvis.html" for a list of free packages. I think the problem is not so much with lack of libraries as it is with lack of an "integrated" environment where one can start with raw data, pass them through various mathematical transformations, and finally plot some result, all from inside the same "package"/environment that encourages trial-and-error, what-if experiments, and rapid prototyping. The first thought that one might have for achieving this would be to somehow wrap a number of relevant libraries and use them from inside a scripting language like python. I can see at least three kinds of problems with this: - First, most of the existing libraries are too low-level for direct use from an interactive scripting environment. Things like memory allocation (needed e.g. by GSL) or opening a window for plotting (needed by e.g. by PGPLOT) are *show-stoppers* in an interactive environment. Some heroic people are going through the pain of actually creating usable interfaces, such as the PyGSL folks. This is fine, except two things: how do the wrapper functions interoperate with functions from other wrapped libraries (see next item below), and how do we ensure we do not enter into a versioning hell, where the wrapper uses some version A of the library, but the library has now moved on to version B; add some RPM versioning issues if you use RedHat's stuff and multiply all this by the number of libraries which you want to wrap and enjoy the mess... - Second, there is the issue of consistency of the user interface. For instance, a NumPy (numeric python) user is used to the ufuncs, "universal" functions with return type that depends on the input type. So if a NumPy user wanted to compute the mean of an array, he or she would expect that a function call like mean(arrayx) would return an long int or a float, depending on whether arrayx is an array of longs or of floats. But doing this through GSL/PyGSL, the user would have to use pygsl.statistics.mean for a float or pygsl.statistics.long.mean for a long int, i.e. the user is asked to think in terms of C, a strongly typed language. This is both annoying and prone to hard-to-find errors. A related issue is the overlap between wrappers of different libraries (e.g., NumPy already has a couple of mean/average functions from other libraries!). And there is also the issue of performance, as NumPy objects are converted back and forth to different formats (some wrappers do a better job at this than others). - Third, it's the question of "putting this all together". Wrappers are good for wrapping a small number of small libraries. As you add more and more, there's all sorts of issues related to the distribution of the "final" package, the quality and homogeneity of the documentation, and so on. If there was no other solution at hand, maybe this would all be acceptable. But with commercial packages offering a "one-stop" solution (despite a number of other disadvantages) i think the open-source science community has to do better than that. SciPy ( www.scipy.org ) is a package that tries to solve some of these problems but i this it is a little too early to tell how good the outcome will be, and i cannot help wondering how many more times the open source numerical community will have to code and debug e.g. an FFT transform or a statistical package and whether this is the best use of our resources... Christos