From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19144 invoked by alias); 7 Jan 2010 03:29:21 -0000 Received: (qmail 19130 invoked by uid 22791); 7 Jan 2010 03:29:19 -0000 X-SWARE-Spam-Status: No, hits=-3.1 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail.phy.duke.edu (HELO mail.phy.duke.edu) (152.3.182.2) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 07 Jan 2010 03:29:13 +0000 Received: from localhost (localhost [127.0.0.1]) by mail.phy.duke.edu (Postfix) with ESMTP id 01493780B4; Wed, 6 Jan 2010 22:29:05 -0500 (EST) Received: from mail.phy.duke.edu ([127.0.0.1]) by localhost (mail.phy.duke.edu [127.0.0.1]) (amavisd-new, port 10026) with LMTP id wRG46gbJUHbH; Wed, 6 Jan 2010 22:29:04 -0500 (EST) Received: from lilith.rgb.private.net (cpe-069-134-079-008.nc.res.rr.com [69.134.79.8]) by mail.phy.duke.edu (Postfix) with ESMTP id B4322780B2; Wed, 6 Jan 2010 22:29:04 -0500 (EST) Date: Thu, 07 Jan 2010 03:29:00 -0000 From: "Robert G. Brown" To: Gerard Jungman cc: GSL Discuss Mailing List Subject: Re: gsl container designs In-Reply-To: <1262829020.27244.361.camel@manticore.lanl.gov> Message-ID: References: <1259110486.3028.69.camel@manticore.lanl.gov> <4B4477B8.50305@iki.fi> <1262829020.27244.361.camel@manticore.lanl.gov> User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sourceware.org X-SW-Source: 2010-q1/txt/msg00006.txt.bz2 On Wed, 6 Jan 2010, Gerard Jungman wrote: > In fact, I don't even see a need to sacrifice efficiency. I think I recall from my benchmarking days that -- depending on compiler -- there is a small dereferencing penalty for packed matrices (vectors packed into dereferencing **..* pointers) compared to doing the offset arithmetic via brute force inline or via a macro. I ran two different implementations of stream (one with direct pointer traversal of the vectors, one with vector offsets repacked into **matrices to do the tests. The size of this penalty did decrease as gcc advanced; I haven't run the benchmark recently and don't know how large it currently is. It was never so large that it stopped me from using repacked pointers for code clarity. > this goal is directly achievable at essentially optimal efficiency. > My attitude about C is that it is all about lining up the bytes. > A good C design is just some syntactic layer on top of a > rudimentary layer, where-in you are careful to line up all > the bytes. After that, optimal efficiency is pretty much > automatic. C is very friendly, once you are careful to > line everything up. I hope you see my small but non-vacuous > technical meaning. Agreed. And I can see no reason that modern compilers may not have eliminated most of the dereferencing penalty, especially if one aligns things carefully. However, some of my beowulfish friends who work(ed) for compiler companies argued that one of Fortran's longstanding advantages in numerical code efficiency is simply because of the fact that matrices in Fortran are written in stone and so compiler writers can optimize the hell out of them. C pointers, OTOH -- well, one can make triangular arrays, rectangular arrays, one can start the indices at 0 or 1 or -100 or 47 as you like and run to wherever you like. How can anyone pre-optimize a compiler to handle all of that? Truthfully, I can't see why they CAN'T, but I'm not a compiler writer and empirically there has been a small penalty between C in general and fortran and C with direct addressing and C with dereferenced addressing as well. IIRC there is one more layer of address resolution in indirect dereferencing, which is the fundamental origin of the latter. >> What I want is extremely simple: >> >> * A way to construct/allocate a matrix of arbitrary dimension (out to some >> (un)reasonable max, e.g. 10 > > Check. > > >> but I want to be able to use it >> just like m[i][j][k]...[r][s] when all is said and done. > > This is the only part which I wonder about. I thought about this > before and came to the conclusion that it is not easy to get right. > Probably possible, but very tricky. I've written code to do so many times -- Numerical Recipes provides the basic idea in its matrix-packing routines. Simply allocate e.g. double **..*m,*v; Then malloc v long enough to hold the actual data. Then malloc m to hold the first index, malloc vectors long enough to hold the second index and assign them to these indices in a first-index loop (recurse until the next to last index) and finally pack the appropriately offset addresses from the actual vector v into the next to last index so that the last index actually traverses v. In this scheme the last index is always the linear/fast index in the actual data vector, the other indices are all there to permit the compiler to handle the dereferencing. To free the matrix, one has to work backwards -- free each successive index one malloc'd from the right back to the left (you need the leftmost indices to know the addresses to free at each level), and finally free m and v themselves. All very tedious and mechanical, but all fully automatable and debuggable as well. The biggest problem with NR is that each type has an individual allocation/free command, and it only goes to 2 dimensions. To make a single routine pair handle more dimensions and arbitrary types is more work and requires void typing and casting (and/or switches to handle the unique aspects of individual tyees internally). > The most annoying aspect of this is that these little problems > are trivial in C++. But maybe just beyond the capability of C. How could that be? People write operating systems in C. I hardly think that allocating sensible matrices is beyond it. I'm not trying to be argumentative or flame, by the way. Could you be more explicit about why packing pointers to permit the dereferencing of an arbitrary matrix is technically difficult (as opposed to merely tedious) in C and would be easier in C++ (which I thought discouraged the use of pointers altogether)? >> * The specific dimension bounds need to be user specified. If I want to >> make a 6 dimensional triple triangular array to hold e.g. 3j symbols, I >> don't want to have to allocate a rectangular array most of which is >> empty wasted space. > > See the earlier comment by Tuomo Keskitalo, about banded/packed storage. > Not something I have thought about yet, but probably not hard. I'm not sure about "banded", but packed is (as noted) simple. It is just a matter of picking the lengths of the successive ***, **, * length vectors one mallocs to fill the toplevel e.g. **** pointer, plus the use of appropriate offsets. In particular, NR contains examples of how to do e.g.: **dmatrix(l1,u1,l2,u2) for arbitrary lower/upper indices in the two slots (as well as the associated free) and it is straightforward to generalize the basic algorithm to higher dimensions and allow for arbitrary data types. I don't think twice about allocating a vector of structs of more or less arbitrary length (including structs that contain vectors or matrices themselves) and dereferencing it via a packed, cast or typed, matrix. This is one of the ADVANTAGES of C -- one can just do what one wants to do in the most natural way via pointers, mallocs and suitable typedefs and/or structs. Without a safety net, admittedly, and you're responsible for both code integrity and efficiency with little compiler help, but OTOH you can quite easily visualize what the underlying assembler is going to (approximately) look like, so one CAN tweak it in reasonable ways for efficiency without resorting to actual assembler. >> * Access to the underlying actual data vector (non-opaque, in other >> words) so it can be passed to e.g. ODE solvers and the like that can >> only be sensibly written in a general way for a vector. > > Check. > > >> My problem in this regard with the GSL is that so far, the matrix and >> vector constructs are pretty much useless -- they are far away from the >> standard way one usually does this sort of thing in C using pointers or >> just allocating a vector or matrix. > > I find them useless as well, for exactly these reasons. > > >> One of the ADVANTAGES of C is that it permits one to work pointer magic >> to improve the readability (and sometimes execution efficiency) of code. >> Opaque types and set/get calls are dark evil in both regards. > > Right. The layer which defines the "types" should be essentially > syntactic, just there to make it easier to organize/understand > your code. The underlying implementation should be a brutally > simple pile of bytes, and it should be visible. Amen, my brother. C is a THIN veneer of upper-level language sensibilities on top of raw assembler, and its ability to work "close to the metal" is its power. This power should be preserved as much as possible in the GSL, in the process of providing a human readable interface. I actually wrote all of this, in a more or less GSL--like form, several years ago. Some of the code was ugly, but it worked charmlike, at the cost of requiring a cast of a void type to tell the compiler how to dereference the outermost (actual) vector index. Internally, all the dereferencing pointer arithmetic is more or less the same for a void (universal) type, but the pointer into the actual data vector has to be advanced with the correct stride, and the compiler has to know what that stride should be. I can probably dig this code out and post it here (or post a link to it) if you want to look at it. I think I went up to 9 or 10 dimensions (way more than I needed) and did a lot of things via brute force to make the purely mechanical recursion obvious; I planned an eventual rewrite into a more compact form that I never got around to. rgb > > > -- > G. Jungman > > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu