On Wed, 6 Jan 2010, Rhys Ulerich wrote: >> I recall from my benchmarking days that -- depending on compiler -- >> there is a small dereferencing penalty for packed matrices (vectors >> packed into dereferencing **..* pointers) compared to doing the offset >> arithmetic via brute force inline or via a macro. >> ...... >> I haven't >> run the benchmark recently and don't know how large it currently is.  It >> was never so large that it stopped me from using repacked pointers for >> code clarity.. > > Mostly unscientific, but worth tossing into the mix: > > Using Intel 10.1 compilers on a fairly recent AMD chip, 100,000 iterations > of doing the nested pointers approach is neck-and-neck with index arithmetic > on a 10x10 double matrix.  For the 100x100 case it takes 1.3 times longer > to iterate using the nested pointers.  Work in the inner loop "compute > kernel" is > *= against a constant scalar.  Optimization flags on -O3.  I've seen similar > behavior on recent GNU compilers. That sounds partly like a cache effect -- 10x10 almost certainly stays in L1, 100x100 won't fit. My own experience is similar, although I don't recall the multiplier being as large as 1.3 (but then, I was doing stream and stream-like tests with very large vectors, which means that one spends more time in a vector streaming mode and minimizes cache-thrashing when turning corners). And my memory could be faulty -- I'm an old guy, after all, early Alzheimers...;-) rgb > > I'm happy to provide the test code if anyone's interested. > > - Rhys > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu