RE: gsl_multifit

public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed

* RE: gsl_multifit
@ 2001-12-19 13:20 Mikael Adlers
  0 siblings, 0 replies; 11+ messages in thread
From: Mikael Adlers @ 2001-12-19 13:20 UTC (permalink / raw)
  To: 'Kai Trukenmueller', gsl-discuss

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]

Hi again,

I used Matlab to compute the solution to a least squares problem
with the same sizes as you have and got the following results

> system with Athlon 1GHz, 640MB running Linux-2.4.10

> Order:  (300,200)  (600,400)  (900,600) (1200,1000) (1500,1200)
> Octave:  0.340s     3.320s     11.580s    1m9.680s    2m1.090s
> gsl(QR): 0.390s     4.560s     22.690s    1m38.640s   5m1.580s

system Athlon 700MHz 256Mbytes running win2000
Matlab5.3: 0.1500s    1.342s      4.417s      15.873s    28.362s
(with the LAPACK patch)

If speed is what you want I should use LAPACK directly 
(see www.netlib.org/clapack/ use the LAPACK routine DGELS) with a 
optimized BLAS (see www.netlib.org/atlas) There has been a *lot* 
of effort in writing LAPACK and BLAS routines, they are highly 
optimized to use the cache in a good way (use block methods e.t.c.). 

It is very easy to write for example a matrix multiplication routine, 
it's really hard to get it efficient. Look at this page to see the
speedup they obtained in Matlab when they changed from LINPACK 
(older linear algebra package) to LAPACK and used an optimized BLAS.

http://www.mathworks.com/company/newsletter/clevescorner/winter2000.cleve.sh
tml


Sincerely,
/Mikael Adlers



------------------------------------------------------------------ 
 Mikael Adlers, Ph.D.          email: mikael@mathcore.com 
 MathCore AB                   phone: +4613 32 85 07 
 Wallenbergsgata 4             fax:         21 27 01
 SE-583 35 LinkÃ¶ping, Sweden   http://www.mathcore.com



> -----Original Message-----
> From: Kai Trukenmueller [ mailto:trukenm@ag2.mechanik.tu-darmstadt.de ] 
> Sent: den 18 oktober 2001 18:21
> To: gsl-discuss@sources.redhat.com
> Subject: Re: gsl_multifit
> 
> 
> Hi,
> 
> > You could try computing the solution using the following
> > functions instead and see if you get any speed improvement.
> > gsl_linalg_QR_decomp
> > gsl_linalg_QR_lssolve
> Thanks for the tip. It's much faster now, but still slower 
> than octave.
> 
> Order:	(300,200)  (600,400)  (900,600)  (1200,1000)  
> (1500,1200)
> Octave:  0.340s     3.320s     11.580s    1m9.680s    2m1.090s
> gsl(QR): 0.390s     4.560s     22.690s    1m38.640s   5m1.580s
> -- 
> :wq `Kai Trukenmueller'
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gsl_multifit
  2001-12-19 13:20   ` gsl_multifit Kai Trukenmueller
@ 2001-12-19 13:20     ` Brian Gough
  0 siblings, 0 replies; 11+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
  To: Kai Trukenmueller; +Cc: gsl-discuss

Kai Trukenmueller writes:
 > I benchmarked the programs on system with Athlon 1GHz, 640MB running Linux-2.4.10 with the following results:
 > 
 > Dimensions	(120,80)  (300,200)  (600,400)  (900,600)  (1200,800)
 > Octave-Script	0.060s    0.410s      3.390s     12.260s    39.790s
 > Compiled gsl    0.110s    2.240s      31.310s    2m7.590s   seg.fault
 > 
 > The gsl routine not just turns out to be much slower, it also seems to
 > be `instable' for higher orders (the results do not convege -> inft..)
 > Maybe sth. is wrong in my code. For low-orders (~<500) both results are
 > equivalent.

Regarding the segfault, your program contains some fixed length arrays
which are accessed outside the array bounds -- if you allocate them
dynamically that problem should go away.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gsl_multifit
  2001-12-19 13:20       ` gsl_multifit Brian Gough
@ 2001-12-19 13:20         ` Dirk Eddelbuettel
  0 siblings, 0 replies; 11+ messages in thread
From: Dirk Eddelbuettel @ 2001-12-19 13:20 UTC (permalink / raw)
  To: Brian Gough; +Cc: Henry Sobotka, gsl-discuss

On Fri, Oct 19, 2001 at 11:12:19AM +0100, Brian Gough wrote:
> The only other optimization I have up my sleeve is to introduce more
> use of BLAS functions.  There are quite a few places where the code
> uses a for-loop instead of calling the corresponding BLAS routine.

That would be great as Atlas can transparently replace the BLAS. Debian does
that for GNU R and GNU Octave, and the speed gain can be of the order of 8
or 9 times [ for suitable operations and matrix sizes, of course ].

Dirk

-- 
Three out of two people have difficulties with fractions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gsl_multifit
  2001-12-19 13:20     ` gsl_multifit Henry Sobotka
@ 2001-12-19 13:20       ` Brian Gough
  2001-12-19 13:20         ` gsl_multifit Dirk Eddelbuettel
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
  To: Henry Sobotka; +Cc: gsl-discuss

Henry Sobotka writes:
 > Brian Gough wrote:
 > > 
 > > For extra speed try recompiling the library with -DGSL_RANGE_CHECK_OFF=1.
 > > Range checking is currently enabled by default, for safety, and puts an
 > > overhead on every matrix/vector operation.
 > 
 > Brian, are there any other similar macros that can be turned off at
 > compiletime for performance gains?

No. That is the only macro, apart from HAVE_INLINE which is turned on
by default.

The only other optimization I have up my sleeve is to introduce more
use of BLAS functions.  There are quite a few places where the code
uses a for-loop instead of calling the corresponding BLAS routine.

Brian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* gsl_multifit
@ 2001-12-19 13:20 Kai Trukenmueller
  2001-12-19 13:20 ` gsl_multifit Brian Gough
  0 siblings, 1 reply; 11+ messages in thread
From: Kai Trukenmueller @ 2001-12-19 13:20 UTC (permalink / raw)
  To: gsl-discuss

Hi,

The problem A*x=b with A not square leeds to
x = (A^T *A)^-1 A^T *b
at least analytically.
That should be equivalent to a multidimensional fitting, as in
`gsl_multifit_linear'.
I implemented an example and it works good.

Than I compared it to a simple octave (a free matlab clone) routine,
using
x=A\b;

The results are the same (as it should be), but it surprised me a lot,
that the octave-script works _much_ faster for high dimensions than the
compiled C programm. It seems, that these algorithems are better.

Is that a known problem, or did I use the wrong routines.
Is there any better way for solving A x = b?
-- 
:wq `Kai Trukenmueller'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gsl_multifit
  2001-12-19 13:20 gsl_multifit Mikael Adlers
@ 2001-12-19 13:20 ` Kai Trukenmueller
  2001-12-19 13:20   ` gsl_multifit Brian Gough
  0 siblings, 1 reply; 11+ messages in thread
From: Kai Trukenmueller @ 2001-12-19 13:20 UTC (permalink / raw)
  To: gsl-discuss

Hi,

> You could try computing the solution using the following
> functions instead and see if you get any speed improvement.
> gsl_linalg_QR_decomp
> gsl_linalg_QR_lssolve
Thanks for the tip. It's much faster now, but still slower than octave.

Order:	(300,200)  (600,400)  (900,600)  (1200,1000)  (1500,1200)
Octave:  0.340s     3.320s     11.580s    1m9.680s    2m1.090s
gsl(QR): 0.390s     4.560s     22.690s    1m38.640s   5m1.580s
-- 
:wq `Kai Trukenmueller'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: gsl_multifit
@ 2001-12-19 13:20 Mikael Adlers
  2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
  0 siblings, 1 reply; 11+ messages in thread
From: Mikael Adlers @ 2001-12-19 13:20 UTC (permalink / raw)
  To: 'Kai Trukenmueller', gsl-discuss

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3221 bytes --]

Hi,
the algorithm in gsl_multifit_linear uses the singular value
decomposition (SVD). This routine gives the best stability and
accuracy when solving a least squares problem. However, in most
cases it is like shooting a fly with a cannon. The complexity 
of the SVD is about 4mn^2+11/3n^3 compared to 1/2mn^2 +n^3/6 
using the normal equations + Cholesky factorization (or twice the 
amount using QR factorization instead (better stability than
the normal equations)). 

Further, gsl_multifit_linear computes a covariance matrix and
some other nifty stuff eating flops away.

The algorithm in Matlab (and Octave?) uses the QR factorization 
(together with one step of iterative refinement I believe) to 
compute the solution.

You could try computing the solution using the following 
functions instead and see if you get any speed improvement.
gsl_linalg_QR_decomp
gsl_linalg_QR_lssolve

/Mikael Adlers




------------------------------------------------------------------ 
 Mikael Adlers, Ph.D.          email: mikael@mathcore.com 
 MathCore AB                   phone: +4613 32 85 07 
 Wallenbergsgata 4             fax:         21 27 01
 SE-583 35 LinkÃ¶ping, Sweden   http://www.mathcore.com




> -----Original Message-----
> From: Kai Trukenmueller [ mailto:trukenm@ag2.mechanik.tu-darmstadt.de ] 
> Sent: den 16 oktober 2001 22:51
> To: gsl-discuss@sources.redhat.com
> Subject: gsl_multifit
> 
> 
> Hi,
>
> The problem A*x=b with A not square leeds to
> x = (A^T *A)^-1 A^T *b
> at least analytically.
> That should be equivalent to a multidimensional fitting, as in
> `gsl_multifit_linear'.
> I implemented an example and it works good.
>
> Than I compared it to a simple octave (a free matlab clone) routine,
> using
> x=A\b;
>
> The results are the same (as it should be), but it surprised me a lot,
> that the octave-script works _much_ faster for high dimensions than the
> compiled C programm. It seems, that these algorithems are better.
>
> Is that a known problem, or did I use the wrong routines.
> Is there any better way for solving A x = b?
> -- 
> :wq `Kai Trukenmueller'


> -----Original Message-----
> From: Kai Trukenmueller [ mailto:trukenm@ag2.mechanik.tu-darmstadt.de ] 
> Sent: den 17 oktober 2001 01:36
> To: gsl-discuss@sources.redhat.com
> Subject: Re: gsl_multifit
> 
> 
> Hi,
> 
> I benchmarked the programs on system with Athlon 1GHz, 640MB 
> running Linux-2.4.10 with the following results:
> 
> Dimensions	(120,80)  (300,200)  (600,400)  (900,600)  (1200,800)
> Octave-Script	0.060s    0.410s      3.390s     12.260s    39.790s
> Compiled gsl    0.110s    2.240s      31.310s    2m7.590s   seg.fault
> 
> The gsl routine not just turns out to be much slower, it also seems to
> be `instable' for higher orders (the results do not convege -> inft..)
> Maybe sth. is wrong in my code. For low-orders (~<500) both 
> results are
> equivalent.
> 
> I'm not using the atlas-blas but gslcblas.
> 
> Code attached;
> The octave-script `sinft.m' should be executable, and can be started
> directely from the shell (iff /usr/bin/octave exists).
> The matrix dimensions must be edited in the source code (in the c
> programm, they are arguments).
> 
> 
> -- 
> :wq `Kai Trukenmueller'
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gsl_multifit
  2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
@ 2001-12-19 13:20   ` Brian Gough
  2001-12-19 13:20     ` gsl_multifit Henry Sobotka
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
  To: Kai Trukenmueller; +Cc: gsl-discuss

Kai Trukenmueller writes:
 > Thanks for the tip. It's much faster now, but still slower than octave.
 > 
 > Order:	(300,200)  (600,400)  (900,600)  (1200,1000)  (1500,1200)
 > Octave:  0.340s     3.320s     11.580s    1m9.680s    2m1.090s
 > gsl(QR): 0.390s     4.560s     22.690s    1m38.640s   5m1.580s

For extra speed try recompiling the library with -DGSL_RANGE_CHECK_OFF=1.
Range checking is currently enabled by default, for safety, and puts an
overhead on every matrix/vector operation.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gsl_multifit
  2001-12-19 13:20 ` gsl_multifit Brian Gough
@ 2001-12-19 13:20   ` Kai Trukenmueller
  2001-12-19 13:20     ` gsl_multifit Brian Gough
  0 siblings, 1 reply; 11+ messages in thread
From: Kai Trukenmueller @ 2001-12-19 13:20 UTC (permalink / raw)
  To: gsl-discuss

Hi,

I benchmarked the programs on system with Athlon 1GHz, 640MB running Linux-2.4.10 with the following results:

Dimensions	(120,80)  (300,200)  (600,400)  (900,600)  (1200,800)
Octave-Script	0.060s    0.410s      3.390s     12.260s    39.790s
Compiled gsl    0.110s    2.240s      31.310s    2m7.590s   seg.fault

The gsl routine not just turns out to be much slower, it also seems to
be `instable' for higher orders (the results do not convege -> inft..)
Maybe sth. is wrong in my code. For low-orders (~<500) both results are
equivalent.

I'm not using the atlas-blas but gslcblas.

Code attached;
The octave-script `sinft.m' should be executable, and can be started
directely from the shell (iff /usr/bin/octave exists).
The matrix dimensions must be edited in the source code (in the c
programm, they are arguments).

-- 
:wq `Kai Trukenmueller'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gsl_multifit
  2001-12-19 13:20 gsl_multifit Kai Trukenmueller
@ 2001-12-19 13:20 ` Brian Gough
  2001-12-19 13:20   ` gsl_multifit Kai Trukenmueller
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
  To: Kai Trukenmueller; +Cc: gsl-discuss

Kai Trukenmueller writes:
 > The problem A*x=b with A not square leeds to x = (A^T *A)^-1 A^T *b
 > at least analytically.  That should be equivalent to a
 > multidimensional fitting, as in `gsl_multifit_linear'.  I
 > implemented an example and it works good.
 >  Than I compared it to a simple octave (a free matlab clone)
 > routine, using x=A\b;
 >  The results are the same (as it should be), but it surprised me a
 > lot, that the octave-script works _much_ faster for high dimensions
 > than the compiled C programm. It seems, that these algorithems are
 > better.
 >  Is that a known problem, or did I use the wrong routines.  Is
 > there any better way for solving A x = b?

That sounds like the correct routine. Can you send your program and
benchmarks.  I have not compared the two myself.  I would expect
Octave to be faster because it uses LAPACK -- the question is how
much.  I have not done any optimization in GSL yet, so there is room
to improve things.  If you are interested in profiling the code you
can compile GSL for use with gprof, as described in the GCC manual.

regards
Brian Gough

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gsl_multifit
  2001-12-19 13:20   ` gsl_multifit Brian Gough
@ 2001-12-19 13:20     ` Henry Sobotka
  2001-12-19 13:20       ` gsl_multifit Brian Gough
  0 siblings, 1 reply; 11+ messages in thread
From: Henry Sobotka @ 2001-12-19 13:20 UTC (permalink / raw)
  To: Brian Gough; +Cc: gsl-discuss

Brian Gough wrote:
> 
> For extra speed try recompiling the library with -DGSL_RANGE_CHECK_OFF=1.
> Range checking is currently enabled by default, for safety, and puts an
> overhead on every matrix/vector operation.

Brian, are there any other similar macros that can be turned off at
compiletime for performance gains?

h~

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-12-19 13:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-19 13:20 gsl_multifit Mikael Adlers
  -- strict thread matches above, loose matches on Subject: below --
2001-12-19 13:20 gsl_multifit Kai Trukenmueller
2001-12-19 13:20 ` gsl_multifit Brian Gough
2001-12-19 13:20   ` gsl_multifit Kai Trukenmueller
2001-12-19 13:20     ` gsl_multifit Brian Gough
2001-12-19 13:20 gsl_multifit Mikael Adlers
2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
2001-12-19 13:20   ` gsl_multifit Brian Gough
2001-12-19 13:20     ` gsl_multifit Henry Sobotka
2001-12-19 13:20       ` gsl_multifit Brian Gough
2001-12-19 13:20         ` gsl_multifit Dirk Eddelbuettel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).