* RE: gsl_multifit
@ 2001-12-19 13:20 Mikael Adlers
0 siblings, 0 replies; 11+ messages in thread
From: Mikael Adlers @ 2001-12-19 13:20 UTC (permalink / raw)
To: 'Kai Trukenmueller', gsl-discuss
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]
Hi again,
I used Matlab to compute the solution to a least squares problem
with the same sizes as you have and got the following results
> system with Athlon 1GHz, 640MB running Linux-2.4.10
> Order: (300,200) (600,400) (900,600) (1200,1000) (1500,1200)
> Octave: 0.340s 3.320s 11.580s 1m9.680s 2m1.090s
> gsl(QR): 0.390s 4.560s 22.690s 1m38.640s 5m1.580s
system Athlon 700MHz 256Mbytes running win2000
Matlab5.3: 0.1500s 1.342s 4.417s 15.873s 28.362s
(with the LAPACK patch)
If speed is what you want I should use LAPACK directly
(see www.netlib.org/clapack/ use the LAPACK routine DGELS) with a
optimized BLAS (see www.netlib.org/atlas) There has been a *lot*
of effort in writing LAPACK and BLAS routines, they are highly
optimized to use the cache in a good way (use block methods e.t.c.).
It is very easy to write for example a matrix multiplication routine,
it's really hard to get it efficient. Look at this page to see the
speedup they obtained in Matlab when they changed from LINPACK
(older linear algebra package) to LAPACK and used an optimized BLAS.
http://www.mathworks.com/company/newsletter/clevescorner/winter2000.cleve.sh
tml
Sincerely,
/Mikael Adlers
------------------------------------------------------------------
Mikael Adlers, Ph.D. email: mikael@mathcore.com
MathCore AB phone: +4613 32 85 07
Wallenbergsgata 4 fax: 21 27 01
SE-583 35 Linköping, Sweden http://www.mathcore.com
> -----Original Message-----
> From: Kai Trukenmueller [ mailto:trukenm@ag2.mechanik.tu-darmstadt.de ]
> Sent: den 18 oktober 2001 18:21
> To: gsl-discuss@sources.redhat.com
> Subject: Re: gsl_multifit
>
>
> Hi,
>
> > You could try computing the solution using the following
> > functions instead and see if you get any speed improvement.
> > gsl_linalg_QR_decomp
> > gsl_linalg_QR_lssolve
> Thanks for the tip. It's much faster now, but still slower
> than octave.
>
> Order: (300,200) (600,400) (900,600) (1200,1000)
> (1500,1200)
> Octave: 0.340s 3.320s 11.580s 1m9.680s 2m1.090s
> gsl(QR): 0.390s 4.560s 22.690s 1m38.640s 5m1.580s
> --
> :wq `Kai Trukenmueller'
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: gsl_multifit
2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
@ 2001-12-19 13:20 ` Brian Gough
0 siblings, 0 replies; 11+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
To: Kai Trukenmueller; +Cc: gsl-discuss
Kai Trukenmueller writes:
> I benchmarked the programs on system with Athlon 1GHz, 640MB running Linux-2.4.10 with the following results:
>
> Dimensions (120,80) (300,200) (600,400) (900,600) (1200,800)
> Octave-Script 0.060s 0.410s 3.390s 12.260s 39.790s
> Compiled gsl 0.110s 2.240s 31.310s 2m7.590s seg.fault
>
> The gsl routine not just turns out to be much slower, it also seems to
> be `instable' for higher orders (the results do not convege -> inft..)
> Maybe sth. is wrong in my code. For low-orders (~<500) both results are
> equivalent.
Regarding the segfault, your program contains some fixed length arrays
which are accessed outside the array bounds -- if you allocate them
dynamically that problem should go away.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: gsl_multifit
2001-12-19 13:20 ` gsl_multifit Brian Gough
@ 2001-12-19 13:20 ` Dirk Eddelbuettel
0 siblings, 0 replies; 11+ messages in thread
From: Dirk Eddelbuettel @ 2001-12-19 13:20 UTC (permalink / raw)
To: Brian Gough; +Cc: Henry Sobotka, gsl-discuss
On Fri, Oct 19, 2001 at 11:12:19AM +0100, Brian Gough wrote:
> The only other optimization I have up my sleeve is to introduce more
> use of BLAS functions. There are quite a few places where the code
> uses a for-loop instead of calling the corresponding BLAS routine.
That would be great as Atlas can transparently replace the BLAS. Debian does
that for GNU R and GNU Octave, and the speed gain can be of the order of 8
or 9 times [ for suitable operations and matrix sizes, of course ].
Dirk
--
Three out of two people have difficulties with fractions.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: gsl_multifit
2001-12-19 13:20 ` gsl_multifit Henry Sobotka
@ 2001-12-19 13:20 ` Brian Gough
2001-12-19 13:20 ` gsl_multifit Dirk Eddelbuettel
0 siblings, 1 reply; 11+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
To: Henry Sobotka; +Cc: gsl-discuss
Henry Sobotka writes:
> Brian Gough wrote:
> >
> > For extra speed try recompiling the library with -DGSL_RANGE_CHECK_OFF=1.
> > Range checking is currently enabled by default, for safety, and puts an
> > overhead on every matrix/vector operation.
>
> Brian, are there any other similar macros that can be turned off at
> compiletime for performance gains?
No. That is the only macro, apart from HAVE_INLINE which is turned on
by default.
The only other optimization I have up my sleeve is to introduce more
use of BLAS functions. There are quite a few places where the code
uses a for-loop instead of calling the corresponding BLAS routine.
Brian
^ permalink raw reply [flat|nested] 11+ messages in thread
* gsl_multifit
@ 2001-12-19 13:20 Kai Trukenmueller
2001-12-19 13:20 ` gsl_multifit Brian Gough
0 siblings, 1 reply; 11+ messages in thread
From: Kai Trukenmueller @ 2001-12-19 13:20 UTC (permalink / raw)
To: gsl-discuss
Hi,
The problem A*x=b with A not square leeds to
x = (A^T *A)^-1 A^T *b
at least analytically.
That should be equivalent to a multidimensional fitting, as in
`gsl_multifit_linear'.
I implemented an example and it works good.
Than I compared it to a simple octave (a free matlab clone) routine,
using
x=A\b;
The results are the same (as it should be), but it surprised me a lot,
that the octave-script works _much_ faster for high dimensions than the
compiled C programm. It seems, that these algorithems are better.
Is that a known problem, or did I use the wrong routines.
Is there any better way for solving A x = b?
--
:wq `Kai Trukenmueller'
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: gsl_multifit
2001-12-19 13:20 gsl_multifit Mikael Adlers
@ 2001-12-19 13:20 ` Kai Trukenmueller
2001-12-19 13:20 ` gsl_multifit Brian Gough
0 siblings, 1 reply; 11+ messages in thread
From: Kai Trukenmueller @ 2001-12-19 13:20 UTC (permalink / raw)
To: gsl-discuss
Hi,
> You could try computing the solution using the following
> functions instead and see if you get any speed improvement.
> gsl_linalg_QR_decomp
> gsl_linalg_QR_lssolve
Thanks for the tip. It's much faster now, but still slower than octave.
Order: (300,200) (600,400) (900,600) (1200,1000) (1500,1200)
Octave: 0.340s 3.320s 11.580s 1m9.680s 2m1.090s
gsl(QR): 0.390s 4.560s 22.690s 1m38.640s 5m1.580s
--
:wq `Kai Trukenmueller'
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: gsl_multifit
@ 2001-12-19 13:20 Mikael Adlers
2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
0 siblings, 1 reply; 11+ messages in thread
From: Mikael Adlers @ 2001-12-19 13:20 UTC (permalink / raw)
To: 'Kai Trukenmueller', gsl-discuss
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3221 bytes --]
Hi,
the algorithm in gsl_multifit_linear uses the singular value
decomposition (SVD). This routine gives the best stability and
accuracy when solving a least squares problem. However, in most
cases it is like shooting a fly with a cannon. The complexity
of the SVD is about 4mn^2+11/3n^3 compared to 1/2mn^2 +n^3/6
using the normal equations + Cholesky factorization (or twice the
amount using QR factorization instead (better stability than
the normal equations)).
Further, gsl_multifit_linear computes a covariance matrix and
some other nifty stuff eating flops away.
The algorithm in Matlab (and Octave?) uses the QR factorization
(together with one step of iterative refinement I believe) to
compute the solution.
You could try computing the solution using the following
functions instead and see if you get any speed improvement.
gsl_linalg_QR_decomp
gsl_linalg_QR_lssolve
/Mikael Adlers
------------------------------------------------------------------
Mikael Adlers, Ph.D. email: mikael@mathcore.com
MathCore AB phone: +4613 32 85 07
Wallenbergsgata 4 fax: 21 27 01
SE-583 35 Linköping, Sweden http://www.mathcore.com
> -----Original Message-----
> From: Kai Trukenmueller [ mailto:trukenm@ag2.mechanik.tu-darmstadt.de ]
> Sent: den 16 oktober 2001 22:51
> To: gsl-discuss@sources.redhat.com
> Subject: gsl_multifit
>
>
> Hi,
>
> The problem A*x=b with A not square leeds to
> x = (A^T *A)^-1 A^T *b
> at least analytically.
> That should be equivalent to a multidimensional fitting, as in
> `gsl_multifit_linear'.
> I implemented an example and it works good.
>
> Than I compared it to a simple octave (a free matlab clone) routine,
> using
> x=A\b;
>
> The results are the same (as it should be), but it surprised me a lot,
> that the octave-script works _much_ faster for high dimensions than the
> compiled C programm. It seems, that these algorithems are better.
>
> Is that a known problem, or did I use the wrong routines.
> Is there any better way for solving A x = b?
> --
> :wq `Kai Trukenmueller'
> -----Original Message-----
> From: Kai Trukenmueller [ mailto:trukenm@ag2.mechanik.tu-darmstadt.de ]
> Sent: den 17 oktober 2001 01:36
> To: gsl-discuss@sources.redhat.com
> Subject: Re: gsl_multifit
>
>
> Hi,
>
> I benchmarked the programs on system with Athlon 1GHz, 640MB
> running Linux-2.4.10 with the following results:
>
> Dimensions (120,80) (300,200) (600,400) (900,600) (1200,800)
> Octave-Script 0.060s 0.410s 3.390s 12.260s 39.790s
> Compiled gsl 0.110s 2.240s 31.310s 2m7.590s seg.fault
>
> The gsl routine not just turns out to be much slower, it also seems to
> be `instable' for higher orders (the results do not convege -> inft..)
> Maybe sth. is wrong in my code. For low-orders (~<500) both
> results are
> equivalent.
>
> I'm not using the atlas-blas but gslcblas.
>
> Code attached;
> The octave-script `sinft.m' should be executable, and can be started
> directely from the shell (iff /usr/bin/octave exists).
> The matrix dimensions must be edited in the source code (in the c
> programm, they are arguments).
>
>
> --
> :wq `Kai Trukenmueller'
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: gsl_multifit
2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
@ 2001-12-19 13:20 ` Brian Gough
2001-12-19 13:20 ` gsl_multifit Henry Sobotka
0 siblings, 1 reply; 11+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
To: Kai Trukenmueller; +Cc: gsl-discuss
Kai Trukenmueller writes:
> Thanks for the tip. It's much faster now, but still slower than octave.
>
> Order: (300,200) (600,400) (900,600) (1200,1000) (1500,1200)
> Octave: 0.340s 3.320s 11.580s 1m9.680s 2m1.090s
> gsl(QR): 0.390s 4.560s 22.690s 1m38.640s 5m1.580s
For extra speed try recompiling the library with -DGSL_RANGE_CHECK_OFF=1.
Range checking is currently enabled by default, for safety, and puts an
overhead on every matrix/vector operation.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: gsl_multifit
2001-12-19 13:20 ` gsl_multifit Brian Gough
@ 2001-12-19 13:20 ` Kai Trukenmueller
2001-12-19 13:20 ` gsl_multifit Brian Gough
0 siblings, 1 reply; 11+ messages in thread
From: Kai Trukenmueller @ 2001-12-19 13:20 UTC (permalink / raw)
To: gsl-discuss
Hi,
I benchmarked the programs on system with Athlon 1GHz, 640MB running Linux-2.4.10 with the following results:
Dimensions (120,80) (300,200) (600,400) (900,600) (1200,800)
Octave-Script 0.060s 0.410s 3.390s 12.260s 39.790s
Compiled gsl 0.110s 2.240s 31.310s 2m7.590s seg.fault
The gsl routine not just turns out to be much slower, it also seems to
be `instable' for higher orders (the results do not convege -> inft..)
Maybe sth. is wrong in my code. For low-orders (~<500) both results are
equivalent.
I'm not using the atlas-blas but gslcblas.
Code attached;
The octave-script `sinft.m' should be executable, and can be started
directely from the shell (iff /usr/bin/octave exists).
The matrix dimensions must be edited in the source code (in the c
programm, they are arguments).
--
:wq `Kai Trukenmueller'
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: gsl_multifit
2001-12-19 13:20 gsl_multifit Kai Trukenmueller
@ 2001-12-19 13:20 ` Brian Gough
2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
0 siblings, 1 reply; 11+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
To: Kai Trukenmueller; +Cc: gsl-discuss
Kai Trukenmueller writes:
> The problem A*x=b with A not square leeds to x = (A^T *A)^-1 A^T *b
> at least analytically. That should be equivalent to a
> multidimensional fitting, as in `gsl_multifit_linear'. I
> implemented an example and it works good.
> Than I compared it to a simple octave (a free matlab clone)
> routine, using x=A\b;
> The results are the same (as it should be), but it surprised me a
> lot, that the octave-script works _much_ faster for high dimensions
> than the compiled C programm. It seems, that these algorithems are
> better.
> Is that a known problem, or did I use the wrong routines. Is
> there any better way for solving A x = b?
That sounds like the correct routine. Can you send your program and
benchmarks. I have not compared the two myself. I would expect
Octave to be faster because it uses LAPACK -- the question is how
much. I have not done any optimization in GSL yet, so there is room
to improve things. If you are interested in profiling the code you
can compile GSL for use with gprof, as described in the GCC manual.
regards
Brian Gough
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: gsl_multifit
2001-12-19 13:20 ` gsl_multifit Brian Gough
@ 2001-12-19 13:20 ` Henry Sobotka
2001-12-19 13:20 ` gsl_multifit Brian Gough
0 siblings, 1 reply; 11+ messages in thread
From: Henry Sobotka @ 2001-12-19 13:20 UTC (permalink / raw)
To: Brian Gough; +Cc: gsl-discuss
Brian Gough wrote:
>
> For extra speed try recompiling the library with -DGSL_RANGE_CHECK_OFF=1.
> Range checking is currently enabled by default, for safety, and puts an
> overhead on every matrix/vector operation.
Brian, are there any other similar macros that can be turned off at
compiletime for performance gains?
h~
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2001-12-19 13:20 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-19 13:20 gsl_multifit Mikael Adlers
-- strict thread matches above, loose matches on Subject: below --
2001-12-19 13:20 gsl_multifit Kai Trukenmueller
2001-12-19 13:20 ` gsl_multifit Brian Gough
2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
2001-12-19 13:20 ` gsl_multifit Brian Gough
2001-12-19 13:20 gsl_multifit Mikael Adlers
2001-12-19 13:20 ` gsl_multifit Kai Trukenmueller
2001-12-19 13:20 ` gsl_multifit Brian Gough
2001-12-19 13:20 ` gsl_multifit Henry Sobotka
2001-12-19 13:20 ` gsl_multifit Brian Gough
2001-12-19 13:20 ` gsl_multifit Dirk Eddelbuettel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).