public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
* Speed Issues
  2001-12-19 13:20 Speed Issues David Ronis
@ 2001-12-10 13:34 ` David Ronis
  2001-12-18 10:03 ` David Ronis
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: David Ronis @ 2001-12-10 13:34 UTC (permalink / raw)
  To: gsl-discuss


I've compiled gsl-1.0 on an i686-Linux-gnu and on a dual athlon boxes,
each with it's own local build of the atlas blas routines.  In the one
application we've tried, I notice about a 30% slowdown (on either box)
compared to the same code compiled with IMSL routines.  All the
libraries and code were compiled with gcc-2.95.3 and my application
had GSL_RANGE_CHECK_OFF and HAVE_INLINE defined (the speed difference
is not that large even if it wasn't)

Specifically, the application has to solve for the roots of about 600
coupled nonlinear equations, and we've been using the following code:

  const gsl_multiroot_fsolver_type *T;
  gsl_multiroot_fsolver *sss;
  int status;
  size_t iii, iter = 0;
  double x_init[3*Ntarget];
  
  const size_t nnn = 3*Ntarget;
  struct rparams ppp = {1.0, 10.0};
  gsl_multiroot_function f = {&rosenbrock_f, nnn, &ppp};
  
  for(i = 0; i < 3*Ntarget; i++)
    x_init[i] = 0.0;
  gsl_vector *x = gsl_vector_alloc (nnn);
  
  for(i = 0;i < 3*Ntarget; i++)
    gsl_vector_set (x, i, x_init[i]);
  
  T = gsl_multiroot_fsolver_hybrids;
  sss = gsl_multiroot_fsolver_alloc (T, nnn);

  start = clock();
  
  gsl_multiroot_fsolver_set (sss, &f, x);

  do
    {

      iter++;
      status = gsl_multiroot_fsolver_iterate (sss);
      if (status)   /* check if solver is stuck */
	break;
      status=gsl_multiroot_test_delta (sss->dx, sss->x, 0.0, 1.0e-6);

    }
  while (status == GSL_CONTINUE && iter < 1000);

  elapsed_time += (double)(clock()-start)/CLOCKS_PER_SEC;

I compile with the following flags:

  -O3 -march=i686 -ffast-math -funroll-loops -fomit-frame-pointer
  -fforce-mem -fforce-addr -malign-jumps=3 -malign-loops=3
  -malign-functions=3 -mpreferred-stack-boundary=3

and link with the atlas blas routines.

I've also tried the IMSL routine ZSPOW (written in fortran from an
early version of the IMSL library).  As I mentioned at the outset, the
gsl version is about 30% slower, although the two give identical
roots.


Any suggestions?  I've played around eliminating some of the
additional indirection associated with having general code for
arbitrary strides (e.g., by manipulating the data members of the
gsl_vector directly, assuming stride=1), but this only speeds things up
slightly.


David

P.S., it doesn't seem to be in the documentation, but is there any
convention as to what the initial stride of a gsl_vector is?  When can
I assume that it's 1 and will remain so?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20 ` Brian Gough
@ 2001-12-11  7:39   ` Brian Gough
  2001-12-18 13:41   ` Brian Gough
  2001-12-19 13:20   ` Brian Gough
  2 siblings, 0 replies; 15+ messages in thread
From: Brian Gough @ 2001-12-11  7:39 UTC (permalink / raw)
  To: ronis; +Cc: gsl-discuss

David Ronis writes:
 > I've also tried the IMSL routine ZSPOW (written in fortran from an
 > early version of the IMSL library).  As I mentioned at the outset, the
 > gsl version is about 30% slower, although the two give identical
 > roots.

Hi,

Before I look into this you're refering to the the speed per function
evaluation (or iteration) being 30% slower, not total runtime, and
ZSPOW is the same Powell scaled-hybrid algorithm?

regards

-- 
Brian Gough

 > P.S., it doesn't seem to be in the documentation, but is there any
 > convention as to what the initial stride of a gsl_vector is?  When can
 > I assume that it's 1 and will remain so?

Yes, it's  assumed to be 1 if you use gsl_vector_alloc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20   ` Brian Gough
@ 2001-12-11 12:22     ` Brian Gough
  2001-12-18 13:42     ` Brian Gough
  2001-12-19 13:20     ` David Ronis
  2 siblings, 0 replies; 15+ messages in thread
From: Brian Gough @ 2001-12-11 12:22 UTC (permalink / raw)
  To: ronis, gsl-discuss

Brian Gough writes:
 > Before I look into this you're refering to the the speed per function
 > evaluation (or iteration) being 30% slower, not total runtime, and
 > ZSPOW is the same Powell scaled-hybrid algorithm?
   ^^^^^
or the other IMSL algorithm you used.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20     ` David Ronis
@ 2001-12-12  7:13       ` David Ronis
  2001-12-18 13:59       ` David Ronis
  2001-12-19 13:20       ` Brian Gough
  2 siblings, 0 replies; 15+ messages in thread
From: David Ronis @ 2001-12-12  7:13 UTC (permalink / raw)
  To: Brian Gough; +Cc: ronis, gsl-discuss

Hi Brian,

The clock calls as far as I know, measure CPU time, not total runtime.

Brian Gough writes:
 > Brian Gough writes:
 >  > Before I look into this you're refering to the the speed per function
 >  > evaluation (or iteration) being 30% slower, not total runtime, and
 >  > ZSPOW is the same Powell scaled-hybrid algorithm?
 >    ^^^^^
 > or the other IMSL algorithm you used.

ZSPOW is the IMSL algorithm (non-blas, BTW).

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20       ` Brian Gough
@ 2001-12-13 15:08         ` Brian Gough
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Gough @ 2001-12-13 15:08 UTC (permalink / raw)
  To: ronis; +Cc: gsl-discuss

David Ronis writes:
 > Hi Brian,
 > 
 > The clock calls as far as I know, measure CPU time, not total runtime.
 > 

A comparison of the two routines would need to consider,
- choice of algorithm 
- precision of result
- convergence criteria
- number of iterations / function evaluations

If the 30% difference is a problem you could try the some of the other
algorithms or use gprof to profile the library and look for hotspots.

Brian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Speed Issues
  2001-12-19 13:20 Speed Issues David Ronis
  2001-12-10 13:34 ` David Ronis
@ 2001-12-18 10:03 ` David Ronis
  2001-12-19 13:20 ` Brian Gough
  2001-12-19 13:20 ` David Ronis
  3 siblings, 0 replies; 15+ messages in thread
From: David Ronis @ 2001-12-18 10:03 UTC (permalink / raw)
  To: gsl-discuss

I've compiled gsl-1.0 on an i686-Linux-gnu and on a dual athlon boxes,
each with it's own local build of the atlas blas routines.  In the one
application we've tried, I notice about a 30% slowdown (on either box)
compared to the same code compiled with IMSL routines.  All the
libraries and code were compiled with gcc-2.95.3 and my application
had GSL_RANGE_CHECK_OFF and HAVE_INLINE defined (the speed difference
is not that large even if it wasn't)

Specifically, the application has to solve for the roots of about 600
coupled nonlinear equations, and we've been using the following code:

  const gsl_multiroot_fsolver_type *T;
  gsl_multiroot_fsolver *sss;
  int status;
  size_t iii, iter = 0;
  double x_init[3*Ntarget];
  
  const size_t nnn = 3*Ntarget;
  struct rparams ppp = {1.0, 10.0};
  gsl_multiroot_function f = {&rosenbrock_f, nnn, &ppp};
  
  for(i = 0; i < 3*Ntarget; i++)
    x_init[i] = 0.0;
  gsl_vector *x = gsl_vector_alloc (nnn);
  
  for(i = 0;i < 3*Ntarget; i++)
    gsl_vector_set (x, i, x_init[i]);
  
  T = gsl_multiroot_fsolver_hybrids;
  sss = gsl_multiroot_fsolver_alloc (T, nnn);

  start = clock();
  
  gsl_multiroot_fsolver_set (sss, &f, x);

  do
    {

      iter++;
      status = gsl_multiroot_fsolver_iterate (sss);
      if (status)   /* check if solver is stuck */
	break;
      status=gsl_multiroot_test_delta (sss->dx, sss->x, 0.0, 1.0e-6);

    }
  while (status == GSL_CONTINUE && iter < 1000);

  elapsed_time += (double)(clock()-start)/CLOCKS_PER_SEC;

I compile with the following flags:

  -O3 -march=i686 -ffast-math -funroll-loops -fomit-frame-pointer
  -fforce-mem -fforce-addr -malign-jumps=3 -malign-loops=3
  -malign-functions=3 -mpreferred-stack-boundary=3

and link with the atlas blas routines.

I've also tried the IMSL routine ZSPOW (written in fortran from an
early version of the IMSL library).  As I mentioned at the outset, the
gsl version is about 30% slower, although the two give identical
roots.


Any suggestions?  I've played around eliminating some of the
additional indirection associated with having general code for
arbitrary strides (e.g., by manipulating the data members of the
gsl_vector directly, assuming stride=1), but this only speeds things up
slightly.


David

P.S., it doesn't seem to be in the documentation, but is there any
convention as to what the initial stride of a gsl_vector is?  When can
I assume that it's 1 and will remain so?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20 ` Brian Gough
  2001-12-11  7:39   ` Brian Gough
@ 2001-12-18 13:41   ` Brian Gough
  2001-12-19 13:20   ` Brian Gough
  2 siblings, 0 replies; 15+ messages in thread
From: Brian Gough @ 2001-12-18 13:41 UTC (permalink / raw)
  To: ronis; +Cc: gsl-discuss

David Ronis writes:
 > I've also tried the IMSL routine ZSPOW (written in fortran from an
 > early version of the IMSL library).  As I mentioned at the outset, the
 > gsl version is about 30% slower, although the two give identical
 > roots.

Hi,

Before I look into this you're refering to the the speed per function
evaluation (or iteration) being 30% slower, not total runtime, and
ZSPOW is the same Powell scaled-hybrid algorithm?

regards

-- 
Brian Gough

 > P.S., it doesn't seem to be in the documentation, but is there any
 > convention as to what the initial stride of a gsl_vector is?  When can
 > I assume that it's 1 and will remain so?

Yes, it's  assumed to be 1 if you use gsl_vector_alloc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20   ` Brian Gough
  2001-12-11 12:22     ` Brian Gough
@ 2001-12-18 13:42     ` Brian Gough
  2001-12-19 13:20     ` David Ronis
  2 siblings, 0 replies; 15+ messages in thread
From: Brian Gough @ 2001-12-18 13:42 UTC (permalink / raw)
  To: ronis, gsl-discuss

Brian Gough writes:
 > Before I look into this you're refering to the the speed per function
 > evaluation (or iteration) being 30% slower, not total runtime, and
 > ZSPOW is the same Powell scaled-hybrid algorithm?
   ^^^^^
or the other IMSL algorithm you used.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20     ` David Ronis
  2001-12-12  7:13       ` David Ronis
@ 2001-12-18 13:59       ` David Ronis
  2001-12-19 13:20       ` Brian Gough
  2 siblings, 0 replies; 15+ messages in thread
From: David Ronis @ 2001-12-18 13:59 UTC (permalink / raw)
  To: Brian Gough; +Cc: ronis, gsl-discuss

Hi Brian,

The clock calls as far as I know, measure CPU time, not total runtime.

Brian Gough writes:
 > Brian Gough writes:
 >  > Before I look into this you're refering to the the speed per function
 >  > evaluation (or iteration) being 30% slower, not total runtime, and
 >  > ZSPOW is the same Powell scaled-hybrid algorithm?
 >    ^^^^^
 > or the other IMSL algorithm you used.

ZSPOW is the IMSL algorithm (non-blas, BTW).

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20 Speed Issues David Ronis
  2001-12-10 13:34 ` David Ronis
  2001-12-18 10:03 ` David Ronis
@ 2001-12-19 13:20 ` Brian Gough
  2001-12-11  7:39   ` Brian Gough
                     ` (2 more replies)
  2001-12-19 13:20 ` David Ronis
  3 siblings, 3 replies; 15+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
  To: ronis; +Cc: gsl-discuss

David Ronis writes:
 > I've also tried the IMSL routine ZSPOW (written in fortran from an
 > early version of the IMSL library).  As I mentioned at the outset, the
 > gsl version is about 30% slower, although the two give identical
 > roots.

Hi,

Before I look into this you're refering to the the speed per function
evaluation (or iteration) being 30% slower, not total runtime, and
ZSPOW is the same Powell scaled-hybrid algorithm?

regards

-- 
Brian Gough

 > P.S., it doesn't seem to be in the documentation, but is there any
 > convention as to what the initial stride of a gsl_vector is?  When can
 > I assume that it's 1 and will remain so?

Yes, it's  assumed to be 1 if you use gsl_vector_alloc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Speed Issues
@ 2001-12-19 13:20 David Ronis
  2001-12-10 13:34 ` David Ronis
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: David Ronis @ 2001-12-19 13:20 UTC (permalink / raw)
  To: gsl-discuss

I've compiled gsl-1.0 on an i686-Linux-gnu and on a dual athlon boxes,
each with it's own local build of the atlas blas routines.  In the one
application we've tried, I notice about a 30% slowdown (on either box)
compared to the same code compiled with IMSL routines.  All the
libraries and code were compiled with gcc-2.95.3 and my application
had GSL_RANGE_CHECK_OFF and HAVE_INLINE defined (the speed difference
is not that large even if it wasn't)

Specifically, the application has to solve for the roots of about 600
coupled nonlinear equations, and we've been using the following code:

  const gsl_multiroot_fsolver_type *T;
  gsl_multiroot_fsolver *sss;
  int status;
  size_t iii, iter = 0;
  double x_init[3*Ntarget];
  
  const size_t nnn = 3*Ntarget;
  struct rparams ppp = {1.0, 10.0};
  gsl_multiroot_function f = {&rosenbrock_f, nnn, &ppp};
  
  for(i = 0; i < 3*Ntarget; i++)
    x_init[i] = 0.0;
  gsl_vector *x = gsl_vector_alloc (nnn);
  
  for(i = 0;i < 3*Ntarget; i++)
    gsl_vector_set (x, i, x_init[i]);
  
  T = gsl_multiroot_fsolver_hybrids;
  sss = gsl_multiroot_fsolver_alloc (T, nnn);

  start = clock();
  
  gsl_multiroot_fsolver_set (sss, &f, x);

  do
    {

      iter++;
      status = gsl_multiroot_fsolver_iterate (sss);
      if (status)   /* check if solver is stuck */
	break;
      status=gsl_multiroot_test_delta (sss->dx, sss->x, 0.0, 1.0e-6);

    }
  while (status == GSL_CONTINUE && iter < 1000);

  elapsed_time += (double)(clock()-start)/CLOCKS_PER_SEC;

I compile with the following flags:

  -O3 -march=i686 -ffast-math -funroll-loops -fomit-frame-pointer
  -fforce-mem -fforce-addr -malign-jumps=3 -malign-loops=3
  -malign-functions=3 -mpreferred-stack-boundary=3

and link with the atlas blas routines.

I've also tried the IMSL routine ZSPOW (written in fortran from an
early version of the IMSL library).  As I mentioned at the outset, the
gsl version is about 30% slower, although the two give identical
roots.


Any suggestions?  I've played around eliminating some of the
additional indirection associated with having general code for
arbitrary strides (e.g., by manipulating the data members of the
gsl_vector directly, assuming stride=1), but this only speeds things up
slightly.


David

P.S., it doesn't seem to be in the documentation, but is there any
convention as to what the initial stride of a gsl_vector is?  When can
I assume that it's 1 and will remain so?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20 ` Brian Gough
  2001-12-11  7:39   ` Brian Gough
  2001-12-18 13:41   ` Brian Gough
@ 2001-12-19 13:20   ` Brian Gough
  2001-12-11 12:22     ` Brian Gough
                       ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
  To: ronis, gsl-discuss

Brian Gough writes:
 > Before I look into this you're refering to the the speed per function
 > evaluation (or iteration) being 30% slower, not total runtime, and
 > ZSPOW is the same Powell scaled-hybrid algorithm?
   ^^^^^
or the other IMSL algorithm you used.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Speed Issues
  2001-12-19 13:20 Speed Issues David Ronis
                   ` (2 preceding siblings ...)
  2001-12-19 13:20 ` Brian Gough
@ 2001-12-19 13:20 ` David Ronis
  3 siblings, 0 replies; 15+ messages in thread
From: David Ronis @ 2001-12-19 13:20 UTC (permalink / raw)
  To: gsl-discuss


I've compiled gsl-1.0 on an i686-Linux-gnu and on a dual athlon boxes,
each with it's own local build of the atlas blas routines.  In the one
application we've tried, I notice about a 30% slowdown (on either box)
compared to the same code compiled with IMSL routines.  All the
libraries and code were compiled with gcc-2.95.3 and my application
had GSL_RANGE_CHECK_OFF and HAVE_INLINE defined (the speed difference
is not that large even if it wasn't)

Specifically, the application has to solve for the roots of about 600
coupled nonlinear equations, and we've been using the following code:

  const gsl_multiroot_fsolver_type *T;
  gsl_multiroot_fsolver *sss;
  int status;
  size_t iii, iter = 0;
  double x_init[3*Ntarget];
  
  const size_t nnn = 3*Ntarget;
  struct rparams ppp = {1.0, 10.0};
  gsl_multiroot_function f = {&rosenbrock_f, nnn, &ppp};
  
  for(i = 0; i < 3*Ntarget; i++)
    x_init[i] = 0.0;
  gsl_vector *x = gsl_vector_alloc (nnn);
  
  for(i = 0;i < 3*Ntarget; i++)
    gsl_vector_set (x, i, x_init[i]);
  
  T = gsl_multiroot_fsolver_hybrids;
  sss = gsl_multiroot_fsolver_alloc (T, nnn);

  start = clock();
  
  gsl_multiroot_fsolver_set (sss, &f, x);

  do
    {

      iter++;
      status = gsl_multiroot_fsolver_iterate (sss);
      if (status)   /* check if solver is stuck */
	break;
      status=gsl_multiroot_test_delta (sss->dx, sss->x, 0.0, 1.0e-6);

    }
  while (status == GSL_CONTINUE && iter < 1000);

  elapsed_time += (double)(clock()-start)/CLOCKS_PER_SEC;

I compile with the following flags:

  -O3 -march=i686 -ffast-math -funroll-loops -fomit-frame-pointer
  -fforce-mem -fforce-addr -malign-jumps=3 -malign-loops=3
  -malign-functions=3 -mpreferred-stack-boundary=3

and link with the atlas blas routines.

I've also tried the IMSL routine ZSPOW (written in fortran from an
early version of the IMSL library).  As I mentioned at the outset, the
gsl version is about 30% slower, although the two give identical
roots.


Any suggestions?  I've played around eliminating some of the
additional indirection associated with having general code for
arbitrary strides (e.g., by manipulating the data members of the
gsl_vector directly, assuming stride=1), but this only speeds things up
slightly.


David

P.S., it doesn't seem to be in the documentation, but is there any
convention as to what the initial stride of a gsl_vector is?  When can
I assume that it's 1 and will remain so?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20     ` David Ronis
  2001-12-12  7:13       ` David Ronis
  2001-12-18 13:59       ` David Ronis
@ 2001-12-19 13:20       ` Brian Gough
  2001-12-13 15:08         ` Brian Gough
  2 siblings, 1 reply; 15+ messages in thread
From: Brian Gough @ 2001-12-19 13:20 UTC (permalink / raw)
  To: ronis; +Cc: gsl-discuss

David Ronis writes:
 > Hi Brian,
 > 
 > The clock calls as far as I know, measure CPU time, not total runtime.
 > 

A comparison of the two routines would need to consider,
- choice of algorithm 
- precision of result
- convergence criteria
- number of iterations / function evaluations

If the 30% difference is a problem you could try the some of the other
algorithms or use gprof to profile the library and look for hotspots.

Brian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Speed Issues
  2001-12-19 13:20   ` Brian Gough
  2001-12-11 12:22     ` Brian Gough
  2001-12-18 13:42     ` Brian Gough
@ 2001-12-19 13:20     ` David Ronis
  2001-12-12  7:13       ` David Ronis
                         ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: David Ronis @ 2001-12-19 13:20 UTC (permalink / raw)
  To: Brian Gough; +Cc: ronis, gsl-discuss

Hi Brian,

The clock calls as far as I know, measure CPU time, not total runtime.

Brian Gough writes:
 > Brian Gough writes:
 >  > Before I look into this you're refering to the the speed per function
 >  > evaluation (or iteration) being 30% slower, not total runtime, and
 >  > ZSPOW is the same Powell scaled-hybrid algorithm?
 >    ^^^^^
 > or the other IMSL algorithm you used.

ZSPOW is the IMSL algorithm (non-blas, BTW).

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2001-12-19 21:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-19 13:20 Speed Issues David Ronis
2001-12-10 13:34 ` David Ronis
2001-12-18 10:03 ` David Ronis
2001-12-19 13:20 ` Brian Gough
2001-12-11  7:39   ` Brian Gough
2001-12-18 13:41   ` Brian Gough
2001-12-19 13:20   ` Brian Gough
2001-12-11 12:22     ` Brian Gough
2001-12-18 13:42     ` Brian Gough
2001-12-19 13:20     ` David Ronis
2001-12-12  7:13       ` David Ronis
2001-12-18 13:59       ` David Ronis
2001-12-19 13:20       ` Brian Gough
2001-12-13 15:08         ` Brian Gough
2001-12-19 13:20 ` David Ronis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).