Help: working with large gsl

public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed

* Help: working with large gsl_vector
@ 2004-12-11  7:44 Toan T Nguyen
  2004-12-13 18:41 ` Linas Vepstas
  2004-12-13 18:51 ` Fabrice Rossi
  0 siblings, 2 replies; 7+ messages in thread
From: Toan T Nguyen @ 2004-12-11  7:44 UTC (permalink / raw)
  To: gsl-discuss

Hi, 

I'm using the multidimensional minimization procedure in GSL and have problem 
with large vectors. The dimensionality of my vectors is 300000 which means my 
index variable is of long integer type. gsl_vector, however, works only with 
size_t type only. I couldnot even call gsl_vector_alloc(300000) .  It says:

gsl: init_source.c:29: ERROR: vector length n must be positive integer
Default GSL error handler invoked.

Can somebody here give an advice? I'm quite new to GSL and I search the 
mailing list archives but could not find an answer?

For clarity, my system is an Athlon64 using SuSE x86_64, gcc 3.3.3

Thanks a lot in advance,

Toan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help: working with large gsl_vector
  2004-12-11  7:44 Help: working with large gsl_vector Toan T Nguyen
@ 2004-12-13 18:41 ` Linas Vepstas
  2004-12-13 18:51 ` Fabrice Rossi
  1 sibling, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2004-12-13 18:41 UTC (permalink / raw)
  To: Toan T Nguyen; +Cc: gsl-discuss

On Fri, Dec 10, 2004 at 11:51:19PM -0800, Toan T Nguyen was heard to remark:
> 
> Hi, 
> 
> I'm using the multidimensional minimization procedure in GSL and have problem 
> with large vectors. The dimensionality of my vectors is 300000 which means my 
> index variable is of long integer type. gsl_vector, however, works only with 
> size_t type only. I couldnot even call gsl_vector_alloc(300000) .  It says:

double-check.  size_t is normally a 32-bit int when compiling for a
32-bit libc, and 64 when compiling for a 64-bit libc.  You can do 
this with a
printf ("size is=%d\n", sizeof (size_t));

which will print the number of bytes.

> gsl: init_source.c:29: ERROR: vector length n must be positive integer
> Default GSL error handler invoked.

which is not to say there isn't a gsl bug, but you probably should do 
more due diligence before offering the diagnosis.

--linas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help: working with large gsl_vector
  2004-12-11  7:44 Help: working with large gsl_vector Toan T Nguyen
  2004-12-13 18:41 ` Linas Vepstas
@ 2004-12-13 18:51 ` Fabrice Rossi
  2004-12-14 23:03   ` Toan T Nguyen
  1 sibling, 1 reply; 7+ messages in thread
From: Fabrice Rossi @ 2004-12-13 18:51 UTC (permalink / raw)
  To: gsl-discuss; +Cc: Toan T Nguyen

Toan T Nguyen wrote:
> I'm using the multidimensional minimization procedure in GSL and have problem 
> with large vectors. The dimensionality of my vectors is 300000 which means my 
> index variable is of long integer type. 

Hum. I authored the initial version of the multidimensional minimization 
functions which have been improved by others, but I don't think they 
have been improved in such a way that optimization in dimension 300 000 
is possible. Even for a quadratic function and using a conjugate 
gradient algorithm, it will take 3e5 iterations to the algorithm to 
complete. As each iteration implies a linear search for minimum that 
will use at least around 10 evaluations of your function, your a looking 
for 3e6 function evaluation. Unless you have a very special problem, I 
would guess that a function evaluation take at least 3e5 operations to 
complete, which implies 9e11 operations (a.k.a. 2^39 operations). That's 
a lot. And we are talking about a quadratic form, remember ?

Perhaps you could tell us a little bit more about your application ?

Fabrice

PS : Support Vector Machine algorithms have proven that it is indeed 
possible to do quadratic optimization under constraints in high 
dimensional space (like 10000 dimensions), but they use _very_ complex 
methods.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help: working with large gsl_vector
  2004-12-13 18:51 ` Fabrice Rossi
@ 2004-12-14 23:03   ` Toan T Nguyen
  2004-12-15 11:12     ` Fabrice Rossi
  0 siblings, 1 reply; 7+ messages in thread
From: Toan T Nguyen @ 2004-12-14 23:03 UTC (permalink / raw)
  To: Fabrice Rossi; +Cc: gsl-discuss, help-gsl


Hi, thank you for the analysis. Apparently, my analytical calculation is able 
to guess the minimum reasonably good, so I need only 20000 iterations to find 
the minimum. 

However, the strange thing is if I take the final minimized vector and run it 
through the conjugate gradient routine again, the energy (my function to 
minimize) keeps going down for a few more 1000s iterations (albeit it change 
only the 4th significant digit). Do you know what's wrong?

I follow the example in the GSL document and set my success criteria as

status = gsl_multimin_test_gradient (s->gradient, 1e-20);

The initial setup of the vector is

T = gsl_multimin_fdfminimizer_conjugate_pr;
s = gsl_multimin_fdfminimizer_alloc (T, 3*nparts);
gsl_multimin_fdfminimizer_set (s, &energy, x, 0.001, 1e-12);

For clarity, I used dimensionless units so ALL the nearest neighbor distances 
and the coupling constants are of the order of unity. nparts is the number of 
particles which is about 100000.

Could you give some advises? Thanks a lot in advance.

Toan



On Monday 13 December 2004 10:50, Fabrice Rossi wrote:
> Toan T Nguyen wrote:
> > I'm using the multidimensional minimization procedure in GSL and have
> > problem with large vectors. The dimensionality of my vectors is 300000
> > which means my index variable is of long integer type.
>
> Hum. I authored the initial version of the multidimensional minimization
> functions which have been improved by others, but I don't think they
> have been improved in such a way that optimization in dimension 300 000
> is possible. Even for a quadratic function and using a conjugate
> gradient algorithm, it will take 3e5 iterations to the algorithm to
> complete. As each iteration implies a linear search for minimum that
> will use at least around 10 evaluations of your function, your a looking
> for 3e6 function evaluation. Unless you have a very special problem, I
> would guess that a function evaluation take at least 3e5 operations to
> complete, which implies 9e11 operations (a.k.a. 2^39 operations). That's
> a lot. And we are talking about a quadratic form, remember ?
>
> Perhaps you could tell us a little bit more about your application ?
>
> Fabrice
>
> PS : Support Vector Machine algorithms have proven that it is indeed
> possible to do quadratic optimization under constraints in high
> dimensional space (like 10000 dimensions), but they use _very_ complex
> methods.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help: working with large gsl_vector
  2004-12-14 23:03   ` Toan T Nguyen
@ 2004-12-15 11:12     ` Fabrice Rossi
  2004-12-15 12:11       ` Robert G. Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Fabrice Rossi @ 2004-12-15 11:12 UTC (permalink / raw)
  To: gsl-discuss

Toan T Nguyen wrote:
> Hi, thank you for the analysis. Apparently, my analytical calculation is able 
> to guess the minimum reasonably good, so I need only 20000 iterations to find 
> the minimum. 

Nice!

> However, the strange thing is if I take the final minimized vector and run it 
> through the conjugate gradient routine again, the energy (my function to 
> minimize) keeps going down for a few more 1000s iterations (albeit it change 
> only the 4th significant digit). Do you know what's wrong?

It might be related to the restarting of the algorithm. A conjugate 
gradient calculates is descent direction thanks to the gradient at the 
current point and to the previously used direction. The first descent 
direction is (minus) the gradient alone. When you restart the algorithm, 
it will use the gradient rather than the descent direction it should 
have used.

> I follow the example in the GSL document and set my success criteria as
> 
> status = gsl_multimin_test_gradient (s->gradient, 1e-20);

1e-20 is very very small, maybe to close to machine precision to give 
reliable results. But I'm far from being an expert in this domain. What 
look strange is that if the algorithm stops because of this criteria, I 
don't know why it can continue again after. It would mean that the norm 
of the gradient is going up after having reached 1e-20 once. This might 
be related to numerical errors.

> The initial setup of the vector is
> 
> T = gsl_multimin_fdfminimizer_conjugate_pr;
> s = gsl_multimin_fdfminimizer_alloc (T, 3*nparts);
> gsl_multimin_fdfminimizer_set (s, &energy, x, 0.001, 1e-12);

1e-12 is again quite small.

> For clarity, I used dimensionless units so ALL the nearest neighbor distances 
> and the coupling constants are of the order of unity. nparts is the number of 
> particles which is about 100000.
> 
> Could you give some advises? Thanks a lot in advance.
> 
> Toan

I think that you are basically puting the code under some pressure, 
because of the size of your problem and of the very low limits you are 
using. To my knowledge, it has not been tested under such constraints 
and it's therefore difficult to give sensible advice.

Fabrice

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help: working with large gsl_vector
  2004-12-15 11:12     ` Fabrice Rossi
@ 2004-12-15 12:11       ` Robert G. Brown
  2004-12-15 17:31         ` Linas Vepstas
  0 siblings, 1 reply; 7+ messages in thread
From: Robert G. Brown @ 2004-12-15 12:11 UTC (permalink / raw)
  To: Fabrice Rossi; +Cc: gsl-discuss

On Wed, 15 Dec 2004, Fabrice Rossi wrote:

> Toan T Nguyen wrote:
> > Hi, thank you for the analysis. Apparently, my analytical calculation is able 
> > to guess the minimum reasonably good, so I need only 20000 iterations to find 
> > the minimum. 
> 
> Nice!
> 
> > However, the strange thing is if I take the final minimized vector and run it 
> > through the conjugate gradient routine again, the energy (my function to 
> > minimize) keeps going down for a few more 1000s iterations (albeit it change 
> > only the 4th significant digit). Do you know what's wrong?
> 
> It might be related to the restarting of the algorithm. A conjugate 
> gradient calculates is descent direction thanks to the gradient at the 
> current point and to the previously used direction. The first descent 
> direction is (minus) the gradient alone. When you restart the algorithm, 
> it will use the gradient rather than the descent direction it should 
> have used.

A second possibility is that if you save the final minimized vector at
some precision and reload it, that you are beginning a gradient decent
from a new point in the neighborhood of the old point.  This is, in
fact, how deterministic chaos was discovered in a manner of speaking.

If the problem is one with high complexity, this may be enough to push
it to a different local minimum in the particular well where the first
local minimum is found.

Note that I say local minimum.  Unless you have a really sound
theoretical basis for assuming a single, smooth global minimum in any
problem with high dimensionality, you should presume as a default that
the topology of the minimization surface is neither monotonic nor
smooth, and that multiple local minima exist.  In fact, local minima may
even be "dense" (or nearly so) on the underlying space so that nearly
ANY minimum you find by merely going downhill is a local minimum, and
may not even be a "good" minimum.

Think rugged landscape.  Gradient decent methods basically go downhill
from whereever they are started.  From many/most starting points, this
will take you down to the nearest footprint in the sand, to a nearby
hole in the ground, to the bottom of a pond.  It is relatively unlikely
to take you to the bottom of the deepest ocean trench.  And if you move
the starting point over just a little bit, you're as likely to go down
into a DIFFERENT footprint in the sand as back into the same one -- it's
still a local minimum, just a different one.

There are various easy ways to test for this sort of behavior.  The
easiest one is to just run the code lots of times with completely random
starting data and see if you go down to the same minimum every time.
Also look at the distribution of minima that you get -- are they all
about the same quality or is there one that is occassionally much better
than the others.

If the latter is the case, you'll have to resort to much more powerful
and complex methods to get a "good" minimum on your space.  Most likely
genetic algorithms and/or simulated annealing.

   rgb

> 
> > I follow the example in the GSL document and set my success criteria as
> > 
> > status = gsl_multimin_test_gradient (s->gradient, 1e-20);
> 
> 1e-20 is very very small, maybe to close to machine precision to give 
> reliable results. But I'm far from being an expert in this domain. What 
> look strange is that if the algorithm stops because of this criteria, I 
> don't know why it can continue again after. It would mean that the norm 
> of the gradient is going up after having reached 1e-20 once. This might 
> be related to numerical errors.
> 
> > The initial setup of the vector is
> > 
> > T = gsl_multimin_fdfminimizer_conjugate_pr;
> > s = gsl_multimin_fdfminimizer_alloc (T, 3*nparts);
> > gsl_multimin_fdfminimizer_set (s, &energy, x, 0.001, 1e-12);
> 
> 1e-12 is again quite small.
> 
> > For clarity, I used dimensionless units so ALL the nearest neighbor distances 
> > and the coupling constants are of the order of unity. nparts is the number of 
> > particles which is about 100000.
> > 
> > Could you give some advises? Thanks a lot in advance.
> > 
> > Toan
> 
> I think that you are basically puting the code under some pressure, 
> because of the size of your problem and of the very low limits you are 
> using. To my knowledge, it has not been tested under such constraints 
> and it's therefore difficult to give sensible advice.
> 
> Fabrice
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help: working with large gsl_vector
  2004-12-15 12:11       ` Robert G. Brown
@ 2004-12-15 17:31         ` Linas Vepstas
  0 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2004-12-15 17:31 UTC (permalink / raw)
  To: Robert G. Brown; +Cc: Fabrice Rossi, gsl-discuss

On Wed, Dec 15, 2004 at 08:07:38AM -0500, Robert G. Brown was heard to remark:
> 
> Note that I say local minimum.  Unless you have a really sound
> theoretical basis for assuming a single, smooth global minimum in any
> problem with high dimensionality, you should presume as a default that
> the topology of the minimization surface is neither monotonic nor
> smooth, and that multiple local minima exist.  In fact, local minima may
> even be "dense" (or nearly so) on the underlying space so that nearly
> ANY minimum you find by merely going downhill is a local minimum, and
> may not even be a "good" minimum.

i.e the bottom may be fractal. With that many free variables, 
it wouldn't be a surprise.

> There are various easy ways to test for this sort of behavior.  The
> easiest one is to just run the code lots of times with completely random
> starting data and see if you go down to the same minimum every time.
> Also look at the distribution of minima that you get -- are they all
> about the same quality or is there one that is occassionally much better
> than the others.
> 
> If the latter is the case, you'll have to resort to much more powerful
> and complex methods to get a "good" minimum on your space.  Most likely
> genetic algorithms and/or simulated annealing.

And if the bottom of the thing is fractal, asking for the minimum might
be "the wrong question".  

Just because the problem seems to involve analtyic relations doesn't
mean its not fractal.  Weierstrass in 1872 considered the sum

    Sum_n=0^\infty  a^n sin (\pi x b^n) 

which is insanely fractal.  Running the sum from n=0 to "merely" n=2000 
and looking for the minimum would lead to numerical insanity.

--linas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-12-15 17:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-11  7:44 Help: working with large gsl_vector Toan T Nguyen
2004-12-13 18:41 ` Linas Vepstas
2004-12-13 18:51 ` Fabrice Rossi
2004-12-14 23:03   ` Toan T Nguyen
2004-12-15 11:12     ` Fabrice Rossi
2004-12-15 12:11       ` Robert G. Brown
2004-12-15 17:31         ` Linas Vepstas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).