RE: High-dimensional Minimization without analytical derivatives

public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed

* RE: High-dimensional Minimization without analytical derivatives
@ 2004-09-01 15:30 Anatoliy Belaygorod
  2004-09-02  9:26 ` Manoj Warrier
  2004-09-03 15:15 ` Brian Gough
  0 siblings, 2 replies; 13+ messages in thread
From: Anatoliy Belaygorod @ 2004-09-01 15:30 UTC (permalink / raw)
  To: Jason Stover; +Cc: gsl-discuss

Thank you for replying. 
It is funny that you are telling me this about Bayesian methods though. The fact is that I am as hard-core Bayesian as you can find :). In fact, my advisor is one of the biggest people in the Bayesian field.
The maximization that I am writing is needed to implement namely M-H algorithm. The simplest form of it is Random Walk MH. However, in order to calculate the variance for the proposal density, I need to mind MLE and evaluate the negative inverse Hessian at MLE to get the variance for my proposal density.
Another algorithm I will be writing involves something called MH Tailoring (my advisor developed it), which also requires maximization routine within each iteration for each MH-block. In this case though, the dimensionality of the maximization problem will be smaller, depending on Blocking.
So, my question still stands:
What is the most efficient way ( I am talking <computational speed>, not necessarily precision) to maximize a high-dimensional (anywhere between 4 and 20 dimensions) function using GSL <without> specifying analytic derivatives? Am I better of using Nelder Mead Simplex algorithm that doesn't require gradient at all, or should I write "my_df" function to include a loop for numerically evaluating gradient one direction-at-a-time (N-dimensional gradient is nothing but a collection of N one-dimensional derivatives, so I should be able to loop from 1 to N and use gsl_diff_central) and use gsl_multimin_fdfminimizer_vector_bfgs?
My co-author uses Gauss, and he tells me that in Gauss there is a driver that uses BFGS, while calculating numerical derivatives automatically on the background. It seems like I should be able to replicate this driver with GSL (if I do what I just described above), but is there anything that I am missing? I would hate to see Gauss driver to run faster than my C code, because I missed some important point or some trick. Clearly, if I provided analytical derivatives, BFGS would be faster than Nelder Mead Simplex. But if BFGS has access only to numerical derivatives... Which one is better in most cases?
I think this question should have come up for many people looking to solve a high dimensional min/max problem, because even if in theory analytic derivatives were available, it is simply not feasible to code them all up in, say, 100, or 1000-dimensional problem. So, one either has to loop through numerical derivatives, or live without gradient all together.
But which one is better?

________________________________

From: Jason Stover [mailto:jstover@sdf.lonestar.org]
Sent: Wed 9/1/2004 9:42 AM
To: Anatoliy Belaygorod
Cc: gsl-discuss@sources.redhat.com
Subject: Re: High-dimensional Minimization without analytical derivatives

Maximizing a likelihood function with 17 parameters and
80 observations is likely to get you odd answers, e.g., many
of your estimates will be on the boundary of the parameter space.

Another possible approach is a Bayesian one: define some reasonable
prior distributions for your parameters, then run a simulation to find
the regions where your posterior density function is highest. A common
simulation algorithm here is the Metropolis-Hastings algorithm, about
which there is a lot of literature. One popular book is "Markov Chain
Monte Carlo in Practice" by W. R. Gilks, S. Richardson,
D. J. Spiegelhalter. Coding such a simulation isn't difficult.

With MCMC simulations, you can also specify the dimension of the
parameter vector as a parameter itself. Peter Green wrote a paper
about this:

Reversible jump Markov chain Monte Carlo computation and Bayesian
model determination, Biometrika, 82, 711-732 (1995)

-Jason

On Mon, Aug 30, 2004 at 05:11:11PM -0500, Anatoliy Belaygorod wrote:
> Hello,
> I need to find a (local) maximum of my likelihood function with 80 datapoints over the 17-dimensional parameter space. I want to use gsl_multimin_fdfminimizer_vector_bfgs, (or some other gradient-based algorithm), but I would really hate to specify 17 (or maybe much more if we change the model) analytic derivatives.
> Can you please tell me if I have better options? Can I use the one-dimensional numerical derivatives gsl_diff_central instead of analytic ones when I write "my_df" function for BFGS? How would this approach (if it is feasible at all) compare to Nelder Mead Simplex algorithm provided in my version of GSL 1.4? Is there a better option that would involve numerical gradient evaluation coupled with BFGS method?
> 
> I really appreciate any help or advice you can give me.
> Sincerely,
> Anatoliy
> 
>

--
jstover@sdf.lonestar.org
SDF Public Access UNIX System - http://sdf.lonestar.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: High-dimensional Minimization without analytical derivatives
  2004-09-01 15:30 High-dimensional Minimization without analytical derivatives Anatoliy Belaygorod
@ 2004-09-02  9:26 ` Manoj Warrier
  2004-09-03 15:15 ` Brian Gough
  1 sibling, 0 replies; 13+ messages in thread
From: Manoj Warrier @ 2004-09-02  9:26 UTC (permalink / raw)
  To: gsl-discuss

On Wed, 2004-09-01 at 17:17, Anatoliy Belaygorod wrote:
> Thank you for replying. 
> It is funny that you are telling me this about Bayesian methods though. The fact is that I am as hard-core Bayesian as you can find :). In fact, my advisor is one of the biggest people in the Bayesian field.
Ha Ha. Just laughing because you said it is funny. Perhaps next time
when you ask questions, you should send a CV of yourself, and of your
advisor, .... oops almost forgot also CV of your co-workers. Then people
wont tickle you with answers in the field you all are "hard-core" in or
"Big" in.

I agree with Jason's statement which was:
"Maximizing a likelihood function with 17 parameters and
 80 observations is likely to get you odd answers, e.g., many
 of your estimates will be on the boundary of the parameter space."

> The maximization that I am writing is needed to implement namely M-H algorithm. The simplest form of it is Random Walk MH. However, in order to calculate the variance for the proposal density, I need to mind MLE and evaluate the negative inverse Hessian at MLE to get the variance for my proposal density.
> Another algorithm I will be writing involves something called MH Tailoring (my advisor developed it), which also requires maximization routine within each iteration for each MH-block. In this case though, the dimensionality of the maximization problem will be smaller, depending on Blocking.
> So, my question still stands:
> What is the most efficient way ( I am talking <computational speed>, not necessarily precision) to maximize a high-dimensional (anywhere between 4 and 20 dimensions) function using GSL <without> specifying analytic derivatives? Am I better of using Nelder Mead Simplex algorithm that doesn't require gradient at all, or should I write "my_df" function to include a loop for numerically evaluating gradient one direction-at-a-time (N-dimensional gradient is nothing but a collection of N one-dimensional derivatives, so I should be able to loop from 1 to N and use gsl_diff_central) and use gsl_multimin_fdfminimizer_vector_bfgs?
> My co-author uses Gauss, and he tells me that in Gauss there is a driver that uses BFGS, while calculating numerical derivatives automatically on the background. It seems like I should be able to replicate this driver with GSL (if I do what I just described above), but is there anything that I am missing? I would hate to see Gauss driver to run faster than my C code, because I missed some important point or some trick. Clearly, if I provided analytical derivatives, BFGS would be faster than Nelder Mead Simplex. But if BFGS has access only to numerical derivatives... Which one is better in most cases?
> I think this question should have come up for many people looking to solve a high dimensional min/max problem, because even if in theory analytic derivatives were available, it is simply not feasible to code them all up in, say, 100, or 1000-dimensional problem. So, one either has to loop through numerical derivatives, or live without gradient all together.
> But which one is better?
> 
> ________________________________
> 
> From: Jason Stover [mailto:jstover@sdf.lonestar.org]
> Sent: Wed 9/1/2004 9:42 AM
> To: Anatoliy Belaygorod
> Cc: gsl-discuss@sources.redhat.com
> Subject: Re: High-dimensional Minimization without analytical derivatives
> 
> 
> 
> 
> 
> Maximizing a likelihood function with 17 parameters and
> 80 observations is likely to get you odd answers, e.g., many
> of your estimates will be on the boundary of the parameter space.
> 
> Another possible approach is a Bayesian one: define some reasonable
> prior distributions for your parameters, then run a simulation to find
> the regions where your posterior density function is highest. A common
> simulation algorithm here is the Metropolis-Hastings algorithm, about
> which there is a lot of literature. One popular book is "Markov Chain
> Monte Carlo in Practice" by W. R. Gilks, S. Richardson,
> D. J. Spiegelhalter. Coding such a simulation isn't difficult.
> 
> With MCMC simulations, you can also specify the dimension of the
> parameter vector as a parameter itself. Peter Green wrote a paper
> about this:
> 
> Reversible jump Markov chain Monte Carlo computation and Bayesian
> model determination, Biometrika, 82, 711-732 (1995)
> 
> -Jason
> 
> On Mon, Aug 30, 2004 at 05:11:11PM -0500, Anatoliy Belaygorod wrote:
> > Hello,
> > I need to find a (local) maximum of my likelihood function with 80 datapoints over the 17-dimensional parameter space. I want to use gsl_multimin_fdfminimizer_vector_bfgs, (or some other gradient-based algorithm), but I would really hate to specify 17 (or maybe much more if we change the model) analytic derivatives.
> > Can you please tell me if I have better options? Can I use the one-dimensional numerical derivatives gsl_diff_central instead of analytic ones when I write "my_df" function for BFGS? How would this approach (if it is feasible at all) compare to Nelder Mead Simplex algorithm provided in my version of GSL 1.4? Is there a better option that would involve numerical gradient evaluation coupled with BFGS method?
> > 
> > I really appreciate any help or advice you can give me.
> > Sincerely,
> > Anatoliy
> > 
> >
> 
> --
> jstover@sdf.lonestar.org
> SDF Public Access UNIX System - http://sdf.lonestar.org
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: High-dimensional Minimization without analytical derivatives
  2004-09-01 15:30 High-dimensional Minimization without analytical derivatives Anatoliy Belaygorod
  2004-09-02  9:26 ` Manoj Warrier
@ 2004-09-03 15:15 ` Brian Gough
  1 sibling, 0 replies; 13+ messages in thread
From: Brian Gough @ 2004-09-03 15:15 UTC (permalink / raw)
  To: Anatoliy Belaygorod; +Cc: gsl-discuss

Anatoliy Belaygorod writes:
 > But which one is better?

The computational effort always depends on the details of the function
and the suitability of the starting point(s), as well as the algorithm.

In the absence of any theoretical guidance, one has to try numerical
derivatives and the simplex method, and see which is faster.

-- 
Brian Gough

Network Theory Ltd,
Commercial support for GSL --- http://www.network-theory.co.uk/gsl/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: High-dimensional Minimization without analytical derivatives
  2004-09-06  8:18 Anatoliy Belaygorod
@ 2004-09-08 19:08 ` Linas Vepstas
  0 siblings, 0 replies; 13+ messages in thread
From: Linas Vepstas @ 2004-09-08 19:08 UTC (permalink / raw)
  To: Anatoliy Belaygorod; +Cc: Joakim Hove, gsl-discuss

On Sat, Sep 04, 2004 at 09:48:01AM -0500, Anatoliy Belaygorod was heard to remark:
> JDL is right, the main characteristic that I cared about was whether or
> not Simulated Annealing is robust enough to jump from one 'valley' to
> the next in the search of the GLOBAL min, unlike gradient based methods

Simulated annealing does this by going 'uphill' every now and then, 
with a probability of (1-exp(-kT)).  By going uphill, it can hop out of
a local minima.  Its called "annealing" because you slowly lower the
temperature T during the run of the algo.  The idea is at the end of the
algo, you are less likely to hop out of the good minimum you've found,
and back into a bad one.  In real life, "annealing" means "cooling slowly"; 
it allows the atomic dislocations in metal/glass to work themsleves out, 
slowly, making the metal/glass less brittle.

--linas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: High-dimensional Minimization without analytical derivatives
@ 2004-09-06  8:18 Anatoliy Belaygorod
  2004-09-08 19:08 ` Linas Vepstas
  0 siblings, 1 reply; 13+ messages in thread
From: Anatoliy Belaygorod @ 2004-09-06  8:18 UTC (permalink / raw)
  To: Joakim Hove; +Cc: gsl-discuss

> as far as I am aware
the simplex method is restricted to *linear* problems

Well, that's what I thought too at first, but in the example provided in
the "multidimensional Minimization" chapter their target function is
quadratic with respect to variables over which they minimize. Also, I
have heard testimonies of people successfully using _nmsimplex method
for some other 'highly'-nonlinear problems. 
Oops, it looks like JDL already answered my question a minute ago.
JDL is right, the main characteristic that I cared about was whether or
not Simulated Annealing is robust enough to jump from one 'valley' to
the next in the search of the GLOBAL min, unlike gradient based methods
that are more likely to get stuck in a local min. 
Thank you.
Anatoliy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: High-dimensional Minimization without analytical derivatives
  2004-09-05 21:07     ` James Theiler
@ 2004-09-05 21:29       ` Robert G. Brown
  0 siblings, 0 replies; 13+ messages in thread
From: Robert G. Brown @ 2004-09-05 21:29 UTC (permalink / raw)
  To: James Theiler; +Cc: John Lamb, gsl-discuss

On Sun, 5 Sep 2004, James Theiler wrote:

> On Sat, 4 Sep 2004, John Lamb wrote:
> 
> ] Calling simulated annealing SA is also potentially problematic because 
> ] there is a Stochastic Approximation algorithm due to Keifer and 
> ] Wolfowitz that is commonly known as SA. It can be used on nonstochastic 
> ] problems by adding noise and has guaranteed convergence to a global 
> ] optimum, albeit the convergence is often too slow for SA to be practical.
> 
> Just a minor point, but SA (stochastic approximation) does not 
> guarantee convergence to a global minimum.  There are theorems 
> about convergence of SA (simulated annealing) to a global minimum, 
> but that convergence is arguably too slow to be practical.
> 
> My own opinion[*] is that the only practical way to guarantee
> convergence to the global minimum is to make sure you are minimizing a
> convex function -- that may not be saying a lot, since for a convex
> function, there is only one local minimum and it is the global
> minimum. And it certainly doesn't help if the function you need to
> minimize doesn't happen to be convex.  Sorry if this is getting afield
> of discussions about how to use GSL.

Or to put it more bluntly, if you could guarantee convergence to a
global minimum for systems of high dimensionality, I think there are a
few serious physics problems waiting for you and a lifetime position at
the Santa Fe institute to boot.

This is a >>hard<< problem, and Simulated Annealing and Genetic
Algorithms are some of the very few approaches to do something
approaching a good job.  In that they'll often give a "good" minimax,
not necessarily or generally the "best" minimax.

They also need a gradient methodology of one sort or another as a
finishing step, as they are more suited to finding the right (or a
"good") valley, not decending efficiently to the bottom of the valley
they are in.

> jt
> 
> [*] whenever you combine "practical" and "guarantee" into a sentence, 
> I think you are inevitably in the realm of opinion.

I practically guaranteed that this is a true statement;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: High-dimensional Minimization without analytical derivatives
  2004-09-04 11:10   ` John Lamb
@ 2004-09-05 21:07     ` James Theiler
  2004-09-05 21:29       ` Robert G. Brown
  0 siblings, 1 reply; 13+ messages in thread
From: James Theiler @ 2004-09-05 21:07 UTC (permalink / raw)
  To: John Lamb; +Cc: gsl-discuss

On Sat, 4 Sep 2004, John Lamb wrote:

] Calling simulated annealing SA is also potentially problematic because 
] there is a Stochastic Approximation algorithm due to Keifer and 
] Wolfowitz that is commonly known as SA. It can be used on nonstochastic 
] problems by adding noise and has guaranteed convergence to a global 
] optimum, albeit the convergence is often too slow for SA to be practical.

Just a minor point, but SA (stochastic approximation) does not 
guarantee convergence to a global minimum.  There are theorems 
about convergence of SA (simulated annealing) to a global minimum, 
but that convergence is arguably too slow to be practical.

My own opinion[*] is that the only practical way to guarantee
convergence to the global minimum is to make sure you are minimizing a
convex function -- that may not be saying a lot, since for a convex
function, there is only one local minimum and it is the global
minimum. And it certainly doesn't help if the function you need to
minimize doesn't happen to be convex.  Sorry if this is getting afield
of discussions about how to use GSL.

jt

[*] whenever you combine "practical" and "guarantee" into a sentence, 
I think you are inevitably in the realm of opinion.

-- 
James Theiler            Space and Remote Sensing Sciences
MS-B244, ISR-2, LANL     Los Alamos National Laboratory
Los Alamos, NM 87545     http://nis-www.lanl.gov/~jt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: High-dimensional Minimization without analytical derivatives
  2004-09-04  9:30 ` Joakim Hove
  2004-09-04 11:10   ` John Lamb
@ 2004-09-04 17:50   ` Ivo Alxneit
  1 sibling, 0 replies; 13+ messages in thread
From: Ivo Alxneit @ 2004-09-04 17:50 UTC (permalink / raw)
  To: Joakim Hove; +Cc: gsl-discuss

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday 04 September 2004 10:33 am, Joakim Hove wrote:
> "Anatoliy Belaygorod" <belaygorod@wustl.edu> writes:
> > My understanding is that 'in general' in high-dimensional cases with
> > rough surface, the Simulated Annealing (SA) method is better tuned
> > for finding a GLOBAL maximum , than Gradient-based methods, because
> > the latter are better tuned for 'zeroing in' the local maximums.  In
> > that regard, is Simplex Method closer to SA, or Gradient-based
> > methods?
>
> Well, excuse me if I am completely off base, but as far as I am aware
> the simplex method is restricted to *linear* problems - where it is
> 'guaranteed' to find the optimial solution. Gradient based methods and
> SA can be used for more general (nonlinear) problems, but can really
> not be compared to the two.
>
> JH

i think you refer to the simplex algorithm of linear programming. the 
algorithm of nelder and mead is something quite different (i am, however, not 
sure whether the same name was really chosen by coincidence). nelder mead 
does work to find minimas of a multidimensional (non-linear) surface.

with regard to the original question: both, nelder mead simplex or a gradient 
based method, will never guarantee that the minimum found (IF one is found) 
is a global one. you can easyly see that both do not even find the nearest by 
minimum (just construct a surface that has the same function values and maybe 
gradients at the points evaluated and then insert a local minimum somewhere 
between these point. both algorithms will just skip over it if the same 
starting point is used again) . you can become "quite" sure if you try a lot 
of different starting points and try to find other minimas. but this is 
expensive. the only methode i know of that is quite reliable to find a global 
minimum is simulated annealing which is really expensive. even in simulated 
annealing it can be tricky (or time consuming) to set up a reliable cooling 
strategy that produces the global minimum. 
- -- 
Dr. Ivo Alxneit
Laboratory for Solar Technology   phone: +41 56 310 4092
CH-5232 Villigen                    fax: +41 56 310 2624
Paul Scherrer Institute          http://solar.web.psi.ch
Switzerland                        gnupg key: 0x515E30C7
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBOgbPAd7CE1FeMMcRAioFAKDL43oL3+FsjWYVHb+f9jPGQduiXgCfZu22
xV72TKidWOh6KFB1cxNx/pA=
=0BXv
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: High-dimensional Minimization without analytical derivatives
  2004-09-04  9:30 ` Joakim Hove
@ 2004-09-04 11:10   ` John Lamb
  2004-09-05 21:07     ` James Theiler
  2004-09-04 17:50   ` Ivo Alxneit
  1 sibling, 1 reply; 13+ messages in thread
From: John Lamb @ 2004-09-04 11:10 UTC (permalink / raw)
  To: gsl-discuss

>>the latter are better tuned for 'zeroing in' the local maximums.  In
>>that regard, is Simplex Method closer to SA, or Gradient-based
>>methods?
> 
> 
> Well, excuse me if I am completely off base, but as far as I am aware
> the simplex method is restricted to *linear* problems - where it is
> 'guaranteed' to find the optimial solution. Gradient based methods and

I think you are off base here, though understandably so, There are two 
optimisation techniques known as 'simplex method'. I think you have in 
mind the better known one, used for linear programming. The one used to 
find minima of nonlinear functions is due to Nelder and Mead and is 
sometimes called 'simplex method' because it modifies a simplex 
(generalisation of a triangle to many dimensions) in searching for a 
solution.

I think of Nelder-Mead as closer to simulated annealing inasmuch as it 
cwon't necessarily move to the nearest local minimum and doesn't use 
gradients, analytic or estimated.

Calling simulated annealing SA is also potentially problematic because 
there is a Stochastic Approximation algorithm due to Keifer and 
Wolfowitz that is commonly known as SA. It can be used on nonstochastic 
problems by adding noise and has guaranteed convergence to a global 
optimum, albeit the convergence is often too slow for SA to be practical.

-- 
JDL

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: High-dimensional Minimization without analytical derivatives
  2004-09-03 16:45 Anatoliy Belaygorod
@ 2004-09-04  9:30 ` Joakim Hove
  2004-09-04 11:10   ` John Lamb
  2004-09-04 17:50   ` Ivo Alxneit
  0 siblings, 2 replies; 13+ messages in thread
From: Joakim Hove @ 2004-09-04  9:30 UTC (permalink / raw)
  To: Anatoliy Belaygorod; +Cc: gsl-discuss

"Anatoliy Belaygorod" <belaygorod@wustl.edu> writes:

> My understanding is that 'in general' in high-dimensional cases with
> rough surface, the Simulated Annealing (SA) method is better tuned
> for finding a GLOBAL maximum , than Gradient-based methods, because
> the latter are better tuned for 'zeroing in' the local maximums.  In
> that regard, is Simplex Method closer to SA, or Gradient-based
> methods?

Well, excuse me if I am completely off base, but as far as I am aware
the simplex method is restricted to *linear* problems - where it is
'guaranteed' to find the optimial solution. Gradient based methods and
SA can be used for more general (nonlinear) problems, but can really
not be compared to the two.

JH



-- 
Joakim Hove
hove AT ift uib no
+47 (55 5)8 27 90
http://www.ift.uib.no/~hove/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: High-dimensional Minimization without analytical derivatives
@ 2004-09-03 16:45 Anatoliy Belaygorod
  2004-09-04  9:30 ` Joakim Hove
  0 siblings, 1 reply; 13+ messages in thread
From: Anatoliy Belaygorod @ 2004-09-03 16:45 UTC (permalink / raw)
  To: Brian Gough; +Cc: gsl-discuss

My understanding is that 'in general' in high-dimensional cases with rough surface, the Simulated Annealing (SA) method is better tuned for finding a GLOBAL maximum , than Gradient-based methods, because the latter are better tuned for 'zeroing in' the local maximums.
In that regard, is Simplex Method closer to SA, or Gradient-based methods?

________________________________

From: Brian Gough [mailto:bjg@network-theory.co.uk]
Sent: Fri 9/3/2004 10:14 AM
To: Anatoliy Belaygorod
Cc: gsl-discuss@sources.redhat.com
Subject: RE: High-dimensional Minimization without analytical derivatives

Anatoliy Belaygorod writes:
 > But which one is better?

The computational effort always depends on the details of the function
and the suitability of the starting point(s), as well as the algorithm.

In the absence of any theoretical guidance, one has to try numerical
derivatives and the simplex method, and see which is faster.

--
Brian Gough

Network Theory Ltd,
Commercial support for GSL --- http://www.network-theory.co.uk/gsl/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: High-dimensional Minimization without analytical derivatives
  2004-08-31  8:30 Anatoliy Belaygorod
@ 2004-09-01 15:30 ` Jason Stover
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Stover @ 2004-09-01 15:30 UTC (permalink / raw)
  To: Anatoliy Belaygorod; +Cc: gsl-discuss

Maximizing a likelihood function with 17 parameters and
80 observations is likely to get you odd answers, e.g., many
of your estimates will be on the boundary of the parameter space.

Another possible approach is a Bayesian one: define some reasonable
prior distributions for your parameters, then run a simulation to find
the regions where your posterior density function is highest. A common
simulation algorithm here is the Metropolis-Hastings algorithm, about
which there is a lot of literature. One popular book is "Markov Chain
Monte Carlo in Practice" by W. R. Gilks, S. Richardson,
D. J. Spiegelhalter. Coding such a simulation isn't difficult.

With MCMC simulations, you can also specify the dimension of the
parameter vector as a parameter itself. Peter Green wrote a paper
about this:

Reversible jump Markov chain Monte Carlo computation and Bayesian
model determination, Biometrika, 82, 711-732 (1995)

-Jason

On Mon, Aug 30, 2004 at 05:11:11PM -0500, Anatoliy Belaygorod wrote:
> Hello,
> I need to find a (local) maximum of my likelihood function with 80 datapoints over the 17-dimensional parameter space. I want to use gsl_multimin_fdfminimizer_vector_bfgs, (or some other gradient-based algorithm), but I would really hate to specify 17 (or maybe much more if we change the model) analytic derivatives. 
> Can you please tell me if I have better options? Can I use the one-dimensional numerical derivatives gsl_diff_central instead of analytic ones when I write "my_df" function for BFGS? How would this approach (if it is feasible at all) compare to Nelder Mead Simplex algorithm provided in my version of GSL 1.4? Is there a better option that would involve numerical gradient evaluation coupled with BFGS method?
>  
> I really appreciate any help or advice you can give me.
> Sincerely,
> Anatoliy
>  
> 

-- 
jstover@sdf.lonestar.org
SDF Public Access UNIX System - http://sdf.lonestar.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* High-dimensional Minimization without analytical derivatives
@ 2004-08-31  8:30 Anatoliy Belaygorod
  2004-09-01 15:30 ` Jason Stover
  0 siblings, 1 reply; 13+ messages in thread
From: Anatoliy Belaygorod @ 2004-08-31  8:30 UTC (permalink / raw)
  To: gsl-discuss

Hello,
I need to find a (local) maximum of my likelihood function with 80 datapoints over the 17-dimensional parameter space. I want to use gsl_multimin_fdfminimizer_vector_bfgs, (or some other gradient-based algorithm), but I would really hate to specify 17 (or maybe much more if we change the model) analytic derivatives. 
Can you please tell me if I have better options? Can I use the one-dimensional numerical derivatives gsl_diff_central instead of analytic ones when I write "my_df" function for BFGS? How would this approach (if it is feasible at all) compare to Nelder Mead Simplex algorithm provided in my version of GSL 1.4? Is there a better option that would involve numerical gradient evaluation coupled with BFGS method?

I really appreciate any help or advice you can give me.
Sincerely,
Anatoliy

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-09-08 19:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-01 15:30 High-dimensional Minimization without analytical derivatives Anatoliy Belaygorod
2004-09-02  9:26 ` Manoj Warrier
2004-09-03 15:15 ` Brian Gough
  -- strict thread matches above, loose matches on Subject: below --
2004-09-06  8:18 Anatoliy Belaygorod
2004-09-08 19:08 ` Linas Vepstas
2004-09-03 16:45 Anatoliy Belaygorod
2004-09-04  9:30 ` Joakim Hove
2004-09-04 11:10   ` John Lamb
2004-09-05 21:07     ` James Theiler
2004-09-05 21:29       ` Robert G. Brown
2004-09-04 17:50   ` Ivo Alxneit
2004-08-31  8:30 Anatoliy Belaygorod
2004-09-01 15:30 ` Jason Stover

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).