Hello everybody, I was wondering if someone could comment on the accuracy of gsl_cdf_binomial_P() function gsl implementation for large n (n is about a few thousand). for different values of p and when the result of cdf is in the tails ( small less then 0.05 and large -- above 0.95) Thank you very much ZF
An interesting (but "homework-like" ~;) question - and fun to answer too. Anyway, I'd probably compare GSL results with those from other sources. I had easy access to gsl_cdf_binomial_P (v 1.14), R pbinom(k,n,p), binomCDF (Excel 2007) and dcdflib (Fortran - Brown, Lovato & Russel; U. Texas; November, 1997). For a sample size of n=1000, a trial probability of p=0.01 and number of successes of s=1 thru 40, the CDF values from dcdclib and the R 2.13.0 stats package pbinom() function (http://cran.r-project.org/) show no difference. Mean absolute deviations for these 40 tests, comparing pbinom with gsl_cdf_binomial_P and with binomCDF, show MAD of 2.319E-15 and 3.296E-15 respectively. My "commend"? Looks as if we all have to decide when to STOP accumulating small terms, and some stop earlier than others. While I always test functions in Excel against other sources before release in a report, anything showing a MAD below 4E-15 sure beats using my slide rule (which didn't have an incomplete beta function anyway ~;). Well Howell On 6/2/2011 12:49 AM, Z F wrote: > Hello everybody, > > I was wondering if someone could comment on the accuracy of gsl_cdf_binomial_P() function gsl implementation for large n (n is about a few thousand). > for different values of p and when the result of cdf is in the tails ( small less then 0.05 and large -- above 0.95) > > Thank you very much > > ZF > >
Dear Well Howell, --- On Sun, 6/5/11, Well Howell <whowell@superlink.net> wrote: > An interesting (but "homework-like" > ~;) question - and fun to answer too. > > Anyway, I'd probably compare GSL results with those from > other sources. > > I had easy access to gsl_cdf_binomial_P (v 1.14), R > pbinom(k,n,p), > binomCDF > (Excel 2007) and dcdflib (Fortran - Brown, Lovato & > Russel; U. Texas; > November, 1997). > > For a sample size of n=1000, a trial probability of p=0.01 > and number of > successes of > s=1 thru 40, the CDF values from dcdclib and the R 2.13.0 > stats package > pbinom() > function (http://cran.r-project.org/) show no > difference. > Thank you very much for your reply. It seems I was not clear with my question. I am not looking for a comparison with other libraries, but rather for information regarding the approximations used to obtain the values of CDF. What I am afraid of is that a Gaussian approximation is used for a large sample, rendering values in the tails of the distribution error-prone. I someone could provide any info on the subject or maybe point in the "right direction" , I would highly appreciate it. Thanks again ZF > > On 6/2/2011 12:49 AM, Z F wrote: > > Hello everybody, > > > > I was wondering if someone could comment on the > accuracy of gsl_cdf_binomial_P() function gsl implementation > for large n (n is about a few thousand). > > for different values of p and when the result of cdf > is in the tails ( small less then 0.05 and large -- above > 0.95) > > > > Thank you very much > > > > ZF > > > > > >
I do see a testing function (beta_series) that only tries sample sizes
smaller than
n=512, but I can't easily find any use of a gaussian approximation in
the source
code for the 1.14 version of GSL.
I don't expect some of the other sources I tested against to use the
gaussian
either, so my finding that all 4 methods agree within about ten times
the IEEE
eps value of 2.2204E-16 would be proof enough for me to NOT fully read the
beta_inc.c source code.
Funny history - I was first asked if I was using the gaussian
approximation to
the binomial in the mid 60's, and was able to answer that I was using
the exact
binomial ~;)
On 6/5/2011 10:19 PM, Z F wrote:
> Dear Well Howell,
>
> --- On Sun, 6/5/11, Well Howell<whowell@superlink.net> wrote:
>
>> An interesting (but "homework-like"
>> ~;) question - and fun to answer too.
>>
>> Anyway, I'd probably compare GSL results with those from
>> other sources.
>>
>> I had easy access to gsl_cdf_binomial_P (v 1.14), R
>> pbinom(k,n,p),
>> binomCDF
>> (Excel 2007) and dcdflib (Fortran - Brown, Lovato&
>> Russel; U. Texas;
>> November, 1997).
>>
>> For a sample size of n=1000, a trial probability of p=0.01
>> and number of
>> successes of
>> s=1 thru 40, the CDF values from dcdclib and the R 2.13.0
>> stats package
>> pbinom()
>> function (http://cran.r-project.org/) show no
>> difference.
>>
> Thank you very much for your reply.
> It seems I was not clear with my question. I am not looking for a
> comparison with other libraries, but rather for information regarding
> the approximations used to obtain the values of CDF. What I am afraid of
> is that a Gaussian approximation is used for a large sample, rendering
> values in the tails of the distribution error-prone.
>
> I someone could provide any info on the subject or maybe point in the "right direction" , I would highly appreciate it.
>
>
> Thanks again
>
> ZF
>
>
>> On 6/2/2011 12:49 AM, Z F wrote:
>>> Hello everybody,
>>>
>>> I was wondering if someone could comment on the
>> accuracy of gsl_cdf_binomial_P() function gsl implementation
>> for large n (n is about a few thousand).
>>> for different values of p and when the result of cdf
>> is in the tails ( small less then 0.05 and large -- above
>> 0.95)
>>> Thank you very much
>>>
>>> ZF
>>>
>>>
>>
>