public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
* Flat (uniform) random distribution and some development/contribution queries
@ 2010-05-10 11:15 Joseph Wakeling
  2010-05-10 14:23 ` Joseph Wakeling
  2010-05-11 14:42 ` Brian Gough
  0 siblings, 2 replies; 5+ messages in thread
From: Joseph Wakeling @ 2010-05-10 11:15 UTC (permalink / raw)
  To: gsl-discuss

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

Hello all,

While browsing the source code I noticed that the flat/uniform
distribution provided in GSL's randist.h provided only a 'half-open'
interval [a,b) akin to what gsl_rng_uniform() provides for [0,1) and not
also an 'open' interval (a,b) akin to what gsl_rng_uniform_pos()
provides for (0,1).

So I wrote a couple of extra functions, which are attached as a git
patch in case they are useful/welcome.  (Apologies if instead they are
stupid and I'm missing something; this is a very off-the-cuff
contribution, but it seemed easy enough to deliver and trivial enough to
not mind if it's rejected:-)

More general development question, which is the main reason for joining
this list.  A project I'm working on needs to employ random sampling on
several occasions: to be precise, the case of selecting n unique records
from the set {1, ..., N}, as described in these articles:
http://doi.acm.org/10.1145/358105.893
http://doi.acm.org/10.1145/23002.23003
http://doi.acm.org/10.1145/79505.356313

The functionality seems generic and useful enough that I was surprised
not to find a library available.  Anyway, it seems like it would be easy
enough to implement as part of GSL.  Is there interest in having this?
If so, I'll map out a brief API description and/or sample code.

Best wishes,

    -- Joe

[-- Attachment #2: 0001-Flat-uniform-distribution-over-open-interval.patch.txt --]
[-- Type: text/plain, Size: 2380 bytes --]

From 09e5cdc0dd3991e69982bce6872edf201f25e7ed Mon Sep 17 00:00:00 2001
From: Joseph Rushton Wakeling <joe@webdrake.net>
Date: Mon, 10 May 2010 09:13:10 +0200
Subject: [PATCH] Flat (uniform) distribution over open interval.

* Two new functions that provide random variates and the
  pdf of a flat (uniform) distribution over the open
  interval (a, b).

* Corrected a potentially misleading comment in the
  function for a uniform distribution on the half-closed
  interval [a, b).
---
 randist/flat.c        |   33 ++++++++++++++++++++++++++++++++-
 randist/gsl_randist.h |    2 ++
 2 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/randist/flat.c b/randist/flat.c
index 996366e..0a72a85 100644
--- a/randist/flat.c
+++ b/randist/flat.c
@@ -34,7 +34,7 @@ gsl_ran_flat (const gsl_rng * r, const double a, const double b)
 {
   double u = gsl_rng_uniform (r);
 
-  /* A uniform distribution over [a,b] */
+  /* A uniform distribution over the half-open interval [a,b) */
 
   return a * (1 - u) + b * u;
 }
@@ -51,3 +51,34 @@ gsl_ran_flat_pdf (double x, const double a, const double b)
       return 0;
     }
 }
+
+
+/* This is the uniform distribution in the range (a, b)
+
+   p(x) dx = 1/(b-a) dx   if  a < x < b
+   .....   = 0            otherwise
+
+ */
+
+double
+gsl_ran_flat_open (const gsl_rng *r, const double a, const double b)
+{
+  double u = gsl_rng_uniform_pos (r);
+
+  /* A uniform distribution over the open interval (a, b) */
+
+  return a * (1 - u) + b * u;
+}
+
+double
+gsl_ran_flat_open_pdf (double x, const double a, const double b)
+{
+  if (x < b && x > a)
+    {
+      return 1 / (b - a);
+    }
+  else
+    {
+      return 0;
+    }
+}
diff --git a/randist/gsl_randist.h b/randist/gsl_randist.h
index 6f4b0e3..7218c76 100644
--- a/randist/gsl_randist.h
+++ b/randist/gsl_randist.h
@@ -68,6 +68,8 @@ double gsl_ran_fdist_pdf (const double x, const double nu1, const double nu2);
 
 double gsl_ran_flat (const gsl_rng * r, const double a, const double b);
 double gsl_ran_flat_pdf (double x, const double a, const double b);
+double gsl_ran_flat_open (const gsl_rng * r, const double a, const double b);
+double gsl_ran_flat_open_pdf (double x, const double a, const double b);
 
 double gsl_ran_gamma (const gsl_rng * r, const double a, const double b);
 double gsl_ran_gamma_int (const gsl_rng * r, const unsigned int a);
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Flat (uniform) random distribution and some development/contribution queries
  2010-05-10 11:15 Flat (uniform) random distribution and some development/contribution queries Joseph Wakeling
@ 2010-05-10 14:23 ` Joseph Wakeling
  2010-05-11 14:42   ` Brian Gough
  2010-05-11 14:42 ` Brian Gough
  1 sibling, 1 reply; 5+ messages in thread
From: Joseph Wakeling @ 2010-05-10 14:23 UTC (permalink / raw)
  To: gsl-discuss

Further to previous message ...

> The functionality seems generic and useful enough that I was surprised
> not to find a library available.  Anyway, it seems like it would be easy
> enough to implement as part of GSL.  Is there interest in having this?
> If so, I'll map out a brief API description and/or sample code.

... yes, I _have_ read the part of the GSL website which says,

> To maintain stability, any new functionality is encouraged as packages,
> built on top of GSL and maintained independently by their authors, as
> in other free software projects. The design of GSL permits extensions
> to be used alongside the existing library easily by simple linking.

What I'd like to know is if there are any recommended practices for
making my work "friendly" to the GSL in the sense that it can be readily
incorporated back if/when it gets the necessary level of approval.
Questions would include,

   (i) Is it OK to have functions named along the lines of,

            gsl_mylib_func()

       ... as per typical GSL fashion, so that the API will remain
       stable if/when the code is incorporated into GSL?

  (ii) Are there any recommended practices related to VCS that will
       make it easier to incorporate a semi-independent project into
       GSL and preserve version history?

 (iii) What's the appropriate etiquette to follow regarding discussion,
       of the project status?  How much discussion can/should I make on
       this and other GSL lists, for example?  I would like the work to
       be developed in as close collaboration with the wider GSL
       community as possible.

Thanks & best wishes,

    -- Joe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Flat (uniform) random distribution and some development/contribution queries
  2010-05-10 11:15 Flat (uniform) random distribution and some development/contribution queries Joseph Wakeling
  2010-05-10 14:23 ` Joseph Wakeling
@ 2010-05-11 14:42 ` Brian Gough
  2010-05-11 15:35   ` Joseph Wakeling
  1 sibling, 1 reply; 5+ messages in thread
From: Brian Gough @ 2010-05-11 14:42 UTC (permalink / raw)
  To: Joseph Wakeling; +Cc: gsl-discuss

At Mon, 10 May 2010 13:14:53 +0200,
Joseph Wakeling wrote:
> While browsing the source code I noticed that the flat/uniform
> distribution provided in GSL's randist.h provided only a 'half-open'
> interval [a,b) akin to what gsl_rng_uniform() provides for [0,1) and not
> also an 'open' interval (a,b) akin to what gsl_rng_uniform_pos()
> provides for (0,1).
> 
> So I wrote a couple of extra functions, which are attached as a git
> patch in case they are useful/welcome.  (Apologies if instead they are
> stupid and I'm missing something; this is a very off-the-cuff
> contribution, but it seemed easy enough to deliver and trivial enough to
> not mind if it's rejected:-)

Thanks, I will correct the comment.  I don't think I will include the
additional functions in this case as they are so similar to the
existing ones.

> More general development question, which is the main reason for joining
> this list.  A project I'm working on needs to employ random sampling on
> several occasions: to be precise, the case of selecting n unique records
> from the set {1, ..., N}, as described in these articles:
> http://doi.acm.org/10.1145/358105.893
> http://doi.acm.org/10.1145/23002.23003
> http://doi.acm.org/10.1145/79505.356313

I think this is what gsl_ran_choose does.  The implementation isn't as
efficient as those described but it gives the same result.

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Flat (uniform) random distribution and some development/contribution queries
  2010-05-10 14:23 ` Joseph Wakeling
@ 2010-05-11 14:42   ` Brian Gough
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Gough @ 2010-05-11 14:42 UTC (permalink / raw)
  To: Joseph Wakeling; +Cc: gsl-discuss

At Mon, 10 May 2010 16:23:17 +0200,
Joseph Wakeling wrote:
> What I'd like to know is if there are any recommended practices for
> making my work "friendly" to the GSL in the sense that it can be readily
> incorporated back if/when it gets the necessary level of approval.

Since I'm limited in the time I can spend on GSL these days the main
thing is for any code I deal with to be well tested and debugged in
actual use before submission.  If it is in specific areas like linear
algebra maybe other people like Patrick Alken can help.  If there are
multiple changes needed after something is submitted it becomes
problematic, particularly if they necessitate changes to the API.

>    (i) Is it OK to have functions named along the lines of,
> 
>             gsl_mylib_func()
> 
>        ... as per typical GSL fashion, so that the API will remain
>        stable if/when the code is incorporated into GSL?

I'd recommend using the prefix mylib_ to avoid confusion with existing
gsl functions.

>   (ii) Are there any recommended practices related to VCS that will
>        make it easier to incorporate a semi-independent project into
>        GSL and preserve version history?

Currently
bzr branch http://bzr.savannah.gnu.org/r/gsl/trunk

>  (iii) What's the appropriate etiquette to follow regarding discussion,
>        of the project status?  How much discussion can/should I make on
>        this and other GSL lists, for example?  I would like the work to
>        be developed in as close collaboration with the wider GSL
>        community as possible.

You can use this list, it is the best place I think.

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Flat (uniform) random distribution and some development/contribution queries
  2010-05-11 14:42 ` Brian Gough
@ 2010-05-11 15:35   ` Joseph Wakeling
  0 siblings, 0 replies; 5+ messages in thread
From: Joseph Wakeling @ 2010-05-11 15:35 UTC (permalink / raw)
  To: gsl-discuss

On 05/11/2010 03:59 PM, Brian Gough wrote:
> I think this is what gsl_ran_choose does.  The implementation isn't as
> efficient as those described but it gives the same result.

Ahh, yes.  The interest I have is in doing something a bit different --
sampling which doesn't require the data to be presented in a predefined
set as gsl_ran_choose does, so it's space- as well as speed-efficient.

The particular motivation in my case was having to make an efficient
selection of a subset of links from a fully-connected bipartite graph.
That means choosing from N*M potential links, where N and M could both
be very large.

> Since I'm limited in the time I can spend on GSL these days the main
> thing is for any code I deal with to be well tested and debugged in
> actual use before submission.  If it is in specific areas like linear
> algebra maybe other people like Patrick Alken can help.  If there are
> multiple changes needed after something is submitted it becomes
> problematic, particularly if they necessitate changes to the API.

Completely understand.  What I will aim for is to develop this code
independently but with a design that will make it easy to incorporate
into GSL proper if/when it has received sufficient review.

If that never happens ... well, I'll have had fun working on it, and it
will be useful for my own work in any case.

> Currently
> bzr branch http://bzr.savannah.gnu.org/r/gsl/trunk

... yea, but I don't want to branch the whole of GSL :-)

What I'll do is, the moment I have something working-ish with even the
stupidest demo, I'll put it up in a new Savannah project and make a
little announcement here and on the help-gsl list just so people know
what is happening.  (Incidentally, part of the demo could be an
alternative version of gsl_ran_choose, for purposes of comparison...).

From the GSL point of view I guess the main useful feedback would be on
design factors to make it fit well with GSL proper.  On the testing side
I'll try and build some bridges with the sampling algorithms community
-- I think there are some packages contributed to R, so maybe the people
responsible would give some review or contribution.  I guess the
existing tests for gsl_ran_choose would also apply.

Best wishes,

    -- Joe

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-05-11 15:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-10 11:15 Flat (uniform) random distribution and some development/contribution queries Joseph Wakeling
2010-05-10 14:23 ` Joseph Wakeling
2010-05-11 14:42   ` Brian Gough
2010-05-11 14:42 ` Brian Gough
2010-05-11 15:35   ` Joseph Wakeling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).