public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Optimisation of std::binary_search of the <algorithm> header
@ 2017-05-29  4:39 jay pokarna
  2017-05-29  6:55 ` Tim Song
  0 siblings, 1 reply; 7+ messages in thread
From: jay pokarna @ 2017-05-29  4:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: libstdc++

Hey,

I am Jay.

I have written code for an optimised version of the binary_search
algorithm of the algorithm header file of the standard template
library.

I have implemented it for the integer data type, but it can be
implemented for any other data type without any changes in the
algorithm as such.

You are requested to give me feedback on my implementation.


#include<iostream>
#include<stdlib.h>
#include<math.h>
using namespace std;

bool binary_search(int arr[], int size, int ele, bool *fun_ptr);
bool binary_search(int arr[] , int size , int ele);

/*
This version of the binary_search finds an element ele in the array
arr of length size.
*/

/*
Assumptions:
1) Elements are sorted in increasing order in the array
2) The parameter size is the number of elements in the array not the
index of the last element,
i.e. the search space is arr[0,size)
3) Though , I have implemented this version with the int data type,
it is also compatible with other data types like strings , or user
defined data types.
In those cases , only the comparison operation will change without any
changes in the
algorithm.
*/

/*
  If the specified element is in the array, then the function returns
1 else returns 0.
*/

bool binary_search(int arr[] , int size , int ele)
{

/*
The element cannot be in the array if it is larger than the largest
element or smaller than the smallest element
*/

  if( ele < arr[0]||arr[size-1] < ele)
  {
    return bool(0);
  }

  /*
After this statement , the element is definitely not outside the array,
but it may or may not be in the array.
  */
  int t=sqrt(size);
  int start =0,end=t*t;

  /*
The algorithm uses the technique of square root decomposition.
It breaks down the problem (the array of length size)
to smaller problems each of length sqrt(size).
It does a kind of binary_search on the first element
of each subarray to find the subproblem to which the element belongs.
  */

  /*
The if part of the if else loop below finds which subproblem the element ele
belongs to.

The last subproblem won't be of length sqrt(size) if size is not a
perfect square.
So , as to handle this corner case , we need the else part of the loop.
  */
  if(ele < arr[end])
  {
    while(end-start>t)
    {
      int mid = (start+end)/2;

      mid -= (mid%t);

      if(arr[mid]==ele)

      {
        return bool(1);
      }
      else if(arr[mid] < ele)
      {
        start = mid;
      }
      else
      {
        end = mid;
      }
    }
  }
  else
  {
    start = end;
    end = size;
  }

  /*
After this if else loop , [start , end] will have a portion of the
array of maximum length sqrt(size)
which may contain the element.
  */

  /*
The loop below does a normal binary_search in the range [start,end]
  */
  while(end>=start)
  {
    int mid = start + ((end-start)/2);

if(ele ==arr[mid])
{
return bool(1);
}

    else if(arr[mid] < ele)
    {
      start = mid+1;
    }
    else
    {
      end = mid-1;
    }
  }
  return bool(0);
}

// Same Code as above just with a comparator specified

bool binary_search(int arr[] , int size , int ele, bool (*fun_ptr)(int ,int))
{

/*
The element cannot be in the array if it is larger than the largest
element or smaller than the smallest element
*/

  if(fun_ptr(ele,arr[0])||fun_ptr(arr[size-1],ele))
  {
    return bool(0);
  }

  /*
After this statement , the element is definitely not outside the array,
but it may or may not be in the array.
  */
  int t=sqrt(size);
  int start =0,end=t*t;

  /*
The algorithm uses the technique of square root decomposition.
It breaks down the problem (the array of length size)
to smaller problems each of length sqrt(size).
It does a kind of binary_search on the first element
of each subarray to find the subproblem to which the element belongs.
  */

  /*
The if part of the if else loop below finds which subproblem the element ele
belongs to.

The last subproblem won't be of length sqrt(size) if size is not a
perfect square.
So , as to handle this corner case , we need the else part of the loop.
  */
  if(fun_ptr(ele,arr[end]))
  {
    while(end-start>t)
    {
      int mid = (start+end)/2;

      mid -= (mid%t);

      if(!fun_ptr(arr[mid],ele)&&!fun_ptr(ele,arr[mid]))
       /*
This is equivalent to checking the condition
ele == arr[mid]
       */
      {
        return bool(1);
      }
      else if(fun_ptr(arr[mid],ele))
      {
        start = mid;
      }
      else
      {
        end = mid;
      }
    }
  }
  else
  {
    start = end;
    end = size;
  }

  /*
After this if else construct , [start , end] will have a portion of
the array of maximum length sqrt(size)
which may contain the element
  */

  /*
The loop below does a normal binary_search in the range [start,end]
  */
  while(end>=start)
  {
    int mid = start + ((end-start)/2);

if(!fun_ptr(arr[mid],ele)&&!fun_ptr(ele,arr[mid]))
       /*
This is equivalent to checking the condition
ele == arr[mid]
       */
{
return bool(1);
}

    else if(fun_ptr(arr[mid],ele))
    {
      start = mid+1;
    }
    else
    {
      end = mid-1;
    }
  }
  return bool(0);
}



Regards,
Jay Pokarna

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimisation of std::binary_search of the <algorithm> header
  2017-05-29  4:39 Optimisation of std::binary_search of the <algorithm> header jay pokarna
@ 2017-05-29  6:55 ` Tim Song
       [not found]   ` <CAK61LuC-Wk2wxyVO5_FQgiHEA3d8Tk5yCLpZKLq4EMht_cu=vw@mail.gmail.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Song @ 2017-05-29  6:55 UTC (permalink / raw)
  To: jay pokarna; +Cc: gcc-patches, libstdc++

This is not a patch. Nor is it an implementation of
std::binary_search, which is a function template that accepts
arbitrary forward iterators and optionally arbitrary comparator
objects, rather than just pointers.

If you want to find someone to review your code, consider something
like https://codereview.stackexchange.com/.

While I'm not a maintainer, I strongly suspect that for something like
this to have a chance of getting into libstdc++, it needs to, at the
very minimum, implement the standard signature and be accompanied by
sufficient performance benchmarks showing that it is better performing
than the existing implementation.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimisation of std::binary_search of the <algorithm> header
       [not found]     ` <CAPQZVxuY-Motx8YH6hRDn13NrZsPF4buhVSq_J1jAes6kkaXZQ@mail.gmail.com>
@ 2017-05-29  8:14       ` jay pokarna
       [not found]         ` <CAG4ZjNk-4oTnFAPrFDTZVs+JxZTYF=J91jDhGPJgUVZ_9kN6FA@mail.gmail.com>
  2017-05-30 22:26         ` Mike Stump
  0 siblings, 2 replies; 7+ messages in thread
From: jay pokarna @ 2017-05-29  8:14 UTC (permalink / raw)
  To: Tim Song; +Cc: libstdc++, gcc-patches

Respected Sir,
                          I am sorry , for the use of wrong language
in the previous mail. I wanted to convey that c++ has generalised the
algorithm on various data structures , which is not required due to
low performance.

Could you give me the contact of the standard committee?

Regards,
Jay Pokarna

On Mon, May 29, 2017 at 1:13 PM, Tim Song <t.canens.cpp@gmail.com> wrote:
> I'm not sure if you forgot to CC the lists or intended to direct the
> email to me alone.
>
> On Mon, May 29, 2017 at 2:41 AM, jay pokarna <jay.pokarna10@gmail.com> wrote:
>> I know that cpp wants to generalise its methods so that they can be
>> used with various data structures. But the cost of generalisation is
>> that we have to compromise a lot on performance.
>
> That's neither here nor there. binary_search's performance as applied
> to forward iterators has nothing to do with its performance as to
> random access iterators.
>
>> I would like to recommend cpp to allow the use of binary_search only
>> on data structures that use random access models.
>
> C++ has an international standard and GCC/libstdc++ is an
> implementation of that standard. Proposal for changes to the standard
> should be directed to the standards committee, not GCC's mailing
> lists.
>
>> The technique that I have used is square root decomposition . I think
>> that it will be better than the one that is implemented.
>
> And here's the problem: you *think* it will be better. Just thinking
> is not enough. You need to *prove* it with benchmarks that show that
> your technique is in fact faster than the current one.
>
> If there is in fact substantial improvement, *then* it's the time to
> consider generalization: when is this technique faster than the stock
> one? Always? Only random access? Only pointers? Only built-in types?
> Again, this needs to be shown with appropriate benchmark numbers.
>
> Then, finally, you can write a patch proposing that libstdc++'s
> binary_search be modified to use this technique in situations when
> it's shown to be faster.
>
> For a recent example of how something like this should look like, see
> https://gcc.gnu.org/ml/libstdc++/2016-12/msg00051.html and its
> LLVM/libc++ counterpart, https://reviews.llvm.org/D27068. Note the
> copious benchmark numbers showing that the proposed change was indeed
> (much) better than the previous.



-- 
Regards,
Jay Pokarna
CS Sophomore
Wordpress | Linkedin
Birla Institute of Technology and Science, Pilani
Pilani Campus
Rajasthan - 333031.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimisation of std::binary_search of the <algorithm> header
       [not found]         ` <CAG4ZjNk-4oTnFAPrFDTZVs+JxZTYF=J91jDhGPJgUVZ_9kN6FA@mail.gmail.com>
@ 2017-05-29 11:11           ` jay pokarna
  0 siblings, 0 replies; 7+ messages in thread
From: jay pokarna @ 2017-05-29 11:11 UTC (permalink / raw)
  To: Tim Shen; +Cc: libstdc++, gcc-patches

Respected Sir,
                         Could you give me the contact of the standard
committee which handles changes to the c++ standard.

Regards,
Jay Pokarna

On Mon, May 29, 2017 at 2:17 PM, Tim Shen <timshen@google.com> wrote:
> On Mon, May 29, 2017 at 1:05 AM, jay pokarna <jay.pokarna10@gmail.com> wrote:
>>>> The technique that I have used is square root decomposition . I think
>>>> that it will be better than the one that is implemented.
>>>
>>> And here's the problem: you *think* it will be better. Just thinking
>>> is not enough. You need to *prove* it with benchmarks that show that
>>> your technique is in fact faster than the current one.
>
> Agreed.
>
> Jay, specifically, your algorithm has the roughly the same running time:
>
>   T(n) = log (sqrt(n) + 1) + log sqrt(n)
>          > 2 log (n ^ 0.5)
>          = 2 * 0.5 * log n
>          = log n
>
> It's unclear to me whether it's better than the normal binary search
> or not. Detailed and representative benchmarks may convince more
> people.
>
>
> --
> Regards,
> Tim Shen



-- 
Regards,
Jay Pokarna
CS Sophomore
Wordpress | Linkedin
Birla Institute of Technology and Science, Pilani
Pilani Campus
Rajasthan - 333031.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimisation of std::binary_search of the <algorithm> header
  2017-05-29  8:14       ` jay pokarna
       [not found]         ` <CAG4ZjNk-4oTnFAPrFDTZVs+JxZTYF=J91jDhGPJgUVZ_9kN6FA@mail.gmail.com>
@ 2017-05-30 22:26         ` Mike Stump
  2017-05-31  7:34           ` jay pokarna
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Stump @ 2017-05-30 22:26 UTC (permalink / raw)
  To: jay pokarna; +Cc: Tim Song, libstdc++, gcc-patches

On May 29, 2017, at 1:05 AM, jay pokarna <jay.pokarna10@gmail.com> wrote:
> 
> Could you give me the contact of the standard committee?

https://isocpp.org/std/the-committee

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimisation of std::binary_search of the <algorithm> header
  2017-05-30 22:26         ` Mike Stump
@ 2017-05-31  7:34           ` jay pokarna
  2017-05-31 19:38             ` Mike Stump
  0 siblings, 1 reply; 7+ messages in thread
From: jay pokarna @ 2017-05-31  7:34 UTC (permalink / raw)
  To: Mike Stump; +Cc: Tim Song, Tim Shen, libstdc++, gcc-patches

Hey,
        Could you tell the way as to how can I measure the time taken
by my algorithm and compare it with the inbuilt functions ?
My algorithm is similar to std::binary_search in working.

Also , could you recommend some data that could be helpful to help the
comparison between the function and the std::binary_search?

Thanks,
Jay Pokarna

On Wed, May 31, 2017 at 3:50 AM, Mike Stump <mikestump@comcast.net> wrote:
> On May 29, 2017, at 1:05 AM, jay pokarna <jay.pokarna10@gmail.com> wrote:
>>
>> Could you give me the contact of the standard committee?
>
> https://isocpp.org/std/the-committee
>



-- 
Regards,
Jay Pokarna
CS Sophomore
Wordpress | Linkedin
Birla Institute of Technology and Science, Pilani
Pilani Campus
Rajasthan - 333031.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimisation of std::binary_search of the <algorithm> header
  2017-05-31  7:34           ` jay pokarna
@ 2017-05-31 19:38             ` Mike Stump
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Stump @ 2017-05-31 19:38 UTC (permalink / raw)
  To: jay pokarna; +Cc: Tim Song, Tim Shen, libstdc++, gcc-patches

On May 31, 2017, at 12:33 AM, jay pokarna <jay.pokarna10@gmail.com> wrote:
> 
>        Could you tell the way as to how can I measure the time taken
> by my algorithm and compare it with the inbuilt functions ?

No, that's beyond our charter.  We review patches for gcc.  I'd recommend google, it has answers to most questions.  I think this one might be covered.  [ quick check ] https://gist.github.com/nfarring/1624742, yeah, it's covered.

> Also , could you recommend some data that could be helpful to help the
> comparison between the function and the std::binary_search?

I think google can find some data for you, just search; try google("data sets for searching").

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-05-31 19:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-29  4:39 Optimisation of std::binary_search of the <algorithm> header jay pokarna
2017-05-29  6:55 ` Tim Song
     [not found]   ` <CAK61LuC-Wk2wxyVO5_FQgiHEA3d8Tk5yCLpZKLq4EMht_cu=vw@mail.gmail.com>
     [not found]     ` <CAPQZVxuY-Motx8YH6hRDn13NrZsPF4buhVSq_J1jAes6kkaXZQ@mail.gmail.com>
2017-05-29  8:14       ` jay pokarna
     [not found]         ` <CAG4ZjNk-4oTnFAPrFDTZVs+JxZTYF=J91jDhGPJgUVZ_9kN6FA@mail.gmail.com>
2017-05-29 11:11           ` jay pokarna
2017-05-30 22:26         ` Mike Stump
2017-05-31  7:34           ` jay pokarna
2017-05-31 19:38             ` Mike Stump

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).