*Optimisation of std::binary_search of the <algorithm> header@ 2017-05-29 4:39 jay pokarna2017-05-29 6:55 ` Tim Song 0 siblings, 1 reply; 7+ messages in thread From: jay pokarna @ 2017-05-29 4:39 UTC (permalink / raw) To: gcc-patches;+Cc:libstdc++ Hey, I am Jay. I have written code for an optimised version of the binary_search algorithm of the algorithm header file of the standard template library. I have implemented it for the integer data type, but it can be implemented for any other data type without any changes in the algorithm as such. You are requested to give me feedback on my implementation. #include<iostream> #include<stdlib.h> #include<math.h> using namespace std; bool binary_search(int arr[], int size, int ele, bool *fun_ptr); bool binary_search(int arr[] , int size , int ele); /* This version of the binary_search finds an element ele in the array arr of length size. */ /* Assumptions: 1) Elements are sorted in increasing order in the array 2) The parameter size is the number of elements in the array not the index of the last element, i.e. the search space is arr[0,size) 3) Though , I have implemented this version with the int data type, it is also compatible with other data types like strings , or user defined data types. In those cases , only the comparison operation will change without any changes in the algorithm. */ /* If the specified element is in the array, then the function returns 1 else returns 0. */ bool binary_search(int arr[] , int size , int ele) { /* The element cannot be in the array if it is larger than the largest element or smaller than the smallest element */ if( ele < arr[0]||arr[size-1] < ele) { return bool(0); } /* After this statement , the element is definitely not outside the array, but it may or may not be in the array. */ int t=sqrt(size); int start =0,end=t*t; /* The algorithm uses the technique of square root decomposition. It breaks down the problem (the array of length size) to smaller problems each of length sqrt(size). It does a kind of binary_search on the first element of each subarray to find the subproblem to which the element belongs. */ /* The if part of the if else loop below finds which subproblem the element ele belongs to. The last subproblem won't be of length sqrt(size) if size is not a perfect square. So , as to handle this corner case , we need the else part of the loop. */ if(ele < arr[end]) { while(end-start>t) { int mid = (start+end)/2; mid -= (mid%t); if(arr[mid]==ele) { return bool(1); } else if(arr[mid] < ele) { start = mid; } else { end = mid; } } } else { start = end; end = size; } /* After this if else loop , [start , end] will have a portion of the array of maximum length sqrt(size) which may contain the element. */ /* The loop below does a normal binary_search in the range [start,end] */ while(end>=start) { int mid = start + ((end-start)/2); if(ele ==arr[mid]) { return bool(1); } else if(arr[mid] < ele) { start = mid+1; } else { end = mid-1; } } return bool(0); } // Same Code as above just with a comparator specified bool binary_search(int arr[] , int size , int ele, bool (*fun_ptr)(int ,int)) { /* The element cannot be in the array if it is larger than the largest element or smaller than the smallest element */ if(fun_ptr(ele,arr[0])||fun_ptr(arr[size-1],ele)) { return bool(0); } /* After this statement , the element is definitely not outside the array, but it may or may not be in the array. */ int t=sqrt(size); int start =0,end=t*t; /* The algorithm uses the technique of square root decomposition. It breaks down the problem (the array of length size) to smaller problems each of length sqrt(size). It does a kind of binary_search on the first element of each subarray to find the subproblem to which the element belongs. */ /* The if part of the if else loop below finds which subproblem the element ele belongs to. The last subproblem won't be of length sqrt(size) if size is not a perfect square. So , as to handle this corner case , we need the else part of the loop. */ if(fun_ptr(ele,arr[end])) { while(end-start>t) { int mid = (start+end)/2; mid -= (mid%t); if(!fun_ptr(arr[mid],ele)&&!fun_ptr(ele,arr[mid])) /* This is equivalent to checking the condition ele == arr[mid] */ { return bool(1); } else if(fun_ptr(arr[mid],ele)) { start = mid; } else { end = mid; } } } else { start = end; end = size; } /* After this if else construct , [start , end] will have a portion of the array of maximum length sqrt(size) which may contain the element */ /* The loop below does a normal binary_search in the range [start,end] */ while(end>=start) { int mid = start + ((end-start)/2); if(!fun_ptr(arr[mid],ele)&&!fun_ptr(ele,arr[mid])) /* This is equivalent to checking the condition ele == arr[mid] */ { return bool(1); } else if(fun_ptr(arr[mid],ele)) { start = mid+1; } else { end = mid-1; } } return bool(0); } Regards, Jay Pokarna ^ permalink raw reply [flat|nested] 7+ messages in thread

*Re: Optimisation of std::binary_search of the <algorithm> header2017-05-29 4:39 Optimisation of std::binary_search of the <algorithm> header jay pokarna@ 2017-05-29 6:55 ` Tim Song[not found] ` <CAK61LuC-Wk2wxyVO5_FQgiHEA3d8Tk5yCLpZKLq4EMht_cu=vw@mail.gmail.com> 0 siblings, 1 reply; 7+ messages in thread From: Tim Song @ 2017-05-29 6:55 UTC (permalink / raw) To: jay pokarna;+Cc:gcc-patches, libstdc++ This is not a patch. Nor is it an implementation of std::binary_search, which is a function template that accepts arbitrary forward iterators and optionally arbitrary comparator objects, rather than just pointers. If you want to find someone to review your code, consider something like https://codereview.stackexchange.com/. While I'm not a maintainer, I strongly suspect that for something like this to have a chance of getting into libstdc++, it needs to, at the very minimum, implement the standard signature and be accompanied by sufficient performance benchmarks showing that it is better performing than the existing implementation. ^ permalink raw reply [flat|nested] 7+ messages in thread

*[not found] ` <CAPQZVxuY-Motx8YH6hRDn13NrZsPF4buhVSq_J1jAes6kkaXZQ@mail.gmail.com>Re: Optimisation of std::binary_search of the <algorithm> header@ 2017-05-29 8:14 ` jay pokarna[not found] ` <CAG4ZjNk-4oTnFAPrFDTZVs+JxZTYF=J91jDhGPJgUVZ_9kN6FA@mail.gmail.com> 2017-05-30 22:26 ` Mike Stump 0 siblings, 2 replies; 7+ messages in thread From: jay pokarna @ 2017-05-29 8:14 UTC (permalink / raw) To: Tim Song;+Cc:libstdc++, gcc-patches Respected Sir, I am sorry , for the use of wrong language in the previous mail. I wanted to convey that c++ has generalised the algorithm on various data structures , which is not required due to low performance. Could you give me the contact of the standard committee? Regards, Jay Pokarna On Mon, May 29, 2017 at 1:13 PM, Tim Song <t.canens.cpp@gmail.com> wrote: > I'm not sure if you forgot to CC the lists or intended to direct the > email to me alone. > > On Mon, May 29, 2017 at 2:41 AM, jay pokarna <jay.pokarna10@gmail.com> wrote: >> I know that cpp wants to generalise its methods so that they can be >> used with various data structures. But the cost of generalisation is >> that we have to compromise a lot on performance. > > That's neither here nor there. binary_search's performance as applied > to forward iterators has nothing to do with its performance as to > random access iterators. > >> I would like to recommend cpp to allow the use of binary_search only >> on data structures that use random access models. > > C++ has an international standard and GCC/libstdc++ is an > implementation of that standard. Proposal for changes to the standard > should be directed to the standards committee, not GCC's mailing > lists. > >> The technique that I have used is square root decomposition . I think >> that it will be better than the one that is implemented. > > And here's the problem: you *think* it will be better. Just thinking > is not enough. You need to *prove* it with benchmarks that show that > your technique is in fact faster than the current one. > > If there is in fact substantial improvement, *then* it's the time to > consider generalization: when is this technique faster than the stock > one? Always? Only random access? Only pointers? Only built-in types? > Again, this needs to be shown with appropriate benchmark numbers. > > Then, finally, you can write a patch proposing that libstdc++'s > binary_search be modified to use this technique in situations when > it's shown to be faster. > > For a recent example of how something like this should look like, see > https://gcc.gnu.org/ml/libstdc++/2016-12/msg00051.html and its > LLVM/libc++ counterpart, https://reviews.llvm.org/D27068. Note the > copious benchmark numbers showing that the proposed change was indeed > (much) better than the previous. -- Regards, Jay Pokarna CS Sophomore Wordpress | Linkedin Birla Institute of Technology and Science, Pilani Pilani Campus Rajasthan - 333031. ^ permalink raw reply [flat|nested] 7+ messages in thread

*Re: Optimisation of std::binary_search of the <algorithm> header[not found] ` <CAG4ZjNk-4oTnFAPrFDTZVs+JxZTYF=J91jDhGPJgUVZ_9kN6FA@mail.gmail.com>@ 2017-05-29 11:11 ` jay pokarna0 siblings, 0 replies; 7+ messages in thread From: jay pokarna @ 2017-05-29 11:11 UTC (permalink / raw) To: Tim Shen;+Cc:libstdc++, gcc-patches Respected Sir, Could you give me the contact of the standard committee which handles changes to the c++ standard. Regards, Jay Pokarna On Mon, May 29, 2017 at 2:17 PM, Tim Shen <timshen@google.com> wrote: > On Mon, May 29, 2017 at 1:05 AM, jay pokarna <jay.pokarna10@gmail.com> wrote: >>>> The technique that I have used is square root decomposition . I think >>>> that it will be better than the one that is implemented. >>> >>> And here's the problem: you *think* it will be better. Just thinking >>> is not enough. You need to *prove* it with benchmarks that show that >>> your technique is in fact faster than the current one. > > Agreed. > > Jay, specifically, your algorithm has the roughly the same running time: > > T(n) = log (sqrt(n) + 1) + log sqrt(n) > > 2 log (n ^ 0.5) > = 2 * 0.5 * log n > = log n > > It's unclear to me whether it's better than the normal binary search > or not. Detailed and representative benchmarks may convince more > people. > > > -- > Regards, > Tim Shen -- Regards, Jay Pokarna CS Sophomore Wordpress | Linkedin Birla Institute of Technology and Science, Pilani Pilani Campus Rajasthan - 333031. ^ permalink raw reply [flat|nested] 7+ messages in thread

*Re: Optimisation of std::binary_search of the <algorithm> header2017-05-29 8:14 ` jay pokarna [not found] ` <CAG4ZjNk-4oTnFAPrFDTZVs+JxZTYF=J91jDhGPJgUVZ_9kN6FA@mail.gmail.com>@ 2017-05-30 22:26 ` Mike Stump2017-05-31 7:34 ` jay pokarna 1 sibling, 1 reply; 7+ messages in thread From: Mike Stump @ 2017-05-30 22:26 UTC (permalink / raw) To: jay pokarna;+Cc:Tim Song, libstdc++, gcc-patches On May 29, 2017, at 1:05 AM, jay pokarna <jay.pokarna10@gmail.com> wrote: > > Could you give me the contact of the standard committee? https://isocpp.org/std/the-committee ^ permalink raw reply [flat|nested] 7+ messages in thread

*Re: Optimisation of std::binary_search of the <algorithm> header2017-05-30 22:26 ` Mike Stump@ 2017-05-31 7:34 ` jay pokarna2017-05-31 19:38 ` Mike Stump 0 siblings, 1 reply; 7+ messages in thread From: jay pokarna @ 2017-05-31 7:34 UTC (permalink / raw) To: Mike Stump;+Cc:Tim Song, Tim Shen, libstdc++, gcc-patches Hey, Could you tell the way as to how can I measure the time taken by my algorithm and compare it with the inbuilt functions ? My algorithm is similar to std::binary_search in working. Also , could you recommend some data that could be helpful to help the comparison between the function and the std::binary_search? Thanks, Jay Pokarna On Wed, May 31, 2017 at 3:50 AM, Mike Stump <mikestump@comcast.net> wrote: > On May 29, 2017, at 1:05 AM, jay pokarna <jay.pokarna10@gmail.com> wrote: >> >> Could you give me the contact of the standard committee? > > https://isocpp.org/std/the-committee > -- Regards, Jay Pokarna CS Sophomore Wordpress | Linkedin Birla Institute of Technology and Science, Pilani Pilani Campus Rajasthan - 333031. ^ permalink raw reply [flat|nested] 7+ messages in thread

*Re: Optimisation of std::binary_search of the <algorithm> header2017-05-31 7:34 ` jay pokarna@ 2017-05-31 19:38 ` Mike Stump0 siblings, 0 replies; 7+ messages in thread From: Mike Stump @ 2017-05-31 19:38 UTC (permalink / raw) To: jay pokarna;+Cc:Tim Song, Tim Shen, libstdc++, gcc-patches On May 31, 2017, at 12:33 AM, jay pokarna <jay.pokarna10@gmail.com> wrote: > > Could you tell the way as to how can I measure the time taken > by my algorithm and compare it with the inbuilt functions ? No, that's beyond our charter. We review patches for gcc. I'd recommend google, it has answers to most questions. I think this one might be covered. [ quick check ] https://gist.github.com/nfarring/1624742, yeah, it's covered. > Also , could you recommend some data that could be helpful to help the > comparison between the function and the std::binary_search? I think google can find some data for you, just search; try google("data sets for searching"). ^ permalink raw reply [flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-05-31 19:11 UTC | newest]Thread overview:7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-29 4:39 Optimisation of std::binary_search of the <algorithm> header jay pokarna 2017-05-29 6:55 ` Tim Song [not found] ` <CAK61LuC-Wk2wxyVO5_FQgiHEA3d8Tk5yCLpZKLq4EMht_cu=vw@mail.gmail.com> [not found] ` <CAPQZVxuY-Motx8YH6hRDn13NrZsPF4buhVSq_J1jAes6kkaXZQ@mail.gmail.com> 2017-05-29 8:14 ` jay pokarna [not found] ` <CAG4ZjNk-4oTnFAPrFDTZVs+JxZTYF=J91jDhGPJgUVZ_9kN6FA@mail.gmail.com> 2017-05-29 11:11 ` jay pokarna 2017-05-30 22:26 ` Mike Stump 2017-05-31 7:34 ` jay pokarna 2017-05-31 19:38 ` Mike Stump

This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).