Benchmark shows that adding an extra check to determine whether two
iterators are in one cache line requires noticeable CPU time. These checks
outweigh the benefits of searching through one cache line.


Given these facts, I 'd rather stick to the previously proposed version. Do
I have to do change the code somehow so that it could be merged to
libstdc++ ?

сб, 25 мар. 2023 г. в 14:16, Александр Шитов <alex.shitov1237@gmail.com>:

> Benchmark shows that adding an extra check to determine whether two
> iterators are in one cache line requires noticeable CPU time. These checks
> outweigh the benefits of searching through one cache line.
>
>
> Given these facts, I 'd rather stick to the previously proposed version.
> Do I have to do change the code somehow so that it could be merged to
> libstdc++ ?
>
> Пт, 10 марта 2023 г. в 14:27, Jonathan Wakely <jwakely.gcc@gmail.com>:
>
>> On Thu, 9 Mar 2023 at 19:58, Александр Шитов via Libstdc++
>> <libstdc++@gcc.gnu.org> wrote:
>> >
>> > I want to propose an improvement to std::__lower_bound for arithmetic
>> types
>> > with the standard comparators.
>> >
>> >
>> > The main idea is to use linear search on a small number of elements to
>> aid
>> > the branch predictor and CPU caches, but only when it is not observable
>> by
>> > the user. In other words, if a standard comparator (std::less,
>> > std::greater) is used for arithmetic types.
>> >
>> >
>> > In benchmarks I achieved twice the increase in speed for small
>> vectors(16
>> > elements) and increase for 10-20% in large vectors(1'000, 100'000
>> elements).
>> >
>> >
>> > Code: https://gist.github.com/ATGsan/8a1fdec92371d5778a65b01321c43604
>> >
>> > PR: https://github.com/ATGsan/gcc/pull/1
>>
>> This is an interesting idea, thanks.
>>
>> You limit the linear search to a single cacheline, but you don't
>> ensure that the range to search doesn't cross two cachelines, right?
>> Maybe it doesn't matter in practice, but I wonder if limiting the
>> linear search to a single cacheline would be even better.
>>
>