On Wed, 29 Mar 2023, 07:26 Александр Шитов, <alex.shitov1237@gmail.com>
wrote:

> Benchmark shows that adding an extra check to determine whether two
> iterators are in one cache line requires noticeable CPU time. These checks
> outweigh the benefits of searching through one cache line.
>

OK, thanks for checking.



> Given these facts, I 'd rather stick to the previously proposed version.
> Do I have to do change the code somehow so that it could be merged to
> libstdc++ ?
>

It will have to wait until after the GCC 13 release, so I'll review it
properly in a few weeks.

It looks like we can probably get it merged though, thanks for the
contribution.




> сб, 25 мар. 2023 г. в 14:16, Александр Шитов <alex.shitov1237@gmail.com>:
>
>> Benchmark shows that adding an extra check to determine whether two
>> iterators are in one cache line requires noticeable CPU time. These checks
>> outweigh the benefits of searching through one cache line.
>>
>>
>> Given these facts, I 'd rather stick to the previously proposed version.
>> Do I have to do change the code somehow so that it could be merged to
>> libstdc++ ?
>>
>> Пт, 10 марта 2023 г. в 14:27, Jonathan Wakely <jwakely.gcc@gmail.com>:
>>
>>> On Thu, 9 Mar 2023 at 19:58, Александр Шитов via Libstdc++
>>> <libstdc++@gcc.gnu.org> wrote:
>>> >
>>> > I want to propose an improvement to std::__lower_bound for arithmetic
>>> types
>>> > with the standard comparators.
>>> >
>>> >
>>> > The main idea is to use linear search on a small number of elements to
>>> aid
>>> > the branch predictor and CPU caches, but only when it is not
>>> observable by
>>> > the user. In other words, if a standard comparator (std::less,
>>> > std::greater) is used for arithmetic types.
>>> >
>>> >
>>> > In benchmarks I achieved twice the increase in speed for small
>>> vectors(16
>>> > elements) and increase for 10-20% in large vectors(1'000, 100'000
>>> elements).
>>> >
>>> >
>>> > Code: https://gist.github.com/ATGsan/8a1fdec92371d5778a65b01321c43604
>>> >
>>> > PR: https://github.com/ATGsan/gcc/pull/1
>>>
>>> This is an interesting idea, thanks.
>>>
>>> You limit the linear search to a single cacheline, but you don't
>>> ensure that the range to search doesn't cross two cachelines, right?
>>> Maybe it doesn't matter in practice, but I wonder if limiting the
>>> linear search to a single cacheline would be even better.
>>>
>>