On Wed, 29 Mar 2023, 07:26 Александр Шитов, wrote: > Benchmark shows that adding an extra check to determine whether two > iterators are in one cache line requires noticeable CPU time. These checks > outweigh the benefits of searching through one cache line. > OK, thanks for checking. > Given these facts, I 'd rather stick to the previously proposed version. > Do I have to do change the code somehow so that it could be merged to > libstdc++ ? > It will have to wait until after the GCC 13 release, so I'll review it properly in a few weeks. It looks like we can probably get it merged though, thanks for the contribution. > сб, 25 мар. 2023 г. в 14:16, Александр Шитов : > >> Benchmark shows that adding an extra check to determine whether two >> iterators are in one cache line requires noticeable CPU time. These checks >> outweigh the benefits of searching through one cache line. >> >> >> Given these facts, I 'd rather stick to the previously proposed version. >> Do I have to do change the code somehow so that it could be merged to >> libstdc++ ? >> >> Пт, 10 марта 2023 г. в 14:27, Jonathan Wakely : >> >>> On Thu, 9 Mar 2023 at 19:58, Александр Шитов via Libstdc++ >>> wrote: >>> > >>> > I want to propose an improvement to std::__lower_bound for arithmetic >>> types >>> > with the standard comparators. >>> > >>> > >>> > The main idea is to use linear search on a small number of elements to >>> aid >>> > the branch predictor and CPU caches, but only when it is not >>> observable by >>> > the user. In other words, if a standard comparator (std::less, >>> > std::greater) is used for arithmetic types. >>> > >>> > >>> > In benchmarks I achieved twice the increase in speed for small >>> vectors(16 >>> > elements) and increase for 10-20% in large vectors(1'000, 100'000 >>> elements). >>> > >>> > >>> > Code: https://gist.github.com/ATGsan/8a1fdec92371d5778a65b01321c43604 >>> > >>> > PR: https://github.com/ATGsan/gcc/pull/1 >>> >>> This is an interesting idea, thanks. >>> >>> You limit the linear search to a single cacheline, but you don't >>> ensure that the range to search doesn't cross two cachelines, right? >>> Maybe it doesn't matter in practice, but I wonder if limiting the >>> linear search to a single cacheline would be even better. >>> >>