From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=I2pl=7V=gmail.com=alex.shitov1237@sourceware.org>
Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33])
	by sourceware.org (Postfix) with ESMTPS id 23B1B3858C50
	for <libstdc++@gcc.gnu.org>; Wed, 29 Mar 2023 06:26:31 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 23B1B3858C50
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-qv1-xf33.google.com with SMTP id cu4so10974120qvb.3
        for <libstdc++@gcc.gnu.org>; Tue, 28 Mar 2023 23:26:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112; t=1680071190;
        h=to:subject:message-id:date:from:in-reply-to:references:mime-version
         :from:to:cc:subject:date:message-id:reply-to;
        bh=nXCV0HocXADUj7riq5wwiu7JX0n1ylL7pkYqTbr++v4=;
        b=KAIORkzb7ajaGA1K8oSF24HJ/po++Top6//u9tfeHOHstVzpb482doqoR6/z6hszJU
         RJ/RFiHaDlqS2bEE8baoFfiG3XO2DybJE+Llo2lIW4RfuUVfPoLza4VSh/Vc/pnL/jFd
         JxeCof40L/hwVWbcFVL9wj4fAbs5qKPoILHSV0kmZT0SIjlmn2xQrWE7HjqcF5vh78Ax
         aXqFfY8ENadRA+j97APP11aqgTg8B5RMccFlbMbXjEnNnsREYc4FEsA6FLJ/YdUvOv6a
         THHe1CemiXesPj1MCKc99Fvkj6f8piZKYlkT0USHcilfcv50ufqNaTsl71OCgzGiYGy8
         ToSg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1680071190;
        h=to:subject:message-id:date:from:in-reply-to:references:mime-version
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=nXCV0HocXADUj7riq5wwiu7JX0n1ylL7pkYqTbr++v4=;
        b=4IglT1T4TYUd2hJu1YDtK3BsxoDWZ6qmTSXn2sADfhubGYwAyLduVTpsRBiGvgRK8J
         9APoFQrGoQVlnPq8G5Qk8aXrKcv39p6PYMki1vx6fu757osuNyfRKUl5DyK2HvzWm6wm
         rLuvbLBOd9KuWewt53zTu8rzWUaI5KaZGak7heCh6X2egwCd1DDRsdlG21h2xzc3pwWd
         K6FQjtmWPV6xHKbANIern+3NvUpdlqjirKj+7mgnjnsRmhzBzwA3fXkONoA1F/AcfrcQ
         +avQhPSX8zeAQ24vIZjCI3N9BAPyle4Ktxje5cWkRhK/LlcBvy7L3KnxAzMxapo83JDb
         nGvA==
X-Gm-Message-State: AAQBX9diyaPN8kYvTm5j9cTVxv+WUlFt1LuHQum3F9izVMTt3oxX4BKu
	F+2B28rblv/E54bgfAsejXxn7oQlFUkXTxCSJIIQQA+24YogVqca
X-Google-Smtp-Source: AKy350ZxmHPqrpFIg89uCaHq8KhgjMUm4sAOqCPcBGDHsiuJ6dya/4j3ZeBpQPcwmfQDmfn/qKkHwNhtnv76UURh6PA=
X-Received: by 2002:a05:6214:514:b0:5ac:b3fa:e6bd with SMTP id
 px20-20020a056214051400b005acb3fae6bdmr319033qvb.2.1680071190092; Tue, 28 Mar
 2023 23:26:30 -0700 (PDT)
MIME-Version: 1.0
References: <CAP=JgFqDzDTkuBD7qOaL95Fum+vP=EP6mt5PZN98u10D5yX8rg@mail.gmail.com>
 <CAH6eHdRF1HDW_42yXDSEPYRscMcgGRogVgHqtxK8rPZ_BbG=jw@mail.gmail.com> <CAP=JgFpBesxfVvcnWqE2ZmGt0s-H+4L6F1bBP2NbJiAnziCE7g@mail.gmail.com>
In-Reply-To: <CAP=JgFpBesxfVvcnWqE2ZmGt0s-H+4L6F1bBP2NbJiAnziCE7g@mail.gmail.com>
From: =?UTF-8?B?0JDQu9C10LrRgdCw0L3QtNGAINCo0LjRgtC+0LI=?= <alex.shitov1237@gmail.com>
Date: Wed, 29 Mar 2023 10:26:18 +0400
Message-ID: <CAP=JgFoFYK8B7XNsL-2HxZET-K=kzsVy6xd6hMPUt4wkphc14Q@mail.gmail.com>
Subject: Re: __lower_bound improvement for arithmetical types
To: Jonathan Wakely <jwakely.gcc@gmail.com>, libstdc++@gcc.gnu.org
Content-Type: multipart/alternative; boundary="0000000000007b9f7d05f8040eb9"
X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libstdc++.gcc.gnu.org>

--0000000000007b9f7d05f8040eb9
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Benchmark shows that adding an extra check to determine whether two
iterators are in one cache line requires noticeable CPU time. These checks
outweigh the benefits of searching through one cache line.


Given these facts, I 'd rather stick to the previously proposed version. Do
I have to do change the code somehow so that it could be merged to
libstdc++ ?

=D1=81=D0=B1, 25 =D0=BC=D0=B0=D1=80. 2023=E2=80=AF=D0=B3. =D0=B2 14:16, =D0=
=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=A8=D0=B8=D1=82=D0=
=BE=D0=B2 <alex.shitov1237@gmail.com>:

> Benchmark shows that adding an extra check to determine whether two
> iterators are in one cache line requires noticeable CPU time. These checks
> outweigh the benefits of searching through one cache line.
>
>
> Given these facts, I 'd rather stick to the previously proposed version.
> Do I have to do change the code somehow so that it could be merged to
> libstdc++ ?
>
> =D0=9F=D1=82, 10 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2023 =D0=B3. =D0=B2 14:27=
, Jonathan Wakely <jwakely.gcc@gmail.com>:
>
>> On Thu, 9 Mar 2023 at 19:58, =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=
=D0=B4=D1=80 =D0=A8=D0=B8=D1=82=D0=BE=D0=B2 via Libstdc++
>> <libstdc++@gcc.gnu.org> wrote:
>> >
>> > I want to propose an improvement to std::__lower_bound for arithmetic
>> types
>> > with the standard comparators.
>> >
>> >
>> > The main idea is to use linear search on a small number of elements to
>> aid
>> > the branch predictor and CPU caches, but only when it is not observable
>> by
>> > the user. In other words, if a standard comparator (std::less,
>> > std::greater) is used for arithmetic types.
>> >
>> >
>> > In benchmarks I achieved twice the increase in speed for small
>> vectors(16
>> > elements) and increase for 10-20% in large vectors(1'000, 100'000
>> elements).
>> >
>> >
>> > Code: https://gist.github.com/ATGsan/8a1fdec92371d5778a65b01321c43604
>> >
>> > PR: https://github.com/ATGsan/gcc/pull/1
>>
>> This is an interesting idea, thanks.
>>
>> You limit the linear search to a single cacheline, but you don't
>> ensure that the range to search doesn't cross two cachelines, right?
>> Maybe it doesn't matter in practice, but I wonder if limiting the
>> linear search to a single cacheline would be even better.
>>
>

--0000000000007b9f7d05f8040eb9--