From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=o/GC=7V=gmail.com=jwakely.gcc@sourceware.org>
Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b])
	by sourceware.org (Postfix) with ESMTPS id 1E9463858C50
	for <libstdc++@gcc.gnu.org>; Wed, 29 Mar 2023 08:00:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1E9463858C50
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-ed1-x52b.google.com with SMTP id r11so59791341edd.5
        for <libstdc++@gcc.gnu.org>; Wed, 29 Mar 2023 01:00:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112; t=1680076837;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=cUT2/4YXN9lujkvhofmyGr48jXwx+4pUdOYTReZaVwg=;
        b=VqZBJ3Zd1n9MI4wusq7Y1La9Z1dwqN3IMhhcurubmnvJYkwsawwrytd+03/7YNLOmE
         RO2LYPwFtE9Rtci06Mrc/Wcyqu7jQm/gJg/0H7MYIBH4AY9Gd+o/FNr94ON8R5wIkDkV
         f8DBU2mAlvXTv22VyqL29cBjMPNifdES25ubOB6ye3o6tvii0MZ3v8Keg06cXPLKvRbo
         k1NjybK2DEExiV6lFDZB5vBnxVcsWmx4U3aOabFHt2clMj39Zi1N6dkvnFBI7+ViIZYX
         kpsC9SX72OrHjDPhLxpymJY9S19gncY4OrVepEb5tyi6s/nztsFI3z4CmjQeOHo1cIrH
         Vzfw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1680076837;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=cUT2/4YXN9lujkvhofmyGr48jXwx+4pUdOYTReZaVwg=;
        b=K9XJbrPg98R6+Ijd124Efto5O1mlwxghy165f8LCAkknHUEr+O0rzAiYM4DSZNSPo7
         BZq74wKfhTD25+fFNUjg78XbhD4j8ntJMsPBvZZNVT1eLZKj2Yv/cU79HjRSX3PFXLBM
         MsSP3ELMgXQaRRrnkK88CneG7Of7bnqM+EuF7tXmuaQws+O5jGiOFlhSvtQ8qZPxVSCp
         3GIScjvcjiE1yPIqwG5l5NY8sQPtg240AsKot9oGSLxP2RlqlfXVl0b2PCcjQpcuZOgB
         2jGzT7cKI4N0mhJY1xaXtyltPwW/J/aedBy4eVB7rfdNZH31Nyzcz7dWPszjo4ohhsXT
         tH4A==
X-Gm-Message-State: AAQBX9enOKVPzfxtVIprlK+r4qKkYq9aln2sn9zTV4ob/+jTzIl5ldtp
	6hqYDi2P+MoJALUdN3sC8smH1+f1i9h3iJA1hFk=
X-Google-Smtp-Source: AKy350aCKyshMie4RJ95Eqs6mA4B3rAlBwWIKpwhfVjOJt4YC1O0LquUHko5cHTKHtXDkJRIdkLKXWH4rPsrAJRI5Vo=
X-Received: by 2002:a17:907:9623:b0:93e:aac:bb8d with SMTP id
 gb35-20020a170907962300b0093e0aacbb8dmr9270205ejc.13.1680076836718; Wed, 29
 Mar 2023 01:00:36 -0700 (PDT)
MIME-Version: 1.0
References: <CAP=JgFqDzDTkuBD7qOaL95Fum+vP=EP6mt5PZN98u10D5yX8rg@mail.gmail.com>
 <CAH6eHdRF1HDW_42yXDSEPYRscMcgGRogVgHqtxK8rPZ_BbG=jw@mail.gmail.com>
 <CAP=JgFpBesxfVvcnWqE2ZmGt0s-H+4L6F1bBP2NbJiAnziCE7g@mail.gmail.com> <CAP=JgFoFYK8B7XNsL-2HxZET-K=kzsVy6xd6hMPUt4wkphc14Q@mail.gmail.com>
In-Reply-To: <CAP=JgFoFYK8B7XNsL-2HxZET-K=kzsVy6xd6hMPUt4wkphc14Q@mail.gmail.com>
From: Jonathan Wakely <jwakely.gcc@gmail.com>
Date: Wed, 29 Mar 2023 09:00:24 +0100
Message-ID: <CAH6eHdTfEwKCx_UADSyaKY0FvAb8X=0vPOCzY8KrSyWSgQEZxQ@mail.gmail.com>
Subject: Re: __lower_bound improvement for arithmetical types
To: =?UTF-8?B?0JDQu9C10LrRgdCw0L3QtNGAINCo0LjRgtC+0LI=?= <alex.shitov1237@gmail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>
Content-Type: multipart/alternative; boundary="0000000000000c4e6505f8055f38"
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libstdc++.gcc.gnu.org>

--0000000000000c4e6505f8055f38
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, 29 Mar 2023, 07:26 =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=
=D1=80 =D0=A8=D0=B8=D1=82=D0=BE=D0=B2, <alex.shitov1237@gmail.com>
wrote:

> Benchmark shows that adding an extra check to determine whether two
> iterators are in one cache line requires noticeable CPU time. These checks
> outweigh the benefits of searching through one cache line.
>

OK, thanks for checking.


> Given these facts, I 'd rather stick to the previously proposed version.
> Do I have to do change the code somehow so that it could be merged to
> libstdc++ ?
>

It will have to wait until after the GCC 13 release, so I'll review it
properly in a few weeks.

It looks like we can probably get it merged though, thanks for the
contribution.


> =D1=81=D0=B1, 25 =D0=BC=D0=B0=D1=80. 2023=E2=80=AF=D0=B3. =D0=B2 14:16, =
=D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=A8=D0=B8=D1=82=
=D0=BE=D0=B2 <alex.shitov1237@gmail.com>:
>
>> Benchmark shows that adding an extra check to determine whether two
>> iterators are in one cache line requires noticeable CPU time. These chec=
ks
>> outweigh the benefits of searching through one cache line.
>>
>>
>> Given these facts, I 'd rather stick to the previously proposed version.
>> Do I have to do change the code somehow so that it could be merged to
>> libstdc++ ?
>>
>> =D0=9F=D1=82, 10 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2023 =D0=B3. =D0=B2 14:2=
7, Jonathan Wakely <jwakely.gcc@gmail.com>:
>>
>>> On Thu, 9 Mar 2023 at 19:58, =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=
=D0=B4=D1=80 =D0=A8=D0=B8=D1=82=D0=BE=D0=B2 via Libstdc++
>>> <libstdc++@gcc.gnu.org> wrote:
>>> >
>>> > I want to propose an improvement to std::__lower_bound for arithmetic
>>> types
>>> > with the standard comparators.
>>> >
>>> >
>>> > The main idea is to use linear search on a small number of elements to
>>> aid
>>> > the branch predictor and CPU caches, but only when it is not
>>> observable by
>>> > the user. In other words, if a standard comparator (std::less,
>>> > std::greater) is used for arithmetic types.
>>> >
>>> >
>>> > In benchmarks I achieved twice the increase in speed for small
>>> vectors(16
>>> > elements) and increase for 10-20% in large vectors(1'000, 100'000
>>> elements).
>>> >
>>> >
>>> > Code: https://gist.github.com/ATGsan/8a1fdec92371d5778a65b01321c43604
>>> >
>>> > PR: https://github.com/ATGsan/gcc/pull/1
>>>
>>> This is an interesting idea, thanks.
>>>
>>> You limit the linear search to a single cacheline, but you don't
>>> ensure that the range to search doesn't cross two cachelines, right?
>>> Maybe it doesn't matter in practice, but I wonder if limiting the
>>> linear search to a single cacheline would be even better.
>>>
>>

--0000000000000c4e6505f8055f38--