From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id 1E9463858C50 for ; Wed, 29 Mar 2023 08:00:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1E9463858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x52b.google.com with SMTP id r11so59791341edd.5 for ; Wed, 29 Mar 2023 01:00:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680076837; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=cUT2/4YXN9lujkvhofmyGr48jXwx+4pUdOYTReZaVwg=; b=VqZBJ3Zd1n9MI4wusq7Y1La9Z1dwqN3IMhhcurubmnvJYkwsawwrytd+03/7YNLOmE RO2LYPwFtE9Rtci06Mrc/Wcyqu7jQm/gJg/0H7MYIBH4AY9Gd+o/FNr94ON8R5wIkDkV f8DBU2mAlvXTv22VyqL29cBjMPNifdES25ubOB6ye3o6tvii0MZ3v8Keg06cXPLKvRbo k1NjybK2DEExiV6lFDZB5vBnxVcsWmx4U3aOabFHt2clMj39Zi1N6dkvnFBI7+ViIZYX kpsC9SX72OrHjDPhLxpymJY9S19gncY4OrVepEb5tyi6s/nztsFI3z4CmjQeOHo1cIrH Vzfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680076837; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cUT2/4YXN9lujkvhofmyGr48jXwx+4pUdOYTReZaVwg=; b=K9XJbrPg98R6+Ijd124Efto5O1mlwxghy165f8LCAkknHUEr+O0rzAiYM4DSZNSPo7 BZq74wKfhTD25+fFNUjg78XbhD4j8ntJMsPBvZZNVT1eLZKj2Yv/cU79HjRSX3PFXLBM MsSP3ELMgXQaRRrnkK88CneG7Of7bnqM+EuF7tXmuaQws+O5jGiOFlhSvtQ8qZPxVSCp 3GIScjvcjiE1yPIqwG5l5NY8sQPtg240AsKot9oGSLxP2RlqlfXVl0b2PCcjQpcuZOgB 2jGzT7cKI4N0mhJY1xaXtyltPwW/J/aedBy4eVB7rfdNZH31Nyzcz7dWPszjo4ohhsXT tH4A== X-Gm-Message-State: AAQBX9enOKVPzfxtVIprlK+r4qKkYq9aln2sn9zTV4ob/+jTzIl5ldtp 6hqYDi2P+MoJALUdN3sC8smH1+f1i9h3iJA1hFk= X-Google-Smtp-Source: AKy350aCKyshMie4RJ95Eqs6mA4B3rAlBwWIKpwhfVjOJt4YC1O0LquUHko5cHTKHtXDkJRIdkLKXWH4rPsrAJRI5Vo= X-Received: by 2002:a17:907:9623:b0:93e:aac:bb8d with SMTP id gb35-20020a170907962300b0093e0aacbb8dmr9270205ejc.13.1680076836718; Wed, 29 Mar 2023 01:00:36 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jonathan Wakely Date: Wed, 29 Mar 2023 09:00:24 +0100 Message-ID: Subject: Re: __lower_bound improvement for arithmetical types To: =?UTF-8?B?0JDQu9C10LrRgdCw0L3QtNGAINCo0LjRgtC+0LI=?= Cc: "libstdc++" Content-Type: multipart/alternative; boundary="0000000000000c4e6505f8055f38" X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000000c4e6505f8055f38 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 29 Mar 2023, 07:26 =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4= =D1=80 =D0=A8=D0=B8=D1=82=D0=BE=D0=B2, wrote: > Benchmark shows that adding an extra check to determine whether two > iterators are in one cache line requires noticeable CPU time. These checks > outweigh the benefits of searching through one cache line. > OK, thanks for checking. > Given these facts, I 'd rather stick to the previously proposed version. > Do I have to do change the code somehow so that it could be merged to > libstdc++ ? > It will have to wait until after the GCC 13 release, so I'll review it properly in a few weeks. It looks like we can probably get it merged though, thanks for the contribution. > =D1=81=D0=B1, 25 =D0=BC=D0=B0=D1=80. 2023=E2=80=AF=D0=B3. =D0=B2 14:16, = =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=A8=D0=B8=D1=82= =D0=BE=D0=B2 : > >> Benchmark shows that adding an extra check to determine whether two >> iterators are in one cache line requires noticeable CPU time. These chec= ks >> outweigh the benefits of searching through one cache line. >> >> >> Given these facts, I 'd rather stick to the previously proposed version. >> Do I have to do change the code somehow so that it could be merged to >> libstdc++ ? >> >> =D0=9F=D1=82, 10 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2023 =D0=B3. =D0=B2 14:2= 7, Jonathan Wakely : >> >>> On Thu, 9 Mar 2023 at 19:58, =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD= =D0=B4=D1=80 =D0=A8=D0=B8=D1=82=D0=BE=D0=B2 via Libstdc++ >>> wrote: >>> > >>> > I want to propose an improvement to std::__lower_bound for arithmetic >>> types >>> > with the standard comparators. >>> > >>> > >>> > The main idea is to use linear search on a small number of elements to >>> aid >>> > the branch predictor and CPU caches, but only when it is not >>> observable by >>> > the user. In other words, if a standard comparator (std::less, >>> > std::greater) is used for arithmetic types. >>> > >>> > >>> > In benchmarks I achieved twice the increase in speed for small >>> vectors(16 >>> > elements) and increase for 10-20% in large vectors(1'000, 100'000 >>> elements). >>> > >>> > >>> > Code: https://gist.github.com/ATGsan/8a1fdec92371d5778a65b01321c43604 >>> > >>> > PR: https://github.com/ATGsan/gcc/pull/1 >>> >>> This is an interesting idea, thanks. >>> >>> You limit the linear search to a single cacheline, but you don't >>> ensure that the range to search doesn't cross two cachelines, right? >>> Maybe it doesn't matter in practice, but I wonder if limiting the >>> linear search to a single cacheline would be even better. >>> >> --0000000000000c4e6505f8055f38--