From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) by sourceware.org (Postfix) with ESMTPS id 23B1B3858C50 for ; Wed, 29 Mar 2023 06:26:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 23B1B3858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qv1-xf33.google.com with SMTP id cu4so10974120qvb.3 for ; Tue, 28 Mar 2023 23:26:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680071190; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=nXCV0HocXADUj7riq5wwiu7JX0n1ylL7pkYqTbr++v4=; b=KAIORkzb7ajaGA1K8oSF24HJ/po++Top6//u9tfeHOHstVzpb482doqoR6/z6hszJU RJ/RFiHaDlqS2bEE8baoFfiG3XO2DybJE+Llo2lIW4RfuUVfPoLza4VSh/Vc/pnL/jFd JxeCof40L/hwVWbcFVL9wj4fAbs5qKPoILHSV0kmZT0SIjlmn2xQrWE7HjqcF5vh78Ax aXqFfY8ENadRA+j97APP11aqgTg8B5RMccFlbMbXjEnNnsREYc4FEsA6FLJ/YdUvOv6a THHe1CemiXesPj1MCKc99Fvkj6f8piZKYlkT0USHcilfcv50ufqNaTsl71OCgzGiYGy8 ToSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680071190; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nXCV0HocXADUj7riq5wwiu7JX0n1ylL7pkYqTbr++v4=; b=4IglT1T4TYUd2hJu1YDtK3BsxoDWZ6qmTSXn2sADfhubGYwAyLduVTpsRBiGvgRK8J 9APoFQrGoQVlnPq8G5Qk8aXrKcv39p6PYMki1vx6fu757osuNyfRKUl5DyK2HvzWm6wm rLuvbLBOd9KuWewt53zTu8rzWUaI5KaZGak7heCh6X2egwCd1DDRsdlG21h2xzc3pwWd K6FQjtmWPV6xHKbANIern+3NvUpdlqjirKj+7mgnjnsRmhzBzwA3fXkONoA1F/AcfrcQ +avQhPSX8zeAQ24vIZjCI3N9BAPyle4Ktxje5cWkRhK/LlcBvy7L3KnxAzMxapo83JDb nGvA== X-Gm-Message-State: AAQBX9diyaPN8kYvTm5j9cTVxv+WUlFt1LuHQum3F9izVMTt3oxX4BKu F+2B28rblv/E54bgfAsejXxn7oQlFUkXTxCSJIIQQA+24YogVqca X-Google-Smtp-Source: AKy350ZxmHPqrpFIg89uCaHq8KhgjMUm4sAOqCPcBGDHsiuJ6dya/4j3ZeBpQPcwmfQDmfn/qKkHwNhtnv76UURh6PA= X-Received: by 2002:a05:6214:514:b0:5ac:b3fa:e6bd with SMTP id px20-20020a056214051400b005acb3fae6bdmr319033qvb.2.1680071190092; Tue, 28 Mar 2023 23:26:30 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?B?0JDQu9C10LrRgdCw0L3QtNGAINCo0LjRgtC+0LI=?= Date: Wed, 29 Mar 2023 10:26:18 +0400 Message-ID: Subject: Re: __lower_bound improvement for arithmetical types To: Jonathan Wakely , libstdc++@gcc.gnu.org Content-Type: multipart/alternative; boundary="0000000000007b9f7d05f8040eb9" X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000007b9f7d05f8040eb9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Benchmark shows that adding an extra check to determine whether two iterators are in one cache line requires noticeable CPU time. These checks outweigh the benefits of searching through one cache line. Given these facts, I 'd rather stick to the previously proposed version. Do I have to do change the code somehow so that it could be merged to libstdc++ ? =D1=81=D0=B1, 25 =D0=BC=D0=B0=D1=80. 2023=E2=80=AF=D0=B3. =D0=B2 14:16, =D0= =90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=A8=D0=B8=D1=82=D0= =BE=D0=B2 : > Benchmark shows that adding an extra check to determine whether two > iterators are in one cache line requires noticeable CPU time. These checks > outweigh the benefits of searching through one cache line. > > > Given these facts, I 'd rather stick to the previously proposed version. > Do I have to do change the code somehow so that it could be merged to > libstdc++ ? > > =D0=9F=D1=82, 10 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2023 =D0=B3. =D0=B2 14:27= , Jonathan Wakely : > >> On Thu, 9 Mar 2023 at 19:58, =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD= =D0=B4=D1=80 =D0=A8=D0=B8=D1=82=D0=BE=D0=B2 via Libstdc++ >> wrote: >> > >> > I want to propose an improvement to std::__lower_bound for arithmetic >> types >> > with the standard comparators. >> > >> > >> > The main idea is to use linear search on a small number of elements to >> aid >> > the branch predictor and CPU caches, but only when it is not observable >> by >> > the user. In other words, if a standard comparator (std::less, >> > std::greater) is used for arithmetic types. >> > >> > >> > In benchmarks I achieved twice the increase in speed for small >> vectors(16 >> > elements) and increase for 10-20% in large vectors(1'000, 100'000 >> elements). >> > >> > >> > Code: https://gist.github.com/ATGsan/8a1fdec92371d5778a65b01321c43604 >> > >> > PR: https://github.com/ATGsan/gcc/pull/1 >> >> This is an interesting idea, thanks. >> >> You limit the linear search to a single cacheline, but you don't >> ensure that the range to search doesn't cross two cachelines, right? >> Maybe it doesn't matter in practice, but I wonder if limiting the >> linear search to a single cacheline would be even better. >> > --0000000000007b9f7d05f8040eb9--