From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by sourceware.org (Postfix) with ESMTPS id BBA55385803F for ; Thu, 6 May 2021 18:34:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BBA55385803F Received: by mail-pj1-x1029.google.com with SMTP id gj14so3806769pjb.5 for ; Thu, 06 May 2021 11:34:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=pidrPwaQvG3uR3xgHSfi7SkUfO+E2Z6dB0pUuOQygKY=; b=bS9wLXUMZb0RPloc4Z60Bizlqa38CeSEbL/Zk9WcC17rTJJFvyiUFcIzpieNs3wmnW VzH7NCwj+sT93Gp9cRhf8XDL/pHNCzZ9QQ8SrN13cAtoGNR57BBYvRYUTH+daI/RcO7R je7EI+11R/KrmG9rZbFe84BWYk3rfX7XtQ7VcNHpBaWvknucIIZ2FRIA8dgIs1Z+3K+9 Ss3eWbfPJ8K6s37SoT7H52/YXlKm021GLMaUACsqw8InJrMBp+hQylAgPc9tE/zQrqgn rGvSWDYLjwWEoR5qC2GOk3SXblPc9p6IecBwhIcICy3tD6V7DCtqbSE5pXpuHDwIgtgw LeLw== X-Gm-Message-State: AOAM532luoH2WeF/km5ef3AkXD9a4GULkn4VzuKbbkPnHjvuImRQRlAI M7cmTo5xehOsKO9aKVxeCX/1ZWE9qCEaWHGw1F0= X-Google-Smtp-Source: ABdhPJwPrmRo4LbVSfhYZrxQHW2dvPJepMXdrUbstuLrZXFKIq9TS/oKalctqUtMhScOh+f5+K3YGj0gARD1UU28NM8= X-Received: by 2002:a17:902:aa04:b029:ec:f779:3a2b with SMTP id be4-20020a170902aa04b02900ecf7793a2bmr5761905plb.44.1620326095933; Thu, 06 May 2021 11:34:55 -0700 (PDT) MIME-Version: 1.0 References: <20210329182520.323665-1-adhemerval.zanella@linaro.org> <87a6p9dr9n.fsf@oldenburg.str.redhat.com> <61040ff8-caac-a3d9-91cc-9b445c4e98fd@cs.ucla.edu> In-Reply-To: From: Noah Goldstein Date: Thu, 6 May 2021 14:34:44 -0400 Message-ID: Subject: Re: [PATCH 1/4] Remove architecture specific sched_cpucount optimizations To: Adhemerval Zanella Cc: Paul Eggert , Florian Weimer , crrodriguez@opensuse.org, Adhemerval Zanella via Libc-alpha Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2021 18:34:58 -0000 On Thu, May 6, 2021 at 8:22 AM Adhemerval Zanella wrote: > > > > On 05/05/2021 16:52, Noah Goldstein via Libc-alpha wrote: > > On Wed, May 5, 2021 at 12:32 PM Paul Eggert wrote: > >> > >> On 5/5/21 10:28 AM, Florian Weimer via Libc-alpha wrote: > >>>> diff --git a/posix/sched_cpucount.c b/posix/sched_cpucount.c > >>>> index b0ca4ea7bc..529286e777 100644 > >>>> --- a/posix/sched_cpucount.c > >>>> +++ b/posix/sched_cpucount.c > >>>> @@ -22,31 +22,11 @@ int > >>>> __sched_cpucount (size_t setsize, const cpu_set_t *setp) > >>>> { > >>>> int s =3D 0; > >>>> + for (int i =3D 0; i < setsize / sizeof (__cpu_mask); i++) > >>>> { > >>>> + __cpu_mask si =3D setp->__bits[i]; > >>>> + /* Clear the least significant bit set. */ > >>>> + for (; si !=3D 0; si &=3D si - 1, s++); > >>>> } > >>>> - > >>>> return s; > >>>> } > >>> Why =E2=80=9Csi=E2=80=9D? It think si &=3D si - 1 clears the*most* = significant bit in > >>> si. If you agree, please update the comment. > >> > >> Better yet, define a static function 'popcount' that uses Kernighan's > >> trick and call that function. As things stand it's not obvious what th= e > >> code is doing, regardless of which bit it's clearing. The function's > >> comment should explain why it's not using __builtin_popcount. > > > > It looks to me like GCC is still having a bit of trouble with the new > > implementation. With skylake as a target it seems to be throwing > > in a cmovcc in the outer loop: > > > > https://godbolt.org/z/ocn1KWGPx > > > > It seems that come from using a signed int as the counter, I got a slight > better version using unsigned: https://godbolt.org/z/4MrMq1zKb > > It is not as better as the builtin, but do we really need that > micro-optimization in this specific implementation? This is exactly what > added the current complexity and using a open-coded popcount implementati= on > is slight better on architectures that do not provided such instruction > (where builtin_popcount will issue a call to __popcountsi2 and gcc seems > to aiming for open code such scenarios [1]). > > In any case I would go for simplicity (and the builtin requires know the > underlying type size, which requires the ugly size of check to call the > correct one). > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D92135 I agree. Definetly not opposed to the patch.