From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id A7B223839C66 for ; Thu, 6 May 2021 13:43:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A7B223839C66 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-292-jDrpd6WiP9K3ZjGkHRoq5w-1; Thu, 06 May 2021 09:43:07 -0400 X-MC-Unique: jDrpd6WiP9K3ZjGkHRoq5w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 81741107ACCA; Thu, 6 May 2021 13:43:06 +0000 (UTC) Received: from oldenburg.str.redhat.com (ovpn-112-137.ams2.redhat.com [10.36.112.137]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1BD3F16909; Thu, 6 May 2021 13:43:04 +0000 (UTC) From: Florian Weimer To: Adhemerval Zanella Cc: Paul Eggert , Adhemerval Zanella via Libc-alpha , crrodriguez@opensuse.org Subject: Re: [PATCH 1/4] Remove architecture specific sched_cpucount optimizations References: <20210329182520.323665-1-adhemerval.zanella@linaro.org> <87a6p9dr9n.fsf@oldenburg.str.redhat.com> <61040ff8-caac-a3d9-91cc-9b445c4e98fd@cs.ucla.edu> Date: Thu, 06 May 2021 15:43:23 +0200 In-Reply-To: (Adhemerval Zanella's message of "Thu, 6 May 2021 10:33:52 -0300") Message-ID: <87pmy4gepw.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2021 13:43:13 -0000 * Adhemerval Zanella: > On 05/05/2021 15:25, Paul Eggert wrote: >> On 5/5/21 10:28 AM, Florian Weimer via Libc-alpha wrote: >>>> diff --git a/posix/sched_cpucount.c b/posix/sched_cpucount.c >>>> index b0ca4ea7bc..529286e777 100644 >>>> --- a/posix/sched_cpucount.c >>>> +++ b/posix/sched_cpucount.c >>>> @@ -22,31 +22,11 @@ int >>>> =C2=A0 __sched_cpucount (size_t setsize, const cpu_set_t *setp) >>>> =C2=A0 { >>>> =C2=A0=C2=A0=C2=A0 int s =3D 0; >>>> +=C2=A0 for (int i =3D 0; i < setsize / sizeof (__cpu_mask); i++) >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __cpu_mask si =3D setp->__bits[i]; >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* Clear the least significant bit set= .=C2=A0 */ >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for (; si !=3D 0; si &=3D si - 1, s++)= ; >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>>> - >>>> =C2=A0=C2=A0=C2=A0 return s; >>>> =C2=A0 } >>> Why =E2=80=9Csi=E2=80=9D?=C2=A0 It think si &=3D si - 1 clears the*most= *=C2=A0 significant bit in >>> si.=C2=A0 If you agree, please update the comment. >>=20 >> Better yet, define a static function 'popcount' that uses Kernighan's tr= ick and call that function. As things stand it's not obvious what the code = is doing, regardless of which bit it's clearing. The function's comment sho= uld explain why it's not using __builtin_popcount. > > What about the below: > > diff --git a/posix/sched_cpucount.c b/posix/sched_cpucount.c > index b0ca4ea7bc..6cb5c4337e 100644 > --- a/posix/sched_cpucount.c > +++ b/posix/sched_cpucount.c > @@ -17,36 +17,21 @@ > =20 > #include > =20 > +/* Counting bits set, Brian Kernighan's way */ > +static inline unsigned int > +countbits (__cpu_mask v) > +{ > + unsigned int s =3D 0; > + for (; v !=3D 0; s++) > + v &=3D v - 1; > + return s; > +} I get that choosing the exact matching builtin for the __cpu_mask type isn't easy, but wouldn't it be safe to use __builtin_popcountll unconditionally? Thanks, Florian