From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by sourceware.org (Postfix) with ESMTPS id 4B6023839C66 for ; Thu, 6 May 2021 16:16:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4B6023839C66 Received: by mail-qk1-x734.google.com with SMTP id x8so5473353qkl.2 for ; Thu, 06 May 2021 09:16:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=4fRlQQKldDSaWyv8ccDYCY3YBKdsT2ebY/HTG0DiSyk=; b=uimfcwYQDfYw8bs1Ae71Q3KUWaaWNByJAIfCuPBh/IwuIEtsycEfViRhb8+y8AsPvJ qJZmAjpJ1bcnYeEMe3XyUio5Z8Z5FlQOhSJXxVIb2Nsg5/cRQiRz5kxZ0gbpCIA1fmUx a6vWyIkeUuXFEgKIMbG+gnQ0L4NffrfIhrDMr++7PlNQT4wRevgHQKutKjIDpRs3qfpM 9sDbmeA4B9SYYOA/xIwuRvvH0qpgsGxk0T4b4uqQ0u9eawF6iW3w1AeqA5+w8CsvRVp6 ZnaVASP6nvLIFdTkNOrBUvjPhvZhrLlqNT4Z59BVEWNDCuEuLOFg8pgoC3bo23FhrM9a 1Mvw== X-Gm-Message-State: AOAM532fqY3h/9yS5euCNrx/wq1zjafiJVC+9YCeTP8iFujcTHyFb6mO +a+vOvUSL6cugFJ79ykLqlibrg== X-Google-Smtp-Source: ABdhPJxhuO490EiETm/fIPbQLsznj7ti+Tt0zEETjT9spNr/VUvGAXpk1mUoota8362WN11oWzVdKg== X-Received: by 2002:a05:620a:298b:: with SMTP id r11mr4708227qkp.290.1620317802741; Thu, 06 May 2021 09:16:42 -0700 (PDT) Received: from [192.168.1.4] ([177.194.37.86]) by smtp.gmail.com with ESMTPSA id o10sm2529667qti.33.2021.05.06.09.16.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 May 2021 09:16:42 -0700 (PDT) Subject: Re: [PATCH 1/4] Remove architecture specific sched_cpucount optimizations To: Florian Weimer Cc: Paul Eggert , Adhemerval Zanella via Libc-alpha , crrodriguez@opensuse.org References: <20210329182520.323665-1-adhemerval.zanella@linaro.org> <87a6p9dr9n.fsf@oldenburg.str.redhat.com> <61040ff8-caac-a3d9-91cc-9b445c4e98fd@cs.ucla.edu> <87pmy4gepw.fsf@oldenburg.str.redhat.com> From: Adhemerval Zanella Message-ID: <9a0723f4-82ec-ae41-0304-3dc1af0bb2c0@linaro.org> Date: Thu, 6 May 2021 13:16:39 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <87pmy4gepw.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2021 16:16:45 -0000 On 06/05/2021 10:43, Florian Weimer wrote: > * Adhemerval Zanella: > >> On 05/05/2021 15:25, Paul Eggert wrote: >>> On 5/5/21 10:28 AM, Florian Weimer via Libc-alpha wrote: >>>>> diff --git a/posix/sched_cpucount.c b/posix/sched_cpucount.c >>>>> index b0ca4ea7bc..529286e777 100644 >>>>> --- a/posix/sched_cpucount.c >>>>> +++ b/posix/sched_cpucount.c >>>>> @@ -22,31 +22,11 @@ int >>>>>   __sched_cpucount (size_t setsize, const cpu_set_t *setp) >>>>>   { >>>>>     int s = 0; >>>>> +  for (int i = 0; i < setsize / sizeof (__cpu_mask); i++) >>>>>       { >>>>> +      __cpu_mask si = setp->__bits[i]; >>>>> +      /* Clear the least significant bit set.  */ >>>>> +      for (; si != 0; si &= si - 1, s++); >>>>>       } >>>>> - >>>>>     return s; >>>>>   } >>>> Why “si”?  It think si &= si - 1 clears the*most*  significant bit in >>>> si.  If you agree, please update the comment. >>> >>> Better yet, define a static function 'popcount' that uses Kernighan's trick and call that function. As things stand it's not obvious what the code is doing, regardless of which bit it's clearing. The function's comment should explain why it's not using __builtin_popcount. >> >> What about the below: >> >> diff --git a/posix/sched_cpucount.c b/posix/sched_cpucount.c >> index b0ca4ea7bc..6cb5c4337e 100644 >> --- a/posix/sched_cpucount.c >> +++ b/posix/sched_cpucount.c >> @@ -17,36 +17,21 @@ >> >> #include >> >> +/* Counting bits set, Brian Kernighan's way */ >> +static inline unsigned int >> +countbits (__cpu_mask v) >> +{ >> + unsigned int s = 0; >> + for (; v != 0; s++) >> + v &= v - 1; >> + return s; >> +} > > I get that choosing the exact matching builtin for the __cpu_mask type > isn't easy, but wouldn't it be safe to use __builtin_popcountll > unconditionally? Using a open-coded routine is slight better for architectures that do not have a popcount instruction, since __builtin_popcountll will call the libgcc routine). But I hardly think we need that amount of micro-optimization for such routine.