From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 538BE385502B for ; Wed, 23 Jun 2021 11:45:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 538BE385502B Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-384-saFmurZ6N8e1TFh38_UlZw-1; Wed, 23 Jun 2021 07:45:15 -0400 X-MC-Unique: saFmurZ6N8e1TFh38_UlZw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3E052101F7A3; Wed, 23 Jun 2021 11:45:14 +0000 (UTC) Received: from localhost (unknown [10.33.36.175]) by smtp.corp.redhat.com (Postfix) with ESMTP id BC6985D9DE; Wed, 23 Jun 2021 11:45:13 +0000 (UTC) Date: Wed, 23 Jun 2021 12:45:12 +0100 From: Jonathan Wakely To: cassio.neri@gmail.com Cc: "Koning, Paul" , libstdc++ , Gcc-patches Subject: Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. Message-ID: References: <34C4F25A-6333-4C08-BBFF-8E86A5A9B764@dell.com> MIME-Version: 1.0 In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jun 2021 11:45:20 -0000 On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote: >I've checked the generated code and the compiler doesn't figure out >the logic. I added a comment to explain. > >(Revised patch below and attached.) > >Best wishes, >Cassio. > >--- > >Simple change to std::chrono::year::is_leap. If a year is multiple of 100, >then it's divisible by 400 if and only if it's divisible by 16. The latter >allows for better code generation. > >Tested on x86_64-pc-linux-gnu. > >libstdc++-v3/ChangeLog: >libstdc++-v3/ChangeLog: > > * include/std/chrono: > >diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono >index 4631a727d73..85aa0379432 100644 >--- a/libstdc++-v3/include/std/chrono >+++ b/libstdc++-v3/include/std/chrono >@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; > const bool __is_multiple_of_100 > = __multiplier * (_M_y + __offset) < __bound; >- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; >+ // Usually we test _M_y % 400 == 0 but, when it's already known that >+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. ^^ N.B. this comment should say != >+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; If y % 16 == 0 then y % 4 == 0 too. So we could write that as: return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0; This seems to perform even better over a wide range of inputs, can you confirm that result with your own tests? However, my microbenchmark also shows that the simplistic code using y%100 often performs even better than the current code calculating __is_multiple_of_100 to avoid the modulus operation. So maybe my tests are bad. My rearranged expression above is equivalent to: return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0; which can be written without branches: return _M_y % (4 << (2 * __is_multiple_of_100)) == 0; However, both Clang and GCC already remove the branch for (x ? 16 : 4) and the conditional expression produces slightly smaller code with GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither of these seems to improve compared to my first rearrangement above.