From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 0794C383B82E for ; Wed, 23 Jun 2021 17:51:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0794C383B82E Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-250-YXYVKvfANw6dksct_h0zew-1; Wed, 23 Jun 2021 13:51:20 -0400 X-MC-Unique: YXYVKvfANw6dksct_h0zew-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0CD931966320; Wed, 23 Jun 2021 17:51:19 +0000 (UTC) Received: from localhost (unknown [10.33.36.175]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8C1E719D9F; Wed, 23 Jun 2021 17:51:18 +0000 (UTC) Date: Wed, 23 Jun 2021 18:51:17 +0100 From: Jonathan Wakely To: cassio.neri@gmail.com Cc: "Koning, Paul" , libstdc++ , Gcc-patches Subject: Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. Message-ID: References: <34C4F25A-6333-4C08-BBFF-8E86A5A9B764@dell.com> MIME-Version: 1.0 In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="nsZtp0pmwk0CoB6O" Content-Disposition: inline X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jun 2021 17:51:25 -0000 --nsZtp0pmwk0CoB6O Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline On 23/06/21 14:16 +0100, Jonathan Wakely wrote: >On 23/06/21 12:45 +0100, Jonathan Wakely wrote: >>On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote: >>>I've checked the generated code and the compiler doesn't figure out >>>the logic. I added a comment to explain. >>> >>>(Revised patch below and attached.) >>> >>>Best wishes, >>>Cassio. >>> >>>--- >>> >>>Simple change to std::chrono::year::is_leap. If a year is multiple of 100, >>>then it's divisible by 400 if and only if it's divisible by 16. The latter >>>allows for better code generation. >>> >>>Tested on x86_64-pc-linux-gnu. >>> >>>libstdc++-v3/ChangeLog: >>>libstdc++-v3/ChangeLog: >>> >>> * include/std/chrono: >>> >>>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono >>>index 4631a727d73..85aa0379432 100644 >>>--- a/libstdc++-v3/include/std/chrono >>>+++ b/libstdc++-v3/include/std/chrono >>>@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; >>> const bool __is_multiple_of_100 >>> = __multiplier * (_M_y + __offset) < __bound; >>>- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; >>>+ // Usually we test _M_y % 400 == 0 but, when it's already known that >>>+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. >> ^^ >> N.B. this comment should say != >> >>>+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; >> >>If y % 16 == 0 then y % 4 == 0 too. So we could write that as: >> >> return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0; >> >>This seems to perform even better over a wide range of inputs, can you >>confirm that result with your own tests? >> >>However, my microbenchmark also shows that the simplistic code using >>y%100 often performs even better than the current code calculating >>__is_multiple_of_100 to avoid the modulus operation. So maybe my tests >>are bad. >> >>My rearranged expression above is equivalent to: >> >> return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0; >> >>which can be written without branches: >> >> return _M_y % (4 << (2 * __is_multiple_of_100)) == 0; >> >>However, both Clang and GCC already remove the branch for (x ? 16 : 4) >>and the conditional expression produces slightly smaller code with >>GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither >>of >>these seems to improve compared to my first rearrangement above. > >This version from Ulrich Drepper produces the smallest code of all >(and also performs well, if not the absolute fastest) in my >benchmarks: > > return (y & (__is_multiple_of_100 ? 15 : 3)) == 0; Here's what I've committed. Tested x86_64-linux and powerpc64le-linux. Pushed to trunk. --nsZtp0pmwk0CoB6O Content-Type: text/x-patch; charset=us-ascii Content-Disposition: attachment; filename="patch.txt" commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb Author: Cassio Neri Date: Wed Jun 23 15:32:16 2021 libstdc++: More efficient std::chrono::year::leap Simple change to std::chrono::year::is_leap. If a year is multiple of 100, then it's divisible by 400 if and only if it's divisible by 16. The latter allows for better code generation. The expression is then either y%16 or y%4 which are both powers of two and so it can be rearranged to use simple bitmask operations. Co-authored-by: Jonathan Wakely Co-authored-by: Ulrich Drepper libstdc++-v3/ChangeLog: * include/std/chrono (chrono::year::is_leap()): Optimize. diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 4631a727d73..863b6a27bdf 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // [1] https://github.com/cassioneri/calendar // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16 + // Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0, + // so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0 + // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0. + // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html + constexpr uint32_t __multiplier = 42949673; constexpr uint32_t __bound = 42949669; constexpr uint32_t __max_dividend = 1073741799; constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; const bool __is_multiple_of_100 = __multiplier * (_M_y + __offset) < __bound; - return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; + return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0; } explicit constexpr --nsZtp0pmwk0CoB6O--