From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 4D6923857836 for ; Wed, 23 Jun 2021 13:18:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4D6923857836 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-256-qoH6bC5hNTe9EzWvuMIgyg-1; Wed, 23 Jun 2021 09:18:48 -0400 X-MC-Unique: qoH6bC5hNTe9EzWvuMIgyg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CE14A108660C; Wed, 23 Jun 2021 13:16:32 +0000 (UTC) Received: from localhost (unknown [10.33.36.175]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6CC68604CC; Wed, 23 Jun 2021 13:16:32 +0000 (UTC) Date: Wed, 23 Jun 2021 14:16:31 +0100 From: Jonathan Wakely To: cassio.neri@gmail.com Cc: "Koning, Paul" , libstdc++ , Gcc-patches Subject: Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. Message-ID: References: <34C4F25A-6333-4C08-BBFF-8E86A5A9B764@dell.com> MIME-Version: 1.0 In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jun 2021 13:18:54 -0000 On 23/06/21 12:45 +0100, Jonathan Wakely wrote: >On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote: >>I've checked the generated code and the compiler doesn't figure out >>the logic. I added a comment to explain. >> >>(Revised patch below and attached.) >> >>Best wishes, >>Cassio. >> >>--- >> >>Simple change to std::chrono::year::is_leap. If a year is multiple of 100, >>then it's divisible by 400 if and only if it's divisible by 16. The latter >>allows for better code generation. >> >>Tested on x86_64-pc-linux-gnu. >> >>libstdc++-v3/ChangeLog: >>libstdc++-v3/ChangeLog: >> >> * include/std/chrono: >> >>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono >>index 4631a727d73..85aa0379432 100644 >>--- a/libstdc++-v3/include/std/chrono >>+++ b/libstdc++-v3/include/std/chrono >>@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; >> const bool __is_multiple_of_100 >> = __multiplier * (_M_y + __offset) < __bound; >>- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; >>+ // Usually we test _M_y % 400 == 0 but, when it's already known that >>+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. > ^^ > N.B. this comment should say != > >>+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; > >If y % 16 == 0 then y % 4 == 0 too. So we could write that as: > > return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0; > >This seems to perform even better over a wide range of inputs, can you >confirm that result with your own tests? > >However, my microbenchmark also shows that the simplistic code using >y%100 often performs even better than the current code calculating >__is_multiple_of_100 to avoid the modulus operation. So maybe my tests >are bad. > >My rearranged expression above is equivalent to: > > return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0; > >which can be written without branches: > > return _M_y % (4 << (2 * __is_multiple_of_100)) == 0; > >However, both Clang and GCC already remove the branch for (x ? 16 : 4) >and the conditional expression produces slightly smaller code with GCC >(see https://gcc.gnu.org/PR101179 regarding that). But neither of >these seems to improve compared to my first rearrangement above. This version from Ulrich Drepper produces the smallest code of all (and also performs well, if not the absolute fastest) in my benchmarks: return (y & (__is_multiple_of_100 ? 15 : 3)) == 0;