* [PATCH] libstdc++: More efficient std::chrono::year::leap. @ 2021-05-21 17:46 Cassio Neri 2021-05-21 18:05 ` Koning, Paul 0 siblings, 1 reply; 8+ messages in thread From: Cassio Neri @ 2021-05-21 17:46 UTC (permalink / raw) To: libstdc++, gcc-patches Simple change to std::chrono::year::is_leap. If a year is multiple of 100, then it's divisible by 400 if and only if it's divisible by 16. The latter allows for better code generation. Tested on x86_64-pc-linux-gnu. libstdc++-v3/ChangeLog: * include/std/chrono: diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 4631a727d73..6221d02c1b3 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1612,7 +1612,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; const bool __is_multiple_of_100 = __multiplier * (_M_y + __offset) < __bound; - return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; + return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; } explicit constexpr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. 2021-05-21 17:46 [PATCH] libstdc++: More efficient std::chrono::year::leap Cassio Neri @ 2021-05-21 18:05 ` Koning, Paul 2021-05-21 18:44 ` Cassio Neri 0 siblings, 1 reply; 8+ messages in thread From: Koning, Paul @ 2021-05-21 18:05 UTC (permalink / raw) To: cassio.neri, Cassio Neri via Gcc-patches; +Cc: libstdc++ > On May 21, 2021, at 1:46 PM, Cassio Neri via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Simple change to std::chrono::year::is_leap. If a year is multiple of 100, > then it's divisible by 400 if and only if it's divisible by 16. The latter > allows for better code generation. I wonder if the optimizer could be taught to do that. The change seems correct but it is very confusing; at the very least the reasoning you gave should be stated in a comment on that check. paul ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. 2021-05-21 18:05 ` Koning, Paul @ 2021-05-21 18:44 ` Cassio Neri 2021-06-23 11:45 ` Jonathan Wakely 0 siblings, 1 reply; 8+ messages in thread From: Cassio Neri @ 2021-05-21 18:44 UTC (permalink / raw) To: Koning, Paul; +Cc: Cassio Neri via Gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1806 bytes --] I've checked the generated code and the compiler doesn't figure out the logic. I added a comment to explain. (Revised patch below and attached.) Best wishes, Cassio. --- Simple change to std::chrono::year::is_leap. If a year is multiple of 100, then it's divisible by 400 if and only if it's divisible by 16. The latter allows for better code generation. Tested on x86_64-pc-linux-gnu. libstdc++-v3/ChangeLog: libstdc++-v3/ChangeLog: * include/std/chrono: diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 4631a727d73..85aa0379432 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; const bool __is_multiple_of_100 = __multiplier * (_M_y + __offset) < __bound; - return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; + // Usually we test _M_y % 400 == 0 but, when it's already known that + // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. + return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; } explicit constexpr On Fri, May 21, 2021 at 7:05 PM Koning, Paul <Paul.Koning@dell.com> wrote: > > > > > On May 21, 2021, at 1:46 PM, Cassio Neri via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > Simple change to std::chrono::year::is_leap. If a year is multiple of 100, > > then it's divisible by 400 if and only if it's divisible by 16. The latter > > allows for better code generation. > > I wonder if the optimizer could be taught to do that. > > The change seems correct but it is very confusing; at the very least the reasoning you gave should be stated in a comment on that check. > > paul > > [-- Attachment #2: leap.patch --] [-- Type: text/x-patch, Size: 1011 bytes --] Simple change to std::chrono::year::is_leap. If a year is multiple of 100, then it's divisible by 400 if and only if it's divisible by 16. The latter allows for better code generation. Tested on x86_64-pc-linux-gnu. libstdc++-v3/ChangeLog: libstdc++-v3/ChangeLog: * include/std/chrono: diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 4631a727d73..85aa0379432 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; const bool __is_multiple_of_100 = __multiplier * (_M_y + __offset) < __bound; - return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; + // Usually we test _M_y % 400 == 0 but, when it's already known that + // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. + return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; } explicit constexpr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. 2021-05-21 18:44 ` Cassio Neri @ 2021-06-23 11:45 ` Jonathan Wakely 2021-06-23 13:16 ` Jonathan Wakely 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Wakely @ 2021-06-23 11:45 UTC (permalink / raw) To: cassio.neri; +Cc: Koning, Paul, libstdc++, Gcc-patches On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote: >I've checked the generated code and the compiler doesn't figure out >the logic. I added a comment to explain. > >(Revised patch below and attached.) > >Best wishes, >Cassio. > >--- > >Simple change to std::chrono::year::is_leap. If a year is multiple of 100, >then it's divisible by 400 if and only if it's divisible by 16. The latter >allows for better code generation. > >Tested on x86_64-pc-linux-gnu. > >libstdc++-v3/ChangeLog: >libstdc++-v3/ChangeLog: > > * include/std/chrono: > >diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono >index 4631a727d73..85aa0379432 100644 >--- a/libstdc++-v3/include/std/chrono >+++ b/libstdc++-v3/include/std/chrono >@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; > const bool __is_multiple_of_100 > = __multiplier * (_M_y + __offset) < __bound; >- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; >+ // Usually we test _M_y % 400 == 0 but, when it's already known that >+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. ^^ N.B. this comment should say != >+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; If y % 16 == 0 then y % 4 == 0 too. So we could write that as: return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0; This seems to perform even better over a wide range of inputs, can you confirm that result with your own tests? However, my microbenchmark also shows that the simplistic code using y%100 often performs even better than the current code calculating __is_multiple_of_100 to avoid the modulus operation. So maybe my tests are bad. My rearranged expression above is equivalent to: return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0; which can be written without branches: return _M_y % (4 << (2 * __is_multiple_of_100)) == 0; However, both Clang and GCC already remove the branch for (x ? 16 : 4) and the conditional expression produces slightly smaller code with GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither of these seems to improve compared to my first rearrangement above. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. 2021-06-23 11:45 ` Jonathan Wakely @ 2021-06-23 13:16 ` Jonathan Wakely 2021-06-23 17:51 ` Jonathan Wakely 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Wakely @ 2021-06-23 13:16 UTC (permalink / raw) To: cassio.neri; +Cc: Koning, Paul, libstdc++, Gcc-patches On 23/06/21 12:45 +0100, Jonathan Wakely wrote: >On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote: >>I've checked the generated code and the compiler doesn't figure out >>the logic. I added a comment to explain. >> >>(Revised patch below and attached.) >> >>Best wishes, >>Cassio. >> >>--- >> >>Simple change to std::chrono::year::is_leap. If a year is multiple of 100, >>then it's divisible by 400 if and only if it's divisible by 16. The latter >>allows for better code generation. >> >>Tested on x86_64-pc-linux-gnu. >> >>libstdc++-v3/ChangeLog: >>libstdc++-v3/ChangeLog: >> >> * include/std/chrono: >> >>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono >>index 4631a727d73..85aa0379432 100644 >>--- a/libstdc++-v3/include/std/chrono >>+++ b/libstdc++-v3/include/std/chrono >>@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; >> const bool __is_multiple_of_100 >> = __multiplier * (_M_y + __offset) < __bound; >>- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; >>+ // Usually we test _M_y % 400 == 0 but, when it's already known that >>+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. > ^^ > N.B. this comment should say != > >>+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; > >If y % 16 == 0 then y % 4 == 0 too. So we could write that as: > > return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0; > >This seems to perform even better over a wide range of inputs, can you >confirm that result with your own tests? > >However, my microbenchmark also shows that the simplistic code using >y%100 often performs even better than the current code calculating >__is_multiple_of_100 to avoid the modulus operation. So maybe my tests >are bad. > >My rearranged expression above is equivalent to: > > return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0; > >which can be written without branches: > > return _M_y % (4 << (2 * __is_multiple_of_100)) == 0; > >However, both Clang and GCC already remove the branch for (x ? 16 : 4) >and the conditional expression produces slightly smaller code with GCC >(see https://gcc.gnu.org/PR101179 regarding that). But neither of >these seems to improve compared to my first rearrangement above. This version from Ulrich Drepper produces the smallest code of all (and also performs well, if not the absolute fastest) in my benchmarks: return (y & (__is_multiple_of_100 ? 15 : 3)) == 0; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. 2021-06-23 13:16 ` Jonathan Wakely @ 2021-06-23 17:51 ` Jonathan Wakely 2021-06-23 17:52 ` Jonathan Wakely 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Wakely @ 2021-06-23 17:51 UTC (permalink / raw) To: cassio.neri; +Cc: Koning, Paul, libstdc++, Gcc-patches [-- Attachment #1: Type: text/plain, Size: 2780 bytes --] On 23/06/21 14:16 +0100, Jonathan Wakely wrote: >On 23/06/21 12:45 +0100, Jonathan Wakely wrote: >>On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote: >>>I've checked the generated code and the compiler doesn't figure out >>>the logic. I added a comment to explain. >>> >>>(Revised patch below and attached.) >>> >>>Best wishes, >>>Cassio. >>> >>>--- >>> >>>Simple change to std::chrono::year::is_leap. If a year is multiple of 100, >>>then it's divisible by 400 if and only if it's divisible by 16. The latter >>>allows for better code generation. >>> >>>Tested on x86_64-pc-linux-gnu. >>> >>>libstdc++-v3/ChangeLog: >>>libstdc++-v3/ChangeLog: >>> >>> * include/std/chrono: >>> >>>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono >>>index 4631a727d73..85aa0379432 100644 >>>--- a/libstdc++-v3/include/std/chrono >>>+++ b/libstdc++-v3/include/std/chrono >>>@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; >>> const bool __is_multiple_of_100 >>> = __multiplier * (_M_y + __offset) < __bound; >>>- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; >>>+ // Usually we test _M_y % 400 == 0 but, when it's already known that >>>+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. >> ^^ >> N.B. this comment should say != >> >>>+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; >> >>If y % 16 == 0 then y % 4 == 0 too. So we could write that as: >> >> return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0; >> >>This seems to perform even better over a wide range of inputs, can you >>confirm that result with your own tests? >> >>However, my microbenchmark also shows that the simplistic code using >>y%100 often performs even better than the current code calculating >>__is_multiple_of_100 to avoid the modulus operation. So maybe my tests >>are bad. >> >>My rearranged expression above is equivalent to: >> >> return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0; >> >>which can be written without branches: >> >> return _M_y % (4 << (2 * __is_multiple_of_100)) == 0; >> >>However, both Clang and GCC already remove the branch for (x ? 16 : 4) >>and the conditional expression produces slightly smaller code with >>GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither >>of >>these seems to improve compared to my first rearrangement above. > >This version from Ulrich Drepper produces the smallest code of all >(and also performs well, if not the absolute fastest) in my >benchmarks: > > return (y & (__is_multiple_of_100 ? 15 : 3)) == 0; Here's what I've committed. Tested x86_64-linux and powerpc64le-linux. Pushed to trunk. [-- Attachment #2: patch.txt --] [-- Type: text/x-patch, Size: 1873 bytes --] commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb Author: Cassio Neri <cassio.neri@gmail.com> Date: Wed Jun 23 15:32:16 2021 libstdc++: More efficient std::chrono::year::leap Simple change to std::chrono::year::is_leap. If a year is multiple of 100, then it's divisible by 400 if and only if it's divisible by 16. The latter allows for better code generation. The expression is then either y%16 or y%4 which are both powers of two and so it can be rearranged to use simple bitmask operations. Co-authored-by: Jonathan Wakely <jwakely@redhat.com> Co-authored-by: Ulrich Drepper <drepper@redhat.com> libstdc++-v3/ChangeLog: * include/std/chrono (chrono::year::is_leap()): Optimize. diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 4631a727d73..863b6a27bdf 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // [1] https://github.com/cassioneri/calendar // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16 + // Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0, + // so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0 + // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0. + // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html + constexpr uint32_t __multiplier = 42949673; constexpr uint32_t __bound = 42949669; constexpr uint32_t __max_dividend = 1073741799; constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; const bool __is_multiple_of_100 = __multiplier * (_M_y + __offset) < __bound; - return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; + return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0; } explicit constexpr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. 2021-06-23 17:51 ` Jonathan Wakely @ 2021-06-23 17:52 ` Jonathan Wakely 2021-06-24 11:42 ` Cassio Neri 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Wakely @ 2021-06-23 17:52 UTC (permalink / raw) To: cassio.neri; +Cc: Koning, Paul, libstdc++, Gcc-patches [-- Attachment #1: Type: text/plain, Size: 1562 bytes --] On 23/06/21 18:51 +0100, Jonathan Wakely wrote: >Here's what I've committed. Tested x86_64-linux and powerpc64le-linux. >Pushed to trunk. > > > >commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb >Author: Cassio Neri <cassio.neri@gmail.com> >Date: Wed Jun 23 15:32:16 2021 > > libstdc++: More efficient std::chrono::year::leap > > Simple change to std::chrono::year::is_leap. If a year is multiple of 100, > then it's divisible by 400 if and only if it's divisible by 16. The latter > allows for better code generation. > > The expression is then either y%16 or y%4 which are both powers of two > and so it can be rearranged to use simple bitmask operations. > > Co-authored-by: Jonathan Wakely <jwakely@redhat.com> > Co-authored-by: Ulrich Drepper <drepper@redhat.com> > > libstdc++-v3/ChangeLog: > > * include/std/chrono (chrono::year::is_leap()): Optimize. > >diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono >index 4631a727d73..863b6a27bdf 100644 >--- a/libstdc++-v3/include/std/chrono >+++ b/libstdc++-v3/include/std/chrono >@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > // [1] https://github.com/cassioneri/calendar > // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16 > >+ // Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0, >+ // so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0 But Ulrich pointed out I got my boolean logic all muddled up in the comment. Fixed with the attached patch! [-- Attachment #2: patch.txt --] [-- Type: text/x-patch, Size: 1122 bytes --] commit 4a404f66b09d661bdc60e7e0d5d08f00c4e194db Author: Jonathan Wakely <jwakely@redhat.com> Date: Wed Jun 23 18:50:03 2021 libstdc++: Fix comment in chrono::year::is_leap() libstdc++-v3/ChangeLog: * include/std/chrono (chrono::year::is_leap()): Fix incorrect logic in comment. diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 863b6a27bdf..0b597be1944 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1606,8 +1606,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // [1] https://github.com/cassioneri/calendar // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16 - // Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0, - // so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0 + // Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0, + // so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0, // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0. // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] libstdc++: More efficient std::chrono::year::leap. 2021-06-23 17:52 ` Jonathan Wakely @ 2021-06-24 11:42 ` Cassio Neri 0 siblings, 0 replies; 8+ messages in thread From: Cassio Neri @ 2021-06-24 11:42 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Koning, Paul, libstdc++, Gcc-patches Thanks Jonathan. Here are some benchmarks (assembly in [1]): https://quick-bench.com/q/jclBXmi4QLDcRMLuuVpxTUsFmQw Unfortunately, quick-bench times out unless some implementations are commented out. You can copy the code and run it locally (needs google benchmark) to get the full picture. I really like Ulrich Drepper's implementation (version 8). GCC 11 emmits really good code and looking at previous versions this has been very consistent. This implementation also highlights very clearly that my hack to calculate __is_multiple_of_100 (as opposed to % 100 used in version 7) saves a ror instruction with remarkable performance effects. In my AMD Ryzen 7 box, his implementation doesn't seem to be the fastest. Nevertheless, compared to other fast alternatives, performance differences are small and results seem very much platform dependent. In my opinion, his implementation is the best, especially, given recent compiler changes. My reasoning follows. My original motivation was contrasting 'year % 400 == 0' with 'year % 16' as in versions 1 and 2: __is_multiple_of_100 ? year % 400 == 0 : year % 4 == 0; // version 1 __is_multiple_of_100 ? year % 16 == 0 : year % 4 == 0; // version 2 It's fair to expect that version 2 won't be slower than version 1. However, this is the case and by quite a big margin. The emitted instructions are very reasonable and, I guess, the issue is the choice of registers. I've reimplemented half of version 2 in inline asm using similar register choices as version 1 and got the performance boost that I was expecting. (By no means I'm suggesting a non portable implementation here. It was just an exercise to make my point. Also, it performs very poorly when compiled by clang.) The point here is that code emitted for sources like versions 1 and 2 (involving %) seems sensitive to very small variations and has been changing across compiler releases in the last 3 years or so. (This can be checked on godbolt.) Clang also seems very confused [2]. Even if the emitted code for version 2 and others were good today, I fear a new compiler release could regress. The stability of the code emitted for Ulrich Drepper's implementation is much more reliable, its performance is very good and its clarity is only compromised by my own hack for __is_multiple_of_100. (Recall, however, that this hack is crucial for his implementation's performance.) Best wishes, Cassio. [1] https://godbolt.org/z/TbGYEqx33 [2] https://bugs.llvm.org/show_bug.cgi?id=50334 On Wed, Jun 23, 2021 at 6:52 PM Jonathan Wakely <jwakely@redhat.com> wrote: > On 23/06/21 18:51 +0100, Jonathan Wakely wrote: > >Here's what I've committed. Tested x86_64-linux and powerpc64le-linux. > >Pushed to trunk. > > > > > > > > >commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb > >Author: Cassio Neri <cassio.neri@gmail.com> > >Date: Wed Jun 23 15:32:16 2021 > > > > libstdc++: More efficient std::chrono::year::leap > > > > Simple change to std::chrono::year::is_leap. If a year is multiple of > 100, > > then it's divisible by 400 if and only if it's divisible by 16. The > latter > > allows for better code generation. > > > > The expression is then either y%16 or y%4 which are both powers of two > > and so it can be rearranged to use simple bitmask operations. > > > > Co-authored-by: Jonathan Wakely <jwakely@redhat.com> > > Co-authored-by: Ulrich Drepper <drepper@redhat.com> > > > > libstdc++-v3/ChangeLog: > > > > * include/std/chrono (chrono::year::is_leap()): Optimize. > > > >diff --git a/libstdc++-v3/include/std/chrono > b/libstdc++-v3/include/std/chrono > >index 4631a727d73..863b6a27bdf 100644 > >--- a/libstdc++-v3/include/std/chrono > >+++ b/libstdc++-v3/include/std/chrono > >@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > // [1] https://github.com/cassioneri/calendar > > // [2] > https://accu.org/journals/overload/28/155/overload155.pdf#page=16 > > > >+ // Furthermore, if y%100 != 0, then y%400==0 is equivalent to > y%16==0, > >+ // so we can rearrange the expression to (mult_100 ? y % 4 : y % > 16)==0 > > But Ulrich pointed out I got my boolean logic all muddled up in the > comment. Fixed with the attached patch! > > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-06-24 11:42 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-21 17:46 [PATCH] libstdc++: More efficient std::chrono::year::leap Cassio Neri 2021-05-21 18:05 ` Koning, Paul 2021-05-21 18:44 ` Cassio Neri 2021-06-23 11:45 ` Jonathan Wakely 2021-06-23 13:16 ` Jonathan Wakely 2021-06-23 17:51 ` Jonathan Wakely 2021-06-23 17:52 ` Jonathan Wakely 2021-06-24 11:42 ` Cassio Neri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).