On 23/06/21 14:16 +0100, Jonathan Wakely wrote: >On 23/06/21 12:45 +0100, Jonathan Wakely wrote: >>On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote: >>>I've checked the generated code and the compiler doesn't figure out >>>the logic. I added a comment to explain. >>> >>>(Revised patch below and attached.) >>> >>>Best wishes, >>>Cassio. >>> >>>--- >>> >>>Simple change to std::chrono::year::is_leap. If a year is multiple of 100, >>>then it's divisible by 400 if and only if it's divisible by 16. The latter >>>allows for better code generation. >>> >>>Tested on x86_64-pc-linux-gnu. >>> >>>libstdc++-v3/ChangeLog: >>>libstdc++-v3/ChangeLog: >>> >>> * include/std/chrono: >>> >>>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono >>>index 4631a727d73..85aa0379432 100644 >>>--- a/libstdc++-v3/include/std/chrono >>>+++ b/libstdc++-v3/include/std/chrono >>>@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; >>> const bool __is_multiple_of_100 >>> = __multiplier * (_M_y + __offset) < __bound; >>>- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; >>>+ // Usually we test _M_y % 400 == 0 but, when it's already known that >>>+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0. >> ^^ >> N.B. this comment should say != >> >>>+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0; >> >>If y % 16 == 0 then y % 4 == 0 too. So we could write that as: >> >> return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0; >> >>This seems to perform even better over a wide range of inputs, can you >>confirm that result with your own tests? >> >>However, my microbenchmark also shows that the simplistic code using >>y%100 often performs even better than the current code calculating >>__is_multiple_of_100 to avoid the modulus operation. So maybe my tests >>are bad. >> >>My rearranged expression above is equivalent to: >> >> return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0; >> >>which can be written without branches: >> >> return _M_y % (4 << (2 * __is_multiple_of_100)) == 0; >> >>However, both Clang and GCC already remove the branch for (x ? 16 : 4) >>and the conditional expression produces slightly smaller code with >>GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither >>of >>these seems to improve compared to my first rearrangement above. > >This version from Ulrich Drepper produces the smallest code of all >(and also performs well, if not the absolute fastest) in my >benchmarks: > > return (y & (__is_multiple_of_100 ? 15 : 3)) == 0; Here's what I've committed. Tested x86_64-linux and powerpc64le-linux. Pushed to trunk.