public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Cassio Neri <cassio.neri@gmail.com>
To: Jonathan Wakely <jwakely@redhat.com>
Cc: "Koning, Paul" <Paul.Koning@dell.com>,
	"libstdc++" <libstdc++@gcc.gnu.org>,
	Gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.
Date: Thu, 24 Jun 2021 12:42:06 +0100	[thread overview]
Message-ID: <CAOfgUPhvp5kbzCMcOy4EqVN0MeX9i1qqJ4tH=L3djK_8MYJAOQ@mail.gmail.com> (raw)
In-Reply-To: <YNN05My95aTkoZw6@redhat.com>

Thanks Jonathan.

Here are some benchmarks (assembly in [1]):
https://quick-bench.com/q/jclBXmi4QLDcRMLuuVpxTUsFmQw

Unfortunately, quick-bench times out unless some implementations are
commented out. You can copy the code and run it locally (needs google
benchmark) to get the full picture.

I really like Ulrich Drepper's implementation (version 8). GCC 11 emmits
really good code and looking at previous versions this has been very
consistent. This implementation also highlights very clearly that my hack
to calculate __is_multiple_of_100 (as opposed to % 100 used in version 7)
saves a ror instruction with remarkable performance effects.

In my AMD Ryzen 7 box, his implementation doesn't seem to be the fastest.
Nevertheless, compared to other fast alternatives, performance differences
are small and results seem very much platform dependent. In my opinion, his
implementation is the best, especially, given recent compiler changes. My
reasoning follows.

My original motivation was contrasting 'year % 400 == 0' with 'year % 16'
as in versions 1 and 2:

  __is_multiple_of_100 ? year % 400 == 0 : year % 4 == 0; // version 1
  __is_multiple_of_100 ? year % 16 == 0 : year % 4 == 0; // version 2

It's fair to expect that version 2 won't be slower than version 1. However,
this is the case and by quite a big margin. The emitted instructions are
very reasonable and, I guess, the issue is the choice of registers. I've
reimplemented half of version 2 in inline asm using similar register
choices as version 1 and got the performance boost that I was expecting.
(By no means I'm suggesting a non portable implementation here. It was just
an exercise to make my point. Also, it performs very poorly when compiled
by clang.)

The point here is that code emitted for sources like versions 1 and 2
(involving %) seems sensitive to very small variations and has been
changing across compiler releases in the last 3 years or so. (This can be
checked on godbolt.) Clang also seems very confused [2]. Even if the
emitted code for version 2 and others were good today, I fear a new
compiler release could regress. The stability of the code emitted for
Ulrich Drepper's implementation is much more reliable, its performance is
very good and its clarity is only compromised by my own hack for
__is_multiple_of_100. (Recall, however, that this hack is crucial for his
implementation's performance.)

Best wishes,
Cassio.

[1] https://godbolt.org/z/TbGYEqx33
[2] https://bugs.llvm.org/show_bug.cgi?id=50334

On Wed, Jun 23, 2021 at 6:52 PM Jonathan Wakely <jwakely@redhat.com> wrote:

> On 23/06/21 18:51 +0100, Jonathan Wakely wrote:
> >Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
> >Pushed to trunk.
> >
> >
> >
>
> >commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
> >Author: Cassio Neri <cassio.neri@gmail.com>
> >Date:   Wed Jun 23 15:32:16 2021
> >
> >    libstdc++: More efficient std::chrono::year::leap
> >
> >    Simple change to std::chrono::year::is_leap. If a year is multiple of
> 100,
> >    then it's divisible by 400 if and only if it's divisible by 16. The
> latter
> >    allows for better code generation.
> >
> >    The expression is then either y%16 or y%4 which are both powers of two
> >    and so it can be rearranged to use simple bitmask operations.
> >
> >    Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
> >    Co-authored-by: Ulrich Drepper <drepper@redhat.com>
> >
> >    libstdc++-v3/ChangeLog:
> >
> >            * include/std/chrono (chrono::year::is_leap()): Optimize.
> >
> >diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> >index 4631a727d73..863b6a27bdf 100644
> >--- a/libstdc++-v3/include/std/chrono
> >+++ b/libstdc++-v3/include/std/chrono
> >@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >       // [1] https://github.com/cassioneri/calendar
> >       // [2]
> https://accu.org/journals/overload/28/155/overload155.pdf#page=16
> >
> >+      // Furthermore, if y%100 != 0, then y%400==0 is equivalent to
> y%16==0,
> >+      // so we can rearrange the expression to (mult_100 ? y % 4 : y %
> 16)==0
>
> But Ulrich pointed out I got my boolean logic all muddled up in the
> comment. Fixed with the attached patch!
>
>

      reply	other threads:[~2021-06-24 11:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21 17:46 Cassio Neri
2021-05-21 18:05 ` Koning, Paul
2021-05-21 18:44   ` Cassio Neri
2021-06-23 11:45     ` Jonathan Wakely
2021-06-23 13:16       ` Jonathan Wakely
2021-06-23 17:51         ` Jonathan Wakely
2021-06-23 17:52           ` Jonathan Wakely
2021-06-24 11:42             ` Cassio Neri [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOfgUPhvp5kbzCMcOy4EqVN0MeX9i1qqJ4tH=L3djK_8MYJAOQ@mail.gmail.com' \
    --to=cassio.neri@gmail.com \
    --cc=Paul.Koning@dell.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jwakely@redhat.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).