public inbox for
 help / color / mirror / Atom feed
From: Alexander Monakov <>
To: Paul Zimmermann <>
Subject: Re: slowdown with -std=gnu18 with respect to -std=c99
Date: Fri, 6 May 2022 12:27:39 +0300 (MSK)	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Fri, 6 May 2022, Paul Zimmermann via Gcc-help wrote:

> here are latency metrics (still on i5-4590):
>             | gcc-9 | gcc-10 | gcc-11 |
> ------------|-------|--------|--------|
>  -std=c99   | 70.8  | 70.3   | 70.2   |
>  -std=gnu18 | 59.5  | 59.5   | 59.5   |
> It thus seems the issue only appears for the reciprocal throughput.


The primary issue here is false dependency on vcvtss2sd instruction. In the
snippet shown in Stéphane's email, the slower variant begins with

    vcvtss2sd   -0x4(%rsp),%xmm1,%xmm1

The cvtss2sd instruction is specified to take the upper bits of SSE register
unmodified, so here it merges high bits of xmm1 with results of float->double
conversion (in low bits) into new xmm1. Unless the CPU can track dependencies
separately for vector register components, it has to delay this instruction
until the previous computation that modified xmm1 has completed (AMD Zen2 is
an example of a microarchitecture that apparently can).

This limits the degree to which separate cr_log10f can overlap, affecting
throughput. In latency measurements, the calls are already serialized by
dependency over xmm0, so the additional false dependency does not matter.

(so fma is a "red herring", it's just that depending on compiler version and
flags, register allocation will place last assignment into xmm1 differently)

If you want to experiment, you can hand-edit assembly to replace the problematic
instruction with variants that avoid the false dependency, such as

    vcvtss2sd %xmm0, %xmm0, %xmm1


    vpxor %xmm1, %xmm1, %xmm1
    vcvtss2sd   -0x4(%rsp),%xmm1,%xmm1

GCC has code to do this automatically, but for some reason it doesn't work for
your function. I have reported in to the Bugzilla:


  reply	other threads:[~2022-05-06  9:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03  8:28 Paul Zimmermann
2022-05-03  9:09 ` Alexander Monakov
2022-05-03 11:45   ` Paul Zimmermann
2022-05-03 12:12     ` Alexander Monakov
2022-05-05  8:57   ` Stéphane Glondu
2022-05-05 14:31     ` Stéphane Glondu
2022-05-05 14:41       ` Marc Glisse
2022-05-05 14:56         ` Alexander Monakov
2022-05-06  7:46           ` Paul Zimmermann
2022-05-06  9:27             ` Alexander Monakov [this message]
2022-05-07  6:11               ` Paul Zimmermann
2022-05-11 13:26               ` Alexander Monakov
2022-05-05 17:50         ` Paul Zimmermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).