[Bug tree-optimization/100076] eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on CLX/Znver3

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/100076] eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on CLX/Znver3
Date: Wed, 14 Apr 2021 07:08:22 +0000	[thread overview]
Message-ID: <bug-100076-4-w002b9txzA@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-100076-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100076

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
See also PR90579.  I wonder if there's a way to tell the CPU to not forward
a load - does emitting a lfence inbetween the scalar store and the vector
load fix the issue?

ISTR that the "bad" effect is not so much the delay between flushing the
store buffers to L1 and then loading from L1 but when the CPU speculates
there's no conflicting [not forwardable] store in the store buffer and thus
fetches a wrong value from L1 and thus we have to flush and restart the
pipeline after we discover the conflict late?

Otherwise it's really hard to address these kind of issues - for doubles
and SSE vectorization we might simply vectorize all loads using scalars
but that doesn't scale for larger VFs.  It might eventually be enough to
force peel a single iteration of all loops at the cost of code size
(and performance if there's no STLF issue).

That said, CPU design folks should try to address this by making the
penalty smaller ;)

Can you share a runtime testcase?

next prev parent reply	other threads:[~2021-04-14  7:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14  2:21 [Bug tree-optimization/100076] New: eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX crazylht at gmail dot com
2021-04-14  3:16 ` [Bug tree-optimization/100076] eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on CLX/Znver3 hjl.tools at gmail dot com
2021-04-14  5:28 ` crazylht at gmail dot com
2021-04-14  7:08 ` rguenth at gcc dot gnu.org [this message]
2021-04-14  8:22 ` crazylht at gmail dot com
2021-04-15  7:35 ` rguenth at gcc dot gnu.org
2021-04-15  9:23 ` crazylht at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-100076-4-w002b9txzA@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).