[Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
@ 2020-03-27 23:30 jamborm at gcc dot gnu.org
  2020-03-30  5:16 ` [Bug tree-optimization/94375] " crazylht at gmail dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-03-27 23:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

            Bug ID: 94375
           Summary: 548.exchange2_r run time is 8-18% worse than GCC 9 at
                    -Ofast -march=native
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

When compiled with trunk revision 26b3e568a60 and options -Ofast
-march=native -mtune=native, SPEC 2017 INTrate benchmark
548.exchange2_r runs 19% slower on AMD Zen2 and 12% slower on Intel
Cascade Lake than when built with GCC 9.2.

It appears that the main culprit is the vectorizer, switching it off
recovers the performance - it is in fact even some 4% better than GCC
9 on AMD).

Side note: with --param ipa-cp-eval-threshold=1 --param
ipa-cp-unit-growth=80 one can exchange that is 25% faster yet but that
is a different issue.

This started happening in the autumn but not exactly at one point, as
the following table of run-times relative to GCC 9.2 shows. 

Revision:                  time 
-------------------------  ----
d82f38123b5 (Nov 14 2019)  117%
d9adca6e663 (Nov 5 2019)   117%
bf037872d3c (Oct 24 2019)  101%
77ef339456f (Oct 14 2019)  118%
38a734350fd (Oct 3 2019)   100%
d469a71e5a0 (Sep 23 2019)  101%


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
  2020-03-27 23:30 [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native jamborm at gcc dot gnu.org
@ 2020-03-30  5:16 ` crazylht at gmail dot com
  2020-03-30  8:04 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2020-03-30  5:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548
according to our experience.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
  2020-03-27 23:30 [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native jamborm at gcc dot gnu.org
  2020-03-30  5:16 ` [Bug tree-optimization/94375] " crazylht at gmail dot com
@ 2020-03-30  8:04 ` rguenth at gcc dot gnu.org
  2020-03-30 10:02 ` jamborm at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-03-30  8:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Do we ever hit the vectorized paths?  I guess the number of iterations isn't
bound so the cost model has a hard time, possibly only triggering at runtime.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
  2020-03-27 23:30 [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native jamborm at gcc dot gnu.org
  2020-03-30  5:16 ` [Bug tree-optimization/94375] " crazylht at gmail dot com
  2020-03-30  8:04 ` rguenth at gcc dot gnu.org
@ 2020-03-30 10:02 ` jamborm at gcc dot gnu.org
  2020-03-31  1:50 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-03-30 10:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #1)
> Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548
> according to our experience.

I have seen this helping on one system running SLES 15.1 and with
trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE
Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020).  So,
from my perspective, perhaps it helps, perhaps it doesn't.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
  2020-03-27 23:30 [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-03-30 10:02 ` jamborm at gcc dot gnu.org
@ 2020-03-31  1:50 ` crazylht at gmail dot com
  2020-03-31  1:51 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2020-03-31  1:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Martin Jambor from comment #3)
> (In reply to Hongtao.liu from comment #1)
> > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548
> > according to our experience.
> 
> I have seen this helping on one system running SLES 15.1 and with
> trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE
> Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020).  So,
> from my perspective, perhaps it helps, perhaps it doesn't.

What's your GCC option for OPENSUSE?

Default value of -mprefer-vector-width for -mtune=zenver1 is 128, if that, it
won't help.
Different processor have different tune which may has different default vector
width.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
  2020-03-27 23:30 [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-03-31  1:50 ` crazylht at gmail dot com
@ 2020-03-31  1:51 ` crazylht at gmail dot com
  2020-04-01 20:36 ` jamborm at gcc dot gnu.org
  2021-02-04 17:23 ` jamborm at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2020-03-31  1:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #4)
> (In reply to Martin Jambor from comment #3)
> > (In reply to Hongtao.liu from comment #1)
> > > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548
> > > according to our experience.
> > 
> > I have seen this helping on one system running SLES 15.1 and with
> > trunk abe13e1847f (Feb 17 2020) but not on another running openSUSE
> > Tumbleweed and with trunk revision 26b3e568a60 (Mar 23 2020).  So,
> > from my perspective, perhaps it helps, perhaps it doesn't.
> 
> What's your GCC option for OPENSUSE?
> 
> Default value of -mprefer-vector-width for -mtune=zenver1 is 128, if that,
> it won't help.
> Different processor have different tune which may has different default
> vector width.

for -march=native, it depends on processor of your server/client.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
  2020-03-27 23:30 [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native jamborm at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2020-03-31  1:51 ` crazylht at gmail dot com
@ 2020-04-01 20:36 ` jamborm at gcc dot gnu.org
  2021-02-04 17:23 ` jamborm at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-04-01 20:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

--- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> Do we ever hit the vectorized paths?

What's the best way to find out?  If I open the disassembled code in
perf report and search for ymm, some of these (groups of) instructions
have (very few) samples, but more often they don't.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native
  2020-03-27 23:30 [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native jamborm at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2020-04-01 20:36 ` jamborm at gcc dot gnu.org
@ 2021-02-04 17:23 ` jamborm at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jamborm at gcc dot gnu.org @ 2021-02-04 17:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Thu bug as specified here has been fixed by commits 31584824665, 91153e0af9a,
67ce9099bc9, 1e7fdc02cba, 7d2cb2755a1.

We can still do better on the benchmark if we fix PR 98782.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-04 17:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 23:30 [Bug tree-optimization/94375] New: 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native jamborm at gcc dot gnu.org
2020-03-30  5:16 ` [Bug tree-optimization/94375] " crazylht at gmail dot com
2020-03-30  8:04 ` rguenth at gcc dot gnu.org
2020-03-30 10:02 ` jamborm at gcc dot gnu.org
2020-03-31  1:50 ` crazylht at gmail dot com
2020-03-31  1:51 ` crazylht at gmail dot com
2020-04-01 20:36 ` jamborm at gcc dot gnu.org
2021-02-04 17:23 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).