[Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
@ 2021-09-23 16:45 jamborm at gcc dot gnu.org
  2021-09-24  6:52 ` [Bug target/102473] [12 Regression] " rguenth at gcc dot gnu.org
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: jamborm at gcc dot gnu.org @ 2021-09-23 16:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

            Bug ID: 102473
           Summary: 521.wrf_r 5% slower at -Ofast and generic x86_64
                    tuning after r12-3426-g8f323c712ea76c
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: crazylht at gmail dot com
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

All three x86_64 LNT machines have detected a 4.5-5.2% performance
regression of SPEC FPrate 2017 benchmarks 521.wrf_r when compiled with
-Ofast and the default (generic) march and mtune.

Zen2 based machine regressed by 5%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=294.548.0
Zen1 based machine regressed by 5.2%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=35.548.0
Kabylake based machine regressed by 4.5%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=34.548.0

On an AMD zen2 based machine I have bisected the regression to commit
r12-3426-g8f323c712ea76c:

8f323c712ea76cc4506b03895e9b991e4e4b2baf is the first bad commit
commit 8f323c712ea76cc4506b03895e9b991e4e4b2baf
Author: liuhongt <hongtao.liu@intel.com>
Date:   Tue Sep 7 12:39:04 2021 +0800

    Optimize v4sf reduction.

    gcc/ChangeLog:

            PR target/101059
            * config/i386/sse.md (reduc_plus_scal_<mode>): Split to ..
            (reduc_plus_scal_v4sf): .. this, New define_expand.
            (reduc_plus_scal_v2df): .. and this, New define_expand.


I have confirmed that the commit causes a similar regression on
another Intel Skylake server.

On the Zen2 machine, this is the difference in samples collected by
perf for different symbols (before is commit 60eec23b5ed, after commit
8f323c712ea):

| Symbol                                      | sys lib | Before | After | 
diff |     % |
|---------------------------------------------+---------+--------+-------+-------+-------|
| __logf_fma                                  | yes     |  68882 | 68940 |  
+58 | +0.08 |
| __atanf                                     | yes     |  66664 | 66196 | 
-468 | -0.70 |
| __module_advect_em_MOD_advect_scalar_pd     | no      |  62286 | 62348 |  
+62 | +0.10 |
| __powf_fma                                  | yes     |  56213 | 56127 |  
-86 | -0.15 |
| __module_mp_wsm5_MOD_nislfv_rain_plm        | no      |  46990 | 48340 |
+1350 | +2.87 |
| __module_mp_wsm5_MOD_wsm52d                 | no      |  41031 | 40968 |  
-63 | -0.15 |
| __module_small_step_em_MOD_advance_uv       | no      |  30908 | 30909 |   
+1 | +0.00 |
| __module_small_step_em_MOD_advance_w        | no      |  28738 | 28600 | 
-138 | -0.48 |
| __module_advect_em_MOD_advect_scalar        | no      |  28400 | 28429 |  
+29 | +0.10 |
| __expf_fma                                  | yes     |  26702 | 26516 | 
-186 | -0.70 |
| __module_big_step_utilities_em_MOD_phy_prep | no      |  25878 | 25816 |  
-62 | -0.24 |
| psim_unstable_                              | no      |  24994 | 25106 | 
+112 | +0.45 |
| __module_bl_ysu_MOD_ysu2d                   | no      |  24799 | 25251 | 
+452 | +1.82 |
| psih_unstable_                              | no      |  22600 | 23139 | 
+539 | +2.38 |
| __module_small_step_em_MOD_advance_mu_t     | no      |  22250 | 22232 |  
-18 | -0.08 |
| __memset_avx2_unaligned_erms                | yes     |  21748 | 21613 | 
-135 | -0.62 |
| _ZGVbN4vv_powf_sse4                         | yes     |  21206 | 21355 | 
+149 | +0.70 |


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
@ 2021-09-24  6:52 ` rguenth at gcc dot gnu.org
  2021-09-24  7:39 ` crazylht at gmail dot com
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-24  6:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Target Milestone|---                         |12.0
   Last reconfirmed|                            |2021-09-24
            Summary|521.wrf_r 5% slower at      |[12 Regression] 521.wrf_r
                   |-Ofast and generic x86_64   |5% slower at -Ofast and
                   |tuning after                |generic x86_64 tuning after
                   |r12-3426-g8f323c712ea76c    |r12-3426-g8f323c712ea76c
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Looks like at least on Zen movs[hl]dup is on the integer domain so we'l see a
domain crossing penalty here(?).  But since this is a generic arch/tuning
regression the SSE2 code path should be what matters - on the committed
testcase I see

foo:
.LFB572:
        .cfi_startproc
        pxor    %xmm0, %xmm0
        addss   (%rdi), %xmm0
        addss   4(%rdi), %xmm0
        addss   8(%rdi), %xmm0
        addss   12(%rdi), %xmm0
        ret

where it seems that the vectorizer doesn't pick up the reduction pattern.

/home/rguenther/src/gcc2/gcc/testsuite/gcc.target/i386/sse2-pr101059.c:20:21:
note:   vect_is_simple_use: vectype vector(4) float
/home/rguenther/src/gcc2/gcc/testsuite/gcc.target/i386/sse2-pr101059.c:20:21:
missed:   reduc op not supported by target.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
  2021-09-24  6:52 ` [Bug target/102473] [12 Regression] " rguenth at gcc dot gnu.org
@ 2021-09-24  7:39 ` crazylht at gmail dot com
  2021-09-24  7:50 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-24  7:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> Looks like at least on Zen movs[hl]dup is on the integer domain so we'l see
> a domain crossing penalty here(?).  But since this is a generic arch/tuning
> regression the SSE2 code path should be what matters - on the committed
> testcase I see
> 
> foo:
> .LFB572:
>         .cfi_startproc
>         pxor    %xmm0, %xmm0
>         addss   (%rdi), %xmm0
>         addss   4(%rdi), %xmm0
>         addss   8(%rdi), %xmm0
>         addss   12(%rdi), %xmm0
>         ret
> 
> where it seems that the vectorizer doesn't pick up the reduction pattern.
> 
Guess you're use O3, -ffast-math is needed for v4sf reduction
https://godbolt.org/z/sjf4Pncna

And original code also have movhlps.

BTW: i can't reproduce the regression on CLX/coffelake for one copy run.
options are below

 521.wrf_r: "gfortran -m64" (in FC) "gcc -m64" (in CC)
            "gfortran -m64" (in LD)
            "-fconvert=big-endian -std=legacy -fno-inline-arg-packing" (in
FPORTABILITY)
            "-mtune=generic -Ofast -mfpmath=sse -fno-associative-math" (in
OPTIMIZE)
            "-fno-stack-arrays" (in EXTRA_OPTIMIZE)
            "-Wl,-z,muldefs" (in EXTRA_LDFLAGS)


> /home/rguenther/src/gcc2/gcc/testsuite/gcc.target/i386/sse2-pr101059.c:20:21:
> note:   vect_is_simple_use: vectype vector(4) float
> /home/rguenther/src/gcc2/gcc/testsuite/gcc.target/i386/sse2-pr101059.c:20:21:
> missed:   reduc op not supported by target.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
  2021-09-24  6:52 ` [Bug target/102473] [12 Regression] " rguenth at gcc dot gnu.org
  2021-09-24  7:39 ` crazylht at gmail dot com
@ 2021-09-24  7:50 ` rguenth at gcc dot gnu.org
  2021-09-24 10:43 ` crazylht at gmail dot com
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-24  7:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Richard Biener from comment #1)
> > Looks like at least on Zen movs[hl]dup is on the integer domain so we'l see
> > a domain crossing penalty here(?).  But since this is a generic arch/tuning
> > regression the SSE2 code path should be what matters - on the committed
> > testcase I see
> > 
> > foo:
> > .LFB572:
> >         .cfi_startproc
> >         pxor    %xmm0, %xmm0
> >         addss   (%rdi), %xmm0
> >         addss   4(%rdi), %xmm0
> >         addss   8(%rdi), %xmm0
> >         addss   12(%rdi), %xmm0
> >         ret
> > 
> > where it seems that the vectorizer doesn't pick up the reduction pattern.
> > 
> Guess you're use O3, -ffast-math is needed for v4sf reduction
> https://godbolt.org/z/sjf4Pncna

I was looking at the testcase as compiled by the testsuite.

It seems that adding __attribute__((optimize("tree-vectorize"))) makes the
loop no longer vectorized as it at the moment cancels -ffast-math.

IIRC when Martin commits his optimize() re-org it will no longer do that.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-09-24  7:50 ` rguenth at gcc dot gnu.org
@ 2021-09-24 10:43 ` crazylht at gmail dot com
  2021-09-26  7:29 ` crazylht at gmail dot com
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-24 10:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Richard Biener from comment #1)
> > Looks like at least on Zen movs[hl]dup is on the integer domain so we'l see
> > a domain crossing penalty here(?).  But since this is a generic arch/tuning
> > regression the SSE2 code path should be what matters - on the committed
> > testcase I see
> > 
> > foo:
> > .LFB572:
> >         .cfi_startproc
> >         pxor    %xmm0, %xmm0
> >         addss   (%rdi), %xmm0
> >         addss   4(%rdi), %xmm0
> >         addss   8(%rdi), %xmm0
> >         addss   12(%rdi), %xmm0
> >         ret
> > 
> > where it seems that the vectorizer doesn't pick up the reduction pattern.
> > 
> Guess you're use O3, -ffast-math is needed for v4sf reduction
> https://godbolt.org/z/sjf4Pncna
> 
> And original code also have movhlps.
> 
> BTW: i can't reproduce the regression on CLX/coffelake for one copy run.
> options are below
> 
>  521.wrf_r: "gfortran -m64" (in FC) "gcc -m64" (in CC)
>             "gfortran -m64" (in LD)
>             "-fconvert=big-endian -std=legacy -fno-inline-arg-packing" (in
> FPORTABILITY)
>             "-mtune=generic -Ofast -mfpmath=sse -fno-associative-math" (in
> OPTIMIZE)
Reproduced after removing -fno-associative-math.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-09-24 10:43 ` crazylht at gmail dot com
@ 2021-09-26  7:29 ` crazylht at gmail dot com
  2021-09-27  2:01 ` crazylht at gmail dot com
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-26  7:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
Regression also exists for -march=x86-64 -msse3 -mtune=generic -Ofast.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-09-26  7:29 ` crazylht at gmail dot com
@ 2021-09-27  2:01 ` crazylht at gmail dot com
  2021-09-27  2:16 ` crazylht at gmail dot com
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27  2:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---

> 
> | Symbol                                      | sys lib | Before | After | 
> diff |     % |
> |---------------------------------------------+---------+--------+-------+---
> ----+-------|
> | __logf_fma                                  | yes     |  68882 | 68940 |  
> +58 | +0.08 |
> | __atanf                                     | yes     |  66664 | 66196 | 
> -468 | -0.70 |
> | __module_advect_em_MOD_advect_scalar_pd     | no      |  62286 | 62348 |  
> +62 | +0.10 |
> | __powf_fma                                  | yes     |  56213 | 56127 |  
> -86 | -0.15 |
> | __module_mp_wsm5_MOD_nislfv_rain_plm        | no      |  46990 | 48340 |
> +1350 | +2.87 |

Does it means cycles? 
Vtune data show __module_mp_wsm5_MOD_nislfv_rain_plm has less instructions
retired and clocksticks after my commit. And the regression comes from
libc-2.31.so which shoud be the same.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-09-27  2:01 ` crazylht at gmail dot com
@ 2021-09-27  2:16 ` crazylht at gmail dot com
  2021-09-27  7:49 ` crazylht at gmail dot com
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27  2:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
> retired and clocksticks after my commit. And the regression comes from
> libc-2.31.so which shoud be the same.

difference in libc-2.31.so comes from frond-end bandwidth MITE, very low DSB
coverage.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-09-27  2:16 ` crazylht at gmail dot com
@ 2021-09-27  7:49 ` crazylht at gmail dot com
  2021-09-27  7:51 ` cvs-commit at gcc dot gnu.org
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27  7:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #7)
> > retired and clocksticks after my commit. And the regression comes from
> > libc-2.31.so which shoud be the same.
> 
> difference in libc-2.31.so comes from frond-end bandwidth MITE, very low DSB
> coverage.

I'm going to revert the patch.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-09-27  7:49 ` crazylht at gmail dot com
@ 2021-09-27  7:51 ` cvs-commit at gcc dot gnu.org
  2021-09-27  8:10 ` jamborm at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-27  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:e7b8d7020052110e5717230104e647f6235dd2c1

commit r12-3892-ge7b8d7020052110e5717230104e647f6235dd2c1
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Sep 27 14:57:02 2021 +0800

    Revert "Optimize v4sf reduction.".

    This reverts commit 8f323c712ea76cc4506b03895e9b991e4e4b2baf.

         PR target/102473
         PR target/101059

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-09-27  7:51 ` cvs-commit at gcc dot gnu.org
@ 2021-09-27  8:10 ` jamborm at gcc dot gnu.org
  2021-09-27  8:18 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jamborm at gcc dot gnu.org @ 2021-09-27  8:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #10 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #6)
> Does it means cycles? 

Basically yes, AFAIK.  Basically I ran both versions under perf record
and then processed the output (so that is not so wide) of perf report
-n --stdio --percent-limit=2 (where -n is the thing that gives you
"samples").

> Vtune data show __module_mp_wsm5_MOD_nislfv_rain_plm has less instructions
> retired and clocksticks after my commit. And the regression comes from
> libc-2.31.so which shoud be the same.

I tend to think that any glibc from 2.29 is good enough to reproduce this.
For what it's worth, the system I tried this on has glib 2.33

My examination was very preliminary, because wrf takes ages to build,
I hoped I would point people to the important bit.  I am not sure I
succeeded though.

(In reply to Hongtao.liu from comment #8)
> 
> I'm going to revert the patch.

This is your call.  I actually dot not think that compiling wrf_r for
pre-AVX2 targets is a very important use case, the regression was just
so consistent that I thought it was worth investigating (and of course
it would be great if it could be avoided).

So it depends whether the patch has speed benefits in more common
circumstances or not.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-09-27  8:10 ` jamborm at gcc dot gnu.org
@ 2021-09-27  8:18 ` crazylht at gmail dot com
  2021-09-27 14:27 ` hjl.tools at gmail dot com
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Martin Jambor from comment #10)
> (In reply to Hongtao.liu from comment #6)
> > Does it means cycles? 
> 
> Basically yes, AFAIK.  Basically I ran both versions under perf record
> and then processed the output (so that is not so wide) of perf report
> -n --stdio --percent-limit=2 (where -n is the thing that gives you
> "samples").
> 
> > Vtune data show __module_mp_wsm5_MOD_nislfv_rain_plm has less instructions
> > retired and clocksticks after my commit. And the regression comes from
> > libc-2.31.so which shoud be the same.
> 
> I tend to think that any glibc from 2.29 is good enough to reproduce this.
> For what it's worth, the system I tried this on has glib 2.33
> 

I tried, the regression also exists for glibc 2.33 and 2.28. And also on ICX.
> My examination was very preliminary, because wrf takes ages to build,
> I hoped I would point people to the important bit.  I am not sure I
> succeeded though.
> 
> (In reply to Hongtao.liu from comment #8)
> > 
> > I'm going to revert the patch.
> 
> This is your call.  I actually dot not think that compiling wrf_r for
> pre-AVX2 targets is a very important use case, the regression was just
> so consistent that I thought it was worth investigating (and of course
> it would be great if it could be avoided).
Yes, Still investigating why, it may hit some microarchitecture bound which
can't be explained w/o help from hardward team.
> 
> So it depends whether the patch has speed benefits in more common
> circumstances or not.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-09-27  8:18 ` crazylht at gmail dot com
@ 2021-09-27 14:27 ` hjl.tools at gmail dot com
  2021-09-28  2:20 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hjl.tools at gmail dot com @ 2021-09-27 14:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> ---
Are glibc regressions real? Please show the affected glibc assembly codes
before and after.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2021-09-27 14:27 ` hjl.tools at gmail dot com
@ 2021-09-28  2:20 ` crazylht at gmail dot com
  2021-09-28  2:24 ` hjl.tools at gmail dot com
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-28  2:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to H.J. Lu from comment #12)
> Are glibc regressions real? Please show the affected glibc assembly codes
> before and after.

Assembly codes is the same, but DSB coverage drop down. before my commit 
front-end bound of libc-2.31.so is 1.2%, after my commit front-end bound raise
up to 21.9%, use -falign-functions=64 doesn't help.

The below code is copy from one of libc functions which has big front-end
bounds

0x5f741 0       Block 123:      
0x5f741 0       jz 0x607b0 <Block 379>  0
0x5f747 0       Block 124:      
0x5f747 0       movq  -0x78(%rbp), %rdx 0
0x5f74b 0       Block 125:      
0x5f74b 0       sub $0x1, %r14d 130500000
0x5f74f 0       cmp %rbx, %r15  
0x5f752 0       jz 0x5f813 <Block 135>  
0x5f758 0       Block 126:      
0x5f758 0       movl  -0x98(%rbp), %ecx 0
0x5f75e 0       Block 127:      
0x5f75e 0       movl  -0x8(%r15), %eax  29000000
0x5f762 0       sub $0x18, %r15 203000000
0x5f766 0       add %r12d, %eax 0
0x5f769 0       sub $0x1, %eax  58000000
0x5f76c 0       cmp %ecx, %eax  159500000
0x5f76e 0       jnle 0x5f74b <Block 125>        0
0x5f770 0       Block 128:      
0x5f770 0       movq  -0x70(%rbp), %rdi 43500000
0x5f774 0       test %rdx, %rdx 29000000
0x5f777 0       jz 0x5f700 <Block 119>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2021-09-28  2:20 ` crazylht at gmail dot com
@ 2021-09-28  2:24 ` hjl.tools at gmail dot com
  2021-09-28  2:59 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hjl.tools at gmail dot com @ 2021-09-28  2:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #14 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Hongtao.liu from comment #13)
> (In reply to H.J. Lu from comment #12)
> > Are glibc regressions real? Please show the affected glibc assembly codes
> > before and after.
> 
> Assembly codes is the same, but DSB coverage drop down. before my commit 
> front-end bound of libc-2.31.so is 1.2%, after my commit front-end bound
> raise up to 21.9%, use -falign-functions=64 doesn't help.
> 
> The below code is copy from one of libc functions which has big front-end
> bounds
> 

Which functions in glibc get lower DSB coverage?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2021-09-28  2:24 ` hjl.tools at gmail dot com
@ 2021-09-28  2:59 ` crazylht at gmail dot com
  2022-01-20  9:53 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-09-28  2:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to H.J. Lu from comment #14)
> (In reply to Hongtao.liu from comment #13)
> > (In reply to H.J. Lu from comment #12)
> > > Are glibc regressions real? Please show the affected glibc assembly codes
> > > before and after.
> > 
> > Assembly codes is the same, but DSB coverage drop down. before my commit 
> > front-end bound of libc-2.31.so is 1.2%, after my commit front-end bound
> > raise up to 21.9%, use -falign-functions=64 doesn't help.
> > 
> > The below code is copy from one of libc functions which has big front-end
> > bounds
> > 
> 
> Which functions in glibc get lower DSB coverage?

About a dozen small functions have been affected,  no symbol info, the second
column is time(seconds)


 func@0x5eed4   1.77755
 func@0x58de4   1.53788
 func@0x76f20   0.963671
 func@0x5ea50   0.758953
 func@0x83750   0.349518
 func@0x60d59   0.284607
 func@0x799e4   0.164773
 func@0x59b64   0.15978
 func@0x8f070   0.1448

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2021-09-28  2:59 ` crazylht at gmail dot com
@ 2022-01-20  9:53 ` rguenth at gcc dot gnu.org
  2022-05-06  8:31 ` [Bug target/102473] [12/13 " jakub at gcc dot gnu.org
  2022-07-26 13:27 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-20  9:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
             Status|NEW                         |WAITING

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
The change causing the regression was reverted, correct?  So we can close this
bug?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12/13 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2022-01-20  9:53 ` rguenth at gcc dot gnu.org
@ 2022-05-06  8:31 ` jakub at gcc dot gnu.org
  2022-07-26 13:27 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.0                        |12.2

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/102473] [12/13 Regression] 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c
  2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2022-05-06  8:31 ` [Bug target/102473] [12/13 " jakub at gcc dot gnu.org
@ 2022-07-26 13:27 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-26 13:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102473

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|WAITING                     |RESOLVED

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
Assuming so.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-07-26 13:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-23 16:45 [Bug target/102473] New: 521.wrf_r 5% slower at -Ofast and generic x86_64 tuning after r12-3426-g8f323c712ea76c jamborm at gcc dot gnu.org
2021-09-24  6:52 ` [Bug target/102473] [12 Regression] " rguenth at gcc dot gnu.org
2021-09-24  7:39 ` crazylht at gmail dot com
2021-09-24  7:50 ` rguenth at gcc dot gnu.org
2021-09-24 10:43 ` crazylht at gmail dot com
2021-09-26  7:29 ` crazylht at gmail dot com
2021-09-27  2:01 ` crazylht at gmail dot com
2021-09-27  2:16 ` crazylht at gmail dot com
2021-09-27  7:49 ` crazylht at gmail dot com
2021-09-27  7:51 ` cvs-commit at gcc dot gnu.org
2021-09-27  8:10 ` jamborm at gcc dot gnu.org
2021-09-27  8:18 ` crazylht at gmail dot com
2021-09-27 14:27 ` hjl.tools at gmail dot com
2021-09-28  2:20 ` crazylht at gmail dot com
2021-09-28  2:24 ` hjl.tools at gmail dot com
2021-09-28  2:59 ` crazylht at gmail dot com
2022-01-20  9:53 ` rguenth at gcc dot gnu.org
2022-05-06  8:31 ` [Bug target/102473] [12/13 " jakub at gcc dot gnu.org
2022-07-26 13:27 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).