public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
@ 2021-01-28 15:26 marxin at gcc dot gnu.org
  2021-01-29  7:43 ` [Bug tree-optimization/98868] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-28 15:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868

            Bug ID: 98868
           Summary: [8/9/10/11 Regression] polyhedron rnflow.f90
                    regression since r8-2555-g344be1fd47d7d64e
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marxin at gcc dot gnu.org
  Target Milestone: ---

Since the revision the benchmark is much slower:

$ gfortran rnflow.f90 -Ofast -march=znver1 && time ./a.out >/dev/null

0m7.690s -> 0m13.121s

One can see it here:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=194.791.0&plot.1=188.791.0&plot.2=202.791.0&plot.3=154.791.0&plot.4=245.791.0&plot.5=171.791.0&

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
  2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
@ 2021-01-29  7:43 ` rguenth at gcc dot gnu.org
  2021-01-29  8:24 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29  7:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |8.5
   Last reconfirmed|                            |2021-01-29
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
  2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
  2021-01-29  7:43 ` [Bug tree-optimization/98868] " rguenth at gcc dot gnu.org
@ 2021-01-29  8:24 ` rguenth at gcc dot gnu.org
  2021-01-29  9:09 ` marxin at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29  8:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
First of call I can reproduce no runtime difference on a Zen2 machine.

We do get a lot better ranges after the change (that was the purpose).  For
example

-_490: [-INF, +INF]
-_491: [-INF, 2147483646]
+_490: [-INF, 512]
+_491: [-INF, 511]
...
-_63: [-INF, +INF]
-_64: [-INF, 2147483646]
+_63: [2, +INF]
+_64: [1, 2147483646]

which means we likely get more jump threading done which may result in
code layout changes.  Moving the binaries to a zen1 machine does not
reproduce the issue either.  I can't see the specified jump in the graph
either.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
  2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
  2021-01-29  7:43 ` [Bug tree-optimization/98868] " rguenth at gcc dot gnu.org
  2021-01-29  8:24 ` rguenth at gcc dot gnu.org
@ 2021-01-29  9:09 ` marxin at gcc dot gnu.org
  2021-01-29  9:21 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29  9:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
It's likely about a small loop alignment:

# Overhead  Command  Shared Object         Symbol                              
# ........  .......  ....................  ....................................
#
    78.19%  a.out    a.out                 [.] matsim_
    17.00%  a.out    a.out                 [.] evlrnf_

matsim_ hot place (with --show-total-period)

SLOW:

     8653282 :   4017cb: imul   $0x3243f6ad,%esi,%esi
             :            genuni():
             :            genuni = us231 * real (jsee)
   726254541 :   4017d1: vxorps %xmm0,%xmm0,%xmm0
             :            jsee = jsee * jmul + jadd
           0 :   4017d5: add    $0x1b0cb175,%esi
             :            genuni = us231 * real (jsee)
   105853662 :   4017db: vcvtsi2ss %esi,%xmm0,%xmm0
   273371557 :   4017df: vmulss %xmm1,%xmm0,%xmm0
             :            gentrs_():
             :            do icls = icls1, ncls
   454049783 :   4017e3: cmp    $0xffffffff,%edi
     2165881 :   4017e6: je     401970 <matsim_+0x470>
           0 :   4017ec: cmp    $0x1,%edi
     1081799 :   4017ef: jne    4017cb <matsim_+0x2cb>
     2155914 :   4017f1: mov    %r9,%rdx
     4307088 :   4017f4: mov    %r8d,%ecx
           0 :   4017f7: jmp    401811 <matsim_+0x311>
           0 :   4017f9: nopl   0x0(%rax)
  8624612913 :   401800: inc    %ecx
    42153493 :   401802: add    $0x400,%rdx
   484044717 :   401809: cmp    $0x101,%ecx
    38933067 :   40180f: je     4017cb <matsim_+0x2cb>

FAST:

    45442445 :   4017c9: imul   $0x3243f6ad,%edx,%edx
             :            genuni():
             :            genuni = us231 * real (jsee)
     1076892 :   4017cf: vxorps %xmm0,%xmm0,%xmm0
             :            jsee = jsee * jmul + jadd
     3245642 :   4017d3: add    $0x1b0cb175,%edx
             :            jsee = ibits(jsee, 0, 31)                   !
Replacement
     1083699 :   4017d9: and    $0x7fffffff,%edx
             :            genuni = us231 * real (jsee)
           0 :   4017df: vcvtsi2ss %edx,%xmm0,%xmm0
    76652291 :   4017e3: vmulss %xmm1,%xmm0,%xmm0
             :            gentrs_():
             :            do icls = icls1, ncls
   166631920 :   4017e7: cmp    $0xffffffff,%edi
     3251886 :   4017ea: je     401970 <matsim_+0x470>
           0 :   4017f0: cmp    $0x1,%edi
           0 :   4017f3: jne    4017c9 <matsim_+0x2c9>
           0 :   4017f5: mov    %r9,%rcx
           0 :   4017f8: mov    %r8d,%esi
     1083364 :   4017fb: jmp    401811 <matsim_+0x311>
           0 :   4017fd: nopl   (%rax)
  1099920836 :   401800: inc    %esi
   209587136 :   401802: add    $0x400,%rcx
   100391619 :   401809: cmp    $0x101,%esi
    69184337 :   40180f: je     4017c9 <matsim_+0x2c9>

For some reason the hottest "inc" instruction has in fast version ~10x smaller
number of cycles.
The instruction takes 20% of cycles in the slow version.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
  2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-01-29  9:09 ` marxin at gcc dot gnu.org
@ 2021-01-29  9:21 ` rguenther at suse dot de
  2021-01-29  9:22 ` marxin at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenther at suse dot de @ 2021-01-29  9:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 29 Jan 2021, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
> 
> --- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
> It's likely about a small loop alignment:
> 
> # Overhead  Command  Shared Object         Symbol                              
> # ........  .......  ....................  ....................................
> #
>     78.19%  a.out    a.out                 [.] matsim_
>     17.00%  a.out    a.out                 [.] evlrnf_
> 
> matsim_ hot place (with --show-total-period)
> 
> SLOW:
> 
>      8653282 :   4017cb: imul   $0x3243f6ad,%esi,%esi
>              :            genuni():
>              :            genuni = us231 * real (jsee)
>    726254541 :   4017d1: vxorps %xmm0,%xmm0,%xmm0
>              :            jsee = jsee * jmul + jadd
>            0 :   4017d5: add    $0x1b0cb175,%esi
>              :            genuni = us231 * real (jsee)
>    105853662 :   4017db: vcvtsi2ss %esi,%xmm0,%xmm0
>    273371557 :   4017df: vmulss %xmm1,%xmm0,%xmm0
>              :            gentrs_():
>              :            do icls = icls1, ncls
>    454049783 :   4017e3: cmp    $0xffffffff,%edi
>      2165881 :   4017e6: je     401970 <matsim_+0x470>
>            0 :   4017ec: cmp    $0x1,%edi
>      1081799 :   4017ef: jne    4017cb <matsim_+0x2cb>
>      2155914 :   4017f1: mov    %r9,%rdx
>      4307088 :   4017f4: mov    %r8d,%ecx
>            0 :   4017f7: jmp    401811 <matsim_+0x311>
>            0 :   4017f9: nopl   0x0(%rax)
>   8624612913 :   401800: inc    %ecx
>     42153493 :   401802: add    $0x400,%rdx
>    484044717 :   401809: cmp    $0x101,%ecx
>     38933067 :   40180f: je     4017cb <matsim_+0x2cb>
> 
> FAST:
> 
>     45442445 :   4017c9: imul   $0x3243f6ad,%edx,%edx
>              :            genuni():
>              :            genuni = us231 * real (jsee)
>      1076892 :   4017cf: vxorps %xmm0,%xmm0,%xmm0
>              :            jsee = jsee * jmul + jadd
>      3245642 :   4017d3: add    $0x1b0cb175,%edx
>              :            jsee = ibits(jsee, 0, 31)                   !
> Replacement
>      1083699 :   4017d9: and    $0x7fffffff,%edx
>              :            genuni = us231 * real (jsee)
>            0 :   4017df: vcvtsi2ss %edx,%xmm0,%xmm0
>     76652291 :   4017e3: vmulss %xmm1,%xmm0,%xmm0
>              :            gentrs_():
>              :            do icls = icls1, ncls
>    166631920 :   4017e7: cmp    $0xffffffff,%edi
>      3251886 :   4017ea: je     401970 <matsim_+0x470>
>            0 :   4017f0: cmp    $0x1,%edi
>            0 :   4017f3: jne    4017c9 <matsim_+0x2c9>
>            0 :   4017f5: mov    %r9,%rcx
>            0 :   4017f8: mov    %r8d,%esi
>      1083364 :   4017fb: jmp    401811 <matsim_+0x311>
>            0 :   4017fd: nopl   (%rax)
>   1099920836 :   401800: inc    %esi
>    209587136 :   401802: add    $0x400,%rcx
>    100391619 :   401809: cmp    $0x101,%esi
>     69184337 :   40180f: je     4017c9 <matsim_+0x2c9>
> 
> For some reason the hottest "inc" instruction has in fast version ~10x smaller
> number of cycles.
> The instruction takes 20% of cycles in the slow version.

So we likely enter the loop at 401800 and I see the SLOW version
is slightly smaller overall (but only 3 bytes).  So might be some
uop cache or branch predictor aliasing issue in the uarch.  For
some reason we do not align the 40180f backedge jump destination.
The extra and in the FAST case is somewhat odd, but I suppose
we avoid some overflow condition on the conversion by masking
the sign bit?  Maybe the VRP change causes us to assume overflow
doesn't happen.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
  2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-01-29  9:21 ` rguenther at suse dot de
@ 2021-01-29  9:22 ` marxin at gcc dot gnu.org
  2021-01-29  9:36 ` rguenth at gcc dot gnu.org
  2021-01-29  9:38 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29  9:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868

--- Comment #5 from Martin Liška <marxin at gcc dot gnu.org> ---
So I can manually patch the slow version by adding:

  and    $0x7fffffff,%esi

before:

vcvtsi2ss       %esi, %xmm0, %xmm0

and the speed is back ;)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
  2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-01-29  9:22 ` marxin at gcc dot gnu.org
@ 2021-01-29  9:36 ` rguenth at gcc dot gnu.org
  2021-01-29  9:38 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29  9:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, so I have polyhedron11 and remember patching:

!CRAY - The following multiply must be done with 64 bits (not 46 bits)
!       The algoritm depends on the overflow characteristics of
!       a 32 or 64 bit multiply.
      jsee_long = jsee;
      jsee_long = jsee_long * jmul + jadd
      jsee = jsee_long;

see ~gcctest/spectests/c++bench/pb11/rnflow.patch which IIRC else caused
miscompiles at some point due to undefined overflow.  Might explain why
I can't reproduce the issue.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
  2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-01-29  9:36 ` rguenth at gcc dot gnu.org
@ 2021-01-29  9:38 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
dup even.

*** This bug has been marked as a duplicate of bug 71231 ***

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-29  9:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
2021-01-29  7:43 ` [Bug tree-optimization/98868] " rguenth at gcc dot gnu.org
2021-01-29  8:24 ` rguenth at gcc dot gnu.org
2021-01-29  9:09 ` marxin at gcc dot gnu.org
2021-01-29  9:21 ` rguenther at suse dot de
2021-01-29  9:22 ` marxin at gcc dot gnu.org
2021-01-29  9:36 ` rguenth at gcc dot gnu.org
2021-01-29  9:38 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).