* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
@ 2021-01-29 7:43 ` rguenth at gcc dot gnu.org
2021-01-29 8:24 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 7:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |8.5
Last reconfirmed| |2021-01-29
Status|UNCONFIRMED |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Keywords| |missed-optimization
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
2021-01-29 7:43 ` [Bug tree-optimization/98868] " rguenth at gcc dot gnu.org
@ 2021-01-29 8:24 ` rguenth at gcc dot gnu.org
2021-01-29 9:09 ` marxin at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 8:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
First of call I can reproduce no runtime difference on a Zen2 machine.
We do get a lot better ranges after the change (that was the purpose). For
example
-_490: [-INF, +INF]
-_491: [-INF, 2147483646]
+_490: [-INF, 512]
+_491: [-INF, 511]
...
-_63: [-INF, +INF]
-_64: [-INF, 2147483646]
+_63: [2, +INF]
+_64: [1, 2147483646]
which means we likely get more jump threading done which may result in
code layout changes. Moving the binaries to a zen1 machine does not
reproduce the issue either. I can't see the specified jump in the graph
either.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
2021-01-29 7:43 ` [Bug tree-optimization/98868] " rguenth at gcc dot gnu.org
2021-01-29 8:24 ` rguenth at gcc dot gnu.org
@ 2021-01-29 9:09 ` marxin at gcc dot gnu.org
2021-01-29 9:21 ` rguenther at suse dot de
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29 9:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
It's likely about a small loop alignment:
# Overhead Command Shared Object Symbol
# ........ ....... .................... ....................................
#
78.19% a.out a.out [.] matsim_
17.00% a.out a.out [.] evlrnf_
matsim_ hot place (with --show-total-period)
SLOW:
8653282 : 4017cb: imul $0x3243f6ad,%esi,%esi
: genuni():
: genuni = us231 * real (jsee)
726254541 : 4017d1: vxorps %xmm0,%xmm0,%xmm0
: jsee = jsee * jmul + jadd
0 : 4017d5: add $0x1b0cb175,%esi
: genuni = us231 * real (jsee)
105853662 : 4017db: vcvtsi2ss %esi,%xmm0,%xmm0
273371557 : 4017df: vmulss %xmm1,%xmm0,%xmm0
: gentrs_():
: do icls = icls1, ncls
454049783 : 4017e3: cmp $0xffffffff,%edi
2165881 : 4017e6: je 401970 <matsim_+0x470>
0 : 4017ec: cmp $0x1,%edi
1081799 : 4017ef: jne 4017cb <matsim_+0x2cb>
2155914 : 4017f1: mov %r9,%rdx
4307088 : 4017f4: mov %r8d,%ecx
0 : 4017f7: jmp 401811 <matsim_+0x311>
0 : 4017f9: nopl 0x0(%rax)
8624612913 : 401800: inc %ecx
42153493 : 401802: add $0x400,%rdx
484044717 : 401809: cmp $0x101,%ecx
38933067 : 40180f: je 4017cb <matsim_+0x2cb>
FAST:
45442445 : 4017c9: imul $0x3243f6ad,%edx,%edx
: genuni():
: genuni = us231 * real (jsee)
1076892 : 4017cf: vxorps %xmm0,%xmm0,%xmm0
: jsee = jsee * jmul + jadd
3245642 : 4017d3: add $0x1b0cb175,%edx
: jsee = ibits(jsee, 0, 31) !
Replacement
1083699 : 4017d9: and $0x7fffffff,%edx
: genuni = us231 * real (jsee)
0 : 4017df: vcvtsi2ss %edx,%xmm0,%xmm0
76652291 : 4017e3: vmulss %xmm1,%xmm0,%xmm0
: gentrs_():
: do icls = icls1, ncls
166631920 : 4017e7: cmp $0xffffffff,%edi
3251886 : 4017ea: je 401970 <matsim_+0x470>
0 : 4017f0: cmp $0x1,%edi
0 : 4017f3: jne 4017c9 <matsim_+0x2c9>
0 : 4017f5: mov %r9,%rcx
0 : 4017f8: mov %r8d,%esi
1083364 : 4017fb: jmp 401811 <matsim_+0x311>
0 : 4017fd: nopl (%rax)
1099920836 : 401800: inc %esi
209587136 : 401802: add $0x400,%rcx
100391619 : 401809: cmp $0x101,%esi
69184337 : 40180f: je 4017c9 <matsim_+0x2c9>
For some reason the hottest "inc" instruction has in fast version ~10x smaller
number of cycles.
The instruction takes 20% of cycles in the slow version.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
` (2 preceding siblings ...)
2021-01-29 9:09 ` marxin at gcc dot gnu.org
@ 2021-01-29 9:21 ` rguenther at suse dot de
2021-01-29 9:22 ` marxin at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenther at suse dot de @ 2021-01-29 9:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 29 Jan 2021, marxin at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
>
> --- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
> It's likely about a small loop alignment:
>
> # Overhead Command Shared Object Symbol
> # ........ ....... .................... ....................................
> #
> 78.19% a.out a.out [.] matsim_
> 17.00% a.out a.out [.] evlrnf_
>
> matsim_ hot place (with --show-total-period)
>
> SLOW:
>
> 8653282 : 4017cb: imul $0x3243f6ad,%esi,%esi
> : genuni():
> : genuni = us231 * real (jsee)
> 726254541 : 4017d1: vxorps %xmm0,%xmm0,%xmm0
> : jsee = jsee * jmul + jadd
> 0 : 4017d5: add $0x1b0cb175,%esi
> : genuni = us231 * real (jsee)
> 105853662 : 4017db: vcvtsi2ss %esi,%xmm0,%xmm0
> 273371557 : 4017df: vmulss %xmm1,%xmm0,%xmm0
> : gentrs_():
> : do icls = icls1, ncls
> 454049783 : 4017e3: cmp $0xffffffff,%edi
> 2165881 : 4017e6: je 401970 <matsim_+0x470>
> 0 : 4017ec: cmp $0x1,%edi
> 1081799 : 4017ef: jne 4017cb <matsim_+0x2cb>
> 2155914 : 4017f1: mov %r9,%rdx
> 4307088 : 4017f4: mov %r8d,%ecx
> 0 : 4017f7: jmp 401811 <matsim_+0x311>
> 0 : 4017f9: nopl 0x0(%rax)
> 8624612913 : 401800: inc %ecx
> 42153493 : 401802: add $0x400,%rdx
> 484044717 : 401809: cmp $0x101,%ecx
> 38933067 : 40180f: je 4017cb <matsim_+0x2cb>
>
> FAST:
>
> 45442445 : 4017c9: imul $0x3243f6ad,%edx,%edx
> : genuni():
> : genuni = us231 * real (jsee)
> 1076892 : 4017cf: vxorps %xmm0,%xmm0,%xmm0
> : jsee = jsee * jmul + jadd
> 3245642 : 4017d3: add $0x1b0cb175,%edx
> : jsee = ibits(jsee, 0, 31) !
> Replacement
> 1083699 : 4017d9: and $0x7fffffff,%edx
> : genuni = us231 * real (jsee)
> 0 : 4017df: vcvtsi2ss %edx,%xmm0,%xmm0
> 76652291 : 4017e3: vmulss %xmm1,%xmm0,%xmm0
> : gentrs_():
> : do icls = icls1, ncls
> 166631920 : 4017e7: cmp $0xffffffff,%edi
> 3251886 : 4017ea: je 401970 <matsim_+0x470>
> 0 : 4017f0: cmp $0x1,%edi
> 0 : 4017f3: jne 4017c9 <matsim_+0x2c9>
> 0 : 4017f5: mov %r9,%rcx
> 0 : 4017f8: mov %r8d,%esi
> 1083364 : 4017fb: jmp 401811 <matsim_+0x311>
> 0 : 4017fd: nopl (%rax)
> 1099920836 : 401800: inc %esi
> 209587136 : 401802: add $0x400,%rcx
> 100391619 : 401809: cmp $0x101,%esi
> 69184337 : 40180f: je 4017c9 <matsim_+0x2c9>
>
> For some reason the hottest "inc" instruction has in fast version ~10x smaller
> number of cycles.
> The instruction takes 20% of cycles in the slow version.
So we likely enter the loop at 401800 and I see the SLOW version
is slightly smaller overall (but only 3 bytes). So might be some
uop cache or branch predictor aliasing issue in the uarch. For
some reason we do not align the 40180f backedge jump destination.
The extra and in the FAST case is somewhat odd, but I suppose
we avoid some overflow condition on the conversion by masking
the sign bit? Maybe the VRP change causes us to assume overflow
doesn't happen.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
` (3 preceding siblings ...)
2021-01-29 9:21 ` rguenther at suse dot de
@ 2021-01-29 9:22 ` marxin at gcc dot gnu.org
2021-01-29 9:36 ` rguenth at gcc dot gnu.org
2021-01-29 9:38 ` rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-29 9:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
--- Comment #5 from Martin Liška <marxin at gcc dot gnu.org> ---
So I can manually patch the slow version by adding:
and $0x7fffffff,%esi
before:
vcvtsi2ss %esi, %xmm0, %xmm0
and the speed is back ;)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
` (4 preceding siblings ...)
2021-01-29 9:22 ` marxin at gcc dot gnu.org
@ 2021-01-29 9:36 ` rguenth at gcc dot gnu.org
2021-01-29 9:38 ` rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 9:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, so I have polyhedron11 and remember patching:
!CRAY - The following multiply must be done with 64 bits (not 46 bits)
! The algoritm depends on the overflow characteristics of
! a 32 or 64 bit multiply.
jsee_long = jsee;
jsee_long = jsee_long * jmul + jadd
jsee = jsee_long;
see ~gcctest/spectests/c++bench/pb11/rnflow.patch which IIRC else caused
miscompiles at some point due to undefined overflow. Might explain why
I can't reproduce the issue.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98868] [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e
2021-01-28 15:26 [Bug tree-optimization/98868] New: [8/9/10/11 Regression] polyhedron rnflow.f90 regression since r8-2555-g344be1fd47d7d64e marxin at gcc dot gnu.org
` (5 preceding siblings ...)
2021-01-29 9:36 ` rguenth at gcc dot gnu.org
@ 2021-01-29 9:38 ` rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 9:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98868
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |DUPLICATE
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
dup even.
*** This bug has been marked as a duplicate of bug 71231 ***
^ permalink raw reply [flat|nested] 8+ messages in thread