* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
@ 2021-04-30 6:51 ` rguenth at gcc dot gnu.org
2021-04-30 8:54 ` jakub at gcc dot gnu.org
` (16 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-30 6:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |12.0, 9.3.1
Priority|P3 |P2
Last reconfirmed| |2021-04-30
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Component|target |rtl-optimization
Target Milestone|--- |10.4
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
2021-04-30 6:51 ` [Bug rtl-optimization/100342] " rguenth at gcc dot gnu.org
@ 2021-04-30 8:54 ` jakub at gcc dot gnu.org
2021-05-04 12:50 ` ubizjak at gmail dot com
` (15 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-30 8:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r10-3542-g0b92cf305dcf34387a8e2564e55ca8948df3b47a
Stopped on the trunk with r12-139-g7d6bb80931b429631f63e0fd27bee95f32eb57a9
So most likely was latent before and after those.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
2021-04-30 6:51 ` [Bug rtl-optimization/100342] " rguenth at gcc dot gnu.org
2021-04-30 8:54 ` jakub at gcc dot gnu.org
@ 2021-05-04 12:50 ` ubizjak at gmail dot com
2021-05-04 13:08 ` ubizjak at gmail dot com
` (14 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2021-05-04 12:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
For some reason the *input* value at BSWAP insn is truncated to 32bits.
v256u128 v256u128_1 =
SHLV (SHLSV (__builtin_bswap64 (u128_0), (v256u128) (0 < v256u128_0)) <=
0, v256u128_0);
u128_0 is an argument to foo0, passes value of 123842323652213865 (decimal).
The binary has only one BSWAP insn, so:
$ objdump -dr a.out | grep bswap
401ddd: 48 0f ca bswap %rdx
(gdb) b *0x401ddd
Breakpoint 1 at 0x401ddd: file pr100342.c, line 72.
Breakpoint 1, 0x0000000000401ddd in foo0 (u8_0=u8_0@entry=0 '\000',
v128u8_0=v128u8_0@entry=..., v512u8_0=..., u16_0=u16_0@entry=10,
v128u16_0=v128u16_0@entry=..., u32_0=u32_0@entry=4, u64_0=u64_0@entry=2,
v512u64_0=..., u128_0=<optimized out>, v256u128_0=..., v512u128_0=...)
at zzz.c:72
72 SHLV (SHLSV (__builtin_bswap64 (u128_0), (v256u128) (0 <
v256u128_0)) <=
(gdb) p $rdx
$2 = 3983735913
(gdb) p/x $rdx
$3 = 0xed72fc69
We can "fix" the clobbered value manually in gdb session:
(gdb) set $rdx = 123842323652213865
(gdb) p/x $rdx
$4 = 0x1b7f9efed72fc69
(gdb) c
Continuing.
[Inferior 1 (process 20630) exited normally]
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (2 preceding siblings ...)
2021-05-04 12:50 ` ubizjak at gmail dot com
@ 2021-05-04 13:08 ` ubizjak at gmail dot com
2021-05-04 15:14 ` ubizjak at gmail dot com
` (13 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2021-05-04 13:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
The problematic insn is:
401cec: 44 89 f6 mov %r14d,%esi
This one should be 64 bit wide,
movl %r14d, %esi # 613 [c=4 l=3] *movqi_internal/2
but is actually a QImode move from r14 to esi.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (3 preceding siblings ...)
2021-05-04 13:08 ` ubizjak at gmail dot com
@ 2021-05-04 15:14 ` ubizjak at gmail dot com
2021-05-04 15:29 ` jakub at gcc dot gnu.org
` (12 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2021-05-04 15:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
The problem can be seen in _.pro_and_epilogue pass:
Starting with:
_.cmpelim
2741: r14:DI=[sp:DI+0x38]
...
368: di:DI=r14:DI
...
613: si:QI=r14:QI
...
2737: bp:DI=r14:DI
...
658: strict_low_part(ax:QI)=bp:QI
2275: dx:DI=si:DI
...
2287: dx:DI=bp:DI
710: dx:DI=bswap(dx:DI)
The pass converts:
_.pro_and_epilogue
rescanning insn with uid = 658.
insn 658: replaced reg 6 with 4
verify found no changes in insn with uid = 658.
verify found no changes in insn with uid = 2275.
deleting insn with uid = 2287.
2741: r14:DI=[sp:DI+0x38]
...
368: di:DI=r14:DI
...
613: si:QI=r14:QI
...
2737: bp:DI=r14:DI
...
658: strict_low_part(ax:QI)=si:QI
2275: dx:DI=si:DI
...
2287: ?
710: dx:DI=bswap(dx:DI)
Please note how the optimization removes insn 2287, assuming that rsi holds the
whole 64bit value ... Which is not true!
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (4 preceding siblings ...)
2021-05-04 15:14 ` ubizjak at gmail dot com
@ 2021-05-04 15:29 ` jakub at gcc dot gnu.org
2021-05-04 15:33 ` ubizjak at gmail dot com
` (11 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-04 15:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |segher at gcc dot gnu.org
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
That seems to happen during shrink-wrapping:
#0 df_insn_delete (insn=0x7fffe9e3a6c0) at ../../gcc/df-scan.c:932
#1 0x0000000000978328 in delete_insn (insn=0x7fffe9e3a6c0) at
../../gcc/cfgrtl.c:178
#2 0x0000000000d5a270 in copyprop_hardreg_forward_1 (bb=<basic_block
0x7fffe9f763a8 (2)>, vd=<optimized out>) at ../../gcc/regcprop.c:825
#3 0x0000000000d5b772 in copyprop_hardreg_forward_bb_without_debug_insn
(bb=<basic_block 0x7fffe9f763a8 (2)>) at ../../gcc/regcprop.c:1211
#4 0x0000000000db616c in prepare_shrink_wrap (entry_block=<basic_block
0x7fffe9f763a8 (2)>) at ../../gcc/shrink-wrap.c:451
#5 try_shrink_wrapping (entry_edge=0x7fffffffd518,
prologue_seq=0x7fffe9e81440) at ../../gcc/shrink-wrap.c:674
#6 0x0000000000ae7ffc in thread_prologue_and_epilogue_insns () at
../../gcc/function.c:6025
#7 0x0000000000ae86c3 in rest_of_handle_thread_prologue_and_epilogue () at
../../gcc/function.c:6510
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (5 preceding siblings ...)
2021-05-04 15:29 ` jakub at gcc dot gnu.org
@ 2021-05-04 15:33 ` ubizjak at gmail dot com
2021-05-04 15:34 ` ubizjak at gmail dot com
` (10 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2021-05-04 15:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
I have traced a bit where (insn 2275) and (insn 2287) come from.
In _.ira, we have:
613: r125:QI=r2067:DI#0
...
659: zero_extract(r2080:DI,0x8,0x8)=r125:QI#0
And in _.reload, a DImode reload is inserted before zero_extract RTX:
613: si:QI=r14:QI
...
2275: dx:DI=si:DI
659: zero_extract(ax:DI,0x8,0x8)=dx:DI
However, %rsi is only initialized with QImode value in (insn 613).
This word-mode reload for zero_extract RTX interferes with register propagation
in subsequent passes, as shown in the comment #5.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (6 preceding siblings ...)
2021-05-04 15:33 ` ubizjak at gmail dot com
@ 2021-05-04 15:34 ` ubizjak at gmail dot com
2021-05-04 16:36 ` jakub at gcc dot gnu.org
` (9 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2021-05-04 15:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
FYI, this whole analysis was done with Fedora 33 system compiler:
gcc version 10.3.1 20210422 (Red Hat 10.3.1-1) (GCC)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (7 preceding siblings ...)
2021-05-04 15:34 ` ubizjak at gmail dot com
@ 2021-05-04 16:36 ` jakub at gcc dot gnu.org
2021-05-04 16:57 ` jakub at gcc dot gnu.org
` (8 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-04 16:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Looking at current 10 branch (previously looked at 11), I see:
(insn 2741 1965 368 2 (set (reg:DI 42 r14 [orig:2067 u128_0 ] [2067])
(mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int 56 [0x38])) [9 %sfp+-1032 S8 A64]))
"pr100342.c":65:24 66 {*movdi_internal}
(nil))
...
(insn 613 612 1970 2 (set (reg:QI 4 si [orig:125 _98 ] [125])
(reg:QI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:15 69
{*movqi_internal}
(nil))
...
(insn 2737 1971 615 2 (set (reg:DI 6 bp [orig:2067 u128_0 ] [2067])
(reg:DI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:12 66
{*movdi_internal}
(nil))
...
(insn 2275 658 659 2 (set (reg:DI 1 dx [2225])
(reg:DI 4 si [orig:125 _98 ] [125])) "pr100342.c":68:12 66
{*movdi_internal}
(nil))
...
(insn 2287 2286 710 2 (set (reg:DI 1 dx [orig:131 _104 ] [131])
(reg:DI 6 bp [orig:2067 u128_0 ] [2067])) "pr100342.c":72:5 66
{*movdi_internal}
(nil))
(insn 710 2287 2172 2 (set (reg:DI 1 dx [orig:131 _104 ] [131])
(bswap:DI (reg:DI 1 dx [orig:131 _104 ] [131]))) "pr100342.c":72:5 872
{*bswapdi2}
(nil))
in the reload dump and I must say I don't find anything wrong on that,
what looks wrong is that regcprop thinks that the insn 2287 is a redundant move
because it thinks that at that point %rdx contains the same value as %rbp -
/* Detect noop sets and remove them before processing side effects. */
if (set && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set)))
{
unsigned int regno = REGNO (SET_SRC (set));
rtx r1 = find_oldest_value_reg (REGNO_REG_CLASS (regno),
SET_DEST (set), vd);
rtx r2 = find_oldest_value_reg (REGNO_REG_CLASS (regno),
SET_SRC (set), vd);
if (rtx_equal_p (r1 ? r1 : SET_DEST (set), r2 ? r2 : SET_SRC (set)))
{
bool last = insn == BB_END (bb);
delete_insn (insn);
r1 is %rbp and r2 is NULL, so it is compared against SET_SRC (set) of %rbp.
Something misses that only the lowpart QImode is equivalent that way.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (8 preceding siblings ...)
2021-05-04 16:36 ` jakub at gcc dot gnu.org
@ 2021-05-04 16:57 ` jakub at gcc dot gnu.org
2021-05-05 10:59 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-04 16:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
And there is one more important insn in between 2737 and 2275, in particular
(insn 2911 2867 2853 2 (set (reg:DI 42 r14 [2223])
(const_int 72057594037927935 [0xffffffffffffff])) "pr100342.c":68:12 66
{*movdi_internal}
(nil))
On that one the old value of r14 is lost.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (9 preceding siblings ...)
2021-05-04 16:57 ` jakub at gcc dot gnu.org
@ 2021-05-05 10:59 ` jakub at gcc dot gnu.org
2021-05-05 15:12 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-05 10:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |liuhongt at gcc dot gnu.org
--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(insn 2741 1965 368 2 (set (reg:DI 42 r14 [orig:2067 u128_0 ] [2067])
(mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int 56 [0x38])) [9 %sfp+-1032 S8 A64]))
"pr100342.c":65:24 66 {*movdi_internal}
(nil))
...
(insn 613 612 1970 2 (set (reg:QI 4 si [orig:125 _98 ] [125])
(reg:QI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:15 69
{*movqi_internal}
(nil))
...
(insn 2737 1971 615 2 (set (reg:DI 6 bp [orig:2067 u128_0 ] [2067])
(reg:DI 42 r14 [orig:2067 u128_0 ] [2067])) "pr100342.c":68:12 66
{*movdi_internal}
(nil))
...
(insn 2829 2749 2855 2 (set (reg:DI 42 r14 [2222])
(const_int -71776119061217281 [0xff00ffffffffffff])) "pr100342.c":68:12
66 {*movdi_internal}
(nil))
...
(insn 2275 658 659 2 (set (reg:DI 1 dx [2225])
(reg:DI 4 si [orig:125 _98 ] [125])) "pr100342.c":68:12 66
{*movdi_internal}
(nil))
...
(insn 2910 2860 677 2 (set (reg:DI 4 si [2227])
(const_int -1095216660481 [0xffffff00ffffffff])) "pr100342.c":68:12 66
{*movdi_internal}
(nil))
...
(insn 2287 2286 710 2 (set (reg:DI 1 dx [orig:131 _104 ] [131])
(reg:DI 6 bp [orig:2067 u128_0 ] [2067])) "pr100342.c":72:5 66
{*movdi_internal}
(nil))
(insn 710 2287 2172 2 (set (reg:DI 1 dx [orig:131 _104 ] [131])
(bswap:DI (reg:DI 1 dx [orig:131 _104 ] [131]))) "pr100342.c":72:5 872
{*bswapdi2}
(nil))
In copyprop_hardreg_forward_1 at the start of that insn 2737 we have:
(gdb) p vd->e[1]
$32 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$33 = {mode = E_QImode, oldest_regno = 42, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[6]
$34 = {mode = E_SImode, oldest_regno = 6, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[42]
$35 = {mode = E_DImode, oldest_regno = 42, next_regno = 4, debug_insn_changes =
0x0}
which is I think ok, it says %sil is QImode copy of %r14, all other regs we
care about here contain their values.
At the end of that we have:
(gdb) p vd->e[1]
$36 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$37 = {mode = E_QImode, oldest_regno = 42, next_regno = 6, debug_insn_changes =
0x0}
(gdb) p vd->e[6]
$38 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[42]
$39 = {mode = E_DImode, oldest_regno = 42, next_regno = 4, debug_insn_changes =
0x0}
as well as before the insn 2829. That changes it to:
(gdb) p vd->e[1]
$51 = {mode = E_DImode, oldest_regno = 1, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$52 = {mode = E_QImode, oldest_regno = 4, next_regno = 6, debug_insn_changes =
0x0}
(gdb) p vd->e[6]
$53 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[42]
$54 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295,
debug_insn_changes = 0x0}
Same thing at the start of insn 2275. But after that we have:
(gdb) p vd->e[1]
$61 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$62 = {mode = E_QImode, oldest_regno = 4, next_regno = 6, debug_insn_changes =
0x0}
(gdb) p vd->e[6]
$63 = {mode = E_DImode, oldest_regno = 4, next_regno = 1, debug_insn_changes =
0x0}
(gdb) p vd->e[42]
$64 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295,
debug_insn_changes = 0x0}
I think this step is what looks wrong, it is true that the movq %rsi, %rdx
copied all 64-bits of %rsi, but vd->e[4].mode was just QImode, so I think
vd->e[1].mode
should be also QImode.
That remains until before insn 2910, which changes that to:
(gdb) p vd->e[1]
$84 = {mode = E_DImode, oldest_regno = 6, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[4]
$85 = {mode = E_DImode, oldest_regno = 4, next_regno = 4294967295,
debug_insn_changes = 0x0}
(gdb) p vd->e[6]
$86 = {mode = E_DImode, oldest_regno = 6, next_regno = 1, debug_insn_changes =
0x0}
(gdb) p vd->e[42]
$87 = {mode = E_DImode, oldest_regno = 42, next_regno = 4294967295,
debug_insn_changes = 0x0}
and at that point, the fact that %rdx doesn't contain the same value as %rbp -
only
%dl is guaranteed to be the same as %bpl - is lost.
Then when we look up the oldest value of %rdx on insn 2287 we are told it is
%rbp.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (10 preceding siblings ...)
2021-05-05 10:59 ` jakub at gcc dot gnu.org
@ 2021-05-05 15:12 ` jakub at gcc dot gnu.org
2021-05-06 5:22 ` crazylht at gmail dot com
` (5 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-05 15:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 50761
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50761&action=edit
gcc11-pr100342.patch
Untested fix. But what do I know about regcprop?
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (11 preceding siblings ...)
2021-05-05 15:12 ` jakub at gcc dot gnu.org
@ 2021-05-06 5:22 ` crazylht at gmail dot com
2021-05-15 8:14 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: crazylht at gmail dot com @ 2021-05-06 5:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #12)
> Created attachment 50761 [details]
> gcc11-pr100342.patch
>
> Untested fix. But what do I know about regcprop?
I just want to confirm that this patch alone also fixed pr98694.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (12 preceding siblings ...)
2021-05-06 5:22 ` crazylht at gmail dot com
@ 2021-05-15 8:14 ` cvs-commit at gcc dot gnu.org
2021-05-31 14:08 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-05-15 8:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f
commit r12-808-g425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f
Author: Jakub Jelinek <jakub@redhat.com>
Date: Sat May 15 10:12:11 2021 +0200
regcprop: Fix another cprop_hardreg bug [PR100342]
On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches
wrote:
> Ah, ok, thanks for the extra context.
>
> So AIUI the problem when recording xmm2<-di isn't just:
>
> [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>
> but also that:
>
> [B] partial_subreg_p (vd->e[sr].mode,
vd->e[vd->e[sr].oldest_regno].mode)
>
> For example, all registers in this sequence can be part of the same
chain:
>
> (set (reg:HI R1) (reg:HI R0))
> (set (reg:SI R2) (reg:SI R1)) // [A]
> (set (reg:DI R3) (reg:DI R2)) // [A]
> (set (reg:SI R4) (reg:SI R[0-3]))
> (set (reg:HI R5) (reg:HI R[0-4]))
>
> But:
>
> (set (reg:SI R1) (reg:SI R0))
> (set (reg:HI R2) (reg:HI R1))
> (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>
> is problematic because it dips below the precision of the oldest regno
> and then increases again.
>
> When this happens, I guess we have two choices:
>
> (1) what the patch does: treat R3 as the start of a new chain.
> (2) pretend that the copy occured in vd->e[sr].mode instead
> (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>
> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> Maybe the optimisation provided by (2) compared to (1) isn't common
> enough to be worth the complication.
>
> I think we should test [B] as well as [A] though. The pass is set
> up to do some quite elaborate mode changes and I think rejecting
> [A] on its own would make some of the other code redundant.
> It also feels like it should be a seperate âifâ or âelse ifâ,
> with its own comment.
Unfortunately, we now have a testcase that shows that testing also [B]
is a problem (unfortunately now latent on the trunk, only reproduces
on 10 and 11 branches).
The comment in the patch tries to list just the interesting instructions,
we have a 64-bit value, copy low 8 bit of those to another register,
copy full 64 bits to another register and then clobber the original
register.
Before that (set (reg:DI r14) (const_int ...)) we have a chain
DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain,
so
we have QI si, DI bp , si being the oldest_regno.
Next DI si is copied into DI dx. Only the low 8 bits of that are defined,
the rest is unspecified, but we would add DI dx into that same chain at the
end, so QI si, DI bp, DI dx [*]. Next si is overwritten, so the chain is
DI bp, DI dx. And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
as redundant, because we think bp and dx are already equivalent, when in
reality that is true only for the lowpart 8 bits.
I believe the [*] marked step above is where the bug is.
The committed regcprop.c (copy_value) change (but only committed to
trunk/11, not to 10) added
else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
&& partial_subreg_p (vd->e[sr].mode,
vd->e[vd->e[sr].oldest_regno].mode))
return;
and while the first partial_subreg_p call returns true, the second one
doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
true and we'd return, but as that reg got clobbered, si became the oldest
regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
But as the testcase shows, what is the oldest_regno in the chain is
something that changes over time, so relying on it for anything is
problematic, something could have a different oldest_regno and later
on get a different oldest_regno (perhaps with different mode) because
the oldest_regno got overwritten and it can change both ways.
The following patch effectively implements your (2) above.
2021-05-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/100342
* regcprop.c (copy_value): When copying a source reg in a wider
mode than it has recorded for the value, adjust recorded
destination
mode too or punt if !REG_CAN_CHANGE_MODE_P.
* gcc.target/i386/pr100342.c: New test.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (13 preceding siblings ...)
2021-05-15 8:14 ` cvs-commit at gcc dot gnu.org
@ 2021-05-31 14:08 ` cvs-commit at gcc dot gnu.org
2021-05-31 14:56 ` [Bug rtl-optimization/100342] [10 " jakub at gcc dot gnu.org
` (2 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-05-31 14:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:81c2cd08fafcdc0ff48f6cf440bef86f98822a23
commit r11-8486-g81c2cd08fafcdc0ff48f6cf440bef86f98822a23
Author: Jakub Jelinek <jakub@redhat.com>
Date: Sat May 15 10:12:11 2021 +0200
regcprop: Fix another cprop_hardreg bug [PR100342]
On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches
wrote:
> Ah, ok, thanks for the extra context.
>
> So AIUI the problem when recording xmm2<-di isn't just:
>
> [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>
> but also that:
>
> [B] partial_subreg_p (vd->e[sr].mode,
vd->e[vd->e[sr].oldest_regno].mode)
>
> For example, all registers in this sequence can be part of the same
chain:
>
> (set (reg:HI R1) (reg:HI R0))
> (set (reg:SI R2) (reg:SI R1)) // [A]
> (set (reg:DI R3) (reg:DI R2)) // [A]
> (set (reg:SI R4) (reg:SI R[0-3]))
> (set (reg:HI R5) (reg:HI R[0-4]))
>
> But:
>
> (set (reg:SI R1) (reg:SI R0))
> (set (reg:HI R2) (reg:HI R1))
> (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>
> is problematic because it dips below the precision of the oldest regno
> and then increases again.
>
> When this happens, I guess we have two choices:
>
> (1) what the patch does: treat R3 as the start of a new chain.
> (2) pretend that the copy occured in vd->e[sr].mode instead
> (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>
> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> Maybe the optimisation provided by (2) compared to (1) isn't common
> enough to be worth the complication.
>
> I think we should test [B] as well as [A] though. The pass is set
> up to do some quite elaborate mode changes and I think rejecting
> [A] on its own would make some of the other code redundant.
> It also feels like it should be a seperate âifâ or âelse ifâ,
> with its own comment.
Unfortunately, we now have a testcase that shows that testing also [B]
is a problem (unfortunately now latent on the trunk, only reproduces
on 10 and 11 branches).
The comment in the patch tries to list just the interesting instructions,
we have a 64-bit value, copy low 8 bit of those to another register,
copy full 64 bits to another register and then clobber the original
register.
Before that (set (reg:DI r14) (const_int ...)) we have a chain
DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain,
so
we have QI si, DI bp , si being the oldest_regno.
Next DI si is copied into DI dx. Only the low 8 bits of that are defined,
the rest is unspecified, but we would add DI dx into that same chain at the
end, so QI si, DI bp, DI dx [*]. Next si is overwritten, so the chain is
DI bp, DI dx. And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
as redundant, because we think bp and dx are already equivalent, when in
reality that is true only for the lowpart 8 bits.
I believe the [*] marked step above is where the bug is.
The committed regcprop.c (copy_value) change (but only committed to
trunk/11, not to 10) added
else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
&& partial_subreg_p (vd->e[sr].mode,
vd->e[vd->e[sr].oldest_regno].mode))
return;
and while the first partial_subreg_p call returns true, the second one
doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
true and we'd return, but as that reg got clobbered, si became the oldest
regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
But as the testcase shows, what is the oldest_regno in the chain is
something that changes over time, so relying on it for anything is
problematic, something could have a different oldest_regno and later
on get a different oldest_regno (perhaps with different mode) because
the oldest_regno got overwritten and it can change both ways.
The following patch effectively implements your (2) above.
2021-05-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/100342
* regcprop.c (copy_value): When copying a source reg in a wider
mode than it has recorded for the value, adjust recorded
destination
mode too or punt if !REG_CAN_CHANGE_MODE_P.
* gcc.target/i386/pr100342.c: New test.
(cherry picked from commit 425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (14 preceding siblings ...)
2021-05-31 14:08 ` cvs-commit at gcc dot gnu.org
@ 2021-05-31 14:56 ` jakub at gcc dot gnu.org
2022-05-10 8:18 ` cvs-commit at gcc dot gnu.org
2022-05-10 10:45 ` jakub at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-31 14:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[10/11 Regression] wrong |[10 Regression] wrong code
|code with -O2 -fno-dse |with -O2 -fno-dse
|-fno-forward-propagate |-fno-forward-propagate
|-mno-sse2 |-mno-sse2
--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed also for GCC 11.2.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (15 preceding siblings ...)
2021-05-31 14:56 ` [Bug rtl-optimization/100342] [10 " jakub at gcc dot gnu.org
@ 2022-05-10 8:18 ` cvs-commit at gcc dot gnu.org
2022-05-10 10:45 ` jakub at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-10 8:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #17 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:90330d0e046b11817382db6545bfd8e882f21d41
commit r10-10612-g90330d0e046b11817382db6545bfd8e882f21d41
Author: Jakub Jelinek <jakub@redhat.com>
Date: Sat May 15 10:12:11 2021 +0200
regcprop: Fix another cprop_hardreg bug [PR100342]
On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches
wrote:
> Ah, ok, thanks for the extra context.
>
> So AIUI the problem when recording xmm2<-di isn't just:
>
> [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>
> but also that:
>
> [B] partial_subreg_p (vd->e[sr].mode,
vd->e[vd->e[sr].oldest_regno].mode)
>
> For example, all registers in this sequence can be part of the same
chain:
>
> (set (reg:HI R1) (reg:HI R0))
> (set (reg:SI R2) (reg:SI R1)) // [A]
> (set (reg:DI R3) (reg:DI R2)) // [A]
> (set (reg:SI R4) (reg:SI R[0-3]))
> (set (reg:HI R5) (reg:HI R[0-4]))
>
> But:
>
> (set (reg:SI R1) (reg:SI R0))
> (set (reg:HI R2) (reg:HI R1))
> (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>
> is problematic because it dips below the precision of the oldest regno
> and then increases again.
>
> When this happens, I guess we have two choices:
>
> (1) what the patch does: treat R3 as the start of a new chain.
> (2) pretend that the copy occured in vd->e[sr].mode instead
> (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>
> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> Maybe the optimisation provided by (2) compared to (1) isn't common
> enough to be worth the complication.
>
> I think we should test [B] as well as [A] though. The pass is set
> up to do some quite elaborate mode changes and I think rejecting
> [A] on its own would make some of the other code redundant.
> It also feels like it should be a seperate âifâ or âelse ifâ,
> with its own comment.
Unfortunately, we now have a testcase that shows that testing also [B]
is a problem (unfortunately now latent on the trunk, only reproduces
on 10 and 11 branches).
The comment in the patch tries to list just the interesting instructions,
we have a 64-bit value, copy low 8 bit of those to another register,
copy full 64 bits to another register and then clobber the original
register.
Before that (set (reg:DI r14) (const_int ...)) we have a chain
DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain,
so
we have QI si, DI bp , si being the oldest_regno.
Next DI si is copied into DI dx. Only the low 8 bits of that are defined,
the rest is unspecified, but we would add DI dx into that same chain at the
end, so QI si, DI bp, DI dx [*]. Next si is overwritten, so the chain is
DI bp, DI dx. And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
as redundant, because we think bp and dx are already equivalent, when in
reality that is true only for the lowpart 8 bits.
I believe the [*] marked step above is where the bug is.
The committed regcprop.c (copy_value) change (but only committed to
trunk/11, not to 10) added
else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
&& partial_subreg_p (vd->e[sr].mode,
vd->e[vd->e[sr].oldest_regno].mode))
return;
and while the first partial_subreg_p call returns true, the second one
doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
true and we'd return, but as that reg got clobbered, si became the oldest
regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
But as the testcase shows, what is the oldest_regno in the chain is
something that changes over time, so relying on it for anything is
problematic, something could have a different oldest_regno and later
on get a different oldest_regno (perhaps with different mode) because
the oldest_regno got overwritten and it can change both ways.
The following patch effectively implements your (2) above.
2021-05-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/100342
* regcprop.c (copy_value): When copying a source reg in a wider
mode than it has recorded for the value, adjust recorded
destination
mode too or punt if !REG_CAN_CHANGE_MODE_P.
* gcc.target/i386/pr100342.c: New test.
(cherry picked from commit 425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug rtl-optimization/100342] [10 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2
2021-04-29 20:10 [Bug target/100342] New: [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2 zsojka at seznam dot cz
` (16 preceding siblings ...)
2022-05-10 8:18 ` cvs-commit at gcc dot gnu.org
@ 2022-05-10 10:45 ` jakub at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-10 10:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for 10.4 too.
^ permalink raw reply [flat|nested] 19+ messages in thread