public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu
@ 2021-11-21 19:44 zsojka at seznam dot cz
2021-11-21 22:22 ` [Bug rtl-optimization/103350] " pinskia at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: zsojka at seznam dot cz @ 2021-11-21 19:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
Bug ID: 103350
Summary: wrong code with -Os -fno-tree-ter on
aarch64-unknown-linux-gnu
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: zsojka at seznam dot cz
Target Milestone: ---
Host: x86_64-pc-linux-gnu
Target: aarch64-unknown-linux-gnu
Created attachment 51844
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51844&action=edit
reduced testcase
Output:
$ aarch64-unknown-linux-gnu-gcc -Os -fno-tree-ter testcase.c -static
$ qemu-aarch64 -- ./a.out
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted
$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r12-5436-20211121114008-gdc915b361bb-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/12.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl
--with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu
--with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld
--with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r12-5436-20211121114008-gdc915b361bb-checking-yes-rtl-df-extra-aarch64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.0 20211121 (experimental) (GCC)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/103350] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
@ 2021-11-21 22:22 ` pinskia at gcc dot gnu.org
2021-11-22 9:23 ` marxin at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-11-21 22:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2021-11-21
Ever confirmed|0 |1
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/103350] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
2021-11-21 22:22 ` [Bug rtl-optimization/103350] " pinskia at gcc dot gnu.org
@ 2021-11-22 9:23 ` marxin at gcc dot gnu.org
2021-11-22 10:42 ` [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 marxin at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-11-22 9:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
--- Comment #2 from Martin Liška <marxin at gcc dot gnu.org> ---
I'm going to bisect that..
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
2021-11-21 22:22 ` [Bug rtl-optimization/103350] " pinskia at gcc dot gnu.org
2021-11-22 9:23 ` marxin at gcc dot gnu.org
@ 2021-11-22 10:42 ` marxin at gcc dot gnu.org
2021-11-22 10:43 ` marxin at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-11-22 10:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |12.0
Summary|wrong code with -Os |[12 Regression] wrong code
|-fno-tree-ter on |with -Os -fno-tree-ter on
|aarch64-unknown-linux-gnu |aarch64-unknown-linux-gnu
| |since
| |r12-2288-g8695bf78dad1a4263
| |6775843ca832a2f4dba4da3
Priority|P3 |P1
--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
Started with r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
` (2 preceding siblings ...)
2021-11-22 10:42 ` [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 marxin at gcc dot gnu.org
@ 2021-11-22 10:43 ` marxin at gcc dot gnu.org
2021-12-13 8:36 ` tnfchris at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-11-22 10:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-bisection |
--- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> ---
I emailed the commit author about this. Apparently, he doesn't have a bugzilla
account.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
` (3 preceding siblings ...)
2021-11-22 10:43 ` marxin at gcc dot gnu.org
@ 2021-12-13 8:36 ` tnfchris at gcc dot gnu.org
2021-12-13 9:02 ` tnfchris at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-12-13 8:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
*** Bug 103632 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
` (4 preceding siblings ...)
2021-12-13 8:36 ` tnfchris at gcc dot gnu.org
@ 2021-12-13 9:02 ` tnfchris at gcc dot gnu.org
2021-12-15 10:26 ` cvs-commit at gcc dot gnu.org
2021-12-15 10:29 ` tnfchris at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-12-13 9:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tnfchris at gcc dot gnu.org
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org
--- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
This and the report in PR103632 are caused by a bug in REE where it generates
incorrect code.
It's trying to eliminate the following zero extension
(insn 54 90 102 2 (set (reg:V4SI 33 v1 [orig:94 _5 ] [94])
(zero_extend:V4SI (reg/v:V4HI 40 v8 [orig:112 v64u16_0D.3917 ] [112])))
"jon-inc.c":21:30 4106 {zero_extendv4hiv4si2}
(nil))
by folding it in the definition of `v8`:
(insn 2 5 104 2 (set (reg/v:V4HI 40 v8 [orig:112 v64u16_0D.3917 ] [112])
(reg:V4HI 32 v0 [156])) "jon-inc.c":15:1 1160 {*aarch64_simd_movv4hi}
(nil))
which is fine, except that `v8` is also used by the extracts, e.g.:
(insn 11 10 12 2 (set (reg:SI 1 x1 [orig:103 _17 ] [103])
(zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8 [orig:112
v64u16_0D.3917 ] [112])
(parallel [
(const_int 3 [0x3])
])))) 2480 {*aarch64_get_lane_zero_extendsiv4hi}
(nil))
REE replaces insn 2 by folding insn 54 and placing it at the definition site of
insn 2, so before insn 11.
Trying to eliminate extension:
(insn 54 90 102 2 (set (reg:V4SI 33 v1 [orig:94 _5 ] [94])
(zero_extend:V4SI (reg/v:V4HI 40 v8 [orig:112 v64u16_0D.3917 ] [112])))
"jon-inc.c":21:30 4106 {zero_extendv4hiv4si2}
(nil))
Tentatively merged extension with definition (copy needed):
(insn 2 5 104 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg:V4HI 32 v0 [156]))) "jon-inc.c":15:1 -1
(nil))
to produce
(insn 2 5 110 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg:V4HI 32 v0 [156]))) "jon-inc.c":15:1 4106
{zero_extendv4hiv4si2}
(nil))
(insn 110 2 104 2 (set (reg:V4SI 40 v8)
(reg:V4SI 33 v1)) "jon-inc.c":15:1 -1
(nil))
The new insn 2 using v0 directly is correct, but the insn 110 it creates is
wrong, `v8` should still be V4HI.
or it also needs to eliminate the zero extension from the extracts, so instead
of
(insn 11 10 12 2 (set (reg:SI 1 x1 [orig:103 _17 ] [103])
(zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8 [orig:112
v64u16_0D.3917 ] [112])
(parallel [
(const_int 3 [0x3])
])))) 2480 {*aarch64_get_lane_zero_extendsiv4hi}
(nil))
it should be
(insn 11 10 12 2 (set (reg:SI 1 x1 [orig:103 _17 ] [103])
(vec_select:SI (reg/v:V4SI 40 v8 [orig:112 v64u16_0D.3917 ] [112])
(parallel [
(const_int 3 [0x3])
]))) 2480 {*aarch64_get_lane_zero_extendsiv4hi}
(nil))
without doing so the indices have been remapped in the extension and so we
extract the wrong elements
At any other optimization level but -Os ree seems to abort so this doesn't
trigger:
Trying to eliminate extension:
(insn 54 90 101 2 (set (reg:V4SI 32 v0 [orig:94 _5 ] [94])
(zero_extend:V4SI (reg/v:V4HI 40 v8 [orig:112 v64u16_0D.3917 ] [112])))
"jon-inc.c":21:30 4106 {zero_extendv4hiv4si2}
(nil))
Elimination opportunities = 2 realized = 0
purely due to the ordering of instructions. REE doesn't check uses of `v8`
because it assumes that with a zero extended value, you still have access to
the lower bits by using the the bottom part of the register.
This is true for scalar but not for vector. This would have been fine as well
if REE had eliminated the zero_extend on insn 11 and the rest but it doesn't do
so since REE can only handle cases where the SRC value are REG_P.
It does try to do this in add_removable_extension:
1160 /* For vector mode extensions, ensure that all uses of the
1161 XEXP (src, 0) register are in insn or debug insns, as unlike
1162 integral extensions lowpart subreg of the sign/zero extended
1163 register are not equal to the original register, so we have
1164 to change all uses or none and the current code isn't able
1165 to change them all at once in one transaction. */
However this code doesn't trigger for the example because REE doesn't check the
uses if the defining instruction doesn't feed into another extension.. Which is
bogus. For vectors it should always check usages.
r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 simply exposed this as it
now lowers VEC_SELECT 0 into the RTL canonical form subreg 0 which causes REE
to run more often.
Mine.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
` (5 preceding siblings ...)
2021-12-13 9:02 ` tnfchris at gcc dot gnu.org
@ 2021-12-15 10:26 ` cvs-commit at gcc dot gnu.org
2021-12-15 10:29 ` tnfchris at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-12-15 10:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:
https://gcc.gnu.org/g:d5c965374cd688b0a8ad0334c85c971c1e9c3f44
commit r12-5996-gd5c965374cd688b0a8ad0334c85c971c1e9c3f44
Author: Tamar Christina <tamar.christina@arm.com>
Date: Wed Dec 15 10:26:10 2021 +0000
middle-end: REE should always check all vector usages, even if it finds a
defining def. [PR103350]
This and the report in PR103632 are caused by a bug in REE where it
generates
incorrect code.
It's trying to eliminate the following zero extension
(insn 54 90 102 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg/v:V4HI 40 v8)))
(nil))
by folding it in the definition of `v8`:
(insn 2 5 104 2 (set (reg/v:V4HI 40 v8)
(reg:V4HI 32 v0 [156]))
(nil))
which is fine, except that `v8` is also used by the extracts, e.g.:
(insn 11 10 12 2 (set (reg:SI 1 x1)
(zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
(parallel [
(const_int 3)
]))))
(nil))
REE replaces insn 2 by folding insn 54 and placing it at the definition
site of
insn 2, so before insn 11.
Trying to eliminate extension:
(insn 54 90 102 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg/v:V4HI 40 v8)))
(nil))
Tentatively merged extension with definition (copy needed):
(insn 2 5 104 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg:V4HI 32 v0)))
(nil))
to produce
(insn 2 5 110 2 (set (reg:V4SI 33 v1)
(zero_extend:V4SI (reg:V4HI 32 v0)))
(nil))
(insn 110 2 104 2 (set (reg:V4SI 40 v8)
(reg:V4SI 33 v1))
(nil))
The new insn 2 using v0 directly is correct, but the insn 110 it creates is
wrong, `v8` should still be V4HI.
or it also needs to eliminate the zero extension from the extracts, so
instead
of
(insn 11 10 12 2 (set (reg:SI 1 x1)
(zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
(parallel [
(const_int 3)
]))))
(nil))
it should be
(insn 11 10 12 2 (set (reg:SI 1 x1)
(vec_select:SI (reg/v:V4SI 40 v8)
(parallel [
(const_int 3)
])))
(nil))
without doing so the indices have been remapped in the extension and so we
extract the wrong elements
At any other optimization level but -Os ree seems to abort so this doesn't
trigger:
Trying to eliminate extension:
(insn 54 90 101 2 (set (reg:V4SI 32 v0)
(zero_extend:V4SI (reg/v:V4HI 40 v8)))
(nil))
Elimination opportunities = 2 realized = 0
purely due to the ordering of instructions. REE doesn't check uses of `v8`
because it assumes that with a zero extended value, you still have access
to the
lower bits by using the the bottom part of the register.
This is true for scalar but not for vector. This would have been fine as
well
if REE had eliminated the zero_extend on insn 11 and the rest but it
doesn't do
so since REE can only handle cases where the SRC value are REG_P.
It does try to do this in add_removable_extension:
1160 /* For vector mode extensions, ensure that all uses of the
1161 XEXP (src, 0) register are in insn or debug insns, as unlike
1162 integral extensions lowpart subreg of the sign/zero extended
1163 register are not equal to the original register, so we have
1164 to change all uses or none and the current code isn't able
1165 to change them all at once in one transaction. */
However this code doesn't trigger for the example because REE doesn't check
the
uses if the defining instruction doesn't feed into another extension..
Which is bogus. For vectors it should always check all usages.
r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 simply exposed this as
it now
lowers VEC_SELECT 0 into the RTL canonical form subreg 0 which causes REE
to run
more often.
gcc/ChangeLog:
PR rtl-optimization/103350
* ree.c (add_removable_extension): Don't stop at first definition
but
inspect all.
gcc/testsuite/ChangeLog:
PR rtl-optimization/103350
* gcc.target/aarch64/pr103350-1.c: New test.
* gcc.target/aarch64/pr103350-2.c: New test.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
` (6 preceding siblings ...)
2021-12-15 10:26 ` cvs-commit at gcc dot gnu.org
@ 2021-12-15 10:29 ` tnfchris at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-12-15 10:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #8 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Fixed on master
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-12-15 10:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-21 19:44 [Bug rtl-optimization/103350] New: wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu zsojka at seznam dot cz
2021-11-21 22:22 ` [Bug rtl-optimization/103350] " pinskia at gcc dot gnu.org
2021-11-22 9:23 ` marxin at gcc dot gnu.org
2021-11-22 10:42 ` [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 marxin at gcc dot gnu.org
2021-11-22 10:43 ` marxin at gcc dot gnu.org
2021-12-13 8:36 ` tnfchris at gcc dot gnu.org
2021-12-13 9:02 ` tnfchris at gcc dot gnu.org
2021-12-15 10:26 ` cvs-commit at gcc dot gnu.org
2021-12-15 10:29 ` tnfchris at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).