From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9BC223858416; Wed, 15 Dec 2021 10:26:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9BC223858416 From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/103350] [12 Regression] wrong code with -Os -fno-tree-ter on aarch64-unknown-linux-gnu since r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 Date: Wed, 15 Dec 2021 10:26:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2021 10:26:40 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103350 --- Comment #7 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:d5c965374cd688b0a8ad0334c85c971c1e9c3f44 commit r12-5996-gd5c965374cd688b0a8ad0334c85c971c1e9c3f44 Author: Tamar Christina Date: Wed Dec 15 10:26:10 2021 +0000 middle-end: REE should always check all vector usages, even if it finds= a defining def. [PR103350] This and the report in PR103632 are caused by a bug in REE where it generates incorrect code. It's trying to eliminate the following zero extension (insn 54 90 102 2 (set (reg:V4SI 33 v1) (zero_extend:V4SI (reg/v:V4HI 40 v8))) (nil)) by folding it in the definition of `v8`: (insn 2 5 104 2 (set (reg/v:V4HI 40 v8) (reg:V4HI 32 v0 [156])) (nil)) which is fine, except that `v8` is also used by the extracts, e.g.: (insn 11 10 12 2 (set (reg:SI 1 x1) (zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8) (parallel [ (const_int 3) ])))) (nil)) REE replaces insn 2 by folding insn 54 and placing it at the definition site of insn 2, so before insn 11. Trying to eliminate extension: (insn 54 90 102 2 (set (reg:V4SI 33 v1) (zero_extend:V4SI (reg/v:V4HI 40 v8))) (nil)) Tentatively merged extension with definition (copy needed): (insn 2 5 104 2 (set (reg:V4SI 33 v1) (zero_extend:V4SI (reg:V4HI 32 v0))) (nil)) to produce (insn 2 5 110 2 (set (reg:V4SI 33 v1) (zero_extend:V4SI (reg:V4HI 32 v0))) (nil)) (insn 110 2 104 2 (set (reg:V4SI 40 v8) (reg:V4SI 33 v1)) (nil)) The new insn 2 using v0 directly is correct, but the insn 110 it create= s is wrong, `v8` should still be V4HI. or it also needs to eliminate the zero extension from the extracts, so instead of (insn 11 10 12 2 (set (reg:SI 1 x1) (zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8) (parallel [ (const_int 3) ])))) (nil)) it should be (insn 11 10 12 2 (set (reg:SI 1 x1) (vec_select:SI (reg/v:V4SI 40 v8) (parallel [ (const_int 3) ]))) (nil)) without doing so the indices have been remapped in the extension and so= we extract the wrong elements At any other optimization level but -Os ree seems to abort so this does= n't trigger: Trying to eliminate extension: (insn 54 90 101 2 (set (reg:V4SI 32 v0) (zero_extend:V4SI (reg/v:V4HI 40 v8))) (nil)) Elimination opportunities =3D 2 realized =3D 0 purely due to the ordering of instructions. REE doesn't check uses of `= v8` because it assumes that with a zero extended value, you still have acce= ss to the lower bits by using the the bottom part of the register. This is true for scalar but not for vector. This would have been fine = as well if REE had eliminated the zero_extend on insn 11 and the rest but it doesn't do so since REE can only handle cases where the SRC value are REG_P. It does try to do this in add_removable_extension: 1160 /* For vector mode extensions, ensure that all uses of the 1161 XEXP (src, 0) register are in insn or debug insns, as unl= ike 1162 integral extensions lowpart subreg of the sign/zero exten= ded 1163 register are not equal to the original register, so we ha= ve 1164 to change all uses or none and the current code isn't able 1165 to change them all at once in one transaction. */ However this code doesn't trigger for the example because REE doesn't c= heck the uses if the defining instruction doesn't feed into another extension.. Which is bogus. For vectors it should always check all usages. r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 simply exposed this = as it now lowers VEC_SELECT 0 into the RTL canonical form subreg 0 which causes R= EE to run more often. gcc/ChangeLog: PR rtl-optimization/103350 * ree.c (add_removable_extension): Don't stop at first definiti= on but inspect all. gcc/testsuite/ChangeLog: PR rtl-optimization/103350 * gcc.target/aarch64/pr103350-1.c: New test. * gcc.target/aarch64/pr103350-2.c: New test.=