From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 962DB3857404; Fri, 25 Mar 2022 14:00:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 962DB3857404 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/104049] [12 Regression] vec_select to subreg lowering causes superfluous moves Date: Fri, 25 Mar 2022 14:00:18 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Mar 2022 14:00:18 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104049 --- Comment #10 from Jakub Jelinek --- Anyway, a difference at *.combine time is: (insn 19 18 22 2 (set (reg:SI 103 [ _35 ]) (vec_select:SI (reg:V4SI 121) (parallel [ (const_int 0 [0]) ]))) 2489 {aarch64_get_lanev4si} (expr_list:REG_DEAD (reg:V4SI 121) (nil))) (note 22 19 23 2 NOTE_INSN_DELETED) (insn 23 22 24 2 (set (reg:SI 125) (lshiftrt:SI (reg:SI 103 [ _35 ]) (const_int 16 [0x10]))) "pr104049.c":16:54 694 {*aarch64_lshr_sisd_or_int_si3} (nil)) (insn 24 23 25 2 (set (reg:SI 126) (plus:SI (zero_extend:SI (subreg:HI (reg:SI 103 [ _35 ]) 0)) (reg:SI 125))) "pr104049.c":16:33 220 {*add_zero_extendhi_si} (expr_list:REG_DEAD (reg:SI 103 [ _35 ]) (expr_list:REG_DEAD (reg:SI 125) (nil)))) vs. (insn 21 20 22 2 (set (reg:SI 120) (and:SI (subreg:SI (reg:V4SI 117) 0) (const_int 65535 [0xffff]))) "pr104049.c":16:33 500 {andsi3} (nil)) (insn 22 21 23 2 (set (reg:SI 121) (plus:SI (lshiftrt:SI (subreg:SI (reg:V4SI 117) 0) (const_int 16 [0x10])) (reg:SI 120))) "pr104049.c":16:33 211 {*add_lsr_si} (expr_list:REG_DEAD (reg:V4SI 117) (expr_list:REG_DEAD (reg:SI 120) (nil)))) Appart from the (reg:SI 103 [ _35 ]) vs. (subreg:SI (reg:V4SI 117) 0) difference from RA POV I don't see much a difference between the two, both can be register allocated in a way that pseudo 103 or (subreg:SI (reg:= V4SI 117) 0) is used just once. In the first sequence by choosing e.g. x0 for 1= 03 and 126, x1 for 125, in the second by using x1 for 120 and x0 for 121 and f= or reloading of the (subreg:SI (reg:V4SI 117) 0) in all the spots, so trying to prefer one or another at combine time doesn't look useful (the reason why t= he first one isn't used is likely because (subreg:HI (reg:V4SI 117) 0) is rejected). But IRA instead decides to put both 120 and 121 into x0 and LRA chooses x0 for reloading of the subreg in the first insn and x1 for the sec= ond insn. Could we fix this up in postreload or later? (insn 35 18 21 2 (set (reg:SI 0 x0 [125]) (reg:SI 32 v0 [117])) "pr104049.c":16:33 52 {*movsi_aarch64} (nil)) (insn 21 35 36 2 (set (reg:SI 0 x0 [120]) (and:SI (reg:SI 0 x0 [125]) (const_int 65535 [0xffff]))) "pr104049.c":16:33 500 {andsi3} (nil)) (insn 36 21 22 2 (set (reg:SI 1 x1 [126]) (reg:SI 32 v0 [117])) "pr104049.c":16:33 52 {*movsi_aarch64} (nil)) (insn 22 36 28 2 (set (reg:SI 0 x0 [121]) (plus:SI (lshiftrt:SI (reg:SI 1 x1 [126]) (const_int 16 [0x10])) (reg:SI 0 x0 [120]))) "pr104049.c":16:33 211 {*add_lsr_si} (nil)) transformation into: (insn 35 18 21 2 (set (reg:SI 0 x0 [125]) (reg:SI 32 v0 [117])) "pr104049.c":16:33 52 {*movsi_aarch64} (nil)) (insn 21 35 22 2 (set (reg:SI 0 x1 [120]) (and:SI (reg:SI 0 x0 [125]) (const_int 65535 [0xffff]))) "pr104049.c":16:33 500 {andsi3} (nil)) (insn 22 21 28 2 (set (reg:SI 0 x0 [121]) (plus:SI (lshiftrt:SI (reg:SI 1 x0 [126]) (const_int 16 [0x10])) (reg:SI 0 x1 [120]))) "pr104049.c":16:33 211 {*add_lsr_si} (nil)) ?=