From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 205233858D35; Fri, 31 Jul 2020 13:49:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 205233858D35 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1596203374; bh=eNarUa9ymqib5RJUjYvpi5Tmkb4LWv0/iH+iMCHC3f0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=MOEdaXK7cyf5A2Gnm9H82ocVlHi8pQJRFLpLan6RxFLW5rT38sz7orXR5GciRwl3Y /cYiEtIh9BmumxgrTQN8MVDkgvIfyC8FrgqPdA6VuMxttD2f+BfUq73HK5ruHyS5U+ QerCeC6rlPETOop+nwgyST7NVNSxTkLw6NDV0PQk= From: "vries at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/96401] [nvptx] Take advantage of subword ld/st/cvt Date: Fri, 31 Jul 2020 13:49:34 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: vries at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Jul 2020 13:49:34 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96401 --- Comment #1 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > In other words, we may emit instead: > ... > .reg.u32 %r22; > ld.u32 %r22, [%frame]; > st.u16 [%frame+4], %r22; > ... So, why don't we? Using -dP we see the respective insns: ... //(insn 5 2 6 2 // (set (reg:SI 22 [ v$0_1 ]) // (mem/v/c:SI (reg/f:DI 2 %frame) [1 v+0 S4 A128])) // "test.c":7:6 6 {*movsi_insn} // (nil)) ld.u32 %r22, [%frame]; // 5 [c=3D4] *movsi_insn/1 //(insn 6 5 9 2 // (set (reg:HI 24 [ v$0_1 ]) // (subreg:HI (reg:SI 22 [ v$0_1 ]) 0)) // "test.c":7:6 5 {*movhi_insn} // (expr_list:REG_DEAD (reg:SI 22 [ v$0_1 ]) // (nil))) cvt.u16.u32 %r24, %r22; // 6 [c=3D12] *movhi_in= sn/0 //(insn 9 6 12 2 // (set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame) // (const_int 4 [0x4])) [2 v2+0 S2 A32]) // (reg:HI 24 [ v$0_1 ])) // "test.c":7:6 5 {*movhi_insn} // (expr_list:REG_DEAD (reg:HI 24 [ v$0_1 ]) // (nil))) st.u16 [%frame+4], %r24; // 9 [c=3D4] *movhi_ins= n/2 ... I went to investigate why combine doesn't combine insns 6 and 9, that is, w= hy doesn't it generate: ... //(insn 9 6 12 2 // (set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame) // (const_int 4 [0x4])) [2 v2+0 S2 A32]) // (subreg:HI (reg:SI 22 [ v$0_1 ]) 0)) // "test.c":7:6 5 {*movhi_insn} // (expr_list:REG_DEAD (reg:HI 22 [ v$0_1 ]) // (nil))) ... Part of the required changes is to make the movhi_insn store alternative wo= rk for subreg source operand: ... @@ -229,8 +234,8 @@ (define_insn "*mov_insn" [(set (match_operand:QHSDIM 0 "nonimmediate_operand" "=3DR,R,m") - (match_operand:QHSDIM 1 "general_operand" "Ri,m,R"))] - "!MEM_P (operands[0]) || REG_P (operands[1])" + (match_operand:QHSDIM 1 "general_operand" "Ri,m,Q"))] + "!MEM_P (operands[0]) || REG_P (operands[1]) || SUBREG_P (operands[1])" { if (which_alternative =3D=3D 1) return "%.\\tld%A1%u1\\t%0, %1;"; ... which required me to define: ... +(define_constraint "Q" + "A pseudo register or subreg." + (ior (match_code "reg") + (match_code "subreg"))) + ... [ Note that this constraint is an oddity, like the R constraint: it's not a register constraint. ] After debugging I found that I needed this as well: ... diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index d2f321fcbcc..2234edad53b 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -6444,7 +6444,7 @@ nvptx_data_alignment (const_tree type, unsigned int basic_align) static bool nvptx_modes_tieable_p (machine_mode, machine_mode) { - return false; + return true; } /* Implement TARGET_HARD_REGNO_NREGS. */ ... due to this bit in combine.c:subst(): ... /* In general, don't install a subreg involving two=20=20= =20=20=20=20=20=20=20=20=20 modes not tieable. It can worsen register=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 allocation, and can even make invalid reload=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 insns, since the reg inside may need to be copied=20= =20=20=20=20=20=20=20=20=20 from in the outside mode, and that may be invalid=20= =20=20=20=20=20=20=20=20=20 if it is an fp reg copied in integer mode.=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 ... Using these changes, I get the desired: ... .reg.u32 %r22; ld.u32 %r22, [%frame]; st.u16 [%frame+4], %r22; ...=