From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 23F563858414; Fri, 26 Nov 2021 16:00:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 23F563858414 From: "ubizjak at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c Date: Fri, 26 Nov 2021 16:00:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: ubizjak at gmail dot com X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: FIXED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Nov 2021 16:00:38 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102811 --- Comment #15 from Uro=C5=A1 Bizjak --- (In reply to Hongtao.liu from comment #14) > (In reply to Uro=C5=A1 Bizjak from comment #13) > > (In reply to Hongtao.liu from comment #12) > > > >=20 > > > > Just noticed that for some reason two VPXORs are emitted. One shoul= d be > > > > enough for both VPINSRW insns. > > >=20 > > > With new alternative in your attached match(vpblenw one), RA could re= use > > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the= upper > > > bits. > > > vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=3D4 l=3D6] *vec_s= etv8hf_0/8 > >=20 > > True, but I'd expect some post-reload(?) pass to propagate zeros and re= move > > redundant initializations. >=20 > On the other hand, if not use expand_vector_set (which treats zero regist= er > as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a = new > pseudo register as dest. the redudant initialization could be optimized o= ff > by fwprop1. >=20 > pextrw $0, %xmm1, %eax > pextrw $0, %xmm0, %edx > vpxor %xmm1, %xmm1, %xmm1 > vpinsrw $0, %edx, %xmm1, %xmm0 > vpinsrw $0, %eax, %xmm1, %xmm1 > vcvtph2ps %xmm1, %xmm1 > vcvtph2ps %xmm0, %xmm0 > vaddss %xmm1, %xmm0, %xmm0 > vinsertps $0xe, %xmm0, %xmm0, %xmm0 > vcvtps2ph $4, %xmm0, %xmm0 Then we will lose optimization in expand vector set: case E_V8HFmode: if (TARGET_AVX2) { mmode =3D SImode; gen_blendm =3D gen_sse4_1_pblendph; blendm_const =3D true; } else use_vec_merge =3D true; break; Maybe we should simply copy "target" to a new pseudo here: do_vec_merge: tmp =3D gen_rtx_VEC_DUPLICATE (mode, val); tmp =3D gen_rtx_VEC_MERGE (mode, tmp, target, GEN_INT (HOST_WIDE_INT_1U << elt)); emit_insn (gen_rtx_SET (target, tmp)); OTOH, if recycling "target" inhibits FWprop, we should perhaps copy "target= " to a new pseudo at the beginning of the expand_vector_set?=