From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A0FE13838023; Tue, 17 Aug 2021 07:17:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A0FE13838023 From: "crazylht at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/97147] GCC uses vhaddpd which is bad for latency Date: Tue, 17 Aug 2021 07:17:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: crazylht at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Aug 2021 07:17:38 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97147 --- Comment #4 from Hongtao.liu --- (In reply to Richard Biener from comment #3) > (In reply to Hongtao.liu from comment #2) > > Disable (define_insn "*sse3_haddv2df3_low" and (define_insn > > "*sse3_hsubv2df3_low" seems to be ok. > > But for foo1. > >=20 > > v2df foo1 (v2df x, v2df y) > > { > > v2df a; > > a[0] =3D x[0] + x[1]; > > a[1] =3D y[0] + y[1]; > > return a; > > } > >=20 > > it's=20 > >=20 > > vhaddpd %xmm1, %xmm0, %xmm0 > > ret > >=20 > > vs=20 > >=20 > > movapd xmm2, xmm0 > > unpckhpd xmm2, xmm2 > > addsd xmm0, xmm2 > > movapd xmm2, xmm1 > > unpckhpd xmm1, xmm1 > > addsd xmm1, xmm2 > > unpcklpd xmm0, xmm1 > > ret > >=20 > > and note w/o vhaddpd, codegen can be optimized to=20 > >=20 > > movapd xmm2, xmm0 > > unpcklpd xmm2, xmm1 > > unpckhpd xmm0, xmm1 > > addpd xmm0, xmm2 > > ret > >=20 > > Guess maybe it's better done in gimple level? >=20 > On GIMPLE we see the testcase basically unchanged from what the source do= es: >=20 > _1 =3D BIT_FIELD_REF ; > _2 =3D BIT_FIELD_REF ; > _3 =3D _1 + _2; > a_9 =3D BIT_INSERT_EXPR ; > _4 =3D BIT_FIELD_REF ; > _5 =3D BIT_FIELD_REF ; > _6 =3D _4 + _5; > a_11 =3D BIT_INSERT_EXPR ; > return a_11; >=20 > vectorization fails in SLP discovery because we essentially see two lanes > operating on different vectors and we don't implement a way to shuffle > them together. >=20 > I think the full hadd define_insns are OK to keep, they really have speci= al > arrangements (esp. the SFmode variants). But the reductions to scalar > (*_low) seem unnecessary and penaltizing (maybe we can guard use of those > with a -mtune-ctl?). >=20 Yes, i'm add a tune to enabled v2df vector reduction and defaut disabled for all processors. > I also see we're missing patterns for h{add,sub}ps (not sure if we can ma= nage > to get combine to synthesize it). You mean (define_insn "sse3_hv4sf3"?=