From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id A0FE13838023; Tue, 17 Aug 2021 07:17:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A0FE13838023
From: "crazylht at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/97147] GCC uses vhaddpd which is bad for latency
Date: Tue, 17 Aug 2021 07:17:38 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: crazylht at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-97147-4-MVlwBCgoOU@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-97147-4@http.gcc.gnu.org/bugzilla/>
References: <bug-97147-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Aug 2021 07:17:38 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97147
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> (In reply to Hongtao.liu from comment #2)
> > Disable (define_insn "*sse3_haddv2df3_low" and (define_insn
> > "*sse3_hsubv2df3_low" seems to be ok.
> > But for foo1.
> >=20
> > v2df foo1 (v2df x, v2df y)
> > {
> >   v2df a;
> >   a[0] =3D x[0] + x[1];
> >   a[1] =3D y[0] + y[1];
> >   return a;
> > }
> >=20
> > it's=20
> >=20
> >   vhaddpd %xmm1, %xmm0, %xmm0
> >   ret
> >=20
> > vs=20
> >=20
> >         movapd  xmm2, xmm0
> >         unpckhpd        xmm2, xmm2
> >         addsd   xmm0, xmm2
> >         movapd  xmm2, xmm1
> >         unpckhpd        xmm1, xmm1
> >         addsd   xmm1, xmm2
> >         unpcklpd        xmm0, xmm1
> >         ret
> >=20
> > and note w/o vhaddpd, codegen can be optimized to=20
> >=20
> >         movapd  xmm2, xmm0
> >         unpcklpd        xmm2, xmm1
> >         unpckhpd        xmm0, xmm1
> >         addpd   xmm0, xmm2
> >         ret
> >=20
> > Guess maybe it's better done in gimple level?
>=20
> On GIMPLE we see the testcase basically unchanged from what the source do=
es:
>=20
>   _1 =3D BIT_FIELD_REF <x_7(D), 64, 0>;
>   _2 =3D BIT_FIELD_REF <x_7(D), 64, 64>;
>   _3 =3D _1 + _2;
>   a_9 =3D BIT_INSERT_EXPR <a_8(D), _3, 0>;
>   _4 =3D BIT_FIELD_REF <y_10(D), 64, 0>;
>   _5 =3D BIT_FIELD_REF <y_10(D), 64, 64>;
>   _6 =3D _4 + _5;
>   a_11 =3D BIT_INSERT_EXPR <a_9, _6, 64>;
>   return a_11;
>=20
> vectorization fails in SLP discovery because we essentially see two lanes
> operating on different vectors and we don't implement a way to shuffle
> them together.
>=20
> I think the full hadd define_insns are OK to keep, they really have speci=
al
> arrangements (esp. the SFmode variants).  But the reductions to scalar
> (*_low) seem unnecessary and penaltizing (maybe we can guard use of those
> with a -mtune-ctl?).
>=20

Yes, i'm add a tune to enabled v2df vector reduction and defaut disabled for
all processors.

> I also see we're missing patterns for h{add,sub}ps (not sure if we can ma=
nage
> to get combine to synthesize it).

You mean (define_insn "sse3_h<insn>v4sf3"?=