From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 04ED23858C78; Mon, 31 Jul 2023 07:32:05 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 04ED23858C78
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1690788726;
	bh=biNf5yIsNKIE9s5W1DmQLJOADZWKyflsQYsweYoVrGg=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=MjLpIcyzU/cP72EwuPwoNA4chgm/5ZhMMTkgpZMeqJepSFDcQi341/w5J3NlPYgWN
	 rf786AmpKc04XxHFHf7HOHkYzCsaTp7B6iO3D/h/T74Fx4c8lK2Qq7YZ/z0IYe0qlb
	 zDECMQLBf5ey837DqMShgrAo0446u8qeSCy7siKQ=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/81904] FMA and addsub instructions
Date: Mon, 31 Jul 2023 07:32:05 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 8.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-81904-4-iiDhYPI5ha@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-81904-4@http.gcc.gnu.org/bugzilla/>
References: <bug-81904-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D81904
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #5)
> (In reply to Richard Biener from comment #1)
> > Hmm, I think the issue is we see
> >=20
> > f (__m128d x, __m128d y, __m128d z)
> > {
> >   vector(2) double _4;
> >   vector(2) double _6;
> >=20
> >   <bb 2> [100.00%]:
> >   _4 =3D x_2(D) * y_3(D);
> >   _6 =3D __builtin_ia32_addsubpd (_4, z_5(D)); [tail call]
> We can fold the builtin into .VEC_ADDSUB, and optimize MUL + VEC_ADDSUB ->
> VEC_FMADDSUB in match.pd?

I think MUL + .VEC_ADDSUB can be handled in the FMA pass.  For my example
above we early (before FMA recog) get

  _4 =3D x_2(D) * y_3(D);
  tem2_7 =3D _4 + z_6(D);
  tem3_8 =3D _4 - z_6(D);
  _9 =3D VEC_PERM_EXPR <tem2_7, tem3_8, { 0, 3 }>;

we could recognize that as .VEC_ADDSUB.  I think we want to avoid doing
this too early, not sure if doing this within the FMA pass itself will
work since we key FMAs on the mult but would need to key the addsub
on the VEC_PERM (we are walking stmts from BB start to end).  Looking
at the code it seems changing the walking order should work.

Note matching

  tem2_7 =3D _4 + z_6(D);
  tem3_8 =3D _4 - z_6(D);
  _9 =3D VEC_PERM_EXPR <tem2_7, tem3_8, { 0, 3 }>;

to .VEC_ADDSUB possibly loses exceptions (the vectorizer now directly
creates .VEC_ADDSUB when possible).=