From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 04ED23858C78; Mon, 31 Jul 2023 07:32:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 04ED23858C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690788726; bh=biNf5yIsNKIE9s5W1DmQLJOADZWKyflsQYsweYoVrGg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=MjLpIcyzU/cP72EwuPwoNA4chgm/5ZhMMTkgpZMeqJepSFDcQi341/w5J3NlPYgWN rf786AmpKc04XxHFHf7HOHkYzCsaTp7B6iO3D/h/T74Fx4c8lK2Qq7YZ/z0IYe0qlb zDECMQLBf5ey837DqMShgrAo0446u8qeSCy7siKQ= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/81904] FMA and addsub instructions Date: Mon, 31 Jul 2023 07:32:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 8.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D81904 --- Comment #6 from Richard Biener --- (In reply to Hongtao.liu from comment #5) > (In reply to Richard Biener from comment #1) > > Hmm, I think the issue is we see > >=20 > > f (__m128d x, __m128d y, __m128d z) > > { > > vector(2) double _4; > > vector(2) double _6; > >=20 > > [100.00%]: > > _4 =3D x_2(D) * y_3(D); > > _6 =3D __builtin_ia32_addsubpd (_4, z_5(D)); [tail call] > We can fold the builtin into .VEC_ADDSUB, and optimize MUL + VEC_ADDSUB -> > VEC_FMADDSUB in match.pd? I think MUL + .VEC_ADDSUB can be handled in the FMA pass. For my example above we early (before FMA recog) get _4 =3D x_2(D) * y_3(D); tem2_7 =3D _4 + z_6(D); tem3_8 =3D _4 - z_6(D); _9 =3D VEC_PERM_EXPR ; we could recognize that as .VEC_ADDSUB. I think we want to avoid doing this too early, not sure if doing this within the FMA pass itself will work since we key FMAs on the mult but would need to key the addsub on the VEC_PERM (we are walking stmts from BB start to end). Looking at the code it seems changing the walking order should work. Note matching tem2_7 =3D _4 + z_6(D); tem3_8 =3D _4 - z_6(D); _9 =3D VEC_PERM_EXPR ; to .VEC_ADDSUB possibly loses exceptions (the vectorizer now directly creates .VEC_ADDSUB when possible).=