From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 5EC933858409; Mon, 28 Nov 2022 07:21:49 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5EC933858409
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1669620109;
	bh=q5wl0AqzW1ZVTBsMzROtR5OJmjLRsYkGUw48Y7dbP04=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=eW7zaKroZhXK8Db6GnQtrf+T0AwqXU48wB+G2wzy+yWSU84TTiHt4xSrtd+penN8B
	 r6sAwYEuIw7xKt/VWgj9+EchZmPSk4TWGjKVJ0aVpTo34s0flTFV1a6uERM/R1wP01
	 PhBDvKT4VuHwnq5z/T/YWAMO0Sh/OsuwDtjggsgY=
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/97832] AoSoA complex caxpy-like loops:
 AVX2+FMA -Ofast 7 times slower than -O3
Date: Mon, 28 Nov 2022 07:21:48 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 10.2.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: FIXED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-97832-4-j1JESHkRZU@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-97832-4@http.gcc.gnu.org/bugzilla/>
References: <bug-97832-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97832
--- Comment #25 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 28 Nov 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97832
>=20
> --- Comment #24 from Hongtao.liu <crazylht at gmail dot com> ---
>   _233 =3D {f_im_36, f_re_35, f_re_35, f_re_35};
>   _217 =3D {f_re_35, f_im_36, f_im_36, f_im_36};
> ...
> vect_x_re_55.15_227 =3D VEC_PERM_EXPR <vect_x_im_61.14_228, vect_x_im_61.=
13_230,
> { 0, 5, 6, 7 }>;
>   vect_x_re_55.23_211 =3D VEC_PERM_EXPR <vect_x_im_61.13_230,
> vect_x_im_61.14_228, { 0, 5, 6, 7 }>;
> ...
>   vect_y_re_69.17_224 =3D .FNMA (vect_x_re_55.15_227, _233, vect_y_re_63.=
9_237);
>   vect_y_re_69.25_208 =3D .FNMA (vect_x_re_55.23_211, _217, vect_y_re_69.=
17_224);
>=20
> is equal to
>=20
>   _233 =3D {f_im_36,f_im_36, f_im_36, f_im_36}
>   _217 =3D {f_re_35, f_re_35, f_re_35, f_re_35};
> ...
>   vect_y_re_69.17_224 =3D .FNMA (vect_x_im_61.14_228, _233, vect_y_re_63.=
9_237)
>   vect_y_re_69.25_208 =3D .FNMA (vect_x_im_61.13_230, _217, vect_y_re_69.=
17_224)
>=20
> A simplication in match.pd?

I guess that's possible but the SLP vectorizer has a permute optimization
phase (and SLP discovery itself), it would be nice to see why the former
doesn't elide the permutes here.=