From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 431E43858404; Thu, 13 Oct 2022 11:06:00 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 431E43858404
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1665659160;
	bh=xjhk+oIX1V+IGso+rh2eBpSoAA4tE5A1O+9nisRofLk=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=fzMV4CWqxik0bcD4vYDfv6RiZ9+oWjDgjyGyWSHuCSlEcIZxFpXXFPS8X3ZF3Sb+O
	 YjQbRd0ad0yrUlYize45UtRvA7JpZL6BBP+t4j/n2Ae9V7yiVAzO1IN63D0UAmGRic
	 QQ2buv1RcOxnewwyJGsGNL3yKnCQn1/WfYv4jXZ4=
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d  causes
 verification failure in spec2006
Date: Thu, 13 Oct 2022 11:05:59 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: linkw at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 13.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-107160-4-aUJs3ZFcUX@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-107160-4@http.gcc.gnu.org/bugzilla/>
References: <bug-107160-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107160
--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 13 Oct 2022, linkw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107160
>=20
> --- Comment #9 from Kewen Lin <linkw at gcc dot gnu.org> ---
> >=20
> > The above doesn't look wrong (but may miss the rest of the IL).  On
> > x86_64 this looks like
> >=20
> >   <bb 4> [local count: 105119324]:
> >   # sum0_41 =3D PHI <sum0_28(3)>
> >   # sum1_39 =3D PHI <sum1_29(3)>
> >   # sum2_37 =3D PHI <sum2_30(3)>
> >   # sum3_35 =3D PHI <sum3_31(3)>
> >   # vect_sum3_31.11_59 =3D PHI <vect_sum3_31.11_60(3)>
> >   _58 =3D BIT_FIELD_REF <vect_sum3_31.11_59, 32, 0>;
> >   _57 =3D BIT_FIELD_REF <vect_sum3_31.11_59, 32, 32>;
> >   _56 =3D BIT_FIELD_REF <vect_sum3_31.11_59, 32, 64>;
> >   _55 =3D BIT_FIELD_REF <vect_sum3_31.11_59, 32, 96>;
> >   _74 =3D _58 + _57;
> >   _76 =3D _56 + _74;
> >   _78 =3D _55 + _76;
> >=20
> >   <bb 5> [local count: 118111600]:
> >   # prephitmp_79 =3D PHI <_78(4), 0.0(2)>
> >   return prephitmp_79;
> >=20
>=20
> Yeah, it looks expected without unrolling.
>=20
> > when unrolling is applied, thus with a larger VF, you should ideally
> > see the vectors accumulated.
> >=20
> > Btw, I've fixed a SLP reduction issue two days ago in
> > r13-3226-gee467644c53ee2
> > though that looks unrelated?
>=20
> Thanks for the information, I'll double check it.
>=20
> >=20
> > When I force a larger VF on x86 by adding a int store in the loop I see
> >=20
> >   <bb 11> [local count: 94607391]:
> >   # sum0_48 =3D PHI <sum0_29(3)>
> >   # sum1_36 =3D PHI <sum1_30(3)>
> >   # sum2_35 =3D PHI <sum2_31(3)>
> >   # sum3_24 =3D PHI <sum3_32(3)>
> >   # vect_sum3_32.16_110 =3D PHI <vect_sum3_32.16_106(3)>
> >   # vect_sum3_32.16_111 =3D PHI <vect_sum3_32.16_107(3)>
> >   # vect_sum3_32.16_112 =3D PHI <vect_sum3_32.16_108(3)>
> >   # vect_sum3_32.16_113 =3D PHI <vect_sum3_32.16_109(3)>
> >   _114 =3D BIT_FIELD_REF <vect_sum3_32.16_110, 32, 0>;
> >   _115 =3D BIT_FIELD_REF <vect_sum3_32.16_110, 32, 32>;
> >   _116 =3D BIT_FIELD_REF <vect_sum3_32.16_110, 32, 64>;
> >   _117 =3D BIT_FIELD_REF <vect_sum3_32.16_110, 32, 96>;
> >   _118 =3D BIT_FIELD_REF <vect_sum3_32.16_111, 32, 0>;
> >   _119 =3D BIT_FIELD_REF <vect_sum3_32.16_111, 32, 32>;
> >   _120 =3D BIT_FIELD_REF <vect_sum3_32.16_111, 32, 64>;
> >   _121 =3D BIT_FIELD_REF <vect_sum3_32.16_111, 32, 96>;
> >   _122 =3D BIT_FIELD_REF <vect_sum3_32.16_112, 32, 0>;
> >   _123 =3D BIT_FIELD_REF <vect_sum3_32.16_112, 32, 32>;
> >   _124 =3D BIT_FIELD_REF <vect_sum3_32.16_112, 32, 64>;
> >   _125 =3D BIT_FIELD_REF <vect_sum3_32.16_112, 32, 96>;
> >   _126 =3D BIT_FIELD_REF <vect_sum3_32.16_113, 32, 0>;
> >   _127 =3D BIT_FIELD_REF <vect_sum3_32.16_113, 32, 32>;
> >   _128 =3D BIT_FIELD_REF <vect_sum3_32.16_113, 32, 64>;
> >   _129 =3D BIT_FIELD_REF <vect_sum3_32.16_113, 32, 96>;
> >   _130 =3D _114 + _118;
> >   _131 =3D _115 + _119;
> >   _132 =3D _116 + _120;
> >   _133 =3D _117 + _121;
> >   _134 =3D _130 + _122;
> >   _135 =3D _131 + _123;
> >   _136 =3D _132 + _124;
> >   _137 =3D _133 + _125;
> >   _138 =3D _134 + _126;
> >=20
> > see how the lanes from the different vectors are accumulated?  (yeah,
> > we should simply add the vectors!)
>=20
> Yes, it's the same as what I saw on ppc64le, but the closely following dc=
e6
> removes the three vect_sum3_32 (in your dump, they are
> vect_sum3_32.16_10{7,8,9}) as the subsequent joints don't actually use the
> separated accumulated lane values (_138 -> sum0 ...) but only use
> vect_sum3_32.16_110.

I do - the epilog is even vectorized and it works fine at runtime.=