From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0405138708FD; Wed, 26 Jun 2024 12:26:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0405138708FD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1719404793; bh=NyGlGcAmHBRCnRtEUQ4XPOGg8PX2RRTrVYMganwkgAA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=d8hs28hDrs0r257IgrBcphBuYyNgNGhkTe4mmlPoIQgAIboYwyZrNVR9cTdh/tCvk pK5b8Evonv6o4+jHhqn15aXEnKU14wqu20AfSixAcm+BFrFRKn9l3tXc16ZrIatnTB EnvPfBq2cRQnGnDQ0vmCO+cGwS+lksR0qACMKM4w= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test Date: Wed, 26 Jun 2024 12:26:32 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 15.0 X-Bugzilla-Keywords: testsuite-fail X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 15.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115640 --- Comment #12 from Richard Biener --- (In reply to Andrew Stubbs from comment #10) > On 26/06/2024 12:05, rguenth at gcc dot gnu.org wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115640 > >=20 > > --- Comment #8 from Richard Biener --- > > (In reply to Richard Biener from comment #7) > >> I will have a look (and for run validation try to reproduce with gfx10= 36). > >=20 > > OK, so with gfx1036 we end up using 16 byte vectors and the testcase > > passes. The difference with gfx908 is > >=20 > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > note: =3D=3D> examining statement: _14 =3D aa[_13]; > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > note: vect_model_load_cost: aligned. > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > note: vect_model_load_cost: inside_cost =3D 2, prologue_cost =3D 0 . > >=20 > > vs. > >=20 > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > note: =3D=3D> examining statement: _14 =3D aa[_13]; > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > missed: unsupported vect permute { 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 = 8 9 9 10 > > 10 11 11 12 12 13 13 14 14 15 15 } > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > missed: unsupported load permutation > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:19:72: > > missed: not vectorized: relevant stmt not supported: _14 =3D aa[_13]; > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > note: removing SLP instance operations starting from: REALPART_EXPR > > <(*hadcur_24(D))[_2]> =3D _86; > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > missed: unsupported SLP instances > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/p= r115528.f:16:12: > > note: re-trying with SLP disabled > >=20 > > so gfx1036 cannot do such permutes but gfx908 can? >=20 > GFX10 has more limited permutation capabilities than GFX9 because it=20 > only has 32-lane vectors natively, even though we're using the 64-lane=20 > "compatibility" mode. >=20 > However, in theory, the permutation capabilities on V32 and below should= =20 > be the same, and some permutations on V64 are allowed, so I don't know=20 > why it doesn't use it. It's possible I broke the logic in=20 > gcn_vectorize_vec_perm_const: >=20 > /* RDNA devices can only do permutations within each group of 32-lanes. > Reject permutations that cross the boundary. */ > if (TARGET_RDNA2_PLUS) > for (unsigned int i =3D 0; i < nelt; i++) > if (i < 31 ? perm[i] > 31 : perm[i] < 32) > return false; >=20 > It looks right to me though? nelt =3D=3D 32 so I think the last element has the wrong check applied? It should be > if (i < 32 ? perm[i] > 31 : perm[i] < 32) I think. With that the vectorization happens in a similar way but the failure still doesn't reproduce (without the patch, of course). > The vec_extract patterns that also use permutations are likewise=20 > supposedly still enabled for V32 and below. >=20 > Andrew=