From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2124F3871036; Wed, 26 Jun 2024 14:05:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2124F3871036 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1719410711; bh=Jokalb0D1DPcN8etHvYsbMN765E8FquvqlXbdGuOJB4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=LJkOtJxr2PqaBWR/+LuuU2vKx5K6ISfPztnY0hZW/9kpp0MD+j1aCPw/Rdtr2VaSu X0WQ2bTEWUvCOEmBY2ivSUzlAGwuhyP65MSlG+2xAIssQZpIUMZ1JHrvegrvbUAqv4 Ku3MEazP2L2Ag4wQ3eWL+i39psZM2WAyyNTsZFmY= From: "ams at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test Date: Wed, 26 Jun 2024 14:05:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 15.0 X-Bugzilla-Keywords: testsuite-fail X-Bugzilla-Severity: normal X-Bugzilla-Who: ams at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 15.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115640 --- Comment #16 from Andrew Stubbs --- On 26/06/2024 14:41, rguenther at suse dot de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115640 >=20 > --- Comment #15 from rguenther at suse dot de = --- >>> Btw, the above looks quite odd for nelt =3D=3D 32 anyway - we are permu= ting >>> two vectors src0 and src1 into one 32 element dst vector (it's no longer >>> required that src0 and src1 line up with the dst vector size btw, they >>> might have different nelt). So the loop would reject interleaving >>> the low parts of two 32 element vectors, a permute that would look like >>> { 0, 32, 1, 33, 2, 34 ... } so does "within each group of 32-lanes" >>> mean you can never mix the two vector inputs? Or does GCN not have >>> a two-to-one vector permute instruction? >> >> GCN does not have two-to-one vector permute in hardware, so we do two >> permutes and a vec_merge to get the same effect. >> >> GFX9 can permute all the elements within a 64 lane vector arbitrarily. >> >> GFX10 and GFX11 can permute the low-32 and high-32 elements freely, but >> no value may cross the boundary. AFAIK there's no way to do that via any >> vector instruction (i.e. without writing to memory, or extracting values >> element-wise). >=20 > I see - so it cannot even swap low-32 and high-32? I'm thinking of > what sub-part of permutes would be possible by extending the two-to-one > vec_merge trick. No(?) The 64-lane compatibility mode works, under the hood, by allocating=20 double the number of 32-lane registers and then executing each=20 instruction twice. Mostly this is invisible, but it gets exposed for=20 permutations and the like. Logically, the microarchitecture could do a=20 vec_merge to DTRT, but I've not found a way to express that. It's possible I missed something when RTFM. > OTOH we restrict GFX10/11 to 32 lane vectors so in practice this > restriction should be fine. Yes, with the "31" fixed it should work. Andrew=