From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 2124F3871036; Wed, 26 Jun 2024 14:05:10 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2124F3871036
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1719410711;
	bh=Jokalb0D1DPcN8etHvYsbMN765E8FquvqlXbdGuOJB4=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=LJkOtJxr2PqaBWR/+LuuU2vKx5K6ISfPztnY0hZW/9kpp0MD+j1aCPw/Rdtr2VaSu
	 X0WQ2bTEWUvCOEmBY2ivSUzlAGwuhyP65MSlG+2xAIssQZpIUMZ1JHrvegrvbUAqv4
	 Ku3MEazP2L2Ag4wQ3eWL+i39psZM2WAyyNTsZFmY=
From: "ams at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/115640] [15 Regression] GCN: FAIL:
 gfortran.dg/vect/pr115528.f   -O  execution test
Date: Wed, 26 Jun 2024 14:05:10 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 15.0
X-Bugzilla-Keywords: testsuite-fail
X-Bugzilla-Severity: normal
X-Bugzilla-Who: ams at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 15.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-115640-4-hrq539HLvU@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-115640-4@http.gcc.gnu.org/bugzilla/>
References: <bug-115640-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115640
--- Comment #16 from Andrew Stubbs <ams at gcc dot gnu.org> ---
On 26/06/2024 14:41, rguenther at suse dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115640
>=20
> --- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> =
---
>>> Btw, the above looks quite odd for nelt =3D=3D 32 anyway - we are permu=
ting
>>> two vectors src0 and src1 into one 32 element dst vector (it's no longer
>>> required that src0 and src1 line up with the dst vector size btw, they
>>> might have different nelt).  So the loop would reject interleaving
>>> the low parts of two 32 element vectors, a permute that would look like
>>> { 0, 32, 1, 33, 2, 34 ... } so does "within each group of 32-lanes"
>>> mean you can never mix the two vector inputs?  Or does GCN not have
>>> a two-to-one vector permute instruction?
>>
>> GCN does not have two-to-one vector permute in hardware, so we do two
>> permutes and a vec_merge to get the same effect.
>>
>> GFX9 can permute all the elements within a 64 lane vector arbitrarily.
>>
>> GFX10 and GFX11 can permute the low-32 and high-32 elements freely, but
>> no value may cross the boundary. AFAIK there's no way to do that via any
>> vector instruction (i.e. without writing to memory, or extracting values
>> element-wise).
>=20
> I see - so it cannot even swap low-32 and high-32?  I'm thinking of
> what sub-part of permutes would be possible by extending the two-to-one
> vec_merge trick.

No(?)

The 64-lane compatibility mode works, under the hood, by allocating=20
double the number of 32-lane registers and then executing each=20
instruction twice. Mostly this is invisible, but it gets exposed for=20
permutations and the like. Logically, the microarchitecture could do a=20
vec_merge to DTRT, but I've not found a way to express that.

It's possible I missed something when RTFM.

> OTOH we restrict GFX10/11 to 32 lane vectors so in practice this
> restriction should be fine.

Yes, with the "31" fixed it should work.

Andrew=