From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 32E363829954; Mon,  3 Jun 2024 13:51:44 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 32E363829954
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1717422704;
	bh=MX4ljvN9X5JZ4RWs10Utc/pFKiMwEtxJ/DUz+WkhqL0=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=khtq1Ljscb5CNk8Y/kfe93Bvtfhot/magJb4QrDuIFdP4HaezkwldBh5RWL1TOSYy
	 GXPw5tbX8aAaOURR1FySKL+bPU+H3ttr/IF5eDGCGcvYkW56UFHfylj2kL2eJ13wey
	 wOO3UdQuvh7YcoR/ECnFkBF8jxitMRwfpaohcrAc=
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
Date: Mon, 03 Jun 2024 13:51:43 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 15.0
X-Bugzilla-Keywords: testsuite-fail
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 15.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-115304-4-f1cu8NphFx@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-115304-4@http.gcc.gnu.org/bugzilla/>
References: <bug-115304-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115304
--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 3 Jun 2024, ams at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115304
>=20
> --- Comment #9 from Andrew Stubbs <ams at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #6)
> > The best strathegy for GCN would be to gather V4QImode aka SImode into =
the
> > V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
> > doing consecutive loads isn't a good strategy here.
>=20
> I don't fully understand what you're trying to say here, so apologies if =
you
> knew all this already and I missed the point.....
>=20
> In general, on GCN V4QImode is not in any way equivalent to SImode (when =
the
> values are in registers). The vector registers are not one single string =
of
> re-interpretable bits.
>=20
> For the same reason, you can't load a value as V64QImode and then try to
> interpret it as V16SImode. GCN vector registers just don't work like
> SSE/Neon/etc.
>=20
> When you load a V64QImode vector, each lane is extended to 32 bits, so wh=
at you
> actually get in hardware is a V64SImode vector.
>=20
> Likewise, when you load a V4QImode vector the hardware representation is
> actually V4SImode (which in itself is just V64SImode with undefined value=
s in
> the unused lanes).

I see.  I wonder if there's not one or two latent wrong-code because of
this and the vectorizers assumptions ;)  I suppose modes_tieable_p
will tell us whether a VIEW_CONVERT_EXPR will do the right thing?
Is GET_MODE_SIZE (V64QImode) =3D=3D GET_MODE_SIZE (V64SImode) btw?
And V64QImode really V64PSImode?

Still for a V64QImode load on { c[0], c[1], c[2], c[3], c[32], c[33],=20
c[34], c[35], ... } it's probably best to use a single V64QImode gather=20
with GCN then rather than four "consecutive" V64QImode loads and then
element swizzling.=