From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 32E363829954; Mon, 3 Jun 2024 13:51:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 32E363829954 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1717422704; bh=MX4ljvN9X5JZ4RWs10Utc/pFKiMwEtxJ/DUz+WkhqL0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=khtq1Ljscb5CNk8Y/kfe93Bvtfhot/magJb4QrDuIFdP4HaezkwldBh5RWL1TOSYy GXPw5tbX8aAaOURR1FySKL+bPU+H3ttr/IF5eDGCGcvYkW56UFHfylj2kL2eJ13wey wOO3UdQuvh7YcoR/ECnFkBF8jxitMRwfpaohcrAc= From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs Date: Mon, 03 Jun 2024 13:51:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 15.0 X-Bugzilla-Keywords: testsuite-fail X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 15.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115304 --- Comment #10 from rguenther at suse dot de --- On Mon, 3 Jun 2024, ams at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115304 >=20 > --- Comment #9 from Andrew Stubbs --- > (In reply to Richard Biener from comment #6) > > The best strathegy for GCN would be to gather V4QImode aka SImode into = the > > V64QImode (or V16SImode) vector. For pix2 we have a gap of 28 elements, > > doing consecutive loads isn't a good strategy here. >=20 > I don't fully understand what you're trying to say here, so apologies if = you > knew all this already and I missed the point..... >=20 > In general, on GCN V4QImode is not in any way equivalent to SImode (when = the > values are in registers). The vector registers are not one single string = of > re-interpretable bits. >=20 > For the same reason, you can't load a value as V64QImode and then try to > interpret it as V16SImode. GCN vector registers just don't work like > SSE/Neon/etc. >=20 > When you load a V64QImode vector, each lane is extended to 32 bits, so wh= at you > actually get in hardware is a V64SImode vector. >=20 > Likewise, when you load a V4QImode vector the hardware representation is > actually V4SImode (which in itself is just V64SImode with undefined value= s in > the unused lanes). I see. I wonder if there's not one or two latent wrong-code because of this and the vectorizers assumptions ;) I suppose modes_tieable_p will tell us whether a VIEW_CONVERT_EXPR will do the right thing? Is GET_MODE_SIZE (V64QImode) =3D=3D GET_MODE_SIZE (V64SImode) btw? And V64QImode really V64PSImode? Still for a V64QImode load on { c[0], c[1], c[2], c[3], c[32], c[33],=20 c[34], c[35], ... } it's probably best to use a single V64QImode gather=20 with GCN then rather than four "consecutive" V64QImode loads and then element swizzling.=