From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 747A6385841D; Thu, 16 Feb 2023 02:09:31 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 747A6385841D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1676513371;
	bh=VJuOKPQs9qTrohkouOV4Y6Z8rreFKk1746ec5xlRZZA=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=xbTfGqm36f3FEAqGQYu0y1HX7m1t+POLHm8wBxyZnZVQzcayAlfGn/cw1LrEYRQT7
	 7mb2OP+Z3rUa1/4rRBBIgBPjtUVsImyQuW8uam7krB1/5WucCJjBhXEBUT7zE8BGgC
	 ILZuqJGtKA9kyz3hU4sR54SxYfDoqgd3a3oInz88=
From: "crazylht at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107096] Fully masking vectorization with
 AVX512 ICEs gcc.dg/vect/vect-over-widen-*.c
Date: Thu, 16 Feb 2023 02:09:28 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: crazylht at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-107096-4-dIgNMiehDu@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-107096-4@http.gcc.gnu.org/bugzilla/>
References: <bug-107096-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107096
--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to rguenther@suse.de from comment #12)
> On Wed, 15 Feb 2023, crazylht at gmail dot com wrote:
>=20
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107096
> >=20
> > --- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
> >=20
> > >=20
> > > There's no other way to do N bit to two N/2 bit hi/lo (un)packing
> > > (there's a 2x N/2 bit -> N bit operation, for whatever reason).
> > > There's also no way to transform the d rgroup mask into the
> > > f rgroup mask for the first example aka duplicate bits in place,
> > > { b0, b1, b2, ... bN } -> { b0, b0, b1, b1, b2, b2, ... bN, bN },
> > > nor the reverse.
> > >=20
> >=20
> > Can we just do VIEW_CONVERT_EXPR for vectype instead of mask_type.
> > .i.e
> > we can do VCE to tranform V8SI to V16HI, then use mask_load for V16HI w=
ith same
> > mask {b0, b0, b1, b1, b2, b2, .}, then VCE it to back to V8SI, it shoul=
d be ok
> > as long as duplicated bits in place.(or VCE V16HI to V8SI then use mask=
 {b0,
> > b1, b2, ..., bN}, and VCE V8SI back to V16HI after masked load/move).
>=20
> Hmm, yes, if we arrange for the larger mask to be available that would
> work for loads and stores I guess.  It wouldn't work for arithmetic
> cond_* IFNs though.  It's also going to be a bit tricky within the
> masking framework - I'm going to see whether that works though, it might
> be a nice way to avoid an excessive number of masks for integer code
> at least.

There could be some limitation for nV(it should be power of 2 for VCE?)
.i.e.
There's no suitable vectype for VCE of src1 vectype to resure loop mask.
void
foo (int* __restrict dest, int* src1, int* src2)
{
    for (int i =3D 0; i !=3D 10000; i++)
      dest[i] =3D src1[3*i] + src1[3*i + 1] + src1[3*i + 2];
}


Maybe AVX512 could use gather instruction for .MASK_LOAD_LANES to use
LOOP_MASK?=