From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=WGoq=6J=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by sourceware.org (Postfix) with ESMTPS id E795E3858C54
	for <gcc-patches@gcc.gnu.org>; Mon, 13 Feb 2023 09:48:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E795E3858C54
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id D9C58219BA;
	Mon, 13 Feb 2023 09:48:09 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1676281689; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Wv5gXaWkpQtNkcNN+0Tt5Mfjd7akgMBo2CvMNGl5oKc=;
	b=NlmtWbSHFwdfaE0SBR0ETX5dkOPTWUwdzq2nZb9P3oYYmk7PFU18NHQM/SmHcyhRlygogu
	kb2BFcO93s8vfAXeMeC1SucvymSNvv3LIKB1QtJMw5qHfjsiDYx9cd/FZcIEFvaCDnQ6sX
	MvNjsPx4d+e4evlsL2B4boX70LEZgHw=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1676281689;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Wv5gXaWkpQtNkcNN+0Tt5Mfjd7akgMBo2CvMNGl5oKc=;
	b=X/wdwAfkfLt6Ftde+4pniFwwSq5jP4olyAiVTEfpTACW9QNNFSSWKycSJ7evkq8AAIz476
	vITmm1WI56qh04Cw==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 88A212C141;
	Mon, 13 Feb 2023 09:48:09 +0000 (UTC)
Date: Mon, 13 Feb 2023 09:48:09 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
cc: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>, 
    "incarnation.p.lee" <incarnation.p.lee@outlook.com>, 
    gcc-patches <gcc-patches@gcc.gnu.org>, 
    "Kito.cheng" <kito.cheng@sifive.com>, ams <ams@codesourcery.com>
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
In-Reply-To: <mptr0utzyv8.fsf@arm.com>
Message-ID: <nycvar.YFH.7.77.849.2302130943540.9226@jbgna.fhfr.qr>
References: <BYAPR04MB48245075FF3DB049086E5E1BA4DF9@BYAPR04MB4824.namprd04.prod.outlook.com> <DB73CFD816159C02+2023021121065580303412@rivai.ai> <nycvar.YFH.7.77.849.2302130758210.9226@jbgna.fhfr.qr> <869B9CDC6210FDAD+20230213161951079936108@rivai.ai>
 <nycvar.YFH.7.77.849.2302130833310.9226@jbgna.fhfr.qr> <mptr0utzyv8.fsf@arm.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, 13 Feb 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 13 Feb 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> >> Yes, I think so.
> >> 
> >> Let's explain RVV more clearly.
> >> Let's suppose we have vector-length = 64bits in RVV CPU.
> >> VNx1BI is exactly 1 consecutive bits.
> >> VNx2BI is exactly 2 consecutive bits.
> >> VNx4BI is exactly 4 consecutive bits.
> >> VNx8BI is exactly 8 consecutive bits.
> >> 
> >> For VNx1BI (vbool64_t ), we load it wich this asm:
> >> vsetvl e8mf8
> >> vlm.v
> >> 
> >> For VNx2BI (vbool32_t ), we load it wich this asm:
> >> vsetvl e8mf4
> >> vlm.v
> >> 
> >> For VNx4BI (vbool16_t ), we load it wich this asm:
> >> vsetvl e8mf2
> >> vlm.v
> >> 
> >> For VNx8BI (vbool8_t ), we load it wich this asm:
> >> vsetvl e8m1
> >> vlm.v
> >> 
> >> In case of this code sequence:
> >> vbool16_t v4 = *(vbool16_t *)in;
> >> vbool8_t v3 = *(vbool8_t*)in;
> >> 
> >> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
> >> We can't just use the data loaded by VNx4BI (vbool16_t ) in  VNx8BI (vbool8_t ).
> >> But we can use the data loaded by VNx8BI (vbool8_t  ) in  VNx4BI (vbool16_t ).
> >>
> >> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
> >> It's incorrect for RVV.
> >
> > OK, so the 'vlm.v' instruction will zero the padding bits (according to
> > vsetvl), but I doubt the memory subsystem will not load a whole byte.
> >
> > Then GET_MODE_PRECISION of VNx4BI has to be smaller than 
> > GET_MODE_PRECISION of VNx8BI, even if their size is the same.
> >
> > I suppose that ADJUST_NUNITS should be able to do this, but then we
> > have in aarch64-modes.def
> >
> > VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> > VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> > VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> > VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
> >
> > ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
> > ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> > ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
> > ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
> >
> > so all VNxMBI modes are 2 bytes in size but their component is always
> > BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
> 
> Yeah.  Only the low bit is significant, so it's still a 1-bit element.
> But the padding is distributed evenly across the elements rather than
> being grouped at one end of the predicate.

I wonder what we'd do for a target that makes the high bit significant ;)

> > For riscv we have
> >
> > VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
> > ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
> >
> > so here it would be natural to set the mode precision to
> > a poly-int computed by the component precision times nunits?  OTOH
> > we have to look at the component precision vs. size as well and
> >
> > /* Single bit mode used for booleans.  */ 
> > BOOL_MODE (BI, 1, 1); 
> >
> > BOOL_MODE is not documented, but its precision and size, so BImode
> > has a size of 1.  That makes VECTOR_BOOL_MODE very special since
> > the layout isn't derived from the component mode.  Deriving the
> > layout from the precision would make aarch64 incorrect and
> > would need BI2 and BI4 modes at least.
> 
> I think the elements have to stay BI for AArch64.  Using BI2 (with a
> precision of 2) would make both bits significant.

I think what's "wrong" with a BImode component mode is not the
precision but the size - we don't support bit-precision component
types on the GENERIC side but for bool vector modes we pack the
components to a bit size and aarch64 has varying bit sizes here
(and thus components with padding).  I don't think we support
modes with sizes less than a unit but since bool modes are special
we could re-purpose their precision to mean bitsize.

> I'm not sure the RVV case fits into the existing mode layout scheme.
> AFAIK we don't currently support vector modes with padding at one end.
> If that's right, the fix is likely to involve more than just tweaking
> the mode parameters.
> 
> What's the byte size of VNx1BI, expressed as a function of N?
> If it's CEIL (N, 8) then we don't have a way of representing that yet.

PARTIAL_VECTOR_MODE?  (ick)

Richard.