From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d])
 by sourceware.org (Postfix) with ESMTPS id BDFEE3857001
 for <gcc-patches@gcc.gnu.org>; Wed, 27 Jul 2022 12:56:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BDFEE3857001
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out2.suse.de (Postfix) with ESMTP id 7F3AD2085A;
 Wed, 27 Jul 2022 12:56:23 +0000 (UTC)
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id 5FC132C141;
 Wed, 27 Jul 2022 12:56:23 +0000 (UTC)
Date: Wed, 27 Jul 2022 12:56:23 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
cc: gcc-patches <gcc-patches@gcc.gnu.org>, vmakarov@redhat.com
Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
In-Reply-To: <2022072716263345190043@rivai.ai>+C57ADFE8A6050C80
Message-ID: <nycvar.YFH.7.77.849.2207271251520.6583@jbgna.fhfr.qr>
References: <20220727034437.154625-1-juzhe.zhong@rivai.ai>,
 <20220727034437.154625-2-juzhe.zhong@rivai.ai>,
 <nycvar.YFH.7.77.849.2207270644200.6583@jbgna.fhfr.qr>,
 <2022072715091952120116@rivai.ai>+0E606AA70DF6FCA5,
 <nycvar.YFH.7.77.849.2207270733530.6583@jbgna.fhfr.qr>, 
 <2022072716005642628531@rivai.ai>+297F71E4F3A41880,
 <nycvar.YFH.7.77.849.2207270809140.6583@jbgna.fhfr.qr>
 <2022072716263345190043@rivai.ai>+C57ADFE8A6050C80
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2022 12:56:26 -0000

On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:

> For vint8m1_t: 
>    VECTOR_MODES_WITH_PREFIX (VNx, INT, 16, 0)
> ADJUST_NUNITS (VNx16QI, riscv_vector_chunks * 8);
> ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);
> For vint8mf2_t: 
>    VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 0)
> ADJUST_NUNITS (VNx8QI, riscv_vector_chunks * 4);
> ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);

^^^

ADJUST_BYTES (VNx8QI, riscv_vector_chunks * 8);

probably.  As said, I'm not sure this is a good idea just for the sake
of spill slots.  Maybe there's a mode_for_spill one could use and
spill a paradoxical subreg of that mode ... (don't search for
mode_for_spill, I just made that up).

Maybe you can somehow forbit spilling of vint8mf2_t and instead
define spill_class for those so they spill first to vint8m1_t
and then those get spilled to the stack?

> riscv_vector_chunks is a compile-time unknown poly value, I use ADJUST_BYTES to specify these 2 machine_modes 
> with same bytesize but different element number.
> This way can tell GCC that vint8mf2_t has half element nunber of vint8m1_t but occupy the same size during the register spilling.
> 
> May be we can add a new function that call ADJUST_PRECISION ﻿that I can adjust these two precision?
> I am not sure. I am look forward another solution to deal with this issuse. Thank you so much.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-07-27 16:12
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches
> Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>  
> > Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. In 
> > RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, 
> > zero + vle8.v). You can see it in foo ﻿function.  In this case we don't 
> > need to confuse compiler the size of vint8mf2. However, The second case 
> > is I write assembly to occupy the vector register to generate register 
> > spilling cases. You can see LLVM implementation:  First use vsetvli + 
> > vle8.v load (vint8mf2_t) data from the base pointer, then spill the 
> > vector to memory using vs1r (this is the whole register store which 
> > store the whole vector to the memory and then use vl1r load the whole 
> > register and finally return it.  In LLVM implementation, it insert 
> > vsetvli instructions for RVV instruction using LLVM PASS before RA 
> > (register allocation).  So after RA, compiler generate the spilling 
> > loads/stores. We can either choose to use vsetvli + vle/vse (load/store 
> > fractional vector) or vl1r/vs1r (load/store whole vector which enlarge 
> > the spill size).
> > 
> > Because implementing insert vsetvli PASS after RA (register allocation) 
> > is so complicated, LLVM choose to use vl1r/vs1r. Frankly, I implement 
> > RVV GCC reference to LLVM. So that why I want to define the machine_mode 
> > for `vint8mf2` with smaller element-size but same byte-size from 
> > `vint8m1'.
>  
> The LLVM strathegy might not be the best one for GCC.  I still don't
> quite understand what issue you face with the code you try to patch.
> How does the machmode.def portion of the port look like for
> vint8m1 and vint8mf2?  What are the corresponding modes?
>  
> > Thank you for your reply.  
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-07-27 15:35
> > To: juzhe.zhong@rivai.ai
> > CC: gcc-patches
> > Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
> >  
> > > Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue?
> > > 
> > > For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).
> >  
> > Can you explain in terms of GCCs generic vectors what vint8m1_t and
> > vint8mf2_t are?
> >  
> > > Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of 
> > > fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) 
> > > so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition,
> > > The machine_mode PRECISION is calculate by component size which is different from BITSIZE
> > > 
> > > Now, here is the question. For array type: vint8mf2x4_t,  if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by
> > > bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array.
> > > 
> > >  Can you help me with this? This is important for the following RVV upstream support. Thanks.
> >  
> > So you have that vint8mf2_t type and registers + instructions to operate 
> > on them but no way to load/store?  How do you implement
> >  
> > vint8mf2_t foo (vint8mf2_t *p)
> > {
> >   return *p;
> > }
> >  
> > ?
> >  
> > > 
> > > 
> > > 
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2022-07-27 14:46
> > > To: zhongjuzhe
> > > CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
> > > Subject: Re: [PATCH 1/1] Fix bit-position comparison
> > > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
> > >  
> > > > From: zhongjuzhe <juzhe.zhong@rivai.ai>
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > >         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> > > > 
> > > > ---
> > > >  gcc/expr.cc | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > > > index 80bb1b8a4c5..ac2b3c07df6 100644
> > > > --- a/gcc/expr.cc
> > > > +++ b/gcc/expr.cc
> > > > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
> > > >  code contains an out-of-bounds access to a small array.  */
> > > >        if (!MEM_P (to_rtx)
> > > >    && GET_MODE (to_rtx) != BLKmode
> > > > -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> > > > +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
> > >  
> > > I think this has the chance to go wrong with regard to endianess.
> > > Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> > > is relative to the start of to_rtx as if it were in memory and bitsize
> > > determines the contiguous region affected.  But since we are actually
> > > storing into a non-memory endianess comes into play.
> > >  
> > > So no, I don't think the patch is correct, it would be way more
> > > complicated to get the desired improvement.
> > >  
> > > Richard.
> > >  
> > > >  {
> > > >    expand_normal (from);
> > > >    result = NULL;
> > > > 
> > >  
> > > 
> >  
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)