From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id BDFEE3857001 for ; Wed, 27 Jul 2022 12:56:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BDFEE3857001 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 7F3AD2085A; Wed, 27 Jul 2022 12:56:23 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 5FC132C141; Wed, 27 Jul 2022 12:56:23 +0000 (UTC) Date: Wed, 27 Jul 2022 12:56:23 +0000 (UTC) From: Richard Biener To: "juzhe.zhong@rivai.ai" cc: gcc-patches , vmakarov@redhat.com Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison In-Reply-To: <2022072716263345190043@rivai.ai>+C57ADFE8A6050C80 Message-ID: References: <20220727034437.154625-1-juzhe.zhong@rivai.ai>, <20220727034437.154625-2-juzhe.zhong@rivai.ai>, , <2022072715091952120116@rivai.ai>+0E606AA70DF6FCA5, , <2022072716005642628531@rivai.ai>+297F71E4F3A41880, <2022072716263345190043@rivai.ai>+C57ADFE8A6050C80 User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2022 12:56:26 -0000 On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote: > For vint8m1_t: > VECTOR_MODES_WITH_PREFIX (VNx, INT, 16, 0) > ADJUST_NUNITS (VNx16QI, riscv_vector_chunks * 8); > ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8); > For vint8mf2_t: > VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 0) > ADJUST_NUNITS (VNx8QI, riscv_vector_chunks * 4); > ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8); ^^^ ADJUST_BYTES (VNx8QI, riscv_vector_chunks * 8); probably. As said, I'm not sure this is a good idea just for the sake of spill slots. Maybe there's a mode_for_spill one could use and spill a paradoxical subreg of that mode ... (don't search for mode_for_spill, I just made that up). Maybe you can somehow forbit spilling of vint8mf2_t and instead define spill_class for those so they spill first to vint8m1_t and then those get spilled to the stack? > riscv_vector_chunks is a compile-time unknown poly value, I use ADJUST_BYTES to specify these 2 machine_modes > with same bytesize but different element number. > This way can tell GCC that vint8mf2_t has half element nunber of vint8m1_t but occupy the same size during the register spilling. > > May be we can add a new function that call ADJUST_PRECISION that I can adjust these two precision? > I am not sure. I am look forward another solution to deal with this issuse. Thank you so much. > > > > juzhe.zhong@rivai.ai > > From: Richard Biener > Date: 2022-07-27 16:12 > To: juzhe.zhong@rivai.ai > CC: gcc-patches > Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote: > > > Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. In > > RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, > > zero + vle8.v). You can see it in foo function. In this case we don't > > need to confuse compiler the size of vint8mf2. However, The second case > > is I write assembly to occupy the vector register to generate register > > spilling cases. You can see LLVM implementation: First use vsetvli + > > vle8.v load (vint8mf2_t) data from the base pointer, then spill the > > vector to memory using vs1r (this is the whole register store which > > store the whole vector to the memory and then use vl1r load the whole > > register and finally return it. In LLVM implementation, it insert > > vsetvli instructions for RVV instruction using LLVM PASS before RA > > (register allocation). So after RA, compiler generate the spilling > > loads/stores. We can either choose to use vsetvli + vle/vse (load/store > > fractional vector) or vl1r/vs1r (load/store whole vector which enlarge > > the spill size). > > > > Because implementing insert vsetvli PASS after RA (register allocation) > > is so complicated, LLVM choose to use vl1r/vs1r. Frankly, I implement > > RVV GCC reference to LLVM. So that why I want to define the machine_mode > > for `vint8mf2` with smaller element-size but same byte-size from > > `vint8m1'. > > The LLVM strathegy might not be the best one for GCC. I still don't > quite understand what issue you face with the code you try to patch. > How does the machmode.def portion of the port look like for > vint8m1 and vint8mf2? What are the corresponding modes? > > > Thank you for your reply. > > > > > > > > juzhe.zhong@rivai.ai > > > > From: Richard Biener > > Date: 2022-07-27 15:35 > > To: juzhe.zhong@rivai.ai > > CC: gcc-patches > > Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison > > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote: > > > > > Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue? > > > > > > For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2). > > > > Can you explain in terms of GCCs generic vectors what vint8m1_t and > > vint8mf2_t are? > > > > > Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of > > > fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) > > > so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition, > > > The machine_mode PRECISION is calculate by component size which is different from BITSIZE > > > > > > Now, here is the question. For array type: vint8mf2x4_t, if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by > > > bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array. > > > > > > Can you help me with this? This is important for the following RVV upstream support. Thanks. > > > > So you have that vint8mf2_t type and registers + instructions to operate > > on them but no way to load/store? How do you implement > > > > vint8mf2_t foo (vint8mf2_t *p) > > { > > return *p; > > } > > > > ? > > > > > > > > > > > > > > > > > juzhe.zhong@rivai.ai > > > > > > From: Richard Biener > > > Date: 2022-07-27 14:46 > > > To: zhongjuzhe > > > CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson > > > Subject: Re: [PATCH 1/1] Fix bit-position comparison > > > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote: > > > > > > > From: zhongjuzhe > > > > > > > > gcc/ChangeLog: > > > > > > > > * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE > > > > > > > > --- > > > > gcc/expr.cc | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/gcc/expr.cc b/gcc/expr.cc > > > > index 80bb1b8a4c5..ac2b3c07df6 100644 > > > > --- a/gcc/expr.cc > > > > +++ b/gcc/expr.cc > > > > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal) > > > > code contains an out-of-bounds access to a small array. */ > > > > if (!MEM_P (to_rtx) > > > > && GET_MODE (to_rtx) != BLKmode > > > > - && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx)))) > > > > + && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx)))) > > > > > > I think this has the chance to go wrong with regard to endianess. > > > Consider to_rtx with 32bit mode size but 12bit mode precision. bitpos > > > is relative to the start of to_rtx as if it were in memory and bitsize > > > determines the contiguous region affected. But since we are actually > > > storing into a non-memory endianess comes into play. > > > > > > So no, I don't think the patch is correct, it would be way more > > > complicated to get the desired improvement. > > > > > > Richard. > > > > > > > { > > > > expand_normal (from); > > > > result = NULL; > > > > > > > > > > > > > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)