From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 461F13858D28 for ; Wed, 12 Apr 2023 07:53:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 461F13858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 6CC091F6E6; Wed, 12 Apr 2023 07:53:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1681286002; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=j4yP3CZhMpGp7ajoSfSU12v9CqPeEmJLJpYcB1PTuIU=; b=vUUaAGs7vRAUlyHYYEQ+0hcV3/Frjt212S7GpVAYLePpG6z5s9G2gyLmm66JsBXEpbATce VVozGHQEzzgyZmGAhCTWB6b0diVkgh3YqGAlwQQKu+kEyBk8Z4vi5vHGLgcRF/CvkGRTlA NupycF1wHE41+qNP/cjC9ZxV4iie9mg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1681286002; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=j4yP3CZhMpGp7ajoSfSU12v9CqPeEmJLJpYcB1PTuIU=; b=vtHg0QDc/iNYIJrLd4ldi01NqmciDewr77Bc8RZsonO0P3USISVkiDhjSaZF+YxsvhK4HM ONgKkXEVieGKO7CA== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 38C762C143; Wed, 12 Apr 2023 07:53:22 +0000 (UTC) Date: Wed, 12 Apr 2023 07:53:22 +0000 (UTC) From: Richard Biener To: Kito Cheng cc: "juzhe.zhong@rivai.ai" , "richard.sandiford" , jeffreyalaw , gcc-patches , palmer , jakub Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit In-Reply-To: Message-ID: References: <20230410144808.324346-1-juzhe.zhong@rivai.ai> <89f088ec-8692-01f5-0395-5a66ddf085d7@gmail.com> <47D962C7C724E3A2+20230410231445834316202@rivai.ai> <0AEFD2378C3DF89B+202304111919556577872@rivai.ai> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 11 Apr 2023, Kito Cheng wrote: > Let me give more explanation why RISC-V vector need so many modes than AArch64. > > The following will use "RVV" as an abbreviation for "RISC-V Vector" > instructions. > > There are two key points here: > > - RVV has a concept called LMUL - you can understand that as register > grouping, we can group up to 8 adjacent registers together and then > operate at once, e.g. one vadd can operate on adding two 8-reg groups > at once. > - We have segment load/store that require vector tuple types. - > AArch64 has similar stuffs on both Neon and SVE, e.g. int32x2x2_t or > svint32x2_t. > > In order to model LMUL in backend, we have to the combination of > scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8 > different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF, > so basically we'll have 7 (LMUL type) * 7 (scalar type) here. Other archs have load/store-multiple instructions, IIRC those are modeled with the appropriate set of operands. Do RVV LMUL group inputs/outputs overlap with the non-LMUL grouped registers and can they be used as aliases or is this supposed to be implemented transparently on the register file level only? But yes, implementing this as operations on multi-register ops with large modes is probably the only sensible approach. I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you explain? Is that supposed to virtually increase the number of registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first and the third "virtual" register decomposed from r0) in GCC? To me the natural way would be a subreg of r0? Somehow RVV seems to have more knobs than necessary for tuning the actual vector register layout (aka N axes but only N-1 dimensions thus the axes are not orthogonal). > Okay, let's talk about tuple type AArch64 also having tuple type, but > why is it not having such a huge number of modes? It mainly cause by > LMUL; use a concrete example to explain why this cause different > design on machine mode, using scalable vector mode with SI mode tuple > here: > > AArch64: svint32_t (VNx4SI) svint32x2_t (VNx8SI) svint32x3_t (VNx12SI) > svint32x3_t (VNx16SI) > > AArch64 only has up to 3-tuple, but RISC-V could have up to 8-tuple, > so we already have 8 different types for each scalar mode even though > we don't count LMUL concept yet. > > RISC-V*: vint32m1_t (VNx4SI) vint32m1x2_t (VNx8SI) vint32m1x3_t > (VNx12SI) vint32m1x4_t (VNx16SI) vint32m1x5_t (VNx20SI) vint32m1x6_t > (VNx24SI) vint32m1x7_t (VNx28SI) vint32m1x8_t (VNx32SI) > > Using VLEN=128 as the base type system, you can ignore it if you don't > understand the meaning for now. > > And let's consider LMUL now, add LMUL=2 case here, RVV has a > constraint that the LMUL * NF(NF-tuple) must be less or equal to 8, so > we have only 3 extra modes for LMUL=2. > > RISC-V*: vint32m2_t (VNx8SI) vint32m2x2_t (VNx16SI) vint32m2x3_t > (VNx24SI) vint32m2x4_t (VNx32SI) > > However, there is a big problem RVV have different register constraint > for different LMUL type, LMUL <= 1 can use any register, LMUL=2 type > require register align to multiple-of-2 (v0, v2, ?), and LMUL=4 type > requires register align to multiple-of-4 (v0, v4, ?). > > So vint32m1x2_t (LMUL=1x2) and vint32m2_t (LMUL=2) have the same size > and NUNIT, but they have different register constraint, vint32m1x2_t > is LMUL 1, so we don't have register constraint, but vint32m2_t is > LMUL 2 so it has reg. constraint, it must be aligned to multiple-of-2. > > Based on the above reason, those tuple types must have separated > machine mode even if they have the same size and NUNIT. > > Why Neon and SVE didn't have such an issue? Because SVE and Neon > didn't have the concept of LMUL, so tuple type in SVE and Neon won't > have two vector types that have the same size but different register > constraints or alignment - one size is one type. > > So based on LMUL and register constraint issue of tuple type, we must > have 37 types for vector tuples, and plus 48 modes variable-length > vector mode, and 42 scalar mode - so we have ~140 modes now, it sounds > like still less than 256, so what happened? > > > RVV has one more thing special thing in our type system due to ISA > design, the minimal vector length of RVV is 32 bit unlike SVE > guarantee, the minimal is 128 bits, so we did some tricks one our type > system is we have a different mode for minimal vector length > (MIN_VLEN) is 32, 64 or large or equal to 128, this design is because > it would be more friendly for vectorizer, and also model things > precisely for better code gen. > > e.g. > > vint32m1_t is VNx1SI in MIN_VLEN>=32 > > vint32m1_t is VNx2SI in MIN_VLEN>=64 > > vint32m1_t is VNx4SI in MIN_VLEN>=128 > > So actually we will have 37 * 3 modes for vector tuple mode, and now > ~210 modes now (the result is little different than JuZhe's number > since I ignore some mode isn't used in C, but it defined in machine > mode due the the current GCC will always define all possible scalar > mode for a vector mode) > > We also plan to add some traditional fixed length vector types like > V2SI in future?and apparently 256 mode isn't enough for this plan :( > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)