From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe31.google.com (mail-vs1-xe31.google.com [IPv6:2607:f8b0:4864:20::e31]) by sourceware.org (Postfix) with ESMTPS id 7FDE63858C5E for ; Tue, 11 Apr 2023 13:50:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7FDE63858C5E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-vs1-xe31.google.com with SMTP id v10so7424608vsf.6 for ; Tue, 11 Apr 2023 06:50:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1681221052; x=1683813052; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eyMeWRqiBn7NOXq9X2TJ1cIqjFenixBtbhDOfZIIdcQ=; b=oCs/QNEGJtCnUTjaUEVGoHNjIfEIajO1mK9u94JXfYpkvq4N2E3yQWw4Bke/7OCe9R 90TCxpIaQ6I/Fht+/rsoRQfhvBPmYxZR7CC/Lequ6+OU1ZZJNFGqidnHbHPOo+cGJIKJ iocpAET068C6H8jO1B0lF1mbCx5zssybHAyeGsbpr5HQFpGaOEBkV6+vWGrJQ4eAq3v2 3RCNmZhqHyUr+o83IgQ6nJby5hNezR4xFa3u4Uq5eDbJ5sYVgyemPEZ7RB0vUDg0zGxQ AOrvQLuiP7hkbpU/DrOAVpTjoiLByUtL0QZwyBduV/BL6LAw4NftuezanV6EyBSiuDkq fVlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1681221052; x=1683813052; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eyMeWRqiBn7NOXq9X2TJ1cIqjFenixBtbhDOfZIIdcQ=; b=KsCpGYKipZoX32lDOH5Iym8YV9tqmIQdeXjw/hSIh3eEW1ax63vmi+2bM06RusX/Ct AliMaiw94SP/5PDjycUBiIfPYCp4SJ+KBwuEvp5M9mJsJ+R+i3ty7ftQfa1g9Bwui3Ve +lMfXSt2lKyCsBhz7RtsoCMRqPurs5027HRqzezlquDSyECKftK4PLMwSZ5Qpv9TIUUd DnS/+OcuaZxLe90iWftEU+Ct2WY2moZCGakuA0AiQkGrT1TAyMYOBL9wwF6hHFp4tzVd ryIyaZFVFsJjkY0WkzjPIjFTDrpvf6P086Et/ZxYt+EI2v6WbBsmMdmtPXKzrDwM/85L cNUA== X-Gm-Message-State: AAQBX9dL03zk7Anmmp8bjdsV4hKUwr7KW16w+EOdUwxohhQzPUcFTrsf tdI58EuGc54USV75GGfmtJ6sNHQWbyGzWLOctME= X-Google-Smtp-Source: AKy350b8Q1aHtqrEg/5Kt+q11FvAdZpxgcOB5J58fSDUcirSGRKpBSasE9chsJJF4IkhJuplMPiwBEIh/GbtTRkibmc= X-Received: by 2002:a05:6102:470d:b0:42c:3c4d:e9a5 with SMTP id ei13-20020a056102470d00b0042c3c4de9a5mr6090180vsb.7.1681221051496; Tue, 11 Apr 2023 06:50:51 -0700 (PDT) MIME-Version: 1.0 References: <20230410144808.324346-1-juzhe.zhong@rivai.ai> <89f088ec-8692-01f5-0395-5a66ddf085d7@gmail.com> <47D962C7C724E3A2+20230410231445834316202@rivai.ai> <0AEFD2378C3DF89B+202304111919556577872@rivai.ai> In-Reply-To: <0AEFD2378C3DF89B+202304111919556577872@rivai.ai> From: Kito Cheng Date: Tue, 11 Apr 2023 21:50:40 +0800 Message-ID: Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit To: "juzhe.zhong@rivai.ai" Cc: "richard.sandiford" , rguenther , jeffreyalaw , gcc-patches , palmer , jakub Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Let me give more explanation why RISC-V vector need so many modes than AArc= h64. The following will use "RVV" as an abbreviation for "RISC-V Vector" instructions. There are two key points here: - RVV has a concept called LMUL - you can understand that as register grouping, we can group up to 8 adjacent registers together and then operate at once, e.g. one vadd can operate on adding two 8-reg groups at once. - We have segment load/store that require vector tuple types. - AArch64 has similar stuffs on both Neon and SVE, e.g. int32x2x2_t or svint32x2_t. In order to model LMUL in backend, we have to the combination of scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8 different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF, so basically we'll have 7 (LMUL type) * 7 (scalar type) here. Okay, let's talk about tuple type AArch64 also having tuple type, but why is it not having such a huge number of modes? It mainly cause by LMUL; use a concrete example to explain why this cause different design on machine mode, using scalable vector mode with SI mode tuple here: AArch64: svint32_t (VNx4SI) svint32x2_t (VNx8SI) svint32x3_t (VNx12SI) svint32x3_t (VNx16SI) AArch64 only has up to 3-tuple, but RISC-V could have up to 8-tuple, so we already have 8 different types for each scalar mode even though we don't count LMUL concept yet. RISC-V*: vint32m1_t (VNx4SI) vint32m1x2_t (VNx8SI) vint32m1x3_t (VNx12SI) vint32m1x4_t (VNx16SI) vint32m1x5_t (VNx20SI) vint32m1x6_t (VNx24SI) vint32m1x7_t (VNx28SI) vint32m1x8_t (VNx32SI) Using VLEN=3D128 as the base type system, you can ignore it if you don't understand the meaning for now. And let's consider LMUL now, add LMUL=3D2 case here, RVV has a constraint that the LMUL * NF(NF-tuple) must be less or equal to 8, so we have only 3 extra modes for LMUL=3D2. RISC-V*: vint32m2_t (VNx8SI) vint32m2x2_t (VNx16SI) vint32m2x3_t (VNx24SI) vint32m2x4_t (VNx32SI) However, there is a big problem RVV have different register constraint for different LMUL type, LMUL <=3D 1 can use any register, LMUL=3D2 type require register align to multiple-of-2 (v0, v2, =E2=80=A6), and LMUL=3D4 t= ype requires register align to multiple-of-4 (v0, v4, =E2=80=A6). So vint32m1x2_t (LMUL=3D1x2) and vint32m2_t (LMUL=3D2) have the same size and NUNIT, but they have different register constraint, vint32m1x2_t is LMUL 1, so we don't have register constraint, but vint32m2_t is LMUL 2 so it has reg. constraint, it must be aligned to multiple-of-2. Based on the above reason, those tuple types must have separated machine mode even if they have the same size and NUNIT. Why Neon and SVE didn't have such an issue? Because SVE and Neon didn't have the concept of LMUL, so tuple type in SVE and Neon won't have two vector types that have the same size but different register constraints or alignment - one size is one type. So based on LMUL and register constraint issue of tuple type, we must have 37 types for vector tuples, and plus 48 modes variable-length vector mode, and 42 scalar mode - so we have ~140 modes now, it sounds like still less than 256, so what happened? RVV has one more thing special thing in our type system due to ISA design, the minimal vector length of RVV is 32 bit unlike SVE guarantee, the minimal is 128 bits, so we did some tricks one our type system is we have a different mode for minimal vector length (MIN_VLEN) is 32, 64 or large or equal to 128, this design is because it would be more friendly for vectorizer, and also model things precisely for better code gen. e.g. vint32m1_t is VNx1SI in MIN_VLEN>=3D32 vint32m1_t is VNx2SI in MIN_VLEN>=3D64 vint32m1_t is VNx4SI in MIN_VLEN>=3D128 So actually we will have 37 * 3 modes for vector tuple mode, and now ~210 modes now (the result is little different than JuZhe's number since I ignore some mode isn't used in C, but it defined in machine mode due the the current GCC will always define all possible scalar mode for a vector mode) We also plan to add some traditional fixed length vector types like V2SI in future=E2=80=A6and apparently 256 mode isn't enough for this plan := (