From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by sourceware.org (Postfix) with ESMTPS id 393D33858D38 for ; Fri, 22 Mar 2024 07:48:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 393D33858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 393D33858D38 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::f34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711093698; cv=none; b=PFNwUaC4SXR+RXh/iUCm8zZ3AzkJdQbPRDMb52bfy9G74cSgE4Cy5EaDTtzDdanXA6tMCSSFs6baamohX/2cDx0ZE6SJN7G7143CCwwneHgNrusoqB+yzFjf5s3ti/0134SCOucgf9FrYL7UQhFmybcK2Lax3i+U3FRIRWhcYGM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711093698; c=relaxed/simple; bh=E+arqEEVCnvPGBKLbWjAqb9v+7W1FsrRcRdJFPkW5Ac=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=PdI2ZkwA/iMxHQ3QroTU/04Geid53/NUMf4fwgmGDm3GDAURFV3bk6tFNgDToMAIcXRYeg/ZCkta47+MPpV5H8S3iE4SWdb1+tkr+vMBA3AoSM21QOOd3Hz+KUbqH4TjtwKaJEwe3KdB0mFApQMmHVHzHnUBxdBFpgj/A/rd7EQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-qv1-xf34.google.com with SMTP id 6a1803df08f44-690fed6816fso12562446d6.1 for ; Fri, 22 Mar 2024 00:48:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1711093687; x=1711698487; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nGyrAkzJ15CvVrwPnNt0Ajljp7+5CuUl8JEN3jI6vsU=; b=KXOdpSh5h8wMizLrP6L4iWWmROw3tryNP8dFVfZyviv6f4e3T2/mUVocodgGmA+Sew 1TV8xb5YHRvnEkG/+eehb+NRtz0jrPVGXcfVDqS6qkqPyfMgoF01/SMJkLZ8jx1h/Iu7 jnZ44uuYeRAC0idazzx+/VfZyWxvJkiRk3LRLib5VHIzszDKTmXq3EeyPdT6fz4MNG6o RmkJMLvDLHX6rfKv8DYBaUUnJIJNi69Ik9wVBuVi6bbY0Mb4P8bUt+Gqvol+7uQuOyn+ eUt5wGDm9d27mGuQH3ONbxeWvWT9lmnbu1I3jZEHFtsngVhjpB2jvnAhanMYArd9W70U cDCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711093687; x=1711698487; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nGyrAkzJ15CvVrwPnNt0Ajljp7+5CuUl8JEN3jI6vsU=; b=grTX3Pn6uxOvCcsccgBMl6aETfVQkv5cAwhOhCoYdcbwC3AkPba5Df2raVDGtE6Gn/ YJBZSc0xAmbtGtm+ZLsvBLiNOzty/CD42ao9u9ZTm+f/FliQpr6Bl99ojDxQwqjSo+rO nThpqpFfjWW3/31FJ191vJ+f5O0sEVkMnTPU9kuJixRVrpaj6fC0TLlHlhh+flkqxXt8 VJIVNVkUKazN2x8BsT+VHUOIbg8WG5LR7GfMyaqiYV0EbClQpmttW3RICBgxHpjXYBae IZL2AZkV7n5iVqAttTt5jkPsP0UcUptzu2BNOdOLcC87vtWZoq8raPluRCVCZTpr3eSU hDKg== X-Gm-Message-State: AOJu0YzVT0ZRzQD97d1mplX5BuQ1FGIC6kxfCPRkKrNCyi+llcLshNMe qjYDFJGJd6OVWDFHWNcV1sQruZp76EGh9YYJiKLU1xYvlVeZq/cRXdks1Kegrw0iQF11JhXx3Ek Iz5Zf1JCHc9YcWt1kPY4PJdVvfuHwfjWOpJduFi3uxDqlOfCNi56diw== X-Google-Smtp-Source: AGHT+IEJK9oK0Gub6m5xVPveOyNy1JuDsZYEg8mCEvUeSW+ZVp20YYdhVlWFEblZ9DVWV9gtT+vmJzA8bpkCrR/EFZk= X-Received: by 2002:a05:6a20:3c8c:b0:1a3:55d2:1483 with SMTP id b12-20020a056a203c8c00b001a355d21483mr2099697pzj.5.1711093289605; Fri, 22 Mar 2024 00:41:29 -0700 (PDT) MIME-Version: 1.0 References: <20240321234552.2140254-1-christoph.muellner@vrull.eu> In-Reply-To: From: =?UTF-8?Q?Christoph_M=C3=BCllner?= Date: Fri, 22 Mar 2024 08:41:18 +0100 Message-ID: Subject: Re: [PATCH] RISC-V: Don't add fractional LMUL types to V_VLS for XTheadVector To: Bruce Hoult Cc: gcc-patches@gcc.gnu.org, Kito Cheng , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Camel Coder , Juzhe-Zhong , Jun Sha , Xianmiao Qu , Jin Ma Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Mar 22, 2024 at 4:43=E2=80=AFAM Bruce Hoult wrote= : > > > The effect is demonstrated by a new test case that shows > that the by-pieces framework now emits `sb` instructions > instead of triggering an ICE > > So these small memset() now don't use RVV at all if xtheadvector is enabl= ed? Yes, but not directly. The patch just prevents fractional LMUL modes from being considered for XTheadVector. That's necessary because further lowering memory moves with a fractional LMUL mode cannot be done for XTheadVector (that's the reason for the ICE). > I don't have evidence whether the use of RVV (whether V or > xtheadvector) for these memsets is a win or not, but the treatment > should probably be consistent. > > I don't know why RVV 1.0 uses a fractional LMUL at all here. It would > work perfectly well with LMUL=3D1 and just setting vl to the appropriate > length (which is always less than 16 bytes). Use of fractional LMUL > doesn't save any resources. The compiler can consider fractional LMUL values for expansion for RVV, but that does not mean it will be used in the emitted instruction sequence. Details like cost model and data alignment also matter. During testing, I observed that RVV and XTheadVector will both emit sequenc= es of 'sd' for short memsets with known length, known data to set, and unknown alignment of the data to be written. However, I have not excessively tested using all possible tuning parameters= , as my primary goal was to eliminate the reason for the ICE with XTheadVecto= r. > > On Fri, Mar 22, 2024 at 12:46=E2=80=AFPM Christoph M=C3=BCllner > wrote: > > > > The expansion of `memset` (via expand_builtin_memset_args()) > > uses clear_by_pieces() and store_by_pieces() to avoid calls > > to the C runtime. To check if a type can be used for that purpose > > the function by_pieces_mode_supported_p() tests if a `mov` and > > a `vec_duplicate` INSN can be expaned by the backend. > > > > The `vec_duplicate` expansion takes arguments of type `V_VLS`. > > The `mov` expansions take arguments of type `V`, `VB`, `VT`, > > `VLS_AVL_IMM`, and `VLS_AVL_REG`. Some of these types (in fact > > not types but type iterators) include fractional LMUL types. > > E.g. `V_VLS` includes `V`, which includes `VI`, which includes > > `RVVMF2QI`. > > > > This results in an attempt to use fractional LMUL-types for > > the `memset` expansion resulting in an ICE for XTheadVector, > > because that extension cannot handle fractional LMULs. > > > > This patch addresses this issue by splitting the definition > > of the `VI` mode itereator into `VI_NOFRAC` (without fractional > > LMUL types) and `VI_FRAC` (only fractional LMUL types). > > Further, it defines `V_VLS` such, that `VI_FRAC` types are only > > included if XTheadVector is not enabled. > > > > The effect is demonstrated by a new test case that shows > > that the by-pieces framework now emits `sb` instructions > > instead of triggering an ICE. > > > > Signed-off-by: Christoph M=C3=BCllner > > > > PR 114194 > > > > gcc/ChangeLog: > > > > * config/riscv/vector-iterators.md: Split VI into VI_FRAC and V= I_NOFRAC. > > Only include VI_NOFRAC in V_VLS without TARGET_XTHEADVECTOR. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/xtheadvector/pr114194.c: New test. > > > > Signed-off-by: Christoph M=C3=BCllner > > --- > > gcc/config/riscv/vector-iterators.md | 19 +++++-- > > .../riscv/rvv/xtheadvector/pr114194.c | 56 +++++++++++++++++++ > > 2 files changed, 69 insertions(+), 6 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr1= 14194.c > > > > diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/ve= ctor-iterators.md > > index c2ea7e8b10a..a24e1bf078f 100644 > > --- a/gcc/config/riscv/vector-iterators.md > > +++ b/gcc/config/riscv/vector-iterators.md > > @@ -108,17 +108,24 @@ (define_c_enum "unspecv" [ > > UNSPECV_FRM_RESTORE_EXIT > > ]) > > > > -(define_mode_iterator VI [ > > - RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_= MIN_VLEN > 32") > > - > > - RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN = > 32") > > - > > - RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32") > > +;; Subset of VI with fractional LMUL types > > +(define_mode_iterator VI_FRAC [ > > + RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") > > + RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32") > > + (RVVMF2SI "TARGET_MIN_VLEN > 32") > > +]) > > > > +;; Subset of VI with non-fractional LMUL types > > +(define_mode_iterator VI_NOFRAC [ > > + RVVM8QI RVVM4QI RVVM2QI RVVM1QI > > + RVVM8HI RVVM4HI RVVM2HI RVVM1HI > > + RVVM8SI RVVM4SI RVVM2SI RVVM1SI > > (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64") > > (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64") > > ]) > > > > +(define_mode_iterator VI [ VI_NOFRAC (VI_FRAC "!TARGET_XTHEADVECTOR") = ]) > > + > > ;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_1= 6 > > ;; changed to TARGET_ZVFH. TARGET_VECTOR_ELEN_FP_16 is also true for > > ;; TARGET_ZVFHMIN while we actually want to disable all instructions a= part > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c= b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c > > new file mode 100644 > > index 00000000000..fc2d1349425 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c > > @@ -0,0 +1,56 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-march=3Drv32gc_xtheadvector" { target { rv32 } } } *= / > > +/* { dg-options "-march=3Drv64gc_xtheadvector" { target { rv64 } } } *= / > > +/* { dg-final { check-function-bodies "**" "" } } */ > > + > > +/* > > +** foo0_1: > > +** sb\tzero,0([a-x0-9]+) > > +** ret > > +*/ > > +void foo0_1 (void *p) > > +{ > > + __builtin_memset (p, 0, 1); > > +} > > + > > +/* > > +** foo0_7: > > +** sb\tzero,0([a-x0-9]+) > > +** sb\tzero,1([a-x0-9]+) > > +** sb\tzero,2([a-x0-9]+) > > +** sb\tzero,3([a-x0-9]+) > > +** sb\tzero,4([a-x0-9]+) > > +** sb\tzero,5([a-x0-9]+) > > +** sb\tzero,6([a-x0-9]+) > > +** ret > > +*/ > > +void foo0_7 (void *p) > > +{ > > + __builtin_memset (p, 0, 7); > > +} > > + > > +/* > > +** foo1_1: > > +** li\t[a-x0-9]+,1 > > +** sb\t[a-x0-9]+,0([a-x0-9]+) > > +** ret > > +*/ > > +void foo1_1 (void *p) > > +{ > > + __builtin_memset (p, 1, 1); > > +} > > + > > +/* > > +** foo1_5: > > +** li\t[a-x0-9]+,1 > > +** sb\t[a-x0-9]+,0([a-x0-9]+) > > +** sb\t[a-x0-9]+,1([a-x0-9]+) > > +** sb\t[a-x0-9]+,2([a-x0-9]+) > > +** sb\t[a-x0-9]+,3([a-x0-9]+) > > +** sb\t[a-x0-9]+,4([a-x0-9]+) > > +** ret > > +*/ > > +void foo1_5 (void *p) > > +{ > > + __builtin_memset (p, 1, 5); > > +} > > -- > > 2.44.0 > > > >