From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by sourceware.org (Postfix) with ESMTPS id 7551C3858D35 for ; Tue, 26 Sep 2023 07:48:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7551C3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-5043a01ee20so10875038e87.0 for ; Tue, 26 Sep 2023 00:48:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695714507; x=1696319307; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=l/IYlf8rhA20BIqhga72ayoQgxTL+aWkAp4rq3wC1Mk=; b=eSN7pkgSo2rV7YaqRBomiaLH6otwXFwtdD9q9W8dyu6LGjaIs43d2i980PMKKTgIWG CciSA8n/cJCu/UzGfjrDnt1WJi5lDCkPuByUpoZVgVAxHelHyft3P14NMNO2O4fGaEaZ exYA18FX/vW98HOO6P3pR+qujToACfX99wUPY8//xycKp0/+l0OXC1J19GmGOoNIURx5 Y2Shfj5lwy8Xsyfm4UgayQbns7UzzKYrOD115xHeQ4dfYIlGso6iEYh8K+clMDql+PWB XyEyQZjAurdx4ii69bvCsSm1C0iD/M8vYFvys3V73Ax6DlAMJ5dFycQGONkW5r+aZxew t9fA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695714507; x=1696319307; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l/IYlf8rhA20BIqhga72ayoQgxTL+aWkAp4rq3wC1Mk=; b=R08ifeEPNE/B73iKuQ+mb+Ce+TVRzISNgSNNBfncrK3Gi/NO64G6IoX2sY2bmzpIFn UedxkkZCpfl/W3rhcmS45SUsft1Jj2VjyF33d2LXHHDlWwcGxWlbz5plzWrzXIeLK+6Q ycQ1I5K6vKkkCqqU42LPc9lGI1sOi+ggd5GYfCBvYTCiOgdJGGJbkcdhIz4eDNyvoxKu +TSQyiX+HZ1L+aEj6I/xmlGTeQgixEN0zWilHbyuizJ0xaVCcoXxIhj/cm27hCxSUmay Hv61P9/gsPIL1e8V8Z38yAXm6h+WMULs01aIYbnvkrV1u3Qb8i0MHJZoTPHqR61abEoE WDlg== X-Gm-Message-State: AOJu0YzRGmOmWhVTSludpuyKHMjRMYQPU2QlkbPNIe0p8ntxrUbZ+Iuz dBGuH6J2F4LjyRjY+so7uOt9MJTfl9VyD+1HUww= X-Google-Smtp-Source: AGHT+IFfti4DKoZHMWMemaHlbswKO4WgQuxuAwTgg70ZkA7EIs6dNWTfEpRwaDElXldVDFymE/xJOFU8yRjszm+YpjI= X-Received: by 2002:ac2:515c:0:b0:501:c1d4:cf68 with SMTP id q28-20020ac2515c000000b00501c1d4cf68mr8243913lfd.15.1695714506620; Tue, 26 Sep 2023 00:48:26 -0700 (PDT) MIME-Version: 1.0 References: <20230926071257.129536-1-juzhe.zhong@rivai.ai> In-Reply-To: <20230926071257.129536-1-juzhe.zhong@rivai.ai> From: Richard Biener Date: Tue, 26 Sep 2023 09:46:04 +0200 Message-ID: Subject: Re: [PATCH] MATCH: Optimize COND_ADD_LEN reduction pattern To: Juzhe-Zhong Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com, rguenther@suse.de, pinskia@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Sep 26, 2023 at 9:13=E2=80=AFAM Juzhe-Zhong = wrote: > > > This patch leverage this commit: https://gcc.gnu.org/git/gitweb.cgi?p=3Dg= cc.git;h=3D62b505a4d5fc89 > to optimize COND_LEN_ADD reduction pattern. > > We are doing optimization of VEC_COND_EXPR + COND_LEN_ADD -> COND_LEN_ADD= . > > Consider thsi following case: > > #include > > void > pr11594 (uint64_t *restrict a, uint64_t *restrict b, int loop_size) > { > uint64_t result =3D 0; > > for (int i =3D 0; i < loop_size; i++) > { > if (b[i] <=3D a[i]) > { > result +=3D a[i]; > } > } > > a[0] =3D result; > } > > Before this patch: > vsetvli a7,zero,e64,m1,ta,ma > vmv.v.i v2,0 > vmv1r.v v3,v2 --- redundant > .L3: > vsetvli a5,a2,e64,m1,ta,ma > vle64.v v1,0(a3) > vle64.v v0,0(a1) > slli a6,a5,3 > vsetvli a7,zero,e64,m1,ta,ma > sub a2,a2,a5 > vmsleu.vv v0,v0,v1 > add a1,a1,a6 > vmerge.vvm v1,v3,v1,v0 ---- redundant. > add a3,a3,a6 > vsetvli zero,a5,e64,m1,tu,ma > vadd.vv v2,v2,v1 > bne a2,zero,.L3 > li a5,0 > vsetvli a4,zero,e64,m1,ta,ma > vmv.s.x v1,a5 > vredsum.vs v2,v2,v1 > vmv.x.s a5,v2 > sd a5,0(a0) > ret > > After this patch: > > vsetvli a6,zero,e64,m1,ta,ma > vmv.v.i v1,0 > .L3: > vsetvli a5,a2,e64,m1,ta,ma > vle64.v v2,0(a4) > vle64.v v0,0(a1) > slli a3,a5,3 > vsetvli a6,zero,e64,m1,ta,ma > sub a2,a2,a5 > vmsleu.vv v0,v0,v2 > add a1,a1,a3 > vsetvli zero,a5,e64,m1,tu,mu > add a4,a4,a3 > vadd.vv v1,v1,v2,v0.t > bne a2,zero,.L3 > li a5,0 > vsetivli zero,1,e64,m1,ta,ma > vmv.s.x v2,a5 > vsetvli a5,zero,e64,m1,ta,ma > vredsum.vs v1,v1,v2 > vmv.x.s a5,v1 > sd a5,0(a0) > ret > > Bootstrap && Regression is running. > > Ok for trunk when testing passes ? > > PR tree-optimization/111594 > PR tree-optimization/110660 > > gcc/ChangeLog: > > * match.pd: Optimize COND_LEN_ADD reduction. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111594.c: New test. > > --- > gcc/match.pd | 13 +++++++++ > .../riscv/rvv/autovec/cond/cond_reduc-1.c | 29 +++++++++++++++++++ > .../riscv/rvv/autovec/cond/pr111594.c | 22 ++++++++++++++ > 3 files changed, 64 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_= reduc-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111= 594.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index a17778fbaa6..af8d12c138e 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -8866,6 +8866,19 @@ and, > (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1) > (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1)) > > +/* Detect simplication for a conditional length reduction where > + > + a =3D mask ? b : 0 > + c =3D i < len + bias ? d + a : d > + > + is turned into > + > + c =3D mask && i < len ? d + b : d. */ > +(simplify > + (IFN_COND_LEN_ADD integer_minus_onep @0 (vec_cond @1 @2 zerop) @0 @3 @= 4) I think you want intger_truep instead of integer_minus_onep for readability. Since you use zerop here can you also adjust the preceeding pattern? > + (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)) it might be better to check ANY_INTEGRAL_TYPE_P (type) || fold_real_zero_addition_p (type, NULL_TREE, @5, 0) your change misses HONOR_SIGN_DEPENDENT_ROUNDING I think. > + (IFN_COND_LEN_ADD @1 @0 @2 @0 @3 @4))) > + > /* For pointers @0 and @2 and nonnegative constant offset @1, look for > expressions like: > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_reduc-1= .c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c > new file mode 100644 > index 00000000000..db6f9d1ec6c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_reduc-1.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-march=3Drv64gcv_zvfh -mabi=3Dlp64d -fno-vec= t-cost-model -ffast-math -fdump-tree-optimized" } */ > + > +#include > + > +#define COND_REDUCTION(TYPE) = \ > + TYPE foo##TYPE (TYPE *restrict a, TYPE *restrict b, int loop_size) = \ > + { = \ > + TYPE result =3D 0; = \ > + for (int i =3D 0; i < loop_size; i++) = \ > + if (b[i] <=3D a[i]) = \ > + result +=3D a[i]; = \ > + return result; = \ > + } > + > +COND_REDUCTION (int8_t) > +COND_REDUCTION (int16_t) > +COND_REDUCTION (int32_t) > +COND_REDUCTION (int64_t) > +COND_REDUCTION (uint8_t) > +COND_REDUCTION (uint16_t) > +COND_REDUCTION (uint32_t) > +COND_REDUCTION (uint64_t) > +COND_REDUCTION (_Float16) > +COND_REDUCTION (float) > +COND_REDUCTION (double) > + > +/* { dg-final { scan-tree-dump-not "VCOND_MASK" "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "COND_LEN_ADD" 11 "optimized" } } *= / > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111594.c b= /gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111594.c > new file mode 100644 > index 00000000000..6d81b26fbd0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111594.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-march=3Drv64gcv -mabi=3Dlp64d -fno-vect-cos= t-model -ffast-math" } */ > + > +#include > + > +void > +pr11594 (uint64_t *restrict a, uint64_t *restrict b, int loop_size) > +{ > + uint64_t result =3D 0; > + > + for (int i =3D 0; i < loop_size; i++) > + { > + if (b[i] <=3D a[i]) > + { > + result +=3D a[i]; > + } > + } > + > + a[0] =3D result; > +} > + > +/* { dg-final { scan-assembler-not {vmerge} } } */ > -- > 2.36.3 >