From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id DED133858D28 for ; Mon, 31 Jul 2023 21:47:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DED133858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-307d58b3efbso4443605f8f.0 for ; Mon, 31 Jul 2023 14:47:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1690840024; x=1691444824; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mSHddmI6jFk7vJ9ug0w467/TnW2F0/SqsrPq5KvLknA=; b=ML5AVa+TT1OK6VXBnkphx12ZUpV9d9MX2GaRVAs+NKGSK1IBDBoUcgaseOr17ahpmK QX5BWJwzFB0CNyk16/LSphk8QftVV9ax1h9Zct0vee7HRt09TvY2BZzIL8hOkMT7lKaA yd0/p9tAUcKjerhPuoKi+voUWdihJML5vpqyftUorv40Hm+GDUjksoCpxJl/DyLBxXJX fJUu0pB2EkfGo+0Rt30tGR2Ruy+sjpXR23UBWUkUDLlnEPAKKmwSYgmQk4dRG6JXP7kf t7Mb3D46DoLwBTW4UVuBNTjXB/EVEiq2YYrK5AK2gTsZ73YWhFkPevnpmCdXCGwoEzag DyRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690840024; x=1691444824; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mSHddmI6jFk7vJ9ug0w467/TnW2F0/SqsrPq5KvLknA=; b=d+eAMJ6pmzLOydCM+1AzGIztZgx/v1+N4kXermcP0NoF9t4gapaM9MeUxlZ+1HFQHq 8dQEO5ctW4fSJLQq9IujG5LNdq9t6YN4VcW9OJX6Z1F7soktbgwDXMejPK2OA8btgFLu LzBFaTkQAwSJeiLCb/TwATwL4u6d4MUbpz+sqTvhVacA7zvkugm6y5kRBEUC/hHTAD6E gCgrjjuSkC0cg+Bnm36tG/rFCjR2zJCOC+Fut6YHJfD7RO8RvcJslp4BlszC1zbBR/x2 pNmYksK+lgEpjpkoqVm8T0Jp6RifWJyq4nIc3WeduYzDnZ2YQk6Ydv2gGwad/mEveDAb VL1Q== X-Gm-Message-State: ABy/qLb5Uwi+bsAgoJZoL8TXpC/Qs0kEbdDgu1JJYNAAXuZiIeiBz9Se AbRW19/M6qfoca3p8uSvx0wvzApaUzPxe/NV/4ynYw== X-Google-Smtp-Source: APBJJlEplw2eWcj0orkwGZuCfBn050C+ayYCg6t/lff2GweDn4yoHs6sU+ZUTv8XJjBXF2csdEOf9jL/hq/VtBdgoJI= X-Received: by 2002:adf:cf07:0:b0:317:2574:c2b1 with SMTP id o7-20020adfcf07000000b003172574c2b1mr707584wrj.30.1690840023619; Mon, 31 Jul 2023 14:47:03 -0700 (PDT) MIME-Version: 1.0 References: <20230619142356.345159-1-stefansf@linux.ibm.com> In-Reply-To: From: Prathamesh Kulkarni Date: Tue, 1 Aug 2023 03:16:27 +0530 Message-ID: Subject: Re: [PATCH v2] combine: Narrow comparison of memory and constant To: Stefan Schulze Frielinghaus Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 1 Aug 2023 at 03:13, Prathamesh Kulkarni wrote: > > On Mon, 19 Jun 2023 at 19:59, Stefan Schulze Frielinghaus via > Gcc-patches wrote: > > > > Comparisons between memory and constants might be done in a smaller mod= e > > resulting in smaller constants which might finally end up as immediates > > instead of in the literal pool. > > > > For example, on s390x a non-symmetric comparison like > > x <=3D 0x3fffffffffffffff > > results in the constant being spilled to the literal pool and an 8 byte > > memory comparison is emitted. Ideally, an equivalent comparison > > x0 <=3D 0x3f > > where x0 is the most significant byte of x, is emitted where the > > constant is smaller and more likely to materialize as an immediate. > > > > Similarly, comparisons of the form > > x >=3D 0x4000000000000000 > > can be shortened into x0 >=3D 0x40. > > > > Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le. > > Note, the new tests show that for the mentioned little-endian targets > > the optimization does not materialize since either the costs of the new > > instructions are higher or they do not match. Still ok for mainline? > Hi Stefan, > Unfortunately this patch (committed in 7cdd0860949c6c3232e6cff1d7ca37bb52= 34074c) > caused the following ICE on armv8l-unknown-linux-gnu: Sorry I meant armv8l-unknown-linux-gnueabihf. > during RTL pass: combine > ../../../gcc/libgcc/fixed-bit.c: In function =E2=80=98__gnu_saturate1sq= =E2=80=99: > ../../../gcc/libgcc/fixed-bit.c:210:1: internal compiler error: in > decompose, at rtl.h:2297 > 210 | } > | ^ > 0xaa23e3 wi::int_traits > >::decompose(long long*, unsigned int, std::pair machine_mode> const&) > ../../gcc/gcc/rtl.h:2297 > 0xaf5ab3 wide_int_ref_storage true>::wide_int_ref_storage > >(std::pair const&) > ../../gcc/gcc/wide-int.h:1030 > 0xaf5023 generic_wide_int > >::generic_wide_int > >(std::pair const&) > ../../gcc/gcc/wide-int.h:788 > 0xf916f9 simplify_const_unary_operation(rtx_code, machine_mode, > rtx_def*, machine_mode) > ../../gcc/gcc/simplify-rtx.cc:2131 > 0xf8bad5 simplify_context::simplify_unary_operation(rtx_code, > machine_mode, rtx_def*, machine_mode) > ../../gcc/gcc/simplify-rtx.cc:889 > 0xf8a591 simplify_context::simplify_gen_unary(rtx_code, machine_mode, > rtx_def*, machine_mode) > ../../gcc/gcc/simplify-rtx.cc:360 > 0x9bd1b7 simplify_gen_unary(rtx_code, machine_mode, rtx_def*, machine_mod= e) > ../../gcc/gcc/rtl.h:3520 > 0x1bd5677 simplify_comparison > ../../gcc/gcc/combine.cc:13125 > 0x1bc2b2b simplify_set > ../../gcc/gcc/combine.cc:6848 > 0x1bc1647 combine_simplify_rtx > ../../gcc/gcc/combine.cc:6353 > 0x1bbf97f subst > ../../gcc/gcc/combine.cc:5609 > 0x1bb864b try_combine > ../../gcc/gcc/combine.cc:3302 > 0x1bb30fb combine_instructions > ../../gcc/gcc/combine.cc:1264 > 0x1bd8d25 rest_of_handle_combine > ../../gcc/gcc/combine.cc:15059 > 0x1bd8dd5 execute > ../../gcc/gcc/combine.cc:15103 > Please submit a full bug report, with preprocessed source (by using > -freport-bug). > Please include the complete backtrace with any bug report. > See for instructions. > > Could you please take a look ? > > Thanks, > Prathamesh > > > > gcc/ChangeLog: > > > > * combine.cc (simplify_compare_const): Narrow comparison of > > memory and constant. > > (try_combine): Adapt new function signature. > > (simplify_comparison): Adapt new function signature. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/cmp-mem-const-1.c: New test. > > * gcc.dg/cmp-mem-const-2.c: New test. > > * gcc.dg/cmp-mem-const-3.c: New test. > > * gcc.dg/cmp-mem-const-4.c: New test. > > * gcc.dg/cmp-mem-const-5.c: New test. > > * gcc.dg/cmp-mem-const-6.c: New test. > > * gcc.target/s390/cmp-mem-const-1.c: New test. > > --- > > gcc/combine.cc | 79 +++++++++++++++++-- > > gcc/testsuite/gcc.dg/cmp-mem-const-1.c | 17 ++++ > > gcc/testsuite/gcc.dg/cmp-mem-const-2.c | 17 ++++ > > gcc/testsuite/gcc.dg/cmp-mem-const-3.c | 17 ++++ > > gcc/testsuite/gcc.dg/cmp-mem-const-4.c | 17 ++++ > > gcc/testsuite/gcc.dg/cmp-mem-const-5.c | 17 ++++ > > gcc/testsuite/gcc.dg/cmp-mem-const-6.c | 17 ++++ > > .../gcc.target/s390/cmp-mem-const-1.c | 24 ++++++ > > 8 files changed, 200 insertions(+), 5 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-1.c > > create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-2.c > > create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-3.c > > create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-4.c > > create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-5.c > > create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-6.c > > create mode 100644 gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c > > > > diff --git a/gcc/combine.cc b/gcc/combine.cc > > index 5aa0ec5c45a..56e15a93409 100644 > > --- a/gcc/combine.cc > > +++ b/gcc/combine.cc > > @@ -460,7 +460,7 @@ static rtx simplify_shift_const (rtx, enum rtx_code= , machine_mode, rtx, > > static int recog_for_combine (rtx *, rtx_insn *, rtx *); > > static rtx gen_lowpart_for_combine (machine_mode, rtx); > > static enum rtx_code simplify_compare_const (enum rtx_code, machine_mo= de, > > - rtx, rtx *); > > + rtx *, rtx *); > > static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *)= ; > > static void update_table_tick (rtx); > > static void record_value_for_reg (rtx, rtx_insn *, rtx); > > @@ -3185,7 +3185,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn= *i1, rtx_insn *i0, > > compare_code =3D orig_compare_code =3D GET_CODE (*cc_use_loc)= ; > > if (is_a (GET_MODE (i2dest), &mode)) > > compare_code =3D simplify_compare_const (compare_code, mode= , > > - op0, &op1); > > + &op0, &op1); > > target_canonicalize_comparison (&compare_code, &op0, &op1, 1)= ; > > } > > > > @@ -11796,13 +11796,14 @@ gen_lowpart_for_combine (machine_mode omode, = rtx x) > > (CODE OP0 const0_rtx) form. > > > > The result is a possibly different comparison code to use. > > - *POP1 may be updated. */ > > + *POP0 and *POP1 may be updated. */ > > > > static enum rtx_code > > simplify_compare_const (enum rtx_code code, machine_mode mode, > > - rtx op0, rtx *pop1) > > + rtx *pop0, rtx *pop1) > > { > > scalar_int_mode int_mode; > > + rtx op0 =3D *pop0; > > HOST_WIDE_INT const_op =3D INTVAL (*pop1); > > > > /* Get the constant we are comparing against and turn off all bits > > @@ -11987,6 +11988,74 @@ simplify_compare_const (enum rtx_code code, ma= chine_mode mode, > > break; > > } > > > > + /* Narrow non-symmetric comparison of memory and constant as e.g. > > + x0...x7 <=3D 0x3fffffffffffffff into x0 <=3D 0x3f where x0 is the= most > > + significant byte. Likewise, transform x0...x7 >=3D 0x40000000000= 00000 into > > + x0 >=3D 0x40. */ > > + if ((code =3D=3D LEU || code =3D=3D LTU || code =3D=3D GEU || code = =3D=3D GTU) > > + && is_a (GET_MODE (op0), &int_mode) > > + && MEM_P (op0) > > + && !MEM_VOLATILE_P (op0) > > + /* The optimization makes only sense for constants which are big= enough > > + so that we have a chance to chop off something at all. */ > > + && (unsigned HOST_WIDE_INT) const_op > 0xff > > + /* Ensure that we do not overflow during normalization. */ > > + && (code !=3D GTU || (unsigned HOST_WIDE_INT) const_op < HOST_WI= DE_INT_M1U)) > > + { > > + unsigned HOST_WIDE_INT n =3D (unsigned HOST_WIDE_INT) const_op; > > + enum rtx_code adjusted_code; > > + > > + /* Normalize code to either LEU or GEU. */ > > + if (code =3D=3D LTU) > > + { > > + --n; > > + adjusted_code =3D LEU; > > + } > > + else if (code =3D=3D GTU) > > + { > > + ++n; > > + adjusted_code =3D GEU; > > + } > > + else > > + adjusted_code =3D code; > > + > > + scalar_int_mode narrow_mode_iter; > > + FOR_EACH_MODE_UNTIL (narrow_mode_iter, int_mode) > > + { > > + unsigned nbits =3D GET_MODE_PRECISION (int_mode) > > + - GET_MODE_PRECISION (narrow_mode_iter); > > + unsigned HOST_WIDE_INT mask =3D (HOST_WIDE_INT_1U << nbits) -= 1; > > + unsigned HOST_WIDE_INT lower_bits =3D n & mask; > > + if ((adjusted_code =3D=3D LEU && lower_bits =3D=3D mask) > > + || (adjusted_code =3D=3D GEU && lower_bits =3D=3D 0)) > > + { > > + n >>=3D nbits; > > + break; > > + } > > + } > > + > > + if (narrow_mode_iter < int_mode) > > + { > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + { > > + fprintf ( > > + dump_file, "narrow comparison from mode %s to %s: (MEM = %s " > > + HOST_WIDE_INT_PRINT_HEX ") to (MEM %s " > > + HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode= ), > > + GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code), > > + (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME (adjuste= d_code), > > + n); > > + } > > + poly_int64 offset =3D (BYTES_BIG_ENDIAN > > + ? 0 > > + : (GET_MODE_SIZE (int_mode) > > + - GET_MODE_SIZE (narrow_mode_iter))); > > + *pop0 =3D adjust_address_nv (op0, narrow_mode_iter, offset); > > + *pop1 =3D GEN_INT (n); > > + return adjusted_code; > > + } > > + } > > + > > *pop1 =3D GEN_INT (const_op); > > return code; > > } > > @@ -12179,7 +12248,7 @@ simplify_comparison (enum rtx_code code, rtx *p= op0, rtx *pop1) > > > > /* Try to simplify the compare to constant, possibly changing th= e > > comparison op, and/or changing op1 to zero. */ > > - code =3D simplify_compare_const (code, raw_mode, op0, &op1); > > + code =3D simplify_compare_const (code, raw_mode, &op0, &op1); > > const_op =3D INTVAL (op1); > > > > /* Compute some predicates to simplify code below. */ > > diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-1.c b/gcc/testsuite/gcc= .dg/cmp-mem-const-1.c > > new file mode 100644 > > index 00000000000..263ad98af79 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/cmp-mem-const-1.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile { target { lp64 } } } */ > > +/* { dg-options "-O1 -fdump-rtl-combine-details" } */ > > +/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to QI" "= combine" } } */ > > + > > +typedef __UINT64_TYPE__ uint64_t; > > + > > +int > > +le_1byte_a (uint64_t *x) > > +{ > > + return *x <=3D 0x3fffffffffffffff; > > +} > > + > > +int > > +le_1byte_b (uint64_t *x) > > +{ > > + return *x < 0x4000000000000000; > > +} > > diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-2.c b/gcc/testsuite/gcc= .dg/cmp-mem-const-2.c > > new file mode 100644 > > index 00000000000..a7cc5348295 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/cmp-mem-const-2.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile { target { lp64 } } } */ > > +/* { dg-options "-O1 -fdump-rtl-combine-details" } */ > > +/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to QI" "= combine" } } */ > > + > > +typedef __UINT64_TYPE__ uint64_t; > > + > > +int > > +ge_1byte_a (uint64_t *x) > > +{ > > + return *x > 0x3fffffffffffffff; > > +} > > + > > +int > > +ge_1byte_b (uint64_t *x) > > +{ > > + return *x >=3D 0x4000000000000000; > > +} > > diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-3.c b/gcc/testsuite/gcc= .dg/cmp-mem-const-3.c > > new file mode 100644 > > index 00000000000..06f80bf72d8 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/cmp-mem-const-3.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile { target { lp64 } } } */ > > +/* { dg-options "-O1 -fdump-rtl-combine-details" } */ > > +/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to HI" "= combine" } } */ > > + > > +typedef __UINT64_TYPE__ uint64_t; > > + > > +int > > +le_2bytes_a (uint64_t *x) > > +{ > > + return *x <=3D 0x3ffdffffffffffff; > > +} > > + > > +int > > +le_2bytes_b (uint64_t *x) > > +{ > > + return *x < 0x3ffe000000000000; > > +} > > diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-4.c b/gcc/testsuite/gcc= .dg/cmp-mem-const-4.c > > new file mode 100644 > > index 00000000000..407999abf7e > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/cmp-mem-const-4.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile { target { lp64 } } } */ > > +/* { dg-options "-O1 -fdump-rtl-combine-details" } */ > > +/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to HI" "= combine" } } */ > > + > > +typedef __UINT64_TYPE__ uint64_t; > > + > > +int > > +ge_2bytes_a (uint64_t *x) > > +{ > > + return *x > 0x400cffffffffffff; > > +} > > + > > +int > > +ge_2bytes_b (uint64_t *x) > > +{ > > + return *x >=3D 0x400d000000000000; > > +} > > diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-5.c b/gcc/testsuite/gcc= .dg/cmp-mem-const-5.c > > new file mode 100644 > > index 00000000000..e16773f5bcf > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/cmp-mem-const-5.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile { target { lp64 } } } */ > > +/* { dg-options "-O1 -fdump-rtl-combine-details" } */ > > +/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to SI" "= combine" } } */ > > + > > +typedef __UINT64_TYPE__ uint64_t; > > + > > +int > > +le_4bytes_a (uint64_t *x) > > +{ > > + return *x <=3D 0x3ffffdffffffffff; > > +} > > + > > +int > > +le_4bytes_b (uint64_t *x) > > +{ > > + return *x < 0x3ffffe0000000000; > > +} > > diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-6.c b/gcc/testsuite/gcc= .dg/cmp-mem-const-6.c > > new file mode 100644 > > index 00000000000..8f53b5678bd > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/cmp-mem-const-6.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile { target { lp64 } } } */ > > +/* { dg-options "-O1 -fdump-rtl-combine-details" } */ > > +/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to SI" "= combine" } } */ > > + > > +typedef __UINT64_TYPE__ uint64_t; > > + > > +int > > +ge_4bytes_a (uint64_t *x) > > +{ > > + return *x > 0x4000cfffffffffff; > > +} > > + > > +int > > +ge_4bytes_b (uint64_t *x) > > +{ > > + return *x >=3D 0x4000d00000000000; > > +} > > diff --git a/gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c b/gcc/test= suite/gcc.target/s390/cmp-mem-const-1.c > > new file mode 100644 > > index 00000000000..309aafbec01 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c > > @@ -0,0 +1,24 @@ > > +/* { dg-do compile { target { lp64 } } } */ > > +/* { dg-options "-O1 -march=3Dz13 -mzarch -fdump-rtl-combine-details" = } */ > > +/* { dg-final { scan-assembler-not {\tclc\t} } } */ > > +/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to QI" "= combine" } } */ > > + > > +struct s > > +{ > > + long a; > > + unsigned b : 1; > > + unsigned c : 1; > > +}; > > + > > +int foo (struct s *x) > > +{ > > + /* Expression > > + x->b || x->c > > + is transformed into > > + _1 =3D BIT_FIELD_REF <*x_4(D), 64, 64>; > > + _2 =3D _1 > 0x3FFFFFFFFFFFFFFF; > > + where the constant may materialize in the literal pool and an 8 b= yte CLC > > + may be emitted. Ensure this is not the case. > > + */ > > + return x->b || x->c; > > +} > > -- > > 2.39.2 > >