From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id CB0523858005 for ; Fri, 15 Oct 2021 10:57:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CB0523858005 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id D61B2218E0; Fri, 15 Oct 2021 10:57:45 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id B2F34A3B8D; Fri, 15 Oct 2021 10:57:45 +0000 (UTC) Date: Fri, 15 Oct 2021 12:57:45 +0200 (CEST) From: Richard Biener To: Richard Earnshaw cc: Tamar Christina , nd , "gcc-patches@gcc.gnu.org" , Richard Earnshaw Subject: Re: [PATCH]middle-end convert negate + right shift into compare greater. In-Reply-To: <5ff934e8-96e4-ba3b-0ae6-178d4df3c593@arm.com> Message-ID: <78p1518r-9q15-20or-2n7r-23s3n9748or@fhfr.qr> References: <44765ccd-d2ce-7bff-2fe5-a6ddc49cfd56@foss.arm.com> <3rqo9r7o-9584-4n72-n252-o47ro31qn31@fhfr.qr> <5ff934e8-96e4-ba3b-0ae6-178d4df3c593@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Oct 2021 10:57:48 -0000 On Fri, 15 Oct 2021, Richard Earnshaw wrote: > On 15/10/2021 10:06, Richard Biener via Gcc-patches wrote: > > On Fri, 15 Oct 2021, Tamar Christina wrote: > > > >>>> > >>>> +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1. */ (for > >>>> +cst (INTEGER_CST VECTOR_CST) (simplify > >>>> + (rshift (negate:s @0) cst@1) > >>>> + (if (!flag_wrapv) > >>> > >>> Don't test flag_wrapv directly, instead use the appropriate > >>> TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates. But I'm not sure what > >>> we are protecting against? Right-shift of signed integers is > >>> implementation- > >>> defined and GCC treats it as you'd expect, sign-extending the result. > >>> > >> > >> It's protecting against the overflow of the negate on INT_MIN. When > >> wrapping > >> overflows are enabled the results would be wrong. > > > > But -INT_MIN == INT_MIN in twos-complement so I fail to see the wrong > > result? That is, both -INT_MIN >> 31 and INT_MIN >> 31 are -1. > > Exactly, so transforming the original testcase from (x = -a >> 31) into (x = > -(a > 0)) is not valid in that case. Hmm, but we're not doing that. Actually, we inconsistently handle the scalar and the vector variant here - maybe the (negate ..) is missing around the (gt @0 { ...}) of the scalar case. Btw, I would appreciate testcases for the cases that would go wrong, indeed INT_MIN would be handled wrong. Richard. > R. > > > > >>>> + (with { tree ctype = TREE_TYPE (@0); > >>>> + tree stype = TREE_TYPE (@1); > >>>> + tree bt = truth_type_for (ctype); } > >>>> + (switch > >>>> + /* Handle scalar case. */ > >>>> + (if (INTEGRAL_TYPE_P (ctype) > >>>> + && !VECTOR_TYPE_P (ctype) > >>>> + && !TYPE_UNSIGNED (ctype) > >>>> + && canonicalize_math_after_vectorization_p () > >>>> + && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1)) > >>>> + (convert:bt (gt:bt @0 { build_zero_cst (stype); }))) > >>> > >>> I'm not sure why the result is of type 'bt' rather than the original type > >>> of the > >>> expression? > >> > >> That was to satisfy some RTL check that expected results of comparisons to > >> always > >> be a Boolean, though for scalar that logically always is the case, I just > >> added it > >> for consistency. > >> > >>> > >>> In that regard for non-vectors we'd have to add the sign extension from > >>> unsigned bool, in the vector case we'd hope the type of the comparison is > >>> correct. I think in both cases it might be convenient to use > >>> > >>> (cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst > >>> (ctype); } > >>> { build_zero_cost (ctype); }) > >>> > >>> to compute the correct result and rely on (cond ..) simplifications to > >>> simplify > >>> that if possible. > >>> > >>> Btw, 'stype' should be irrelevant - you need to look at the precision of > >>> 'ctype', > >>> no? > >> > >> I was working under the assumption that both input types must have the same > >> precision, but turns out that assumption doesn't need to hold. > >> > >> New version attached. > >> > >> Bootstrapped Regtested on aarch64-none-linux-gnu, > >> x86_64-pc-linux-gnu and no regressions. > >> > >> Ok for master? > >> > >> Thanks, > >> Tamar > >> > >> gcc/ChangeLog: > >> > >> * match.pd: New negate+shift pattern. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/signbit-2.c: New test. > >> * gcc.dg/signbit-3.c: New test. > >> * gcc.target/aarch64/signbit-1.c: New test. > >> > >> --- inline copy of patch --- > >> > >> diff --git a/gcc/match.pd b/gcc/match.pd > >> index > >> 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..9532cae582e152cae6e22fcce95a9744a844e3c2 > >> 100644 > >> --- a/gcc/match.pd > >> +++ b/gcc/match.pd > >> @@ -38,7 +38,8 @@ along with GCC; see the file COPYING3. If not see > >> uniform_integer_cst_p > >> HONOR_NANS > >> uniform_vector_p > >> - bitmask_inv_cst_vector_p) > >> + bitmask_inv_cst_vector_p > >> + expand_vec_cmp_expr_p) > >> > >> /* Operator lists. */ > >> (define_operator_list tcc_comparison > >> @@ -826,6 +827,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> { tree utype = unsigned_type_for (type); } > >> (convert (rshift (lshift (convert:utype @0) @2) @3)))))) > >> +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1. */ > >> +(for cst (INTEGER_CST VECTOR_CST) > >> + (simplify > >> + (rshift (negate:s @0) cst@1) > >> + (if (!TYPE_OVERFLOW_WRAPS (type)) > > > > as said, I don't think that's necessary but at least it's now > > written correctly ;) > > > >> + (with { tree ctype = TREE_TYPE (@0); > > > > Instead of 'ctype' you can use 'type' since the type of the expression > > is the same as that of @0 > > > >> + tree stype = TREE_TYPE (@1); > >> + tree bt = truth_type_for (ctype); > >> + tree zeros = build_zero_cst (ctype); } > >> + (switch > >> + /* Handle scalar case. */ > >> + (if (INTEGRAL_TYPE_P (ctype) > >> + && !VECTOR_TYPE_P (ctype) > > > > INTEGRAL_TYPE_P does not include VECTOR_TYPE_P. > > > >> + && !TYPE_UNSIGNED (ctype) > >> + && canonicalize_math_after_vectorization_p () > >> + && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (ctype) - 1)) > >> + (cond (gt:bt @0 { zeros; }) { build_all_ones_cst (ctype); } { > >> zeros; })) > >> + /* Handle vector case with a scalar immediate. */ > >> + (if (VECTOR_INTEGER_TYPE_P (ctype) > >> + && !VECTOR_TYPE_P (stype) > >> + && !TYPE_UNSIGNED (ctype) > >> + && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR })) > >> + (with { HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE > >> (ctype)); } > >> + (if (wi::eq_p (wi::to_wide (@1), bits - 1)) > > > > You can use element_precision (@0) - 1 in both the scalar and vector case. > > > >> + (convert:bt (gt:bt @0 { zeros; }))))) > >> + /* Handle vector case with a vector immediate. */ > >> + (if (VECTOR_INTEGER_TYPE_P (ctype) > >> + && VECTOR_TYPE_P (stype) > >> + && !TYPE_UNSIGNED (ctype) > >> + && uniform_vector_p (@1) > >> + && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR })) > >> + (with { tree cst = vector_cst_elt (@1, 0); > >> + HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (ctype)); > >> } > >> + (if (wi::eq_p (wi::to_wide (cst), bits - 1)) > >> + (convert:bt (gt:bt @0 { zeros; })))))))))) > > > > The (convert:bt are not necessary. Note that 'ctype' is not a valid > > mask type but we eventually will be forgiving enough here. You are > > computing thruth_type_for, asking for gt:bt but checking > > expand_vec_cmp_expr_p (ctype, ctype, ...), that's inconsistent and > > will lead to problems on targets with AVX512VL + AVX2 which will > > have a AVX2 compare of say V4SImode producing a V4SImode mask > > but thruth_type_for will give you a QImode vector type. > > > > As said the most general transform would be to > > > > (cond (gt:bt @0 { zeros; }) { build_zero_cst (ctype); } { > > build_all_ones_cst (ctype); }) > > > > if you think that's not better than the negate + shift but only > > when the compare will directly produce the desired result then > > I think you have to either drop the truth_type_for and manually > > build the truthtype or check that the mode of the truth type > > and the value type match. And you need to produce > > > > (view_convert:ctype (gt:bt @0 { zeros; })) > > > > then. > > > >> + > >> /* Fold (C1/X)*C2 into (C1*C2)/X. */ > >> (simplify > >> (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) > >> diff --git a/gcc/testsuite/gcc.dg/signbit-2.c > >> b/gcc/testsuite/gcc.dg/signbit-2.c > >> new file mode 100644 > >> index > >> 0000000000000000000000000000000000000000..fc0157cbc5c7996b481f2998bc30176c96a669bb > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/signbit-2.c > >> @@ -0,0 +1,19 @@ > >> +/* { dg-do assemble } */ > >> +/* { dg-options "-O3 --save-temps -fdump-tree-optimized" } */ > >> + > >> +#include > >> + > >> +void fun1(int32_t *x, int n) > >> +{ > >> + for (int i = 0; i < (n & -16); i++) > >> + x[i] = (-x[i]) >> 31; > >> +} > >> + > >> +void fun2(int32_t *x, int n) > >> +{ > >> + for (int i = 0; i < (n & -16); i++) > >> + x[i] = (-x[i]) >> 30; > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times {\s+>\s+\{ 0, 0, 0, 0 \}} 1 optimized > >> } } */ > >> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ > >> diff --git a/gcc/testsuite/gcc.dg/signbit-3.c > >> b/gcc/testsuite/gcc.dg/signbit-3.c > >> new file mode 100644 > >> index > >> 0000000000000000000000000000000000000000..19e9c06c349b3287610f817628f00938ece60bf7 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/signbit-3.c > >> @@ -0,0 +1,13 @@ > >> +/* { dg-do assemble } */ > >> +/* { dg-options "-O1 --save-temps -fdump-tree-optimized" } */ > >> + > >> +#include > >> + > >> +void fun1(int32_t *x, int n) > >> +{ > >> + for (int i = 0; i < (n & -16); i++) > >> + x[i] = (-x[i]) >> 31; > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times {\s+>\s+0;} 1 optimized } } */ > >> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ > >> diff --git a/gcc/testsuite/gcc.target/aarch64/signbit-1.c > >> b/gcc/testsuite/gcc.target/aarch64/signbit-1.c > >> new file mode 100644 > >> index > >> 0000000000000000000000000000000000000000..3ebfb0586f37de29cf58635b27fe48503714447e > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/aarch64/signbit-1.c > >> @@ -0,0 +1,18 @@ > >> +/* { dg-do assemble } */ > >> +/* { dg-options "-O3 --save-temps" } */ > >> + > >> +#include > >> + > >> +void fun1(int32_t *x, int n) > >> +{ > >> + for (int i = 0; i < (n & -16); i++) > >> + x[i] = (-x[i]) >> 31; > >> +} > >> + > >> +void fun2(int32_t *x, int n) > >> +{ > >> + for (int i = 0; i < (n & -16); i++) > >> + x[i] = (-x[i]) >> 30; > >> +} > >> + > >> +/* { dg-final { scan-assembler-times {\tcmgt\t} 1 } } */ > >> > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)