From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id CB0523858005
 for <gcc-patches@gcc.gnu.org>; Fri, 15 Oct 2021 10:57:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CB0523858005
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out1.suse.de (Postfix) with ESMTP id D61B2218E0;
 Fri, 15 Oct 2021 10:57:45 +0000 (UTC)
Received: from murzim.suse.de (murzim.suse.de [10.160.4.192])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id B2F34A3B8D;
 Fri, 15 Oct 2021 10:57:45 +0000 (UTC)
Date: Fri, 15 Oct 2021 12:57:45 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Richard Earnshaw <Richard.Earnshaw@arm.com>
cc: Tamar Christina <Tamar.Christina@arm.com>, nd <nd@arm.com>, 
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, 
 Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
Subject: Re: [PATCH]middle-end convert negate + right shift into compare
 greater.
In-Reply-To: <5ff934e8-96e4-ba3b-0ae6-178d4df3c593@arm.com>
Message-ID: <78p1518r-9q15-20or-2n7r-23s3n9748or@fhfr.qr>
References: <patch-14918-tamar@arm.com>
 <ad5c6ab4-5397-aeb2-1b38-aa12c42ae904@foss.arm.com>
 <AM0PR08MB53162D17ECAEA0225E19B84BFFAF9@AM0PR08MB5316.eurprd08.prod.outlook.com>
 <e6c816e6-501c-6173-9f64-7b73c3b3853c@foss.arm.com>
 <AM0PR08MB5316A0743F080EE148D1F62FFFAF9@AM0PR08MB5316.eurprd08.prod.outlook.com>
 <44765ccd-d2ce-7bff-2fe5-a6ddc49cfd56@foss.arm.com>
 <AM0PR08MB53169F2B8CBEFE4E5C5D4062FFAF9@AM0PR08MB5316.eurprd08.prod.outlook.com>
 <a9e8ab58-4015-0f88-e4f6-70cce9d78c66@foss.arm.com>
 <VI1PR08MB53250021200C3D41ABB08536FFB59@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <3rqo9r7o-9584-4n72-n252-o47ro31qn31@fhfr.qr>
 <VI1PR08MB532555ED0B6F6B80740F46C0FFB99@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <qpp49220-145r-65p9-2o51-1415n72486op@fhfr.qr>
 <5ff934e8-96e4-ba3b-0ae6-178d4df3c593@arm.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH,
 KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Oct 2021 10:57:48 -0000

On Fri, 15 Oct 2021, Richard Earnshaw wrote:

> On 15/10/2021 10:06, Richard Biener via Gcc-patches wrote:
> > On Fri, 15 Oct 2021, Tamar Christina wrote:
> > 
> >>>>
> >>>> +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */ (for
> >>>> +cst (INTEGER_CST VECTOR_CST)  (simplify
> >>>> +  (rshift (negate:s @0) cst@1)
> >>>> +   (if (!flag_wrapv)
> >>>
> >>> Don't test flag_wrapv directly, instead use the appropriate
> >>> TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates.  But I'm not sure what
> >>> we are protecting against?  Right-shift of signed integers is
> >>> implementation-
> >>> defined and GCC treats it as you'd expect, sign-extending the result.
> >>>
> >>
> >> It's protecting against the overflow of the negate on INT_MIN. When
> >> wrapping
> >> overflows are enabled the results would be wrong.
> > 
> > But -INT_MIN == INT_MIN in twos-complement so I fail to see the wrong
> > result?  That is, both -INT_MIN >> 31 and INT_MIN >> 31 are -1.
> 
> Exactly, so transforming the original testcase from (x = -a >> 31) into (x =
> -(a > 0)) is not valid in that case.

Hmm, but we're not doing that.  Actually, we inconsistently handle
the scalar and the vector variant here - maybe the (negate ..)
is missing around the (gt @0 { ...}) of the scalar case.

Btw, I would appreciate testcases for the cases that would go wrong,
indeed INT_MIN would be handled wrong.

Richard.


> R.
> 
> > 
> >>>> +    (with { tree ctype = TREE_TYPE (@0);
> >>>> +	    tree stype = TREE_TYPE (@1);
> >>>> +	    tree bt = truth_type_for (ctype); }
> >>>> +     (switch
> >>>> +      /* Handle scalar case.  */
> >>>> +      (if (INTEGRAL_TYPE_P (ctype)
> >>>> +	   && !VECTOR_TYPE_P (ctype)
> >>>> +	   && !TYPE_UNSIGNED (ctype)
> >>>> +	   && canonicalize_math_after_vectorization_p ()
> >>>> +	   && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
> >>>> +       (convert:bt (gt:bt @0 { build_zero_cst (stype); })))
> >>>
> >>> I'm not sure why the result is of type 'bt' rather than the original type
> >>> of the
> >>> expression?
> >>
> >> That was to satisfy some RTL check that expected results of comparisons to
> >> always
> >> be a Boolean, though for scalar that logically always is the case, I just
> >> added it
> >> for consistency.
> >>
> >>>
> >>> In that regard for non-vectors we'd have to add the sign extension from
> >>> unsigned bool, in the vector case we'd hope the type of the comparison is
> >>> correct.  I think in both cases it might be convenient to use
> >>>
> >>>    (cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst
> >>> (ctype); }
> >>> { build_zero_cost (ctype); })
> >>>
> >>> to compute the correct result and rely on (cond ..) simplifications to
> >>> simplify
> >>> that if possible.
> >>>
> >>> Btw, 'stype' should be irrelevant - you need to look at the precision of
> >>> 'ctype',
> >>> no?
> >>
> >> I was working under the assumption that both input types must have the same
> >> precision, but turns out that assumption doesn't need to hold.
> >>
> >> New version attached.
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> x86_64-pc-linux-gnu and no regressions.
> >>
> >> Ok for master?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >>  * match.pd: New negate+shift pattern.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/signbit-2.c: New test.
> >>  * gcc.dg/signbit-3.c: New test.
> >>  * gcc.target/aarch64/signbit-1.c: New test.
> >>
> >> --- inline copy of patch ---
> >>
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index
> >> 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..9532cae582e152cae6e22fcce95a9744a844e3c2
> >> 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -38,7 +38,8 @@ along with GCC; see the file COPYING3.  If not see
> >>      uniform_integer_cst_p
> >>      HONOR_NANS
> >>      uniform_vector_p
> >> -   bitmask_inv_cst_vector_p)
> >> +   bitmask_inv_cst_vector_p
> >> +   expand_vec_cmp_expr_p)
> >>   
> >>   /* Operator lists.  */
> >>   (define_operator_list tcc_comparison
> >> @@ -826,6 +827,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>       { tree utype = unsigned_type_for (type); }
> >>       (convert (rshift (lshift (convert:utype @0) @2) @3))))))
> >>   +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */
> >> +(for cst (INTEGER_CST VECTOR_CST)
> >> + (simplify
> >> +  (rshift (negate:s @0) cst@1)
> >> +   (if (!TYPE_OVERFLOW_WRAPS (type))
> > 
> > as said, I don't think that's necessary but at least it's now
> > written correctly ;)
> > 
> >> +    (with { tree ctype = TREE_TYPE (@0);
> > 
> > Instead of 'ctype' you can use 'type' since the type of the expression
> > is the same as that of @0
> > 
> >> +	    tree stype = TREE_TYPE (@1);
> >> +	    tree bt = truth_type_for (ctype);
> >> +	    tree zeros = build_zero_cst (ctype); }
> >> +     (switch
> >> +      /* Handle scalar case.  */
> >> +      (if (INTEGRAL_TYPE_P (ctype)
> >> +	   && !VECTOR_TYPE_P (ctype)
> > 
> > INTEGRAL_TYPE_P does not include VECTOR_TYPE_P.
> > 
> >> +	   && !TYPE_UNSIGNED (ctype)
> >> +	   && canonicalize_math_after_vectorization_p ()
> >> +	   && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (ctype) - 1))
> >> +       (cond (gt:bt @0 { zeros; }) { build_all_ones_cst (ctype); } {
> >> zeros; }))
> >> +      /* Handle vector case with a scalar immediate.  */
> >> +      (if (VECTOR_INTEGER_TYPE_P (ctype)
> >> +	   && !VECTOR_TYPE_P (stype)
> >> +	   && !TYPE_UNSIGNED (ctype)
> >> +	   && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR }))
> >> +       (with { HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE
> >> (ctype)); }
> >> +	(if (wi::eq_p (wi::to_wide (@1), bits - 1))
> > 
> > You can use element_precision (@0) - 1 in both the scalar and vector case.
> > 
> >> +	 (convert:bt (gt:bt @0 { zeros; })))))
> >> +      /* Handle vector case with a vector immediate.   */
> >> +      (if (VECTOR_INTEGER_TYPE_P (ctype)
> >> +	   && VECTOR_TYPE_P (stype)
> >> +	   && !TYPE_UNSIGNED (ctype)
> >> +	   && uniform_vector_p (@1)
> >> +	   && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR }))
> >> +       (with { tree cst = vector_cst_elt (@1, 0);
> >> +	       HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (ctype));
> >> }
> >> +        (if (wi::eq_p (wi::to_wide (cst), bits - 1))
> >> +	 (convert:bt (gt:bt @0 { zeros; }))))))))))
> > 
> > The (convert:bt are not necessary.  Note that 'ctype' is not a valid
> > mask type but we eventually will be forgiving enough here.  You are
> > computing thruth_type_for, asking for gt:bt but checking
> > expand_vec_cmp_expr_p (ctype, ctype, ...), that's inconsistent and
> > will lead to problems on targets with AVX512VL + AVX2 which will
> > have a AVX2 compare of say V4SImode producing a V4SImode mask
> > but thruth_type_for will give you a QImode vector type.
> > 
> > As said the most general transform would be to
> > 
> >   (cond (gt:bt @0 { zeros; }) { build_zero_cst (ctype); } {
> > build_all_ones_cst (ctype); })
> > 
> > if you think that's not better than the negate + shift but only
> > when the compare will directly produce the desired result then
> > I think you have to either drop the truth_type_for and manually
> > build the truthtype or check that the mode of the truth type
> > and the value type match.  And you need to produce
> > 
> >    (view_convert:ctype (gt:bt @0 { zeros; }))
> > 
> > then.
> > 
> >> +
> >>   /* Fold (C1/X)*C2 into (C1*C2)/X.  */
> >>   (simplify
> >>    (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
> >> diff --git a/gcc/testsuite/gcc.dg/signbit-2.c
> >> b/gcc/testsuite/gcc.dg/signbit-2.c
> >> new file mode 100644
> >> index
> >> 0000000000000000000000000000000000000000..fc0157cbc5c7996b481f2998bc30176c96a669bb
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/signbit-2.c
> >> @@ -0,0 +1,19 @@
> >> +/* { dg-do assemble } */
> >> +/* { dg-options "-O3 --save-temps -fdump-tree-optimized" } */
> >> +
> >> +#include <stdint.h>
> >> +
> >> +void fun1(int32_t *x, int n)
> >> +{
> >> +    for (int i = 0; i < (n & -16); i++)
> >> +      x[i] = (-x[i]) >> 31;
> >> +}
> >> +
> >> +void fun2(int32_t *x, int n)
> >> +{
> >> +    for (int i = 0; i < (n & -16); i++)
> >> +      x[i] = (-x[i]) >> 30;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times {\s+>\s+\{ 0, 0, 0, 0 \}} 1 optimized
> >> } } */
> >> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */
> >> diff --git a/gcc/testsuite/gcc.dg/signbit-3.c
> >> b/gcc/testsuite/gcc.dg/signbit-3.c
> >> new file mode 100644
> >> index
> >> 0000000000000000000000000000000000000000..19e9c06c349b3287610f817628f00938ece60bf7
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/signbit-3.c
> >> @@ -0,0 +1,13 @@
> >> +/* { dg-do assemble } */
> >> +/* { dg-options "-O1 --save-temps -fdump-tree-optimized" } */
> >> +
> >> +#include <stdint.h>
> >> +
> >> +void fun1(int32_t *x, int n)
> >> +{
> >> +    for (int i = 0; i < (n & -16); i++)
> >> +      x[i] = (-x[i]) >> 31;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times {\s+>\s+0;} 1 optimized } } */
> >> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */
> >> diff --git a/gcc/testsuite/gcc.target/aarch64/signbit-1.c
> >> b/gcc/testsuite/gcc.target/aarch64/signbit-1.c
> >> new file mode 100644
> >> index
> >> 0000000000000000000000000000000000000000..3ebfb0586f37de29cf58635b27fe48503714447e
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/aarch64/signbit-1.c
> >> @@ -0,0 +1,18 @@
> >> +/* { dg-do assemble } */
> >> +/* { dg-options "-O3 --save-temps" } */
> >> +
> >> +#include <stdint.h>
> >> +
> >> +void fun1(int32_t *x, int n)
> >> +{
> >> +    for (int i = 0; i < (n & -16); i++)
> >> +      x[i] = (-x[i]) >> 31;
> >> +}
> >> +
> >> +void fun2(int32_t *x, int n)
> >> +{
> >> +    for (int i = 0; i < (n & -16); i++)
> >> +      x[i] = (-x[i]) >> 30;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {\tcmgt\t} 1 } } */
> >>
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)