From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 257D83858C60 for ; Thu, 7 Oct 2021 12:46:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 257D83858C60 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A9FFC6D; Thu, 7 Oct 2021 05:46:17 -0700 (PDT) Received: from [10.57.44.103] (unknown [10.57.44.103]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BD66E3F66F; Thu, 7 Oct 2021 05:46:16 -0700 (PDT) Subject: Re: [PATCH]middle-end convert negate + right shift into compare greater. To: Tamar Christina , "gcc-patches@gcc.gnu.org" Cc: nd , "rguenther@suse.de" References: <44765ccd-d2ce-7bff-2fe5-a6ddc49cfd56@foss.arm.com> From: Richard Earnshaw Message-ID: Date: Thu, 7 Oct 2021 13:46:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3498.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, KAM_SHORT, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 12:46:20 -0000 On 05/10/2021 14:56, Tamar Christina via Gcc-patches wrote: > > >> -----Original Message----- >> From: Richard Earnshaw >> Sent: Tuesday, October 5, 2021 2:52 PM >> To: Tamar Christina ; gcc-patches@gcc.gnu.org >> Cc: nd ; rguenther@suse.de >> Subject: Re: [PATCH]middle-end convert negate + right shift into compare >> greater. >> >> >> >> On 05/10/2021 14:49, Tamar Christina wrote: >>>> -----Original Message----- >>>> From: Richard Earnshaw >>>> Sent: Tuesday, October 5, 2021 2:34 PM >>>> To: Tamar Christina ; >>>> gcc-patches@gcc.gnu.org >>>> Cc: nd ; rguenther@suse.de >>>> Subject: Re: [PATCH]middle-end convert negate + right shift into >>>> compare greater. >>>> >>>> >>>> >>>> On 05/10/2021 14:30, Tamar Christina wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Richard Earnshaw >>>>>> Sent: Tuesday, October 5, 2021 1:56 PM >>>>>> To: Tamar Christina ; >>>>>> gcc-patches@gcc.gnu.org >>>>>> Cc: nd ; rguenther@suse.de >>>>>> Subject: Re: [PATCH]middle-end convert negate + right shift into >>>>>> compare greater. >>>>>> >>>>>> >>>>>> >>>>>> On 05/10/2021 13:50, Tamar Christina via Gcc-patches wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> This turns an inversion of the sign bit + arithmetic right shift >>>>>>> into a comparison with 0. >>>>>>> >>>>>>> i.e. >>>>>>> >>>>>>> void fun1(int32_t *x, int n) >>>>>>> { >>>>>>> for (int i = 0; i < (n & -16); i++) >>>>>>> x[i] = (-x[i]) >> 31; >>>>>>> } >>>>>>> >>>>>> Notwithstanding that I think shifting a negative value right is >>>>>> unspecified behaviour, I don't think this generates the same result >>>>>> when x[i] is INT_MIN either, although negating that is also >>>>>> unspecified since it can't be represented in an int. >>>>>> >>>>> >>>>> You're right that they are implementation defined, but I think most >>>>> ISAs do have a sane Implementation of these two cases. At least both >>>>> x86 and AArch64 just replicate the signbit and for negate do two >>>> complement negation. So INT_MIN works as expected and results in 0. >>>> >>>> Which is not what the original code produces if you have wrapping >>>> ints, because -INT_MIN is INT_MIN, and thus still negative. >>>> >>> >>> True, but then you have a signed overflow which is undefined behaviour >>> and not implementation defined >>> >>> " If an exceptional condition occurs during the evaluation of an expression >> (that is, if the result is not mathematically defined or not in the range of >> representable values for its type), the behavior is undefined." >>> >>> So it should still be acceptable to do in this case. >> >> -fwrapv > > If I understand correctly, you're happy with this is I guard it on ! flag_wrapv ? I did some more digging. Right shift of a negative value is IMP_DEF (not UNDEF - this keeps catching me out). So yes, wrapping this with !wrapv would address my concern. I've not reviewed the patch itself, though. I've never even written a patch for match.pd, so don't feel qualified to do that. R. > > Regards, > Tamar >> >> R. >> >>> >>>> R. >>>> >>>>> >>>>> But I'm happy to guard this behind some sort of target guard. >>>>> >>>>> Regards, >>>>> Tamar >>>>> >>>>>> R. >>>>>> >>>>>>> now generates: >>>>>>> >>>>>>> .L3: >>>>>>> ldr q0, [x0] >>>>>>> cmgt v0.4s, v0.4s, #0 >>>>>>> str q0, [x0], 16 >>>>>>> cmp x0, x1 >>>>>>> bne .L3 >>>>>>> >>>>>>> instead of: >>>>>>> >>>>>>> .L3: >>>>>>> ldr q0, [x0] >>>>>>> neg v0.4s, v0.4s >>>>>>> sshr v0.4s, v0.4s, 31 >>>>>>> str q0, [x0], 16 >>>>>>> cmp x0, x1 >>>>>>> bne .L3 >>>>>>> >>>>>>> Bootstrapped Regtested on aarch64-none-linux-gnu, >>>>>>> x86_64-pc-linux-gnu and no regressions. >>>>>>> >>>>>>> Ok for master? >>>>>>> >>>>>>> Thanks, >>>>>>> Tamar >>>>>>> >>>>>>> gcc/ChangeLog: >>>>>>> >>>>>>> * match.pd: New negate+shift pattern. >>>>>>> >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> >>>>>>> * gcc.dg/signbit-2.c: New test. >>>>>>> * gcc.dg/signbit-3.c: New test. >>>>>>> * gcc.target/aarch64/signbit-1.c: New test. >>>>>>> >>>>>>> --- inline copy of patch -- >>>>>>> diff --git a/gcc/match.pd b/gcc/match.pd index >>>>>> >>>> >> 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..581436fe36dbacdcb0c2720b7 >>>>>> 190c96d14398143 100644 >>>>>>> --- a/gcc/match.pd >>>>>>> +++ b/gcc/match.pd >>>>>>> @@ -826,6 +826,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >>>>>>> { tree utype = unsigned_type_for (type); } >>>>>>> (convert (rshift (lshift (convert:utype @0) @2) @3)))))) >>>>>>> >>>>>>> +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1. */ >>>>>>> +(for cst (INTEGER_CST VECTOR_CST) (simplify >>>>>>> + (rshift (negate:s @0) cst@1) >>>>>>> + (with { tree ctype = TREE_TYPE (@0); >>>>>>> + tree stype = TREE_TYPE (@1); >>>>>>> + tree bt = truth_type_for (ctype); } >>>>>>> + (switch >>>>>>> + /* Handle scalar case. */ >>>>>>> + (if (INTEGRAL_TYPE_P (ctype) >>>>>>> + && !VECTOR_TYPE_P (ctype) >>>>>>> + && !TYPE_UNSIGNED (ctype) >>>>>>> + && canonicalize_math_after_vectorization_p () >>>>>>> + && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - >> 1)) >>>>>>> + (convert:bt (gt:bt @0 { build_zero_cst (stype); }))) >>>>>>> + /* Handle vector case with a scalar immediate. */ >>>>>>> + (if (VECTOR_INTEGER_TYPE_P (ctype) >>>>>>> + && !VECTOR_TYPE_P (stype) >>>>>>> + && !TYPE_UNSIGNED (ctype) >>>>>>> + && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1)) >>>>>>> + (convert:bt (gt:bt @0 { build_zero_cst (ctype); }))) >>>>>>> + /* Handle vector case with a vector immediate. */ >>>>>>> + (if (VECTOR_INTEGER_TYPE_P (ctype) >>>>>>> + && VECTOR_TYPE_P (stype) >>>>>>> + && !TYPE_UNSIGNED (ctype) >>>>>>> + && uniform_vector_p (@1)) >>>>>>> + (with { tree cst = vector_cst_elt (@1, 0); >>>>>>> + tree t = TREE_TYPE (cst); } >>>>>>> + (if (wi::eq_p (wi::to_wide (cst), TYPE_PRECISION (t) - 1)) >>>>>>> + (convert:bt (gt:bt @0 { build_zero_cst (ctype); >>>>>>> +}))))))))) >>>>>>> + >>>>>>> /* Fold (C1/X)*C2 into (C1*C2)/X. */ >>>>>>> (simplify >>>>>>> (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) diff --git >>>>>>> a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit- >>>>>> 2.c >>>>>>> new file mode 100644 >>>>>>> index >>>>>> >>>> >> 0000000000000000000000000000000000000000..fc0157cbc5c7996b481f2998bc >>>>>> 30176c96a669bb >>>>>>> --- /dev/null >>>>>>> +++ b/gcc/testsuite/gcc.dg/signbit-2.c >>>>>>> @@ -0,0 +1,19 @@ >>>>>>> +/* { dg-do assemble } */ >>>>>>> +/* { dg-options "-O3 --save-temps -fdump-tree-optimized" } */ >>>>>>> + >>>>>>> +#include >>>>>>> + >>>>>>> +void fun1(int32_t *x, int n) >>>>>>> +{ >>>>>>> + for (int i = 0; i < (n & -16); i++) >>>>>>> + x[i] = (-x[i]) >> 31; >>>>>>> +} >>>>>>> + >>>>>>> +void fun2(int32_t *x, int n) >>>>>>> +{ >>>>>>> + for (int i = 0; i < (n & -16); i++) >>>>>>> + x[i] = (-x[i]) >> 30; >>>>>>> +} >>>>>>> + >>>>>>> +/* { dg-final { scan-tree-dump-times {\s+>\s+\{ 0, 0, 0, 0 \}} 1 >>>>>>> +optimized } } >>>>>> */ >>>>>>> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ >>>>>>> diff --git a/gcc/testsuite/gcc.dg/signbit-3.c >>>>>>> b/gcc/testsuite/gcc.dg/signbit- >>>>>> 3.c >>>>>>> new file mode 100644 >>>>>>> index >>>>>> >>>> >> 0000000000000000000000000000000000000000..19e9c06c349b3287610f817628 >>>>>> f00938ece60bf7 >>>>>>> --- /dev/null >>>>>>> +++ b/gcc/testsuite/gcc.dg/signbit-3.c >>>>>>> @@ -0,0 +1,13 @@ >>>>>>> +/* { dg-do assemble } */ >>>>>>> +/* { dg-options "-O1 --save-temps -fdump-tree-optimized" } */ >>>>>>> + >>>>>>> +#include >>>>>>> + >>>>>>> +void fun1(int32_t *x, int n) >>>>>>> +{ >>>>>>> + for (int i = 0; i < (n & -16); i++) >>>>>>> + x[i] = (-x[i]) >> 31; >>>>>>> +} >>>>>>> + >>>>>>> +/* { dg-final { scan-tree-dump-times {\s+>\s+0;} 1 optimized } } >>>>>>> +*/ >>>>>>> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ >>>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/signbit-1.c >>>>>> b/gcc/testsuite/gcc.target/aarch64/signbit-1.c >>>>>>> new file mode 100644 >>>>>>> index >>>>>> >>>> >> 0000000000000000000000000000000000000000..3ebfb0586f37de29cf58635b27 >>>>>> fe48503714447e >>>>>>> --- /dev/null >>>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/signbit-1.c >>>>>>> @@ -0,0 +1,18 @@ >>>>>>> +/* { dg-do assemble } */ >>>>>>> +/* { dg-options "-O3 --save-temps" } */ >>>>>>> + >>>>>>> +#include >>>>>>> + >>>>>>> +void fun1(int32_t *x, int n) >>>>>>> +{ >>>>>>> + for (int i = 0; i < (n & -16); i++) >>>>>>> + x[i] = (-x[i]) >> 31; >>>>>>> +} >>>>>>> + >>>>>>> +void fun2(int32_t *x, int n) >>>>>>> +{ >>>>>>> + for (int i = 0; i < (n & -16); i++) >>>>>>> + x[i] = (-x[i]) >> 30; >>>>>>> +} >>>>>>> + >>>>>>> +/* { dg-final { scan-assembler-times {\tcmgt\t} 1 } } */ >>>>>>> >>>>>>>