From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 20D3A3858C2C for ; Thu, 30 Sep 2021 10:27:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 20D3A3858C2C Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id E709A22589; Thu, 30 Sep 2021 10:26:58 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 7CFA2A3B8A; Thu, 30 Sep 2021 10:26:55 +0000 (UTC) Date: Thu, 30 Sep 2021 12:26:57 +0200 (CEST) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd Subject: RE: [PATCH 5/7]middle-end Convert bitclear + cmp #0 into cm In-Reply-To: Message-ID: <58723299-qp53-6s88-o9o1-891o5r1s647n@fhfr.qr> References: <20210929162106.GA5336@arm.com> <81r597qr-192o-q9p-6r5o-37nn6r48n538@fhfr.qr> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2021 10:27:03 -0000 On Thu, 30 Sep 2021, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener > > Sent: Thursday, September 30, 2021 7:18 AM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd > > Subject: Re: [PATCH 5/7]middle-end Convert bitclear + cmp #0 > > into cm > > > > On Wed, 29 Sep 2021, Tamar Christina wrote: > > > > > Hi All, > > > > > > This optimizes the case where a mask Y which fulfills ~Y + 1 == pow2 > > > is used to clear a some bits and then compared against 0 into one > > > without the masking and a compare against a different bit immediate. > > > > > > We can do this for all unsigned compares and for signed we can do it > > > for comparisons of EQ and NE: > > > > > > (x & (~255)) == 0 becomes x <= 255. Which for leaves it to the target > > > to optimally deal with the comparison. > > > > > > This transformation has to be done in the mid-end because in RTL you > > > don't have the signs of the comparison operands and if the target > > > needs an immediate this should be floated outside of the loop. > > > > > > The RTL loop invariant hoisting is done before split1. > > > > > > i.e. > > > > > > void fun1(int32_t *x, int n) > > > { > > > for (int i = 0; i < (n & -16); i++) > > > x[i] = (x[i]&(~255)) == 0; > > > } > > > > > > now generates: > > > > > > .L3: > > > ldr q0, [x0] > > > cmhs v0.4s, v2.4s, v0.4s > > > and v0.16b, v1.16b, v0.16b > > > str q0, [x0], 16 > > > cmp x0, x1 > > > bne .L3 > > > > > > and floats the immediate out of the loop. > > > > > > instead of: > > > > > > .L3: > > > ldr q0, [x0] > > > bic v0.4s, #255 > > > cmeq v0.4s, v0.4s, #0 > > > and v0.16b, v1.16b, v0.16b > > > str q0, [x0], 16 > > > cmp x0, x1 > > > bne .L3 > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > > and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * match.pd: New bitmask compare pattern. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/bic-bitmask-10.c: New test. > > > * gcc.dg/bic-bitmask-11.c: New test. > > > * gcc.dg/bic-bitmask-12.c: New test. > > > * gcc.dg/bic-bitmask-2.c: New test. > > > * gcc.dg/bic-bitmask-3.c: New test. > > > * gcc.dg/bic-bitmask-4.c: New test. > > > * gcc.dg/bic-bitmask-5.c: New test. > > > * gcc.dg/bic-bitmask-6.c: New test. > > > * gcc.dg/bic-bitmask-7.c: New test. > > > * gcc.dg/bic-bitmask-8.c: New test. > > > * gcc.dg/bic-bitmask-9.c: New test. > > > * gcc.dg/bic-bitmask.h: New test. > > > * gcc.target/aarch64/bic-bitmask-1.c: New test. > > > > > > --- inline copy of patch -- > > > diff --git a/gcc/match.pd b/gcc/match.pd index > > > > > 0fcfd0ea62c043dc217d0d560ce5b7e569b70e7d..df9212cb27d172856b9d43b08 > > 752 > > > 62f96e8993c4 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -4288,6 +4288,56 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > (if (ic == ncmp) > > > (ncmp @0 @1)))))) > > > > > > +/* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z > > > + where ~Y + 1 == pow2 and Z = ~Y. */ (for cmp (simple_comparison) > > > +(simplify > > > + (cmp (bit_and:c @0 VECTOR_CST@1) integer_zerop) > > > > Why not for INTEGER_CST as well? We do have a related folding (only for > > INTEGER_CST) that does > > > > Because of a slight concern to de-optimize what targets currently generate for the flag setting variants. > So for example AArch64 generates worse code for foo than it does bar > > int foo (int x) > { > if (x <= 0xFFFF) > return 1; > > return 0; > } > > int bar (int x) > { > if (x & ~0xFFFF) > return 1; > > return 0; > } > > Because the flag setting bitmask was optimized more. I can of course do this and fix > AArch64 but other targets may have the same issue. For vectors this was less of a concern since > there's not flag setting there. > > Do you still want the scalar version? Yes, the simplification result is simpler and thus more canonical on GIMPLE. On x86 we generate xorl %eax, %eax cmpl $65535, %edi setle %al ret vs xorl %eax, %eax andl $-65536, %edi setne %al ret which are equivalent I think (and would be easily be transformed using a peephole if required). Richard. > Thanks, > Tamar > > > /* A & (2**N - 1) <= 2**K - 1 -> A & (2**N - 2**K) == 0 > > A & (2**N - 1) > 2**K - 1 -> A & (2**N - 2**K) != 0 > > > > which could be extended for integer vectors. That said, can you please place > > the pattern next to the above? > > > > Why does the transform only work for uniform vector constants? (I see that > > the implementation becomes simpler, but then you should also handle the > > INTEGER_CST case at least) > > > > > + (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@1)) > > > + && uniform_vector_p (@1)) > > > + (with { tree elt = vector_cst_elt (@1, 0); } > > > + (switch > > > + (if (TYPE_UNSIGNED (TREE_TYPE (@1)) && tree_fits_uhwi_p (elt)) > > > > avoid tree_fits_uhwi_p and use wide_int here > > > > > + (with { unsigned HOST_WIDE_INT diff = tree_to_uhwi (elt); > > > + tree tdiff = wide_int_to_tree (TREE_TYPE (elt), (~diff) + 1); > > > + tree newval = wide_int_to_tree (TREE_TYPE (elt), ~diff); > > > + tree newmask = build_uniform_cst (TREE_TYPE (@1), > > newval); } > > > + (if (integer_pow2p (tdiff)) > > > > You don't seem to use 'tdiff' so please do this check in wide_int > > > > > + (switch > > > + /* ((mask & x) < 0) -> 0. */ > > > + (if (cmp == LT_EXPR) > > > + { build_zero_cst (TREE_TYPE (@1)); }) > > > + /* ((mask & x) <= 0) -> x < mask. */ > > > + (if (cmp == LE_EXPR) > > > + (lt @0 { newmask; })) > > > + /* ((mask & x) == 0) -> x < mask. */ > > > + (if (cmp == EQ_EXPR) > > > + (le @0 { newmask; })) > > > + /* ((mask & x) != 0) -> x > mask. */ > > > + (if (cmp == NE_EXPR) > > > + (gt @0 { newmask; })) > > > + /* ((mask & x) >= 0) -> x <= mask. */ > > > + (if (cmp == GE_EXPR) > > > + (le @0 { newmask; })) > > > + /* ((mask & x) > 0) -> x < mask. */ > > > + (if (cmp == GT_EXPR) > > > + (lt @0 { newmask; })))))) > > > > you can avoid this switch with a lock-step (for, that maps 'cmp' > > to the result comparison code (for simplicity you can either keep the LT_EXPR > > special-case or transform to an always true condition which will be simplified). > > > > > + (if (!TYPE_UNSIGNED (TREE_TYPE (@1)) && tree_fits_shwi_p (elt)) > > > + (with { unsigned HOST_WIDE_INT diff = tree_to_shwi (elt); > > > + tree ustype = unsigned_type_for (TREE_TYPE (elt)); > > > + tree uvtype = unsigned_type_for (TREE_TYPE (@1)); > > > + tree tdiff = wide_int_to_tree (ustype, (~diff) + 1); > > > + tree udiff = wide_int_to_tree (ustype, ~diff); > > > + tree cst = build_uniform_cst (uvtype, udiff); } > > > + (if (integer_pow2p (tdiff)) > > > + (switch > > > + /* ((mask & x) == 0) -> x < mask. */ > > > + (if (cmp == EQ_EXPR) > > > + (le (convert:uvtype @0) { cst; })) > > > + /* ((mask & x) != 0) -> x > mask. */ > > > + (if (cmp == NE_EXPR) > > > + (gt (convert:uvtype @0) { cst; }))))))))))) > > > + > > > /* Transform comparisons of the form X - Y CMP 0 to X CMP Y. > > > ??? The transformation is valid for the other operators if overflow > > > is undefined for the type, but performing it here badly interacts > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-10.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-10.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..76a22a2313137a2a75dd711c2c > > 15 > > > c2d3a34e15aa > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-10.c > > > @@ -0,0 +1,26 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(int32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) == 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(int32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) == 0; > > > +} > > > + > > > +#define TYPE int32_t > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump {<=\s*.+\{ 255,.+\}} dce7 } } */ > > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967290,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-11.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-11.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..32553d7ba2f823f7a212374519 > > 90 > > > d0a216d2f912 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-11.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) != 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) != 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump {>\s*.+\{ 255,.+\}} dce7 } } */ > > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967290,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-12.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-12.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..e10cbf7fabe2dbf7ce436cdf37 > > b0 > > > f8b207c58408 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-12.c > > > @@ -0,0 +1,17 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-options "-O3 -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +typedef unsigned int v4si __attribute__ ((vector_size (16))); > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun(v4si *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) == 0; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump {<=\s*.+\{ 255,.+\}} dce7 } } */ > > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967290,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-2.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-2.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..da30fad89f6c8239baa4395b3f > > fa > > > ec0be577e13f > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-2.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) == 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) == 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump-times {<=\s*.+\{ 255,.+\}} 1 dce7 } } > > > +*/ > > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967040,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-3.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-3.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..da30fad89f6c8239baa4395b3f > > fa > > > ec0be577e13f > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-3.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) == 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) == 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump-times {<=\s*.+\{ 255,.+\}} 1 dce7 } } > > > +*/ > > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967040,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-4.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-4.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..1bcf23ccf1447d6c8c999ed1eb > > 25 > > > ba0a450028e1 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-4.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) >= 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) >= 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump-times {=\s*.+\{ 1,.+\}} 1 dce7 } } */ > > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967040,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-5.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-5.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..6e5a2fca9992efbc01f8dbbc6f > > 95 > > > 936e86643028 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-5.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) > 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) > 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump-times {>\s*.+\{ 255,.+\}} 1 dce7 } } > > > +*/ > > > +/* { dg-final { scan-tree-dump-not {&`s*.+\{ 4294967040,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-6.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-6.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..018e7a4348c9fc461106c3d9d0 > > 12 > > > 91325d3406c2 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-6.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) <= 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~255)) <= 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump-times {<=\s*.+\{ 255,.+\}} 1 dce7 } } > > > +*/ > > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967040,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-7.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-7.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..798678fb7555052c93abc4ca34 > > f6 > > > 17d640f73bb4 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-7.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~1)) < 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~1)) < 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump-times {__builtin_memset} 1 dce7 } } */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-8.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-8.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..1dabe834ed57dfa0be48c1dc3 > > dbb > > > 226092c79a1a > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-8.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~1)) != 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~1)) != 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump-times {>\s*.+\{ 1,.+\}} 1 dce7 } } */ > > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967294,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-9.c > > > b/gcc/testsuite/gcc.dg/bic-bitmask-9.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..9c1f8ee0adfc45d1b9fc212138 > > ea > > > 26bb6b693e49 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-9.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do run } */ > > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */ > > > + > > > +#include > > > + > > > +__attribute__((noinline, noipa)) > > > +void fun1(uint32_t *x, int n) > > > +{ > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~5)) == 0; > > > +} > > > + > > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t > > > +*x, int n) { > > > + for (int i = 0; i < (n & -16); i++) > > > + x[i] = (x[i]&(~5)) == 0; > > > +} > > > + > > > +#include "bic-bitmask.h" > > > + > > > +/* { dg-final { scan-tree-dump-not {<=\s*.+\{ 4294967289,.+\}} dce7 } > > > +} */ > > > +/* { dg-final { scan-tree-dump {&\s*.+\{ 4294967290,.+\}} dce7 } } */ > > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { > > > +aarch64*-*-* } } } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask.h > > > b/gcc/testsuite/gcc.dg/bic-bitmask.h > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..2b94065c025e0cbf71a21ac9b9 > > d6 > > > 314e24b0c2d9 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask.h > > > @@ -0,0 +1,43 @@ > > > +#include > > > + > > > +#ifndef N > > > +#define N 50 > > > +#endif > > > + > > > +#ifndef TYPE > > > +#define TYPE uint32_t > > > +#endif > > > + > > > +#ifndef DEBUG > > > +#define DEBUG 0 > > > +#endif > > > + > > > +#define BASE ((TYPE) -1 < 0 ? -126 : 4) > > > + > > > +int main () > > > +{ > > > + TYPE a[N]; > > > + TYPE b[N]; > > > + > > > + for (int i = 0; i < N; ++i) > > > + { > > > + a[i] = BASE + i * 13; > > > + b[i] = BASE + i * 13; > > > + if (DEBUG) > > > + printf ("%d: 0x%x\n", i, a[i]); > > > + } > > > + > > > + fun1 (a, N); > > > + fun2 (b, N); > > > + > > > + for (int i = 0; i < N; ++i) > > > + { > > > + if (DEBUG) > > > + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); > > > + > > > + if (a[i] != b[i]) > > > + __builtin_abort (); > > > + } > > > + return 0; > > > +} > > > + > > > diff --git a/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c > > > b/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c > > > new file mode 100644 > > > index > > > > > 0000000000000000000000000000000000000000..568c1ffc8bc4148efaeeba7a45 > > a7 > > > 5ecbd3a7a3dd > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c > > > @@ -0,0 +1,13 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-options "-O2 -save-temps" } */ > > > + > > > +#include > > > + > > > +uint32x4_t foo (int32x4_t a) > > > +{ > > > + int32x4_t cst = vdupq_n_s32 (255); > > > + int32x4_t zero = vdupq_n_s32 (0); > > > + return vceqq_s32 (vbicq_s32 (a, cst), zero); } > > > + > > > +/* { dg-final { scan-assembler-not {\tbic\t} { xfail { aarch64*-*-* } > > > +} } } */ > > > > > > > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)