public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
@ 2023-07-21 15:08 Drew Ross
  2023-07-21 17:27 ` Andrew Pinski
  0 siblings, 1 reply; 14+ messages in thread
From: Drew Ross @ 2023-07-21 15:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: Drew Ross

Simplifies (x << c) >> c where x is a signed integral type of
width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
on x86_64 and x86 targets.

        PR middle-end/101955

gcc/ChangeLog:

        * match.pd (x << c) >> c -> -(x & 1): New simplification.

gcc/testsuite/ChangeLog:

        * gcc.dg/pr101955.c: New test.
---
 gcc/match.pd                    | 10 +++++
 gcc/testsuite/gcc.dg/pr101955.c | 69 +++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..820fc890e8e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
 
+/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
+   into -(X & 1).  */
+(simplify
+ (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
+ (with { tree cst = uniform_integer_cst_p (@1); }
+ (if (ANY_INTEGRAL_TYPE_P (type)
+      && !TYPE_UNSIGNED (type)
+      && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
+  (negate (bit_and (convert @0) { build_one_cst (type); })))))
+
 /* Optimize x >> x into 0 */
 (simplify
  (rshift @0 @0)
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 00000000000..386154911c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,69 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
+
+typedef int v4si __attribute__((vector_size(4 * sizeof(int))));
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t2 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  int w = 31;
+  int y = x << w;
+  int z = y >> w;
+  return z;
+}
+
+__attribute__((noipa)) long long
+t4 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t5 (long long x)
+{
+  long long y = x << 63;
+  long long z = y >> 63;
+  return z;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  int w = 63;
+  long long y = x << w;
+  long long z = y >> w;
+  return z;
+}
+
+__attribute__((noipa)) v4si
+t7 (v4si x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) v4si
+t8 (v4si x)
+{
+  v4si t = {31,31,31,31};
+  return (x << t) >> t;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
+/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
+/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
+
-- 
2.39.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-21 15:08 [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955] Drew Ross
@ 2023-07-21 17:27 ` Andrew Pinski
  2023-07-22  6:09   ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Pinski @ 2023-07-21 17:27 UTC (permalink / raw)
  To: Drew Ross; +Cc: gcc-patches

On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Simplifies (x << c) >> c where x is a signed integral type of
> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
> on x86_64 and x86 targets.

Thinking about this some more, I think this should be handled in
expand rather than on the gimple level.
It is very much related to PR 110717 even. We are basically truncating
to a signed one bit integer and then sign extending that across the
whole code.

Thanks,
Andrew

>
>         PR middle-end/101955
>
> gcc/ChangeLog:
>
>         * match.pd (x << c) >> c -> -(x & 1): New simplification.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/pr101955.c: New test.
> ---
>  gcc/match.pd                    | 10 +++++
>  gcc/testsuite/gcc.dg/pr101955.c | 69 +++++++++++++++++++++++++++++++++
>  2 files changed, 79 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..820fc890e8e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
>    (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>
> +/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
> +   into -(X & 1).  */
> +(simplify
> + (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
> + (with { tree cst = uniform_integer_cst_p (@1); }
> + (if (ANY_INTEGRAL_TYPE_P (type)
> +      && !TYPE_UNSIGNED (type)
> +      && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
> +  (negate (bit_and (convert @0) { build_one_cst (type); })))))
> +
>  /* Optimize x >> x into 0 */
>  (simplify
>   (rshift @0 @0)
> diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
> new file mode 100644
> index 00000000000..386154911c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
> +
> +typedef int v4si __attribute__((vector_size(4 * sizeof(int))));
> +
> +__attribute__((noipa)) int
> +t1 (int x)
> +{
> +  return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) int
> +t2 (int x)
> +{
> +  int y = x << 31;
> +  int z = y >> 31;
> +  return z;
> +}
> +
> +__attribute__((noipa)) int
> +t3 (int x)
> +{
> +  int w = 31;
> +  int y = x << w;
> +  int z = y >> w;
> +  return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t4 (long long x)
> +{
> +  return (x << 63) >> 63;
> +}
> +
> +__attribute__((noipa)) long long
> +t5 (long long x)
> +{
> +  long long y = x << 63;
> +  long long z = y >> 63;
> +  return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t6 (long long x)
> +{
> +  int w = 63;
> +  long long y = x << w;
> +  long long z = y >> w;
> +  return z;
> +}
> +
> +__attribute__((noipa)) v4si
> +t7 (v4si x)
> +{
> +  return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) v4si
> +t8 (v4si x)
> +{
> +  v4si t = {31,31,31,31};
> +  return (x << t) >> t;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
> +/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
> +
> --
> 2.39.3
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-21 17:27 ` Andrew Pinski
@ 2023-07-22  6:09   ` Jeff Law
  2023-07-24  7:16     ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2023-07-22  6:09 UTC (permalink / raw)
  To: Andrew Pinski, Drew Ross; +Cc: gcc-patches



On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Simplifies (x << c) >> c where x is a signed integral type of
>> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
>> on x86_64 and x86 targets.
> 
> Thinking about this some more, I think this should be handled in
> expand rather than on the gimple level.
> It is very much related to PR 110717 even. We are basically truncating
> to a signed one bit integer and then sign extending that across the
> whole code.
But why defer it to expand?  This idiom is going to generate a -1,0 
result which is definitely interesting from a further simplification 
standpoint.

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-22  6:09   ` Jeff Law
@ 2023-07-24  7:16     ` Richard Biener
  2023-07-24 19:29       ` Drew Ross
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2023-07-24  7:16 UTC (permalink / raw)
  To: Jeff Law; +Cc: Andrew Pinski, Drew Ross, gcc-patches

On Sat, Jul 22, 2023 at 8:09 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> > On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> Simplifies (x << c) >> c where x is a signed integral type of
> >> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
> >> on x86_64 and x86 targets.
> >
> > Thinking about this some more, I think this should be handled in
> > expand rather than on the gimple level.
> > It is very much related to PR 110717 even. We are basically truncating
> > to a signed one bit integer and then sign extending that across the
> > whole code.
> But why defer it to expand?  This idiom is going to generate a -1,0
> result which is definitely interesting from a further simplification
> standpoint.

It's not 'simpler' so it would be a canonicalization.  We talked about
providing a SEXT_EXPR at some point (sign-extend from constant bit N).

Another canonicalization to existing ops would be

 (convert (convert:sbool @0))

where sbool is a 1-bit precision signed type.  I think that's a better
canonicalization
than -(x & 1)?  For zero-extensions we canonicalize such a conversion sequence
to x & const-mask.  For sign-extensions there's no single operation
representation.

Richard.

>
> Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-24  7:16     ` Richard Biener
@ 2023-07-24 19:29       ` Drew Ross
  2023-07-24 19:42         ` Jakub Jelinek
  0 siblings, 1 reply; 14+ messages in thread
From: Drew Ross @ 2023-07-24 19:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Andrew Pinski, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1928 bytes --]

So would something like

(simplify
 (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
 (with { tree stype = build_nonstandard_integer_type (1, 0); }
 (if (INTEGRAL_TYPE_P (type)
      && !TYPE_UNSIGNED (type)
      && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
  (convert (convert:stype @0)))))

work?

Drew

On Mon, Jul 24, 2023 at 3:16 AM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Sat, Jul 22, 2023 at 8:09 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> > > On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >>
> > >> Simplifies (x << c) >> c where x is a signed integral type of
> > >> width >= int and c = precision(type) - 1 into -(x & 1). Tested
> successfully
> > >> on x86_64 and x86 targets.
> > >
> > > Thinking about this some more, I think this should be handled in
> > > expand rather than on the gimple level.
> > > It is very much related to PR 110717 even. We are basically truncating
> > > to a signed one bit integer and then sign extending that across the
> > > whole code.
> > But why defer it to expand?  This idiom is going to generate a -1,0
> > result which is definitely interesting from a further simplification
> > standpoint.
>
> It's not 'simpler' so it would be a canonicalization.  We talked about
> providing a SEXT_EXPR at some point (sign-extend from constant bit N).
>
> Another canonicalization to existing ops would be
>
>  (convert (convert:sbool @0))
>
> where sbool is a 1-bit precision signed type.  I think that's a better
> canonicalization
> than -(x & 1)?  For zero-extensions we canonicalize such a conversion
> sequence
> to x & const-mask.  For sign-extensions there's no single operation
> representation.
>
> Richard.
>
> >
> > Jeff
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-24 19:29       ` Drew Ross
@ 2023-07-24 19:42         ` Jakub Jelinek
  2023-07-25  6:54           ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 2023-07-24 19:42 UTC (permalink / raw)
  To: Drew Ross; +Cc: Richard Biener, Jeff Law, Andrew Pinski, gcc-patches

On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
> So would something like
> 
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>  (if (INTEGRAL_TYPE_P (type)
>       && !TYPE_UNSIGNED (type)
>       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>   (convert (convert:stype @0)))))
> 
> work?

Certainly swap the if and with and the (with then should be indented by 1
column to the right of (if and (convert one further (the reason for the
swapping is not to call build_nonstandard_integer_type when it will not be
needed, which will be probably far more often then an actual match).
As discussed privately, the above isn't what we want for vectors and the 2
shifts are probably best on most arches because even when using -(x & 1) the
{ 1, 1, 1, ... } vector would often needed to be loaded from memory.

	Jakub


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-24 19:42         ` Jakub Jelinek
@ 2023-07-25  6:54           ` Richard Biener
  2023-07-25 19:25             ` Drew Ross
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2023-07-25  6:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Drew Ross, Jeff Law, Andrew Pinski, gcc-patches

On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
> > So would something like
> >
> > (simplify
> >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
> >  (if (INTEGRAL_TYPE_P (type)
> >       && !TYPE_UNSIGNED (type)
> >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> >   (convert (convert:stype @0)))))
> >
> > work?
>
> Certainly swap the if and with and the (with then should be indented by 1
> column to the right of (if and (convert one further (the reason for the
> swapping is not to call build_nonstandard_integer_type when it will not be
> needed, which will be probably far more often then an actual match).

With that fixed I think for non-vector integrals the above is the most suitable
canonical form of a sign-extension.  Note it should also work for any other
constant shift amount - just use the appropriate intermediate precision for
the truncating type.  You might also want to verify what RTL expansion
produces before/after - it at least shouldn't be worse.  We _might_ want
to consider to only use the converts when the intermediate type has
mode precision (and as a special case allow one bit as in your above case)
so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).

> As discussed privately, the above isn't what we want for vectors and the 2
> shifts are probably best on most arches because even when using -(x & 1) the
> { 1, 1, 1, ... } vector would often needed to be loaded from memory.

I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
producing the result.  Note that to reflect this on GIMPLE you'd need

  _2 = _1 < { 0,0...};
  res = _2 ? { -1, -1, ...} : { 0, 0,...};

because whether the ISA has a way to produce all-ones masks isn't known.

For scalars using -(T)(_1 < 0) would also be possible.

That said - do you have any testcase where the canonicalization is an enabler
for further transforms or was this requested stand-alone?

Thanks,
Richard.

>         Jakub
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-25  6:54           ` Richard Biener
@ 2023-07-25 19:25             ` Drew Ross
  2023-07-25 19:43               ` Jakub Jelinek
  2023-07-26  8:39               ` Richard Biener
  0 siblings, 2 replies; 14+ messages in thread
From: Drew Ross @ 2023-07-25 19:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Jeff Law, Andrew Pinski, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4012 bytes --]

> With that fixed I think for non-vector integrals the above is the most
suitable
> canonical form of a sign-extension.  Note it should also work for any
other
> constant shift amount - just use the appropriate intermediate precision
for
> the truncating type.
> We _might_ want
> to consider to only use the converts when the intermediate type has
> mode precision (and as a special case allow one bit as in your above case)
> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).

Here is a pattern that that only matches to truncations that result in mode
precision (or precision of 1):

(simplify
 (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
 (if (INTEGRAL_TYPE_P (type)
      && !TYPE_UNSIGNED (type)
      && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
(TREE_TYPE (@1))))
  (with {
    int width = element_precision (type) - tree_to_uhwi (@1);
    tree stype = build_nonstandard_integer_type (width, 0);
   }
   (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
    (convert (convert:stype @0))))))

Look ok?

> You might also want to verify what RTL expansion
> produces before/after - it at least shouldn't be worse.

The RTL is slightly better for the mode precision cases and slightly worse
for the precision 1 case.

> That said - do you have any testcase where the canonicalization is an
enabler
> for further transforms or was this requested stand-alone?

No, I don't have any specific test cases. This patch is just in response to
pr101955 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101955>.

On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches
> wrote:
> > > So would something like
> > >
> > > (simplify
> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
> > >  (if (INTEGRAL_TYPE_P (type)
> > >       && !TYPE_UNSIGNED (type)
> > >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> > >   (convert (convert:stype @0)))))
> > >
> > > work?
> >
> > Certainly swap the if and with and the (with then should be indented by 1
> > column to the right of (if and (convert one further (the reason for the
> > swapping is not to call build_nonstandard_integer_type when it will not
> be
> > needed, which will be probably far more often then an actual match).
>
> With that fixed I think for non-vector integrals the above is the most
> suitable
> canonical form of a sign-extension.  Note it should also work for any other
> constant shift amount - just use the appropriate intermediate precision for
> the truncating type.  You might also want to verify what RTL expansion
> produces before/after - it at least shouldn't be worse.  We _might_ want
> to consider to only use the converts when the intermediate type has
> mode precision (and as a special case allow one bit as in your above case)
> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>
> > As discussed privately, the above isn't what we want for vectors and the
> 2
> > shifts are probably best on most arches because even when using -(x & 1)
> the
> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>
> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
> producing the result.  Note that to reflect this on GIMPLE you'd need
>
>   _2 = _1 < { 0,0...};
>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
>
> because whether the ISA has a way to produce all-ones masks isn't known.
>
> For scalars using -(T)(_1 < 0) would also be possible.
>
> That said - do you have any testcase where the canonicalization is an
> enabler
> for further transforms or was this requested stand-alone?
>
> Thanks,
> Richard.
>
> >         Jakub
> >
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-25 19:25             ` Drew Ross
@ 2023-07-25 19:43               ` Jakub Jelinek
  2023-07-26  8:39               ` Richard Biener
  1 sibling, 0 replies; 14+ messages in thread
From: Jakub Jelinek @ 2023-07-25 19:43 UTC (permalink / raw)
  To: Drew Ross; +Cc: Richard Biener, Jeff Law, Andrew Pinski, gcc-patches

On Tue, Jul 25, 2023 at 03:25:57PM -0400, Drew Ross wrote:
> > With that fixed I think for non-vector integrals the above is the most
> suitable
> > canonical form of a sign-extension.  Note it should also work for any
> other
> > constant shift amount - just use the appropriate intermediate precision
> for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> 
> Here is a pattern that that only matches to truncations that result in mode
> precision (or precision of 1):
> 
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (INTEGRAL_TYPE_P (type)
>       && !TYPE_UNSIGNED (type)
>       && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1))))

I'd use
     && wi::ltu_p (wi::to_wide (@1), element_precision (type))
If the shift count would be negative, you'd otherwise ICE in tree_to_uhwi on
it (sure, that is UB at runtime, but compiler shouldn't ICE on it).

>   (with {
>     int width = element_precision (type) - tree_to_uhwi (@1);
>     tree stype = build_nonstandard_integer_type (width, 0);
>    }
>    (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>     (convert (convert:stype @0))))))

	Jakub


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-25 19:25             ` Drew Ross
  2023-07-25 19:43               ` Jakub Jelinek
@ 2023-07-26  8:39               ` Richard Biener
  2023-07-26 18:18                 ` Drew Ross
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Biener @ 2023-07-26  8:39 UTC (permalink / raw)
  To: Drew Ross; +Cc: Jakub Jelinek, Jeff Law, Andrew Pinski, gcc-patches

On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
>
> > With that fixed I think for non-vector integrals the above is the most suitable
> > canonical form of a sign-extension.  Note it should also work for any other
> > constant shift amount - just use the appropriate intermediate precision for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>
> Here is a pattern that that only matches to truncations that result in mode precision (or precision of 1):
>
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (INTEGRAL_TYPE_P (type)
>       && !TYPE_UNSIGNED (type)
>       && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1))))
>   (with {
>     int width = element_precision (type) - tree_to_uhwi (@1);
>     tree stype = build_nonstandard_integer_type (width, 0);
>    }
>    (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>     (convert (convert:stype @0))))))
>
> Look ok?

I suppose so.  Can you see to amend the existing

/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
   types.  */
(simplify
 (rshift (lshift @0 INTEGER_CST@1) @1)
 (if (TYPE_UNSIGNED (type)
      && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
  (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))

pattern?  You will get a duplicate pattern diagnostic otherwise.  It
also looks like this
one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
some cases of widening/truncating converts between the shifts?

Richard.

> > You might also want to verify what RTL expansion
> > produces before/after - it at least shouldn't be worse.
>
> The RTL is slightly better for the mode precision cases and slightly worse for the precision 1 case.
>
> > That said - do you have any testcase where the canonicalization is an enabler
> > for further transforms or was this requested stand-alone?
>
> No, I don't have any specific test cases. This patch is just in response to pr101955.
>
> On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>> >
>> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
>> > > So would something like
>> > >
>> > > (simplify
>> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>> > >  (if (INTEGRAL_TYPE_P (type)
>> > >       && !TYPE_UNSIGNED (type)
>> > >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>> > >   (convert (convert:stype @0)))))
>> > >
>> > > work?
>> >
>> > Certainly swap the if and with and the (with then should be indented by 1
>> > column to the right of (if and (convert one further (the reason for the
>> > swapping is not to call build_nonstandard_integer_type when it will not be
>> > needed, which will be probably far more often then an actual match).
>>
>> With that fixed I think for non-vector integrals the above is the most suitable
>> canonical form of a sign-extension.  Note it should also work for any other
>> constant shift amount - just use the appropriate intermediate precision for
>> the truncating type.  You might also want to verify what RTL expansion
>> produces before/after - it at least shouldn't be worse.  We _might_ want
>> to consider to only use the converts when the intermediate type has
>> mode precision (and as a special case allow one bit as in your above case)
>> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>>
>> > As discussed privately, the above isn't what we want for vectors and the 2
>> > shifts are probably best on most arches because even when using -(x & 1) the
>> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>>
>> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
>> producing the result.  Note that to reflect this on GIMPLE you'd need
>>
>>   _2 = _1 < { 0,0...};
>>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
>>
>> because whether the ISA has a way to produce all-ones masks isn't known.
>>
>> For scalars using -(T)(_1 < 0) would also be possible.
>>
>> That said - do you have any testcase where the canonicalization is an enabler
>> for further transforms or was this requested stand-alone?
>>
>> Thanks,
>> Richard.
>>
>> >         Jakub
>> >
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-26  8:39               ` Richard Biener
@ 2023-07-26 18:18                 ` Drew Ross
  2023-07-28  6:30                   ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Drew Ross @ 2023-07-26 18:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Jeff Law, Andrew Pinski, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6037 bytes --]

Here is what I came up with for combining the two:

/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
   unsigned x OR truncate into the precision(type) - c lowest bits
   of signed x (if they have mode precision or a precision of 1)  */
(simplify
 (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
 (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
  (if (TYPE_UNSIGNED (type))
   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
   (if (INTEGRAL_TYPE_P (type))
    (with {
      int width = element_precision (type) - tree_to_uhwi (@1);
      tree stype = build_nonstandard_integer_type (width, 0);
     }
     (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
      (convert (convert:stype @0))))))))

Let me know what you think.

> Btw, I wonder whether we can handle
> some cases of widening/truncating converts between the shifts?

I will look into this.

Drew

On Wed, Jul 26, 2023 at 4:40 AM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
> >
> > > With that fixed I think for non-vector integrals the above is the most
> suitable
> > > canonical form of a sign-extension.  Note it should also work for any
> other
> > > constant shift amount - just use the appropriate intermediate
> precision for
> > > the truncating type.
> > > We _might_ want
> > > to consider to only use the converts when the intermediate type has
> > > mode precision (and as a special case allow one bit as in your above
> case)
> > > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> >
> > Here is a pattern that that only matches to truncations that result in
> mode precision (or precision of 1):
> >
> > (simplify
> >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >  (if (INTEGRAL_TYPE_P (type)
> >       && !TYPE_UNSIGNED (type)
> >       && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1))))
> >   (with {
> >     int width = element_precision (type) - tree_to_uhwi (@1);
> >     tree stype = build_nonstandard_integer_type (width, 0);
> >    }
> >    (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> >     (convert (convert:stype @0))))))
> >
> > Look ok?
>
> I suppose so.  Can you see to amend the existing
>
> /* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
>    types.  */
> (simplify
>  (rshift (lshift @0 INTEGER_CST@1) @1)
>  (if (TYPE_UNSIGNED (type)
>       && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
>   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>
> pattern?  You will get a duplicate pattern diagnostic otherwise.  It
> also looks like this
> one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
> some cases of widening/truncating converts between the shifts?
>
> Richard.
>
> > > You might also want to verify what RTL expansion
> > > produces before/after - it at least shouldn't be worse.
> >
> > The RTL is slightly better for the mode precision cases and slightly
> worse for the precision 1 case.
> >
> > > That said - do you have any testcase where the canonicalization is an
> enabler
> > > for further transforms or was this requested stand-alone?
> >
> > No, I don't have any specific test cases. This patch is just in response
> to pr101955.
> >
> > On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <
> richard.guenther@gmail.com> wrote:
> >>
> >> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >> >
> >> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches
> wrote:
> >> > > So would something like
> >> > >
> >> > > (simplify
> >> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
> >> > >  (if (INTEGRAL_TYPE_P (type)
> >> > >       && !TYPE_UNSIGNED (type)
> >> > >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> >> > >   (convert (convert:stype @0)))))
> >> > >
> >> > > work?
> >> >
> >> > Certainly swap the if and with and the (with then should be indented
> by 1
> >> > column to the right of (if and (convert one further (the reason for
> the
> >> > swapping is not to call build_nonstandard_integer_type when it will
> not be
> >> > needed, which will be probably far more often then an actual match).
> >>
> >> With that fixed I think for non-vector integrals the above is the most
> suitable
> >> canonical form of a sign-extension.  Note it should also work for any
> other
> >> constant shift amount - just use the appropriate intermediate precision
> for
> >> the truncating type.  You might also want to verify what RTL expansion
> >> produces before/after - it at least shouldn't be worse.  We _might_ want
> >> to consider to only use the converts when the intermediate type has
> >> mode precision (and as a special case allow one bit as in your above
> case)
> >> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> >>
> >> > As discussed privately, the above isn't what we want for vectors and
> the 2
> >> > shifts are probably best on most arches because even when using -(x &
> 1) the
> >> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
> >>
> >> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
> >> producing the result.  Note that to reflect this on GIMPLE you'd need
> >>
> >>   _2 = _1 < { 0,0...};
> >>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
> >>
> >> because whether the ISA has a way to produce all-ones masks isn't known.
> >>
> >> For scalars using -(T)(_1 < 0) would also be possible.
> >>
> >> That said - do you have any testcase where the canonicalization is an
> enabler
> >> for further transforms or was this requested stand-alone?
> >>
> >> Thanks,
> >> Richard.
> >>
> >> >         Jakub
> >> >
> >>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
  2023-07-26 18:18                 ` Drew Ross
@ 2023-07-28  6:30                   ` Richard Biener
  2023-08-01 19:20                     ` [PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955] Drew Ross
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2023-07-28  6:30 UTC (permalink / raw)
  To: Drew Ross; +Cc: Jakub Jelinek, Jeff Law, Andrew Pinski, gcc-patches

On Wed, Jul 26, 2023 at 8:19 PM Drew Ross <drross@redhat.com> wrote:
>
> Here is what I came up with for combining the two:
>
> /* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
>    unsigned x OR truncate into the precision(type) - c lowest bits
>    of signed x (if they have mode precision or a precision of 1)  */
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
>   (if (TYPE_UNSIGNED (type))
>    (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
>    (if (INTEGRAL_TYPE_P (type))
>     (with {
>       int width = element_precision (type) - tree_to_uhwi (@1);
>       tree stype = build_nonstandard_integer_type (width, 0);
>      }
>      (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>       (convert (convert:stype @0))))))))
>
> Let me know what you think.

Looks good to me.

Thanks,
Richard.

> > Btw, I wonder whether we can handle
> > some cases of widening/truncating converts between the shifts?
>
> I will look into this.
>
> Drew
>
> On Wed, Jul 26, 2023 at 4:40 AM Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
>> >
>> > > With that fixed I think for non-vector integrals the above is the most suitable
>> > > canonical form of a sign-extension.  Note it should also work for any other
>> > > constant shift amount - just use the appropriate intermediate precision for
>> > > the truncating type.
>> > > We _might_ want
>> > > to consider to only use the converts when the intermediate type has
>> > > mode precision (and as a special case allow one bit as in your above case)
>> > > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>> >
>> > Here is a pattern that that only matches to truncations that result in mode precision (or precision of 1):
>> >
>> > (simplify
>> >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> >  (if (INTEGRAL_TYPE_P (type)
>> >       && !TYPE_UNSIGNED (type)
>> >       && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1))))
>> >   (with {
>> >     int width = element_precision (type) - tree_to_uhwi (@1);
>> >     tree stype = build_nonstandard_integer_type (width, 0);
>> >    }
>> >    (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>> >     (convert (convert:stype @0))))))
>> >
>> > Look ok?
>>
>> I suppose so.  Can you see to amend the existing
>>
>> /* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
>>    types.  */
>> (simplify
>>  (rshift (lshift @0 INTEGER_CST@1) @1)
>>  (if (TYPE_UNSIGNED (type)
>>       && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
>>   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>>
>> pattern?  You will get a duplicate pattern diagnostic otherwise.  It
>> also looks like this
>> one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
>> some cases of widening/truncating converts between the shifts?
>>
>> Richard.
>>
>> > > You might also want to verify what RTL expansion
>> > > produces before/after - it at least shouldn't be worse.
>> >
>> > The RTL is slightly better for the mode precision cases and slightly worse for the precision 1 case.
>> >
>> > > That said - do you have any testcase where the canonicalization is an enabler
>> > > for further transforms or was this requested stand-alone?
>> >
>> > No, I don't have any specific test cases. This patch is just in response to pr101955.
>> >
>> > On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com> wrote:
>> >>
>> >> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>> >> >
>> >> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
>> >> > > So would something like
>> >> > >
>> >> > > (simplify
>> >> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> >> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>> >> > >  (if (INTEGRAL_TYPE_P (type)
>> >> > >       && !TYPE_UNSIGNED (type)
>> >> > >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>> >> > >   (convert (convert:stype @0)))))
>> >> > >
>> >> > > work?
>> >> >
>> >> > Certainly swap the if and with and the (with then should be indented by 1
>> >> > column to the right of (if and (convert one further (the reason for the
>> >> > swapping is not to call build_nonstandard_integer_type when it will not be
>> >> > needed, which will be probably far more often then an actual match).
>> >>
>> >> With that fixed I think for non-vector integrals the above is the most suitable
>> >> canonical form of a sign-extension.  Note it should also work for any other
>> >> constant shift amount - just use the appropriate intermediate precision for
>> >> the truncating type.  You might also want to verify what RTL expansion
>> >> produces before/after - it at least shouldn't be worse.  We _might_ want
>> >> to consider to only use the converts when the intermediate type has
>> >> mode precision (and as a special case allow one bit as in your above case)
>> >> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>> >>
>> >> > As discussed privately, the above isn't what we want for vectors and the 2
>> >> > shifts are probably best on most arches because even when using -(x & 1) the
>> >> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>> >>
>> >> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
>> >> producing the result.  Note that to reflect this on GIMPLE you'd need
>> >>
>> >>   _2 = _1 < { 0,0...};
>> >>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
>> >>
>> >> because whether the ISA has a way to produce all-ones masks isn't known.
>> >>
>> >> For scalars using -(T)(_1 < 0) would also be possible.
>> >>
>> >> That said - do you have any testcase where the canonicalization is an enabler
>> >> for further transforms or was this requested stand-alone?
>> >>
>> >> Thanks,
>> >> Richard.
>> >>
>> >> >         Jakub
>> >> >
>> >>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955]
  2023-07-28  6:30                   ` Richard Biener
@ 2023-08-01 19:20                     ` Drew Ross
  2023-08-01 21:36                       ` Jakub Jelinek
  0 siblings, 1 reply; 14+ messages in thread
From: Drew Ross @ 2023-08-01 19:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: Drew Ross

Canonicalizes (signed x << c) >> c into the lowest
precision(type) - c bits of x IF those bits have a mode precision or a
precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
x86_64 and x86 targets.

  PR middle-end/101955

gcc/ChangeLog:

  * match.pd ((signed x << c) >> c): New canonicalization.

gcc/testsuite/ChangeLog:

  * gcc.dg/pr101955.c: New test.
---
 gcc/match.pd                    | 20 +++++++----
 gcc/testsuite/gcc.dg/pr101955.c | 63 +++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..62f7c84f565 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 			- TYPE_PRECISION (TREE_TYPE (@2)))))
   (bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1))))
 
-/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
-   types.  */
+/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
+   unsigned x OR truncate into the precision(type) - c lowest bits
+   of signed x (if they have mode precision or a precision of 1)  */
 (simplify
- (rshift (lshift @0 INTEGER_CST@1) @1)
- (if (TYPE_UNSIGNED (type)
-      && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
-  (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
+ (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
+ (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
+  (if (TYPE_UNSIGNED (type))
+   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
+   (if (INTEGRAL_TYPE_P (type))
+    (with {
+      int width = element_precision (type) - tree_to_uhwi (@1);
+      tree stype = build_nonstandard_integer_type (width, 0);
+     }
+     (if (width  == 1 || type_has_mode_precision_p (stype))
+      (convert (convert:stype @0))))))))
 
 /* Optimize x >> x into 0 */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 00000000000..8619661b291
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t2 (unsigned int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t4 (int x)
+{
+  return (x << 24) >> 24;
+}
+
+__attribute__((noipa)) int
+t5 (int x)
+{
+  return (x << 16) >> 16;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t7 (long long x)
+{
+  return (x << 56) >> 56;
+}
+
+__attribute__((noipa)) long long
+t8 (long long x)
+{
+  return (x << 48) >> 48;
+}
+
+__attribute__((noipa)) long long
+t9 (long long x)
+{
+  return (x << 32) >> 32;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " << " "optimized" } } */
-- 
2.39.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955]
  2023-08-01 19:20                     ` [PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955] Drew Ross
@ 2023-08-01 21:36                       ` Jakub Jelinek
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Jelinek @ 2023-08-01 21:36 UTC (permalink / raw)
  To: Drew Ross; +Cc: gcc-patches

On Tue, Aug 01, 2023 at 03:20:33PM -0400, Drew Ross via Gcc-patches wrote:
> Canonicalizes (signed x << c) >> c into the lowest
> precision(type) - c bits of x IF those bits have a mode precision or a
> precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
> ((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
> x86_64 and x86 targets.
> 
>   PR middle-end/101955
> 
> gcc/ChangeLog:
> 
>   * match.pd ((signed x << c) >> c): New canonicalization.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr101955.c: New test.
> ---
>  gcc/match.pd                    | 20 +++++++----
>  gcc/testsuite/gcc.dg/pr101955.c | 63 +++++++++++++++++++++++++++++++++
>  2 files changed, 77 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..62f7c84f565 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  			- TYPE_PRECISION (TREE_TYPE (@2)))))
>    (bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1))))
>  
> -/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
> -   types.  */
> +/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
> +   unsigned x OR truncate into the precision(type) - c lowest bits
> +   of signed x (if they have mode precision or a precision of 1)  */

There should be . between ) and "  */" above.

>  (simplify
> - (rshift (lshift @0 INTEGER_CST@1) @1)
> - (if (TYPE_UNSIGNED (type)
> -      && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
> -  (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
> + (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> + (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
> +  (if (TYPE_UNSIGNED (type))
> +   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))

This needs to be (convert @0) instead of @0, because now that there is
the nop_convert? in between, @0 could have different type than type.
I certainly see regressions on
gcc.c-torture/compile/950612-1.c
on i686-linux because of this:
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/950612-1.c:17:1: error: type mismatch in binary expression
long long unsigned int

long long int

long long unsigned int

_346 = _3 & 4294967295;
during GIMPLE pass: forwprop
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/950612-1.c:17:1: internal compiler error: verify_gimple failed
0x9018a4e verify_gimple_in_cfg(function*, bool, bool)
        ../../gcc/tree-cfg.cc:5646
0x8e81eb5 execute_function_todo
        ../../gcc/passes.cc:2088
0x8e8234c do_per_function
        ../../gcc/passes.cc:1687
0x8e82431 execute_todo
        ../../gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

> +   (if (INTEGRAL_TYPE_P (type))
> +    (with {
> +      int width = element_precision (type) - tree_to_uhwi (@1);
> +      tree stype = build_nonstandard_integer_type (width, 0);
> +     }
> +     (if (width  == 1 || type_has_mode_precision_p (stype))
> +      (convert (convert:stype @0))))))))

just one space before == instead of two

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,63 @@
> +/* { dg-do compile } */

The above line should be
/* { dg-do compile { target int32 } } */
because the test relies on 32-bit int, some targets have just
16-bit int.
Of course, unless you want to make the testcase more portable, by
using say
#define CHAR_BITS __CHAR_BIT__
#define INT_BITS (__SIZEOF_INT__ * __CHAR_BIT__)
#define LLONG_BITS (__SIZEOF_LONGLONG__ * __CHAR_BIT__)
and replacing all the 31, 24, 56 etc. constants with (INT_BITS - 1),
(INT_BITS - CHAR_BITS), (LLONG_BITS - CHAR_BITS) etc.
Though, it would still fail on some AVR configurations which have
(invalid for C) just 8-bit int, and the question is what to do with
that 16, because (INT_BITS - 2 * CHAR_BITS) is 0 on 16-bit ints, so
it would need to be (INT_BITS / 2) instead.  C requires that
long long is at least 64-bit, so that is less problematic (no known
target to have > 64-bit long long, though theoretically possible).

> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +

	Jakub


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-08-01 21:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21 15:08 [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955] Drew Ross
2023-07-21 17:27 ` Andrew Pinski
2023-07-22  6:09   ` Jeff Law
2023-07-24  7:16     ` Richard Biener
2023-07-24 19:29       ` Drew Ross
2023-07-24 19:42         ` Jakub Jelinek
2023-07-25  6:54           ` Richard Biener
2023-07-25 19:25             ` Drew Ross
2023-07-25 19:43               ` Jakub Jelinek
2023-07-26  8:39               ` Richard Biener
2023-07-26 18:18                 ` Drew Ross
2023-07-28  6:30                   ` Richard Biener
2023-08-01 19:20                     ` [PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955] Drew Ross
2023-08-01 21:36                       ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).