More type narrowing in match.pd

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* More type narrowing in match.pd
@ 2015-04-30  3:57 Jeff Law
  2015-04-30  7:28 ` Marc Glisse
  2015-04-30  9:22 ` Richard Biener
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff Law @ 2015-04-30  3:57 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]


This is an incremental improvement to the type narrowing in match.pd. 
It's largely based on the pattern I added to fix 47477.

Basically if we have

(bit_and (arith_op (convert A) (convert B)) mask)

Where the conversions are widening and the mask turns off all bits 
outside the original types of A & B, then we can turn that into

(bit_and (arith_op A B) mask)

We may need to convert A & B to an unsigned type with the same 
width/precision as their original type, but that's still better than a 
widening conversion.

Bootstrapped and regression tested on x86_64-linux-gnu.

OK for the trunk?


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 4936 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5c7558a..51f68ab 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-04-29  Jeff Law  <law@redhat.com>
+
+	* match.pd (bit_and (plus/minus (convert @0) (convert @1) mask): New
+	simplifier to narrow arithmetic.
+
 2015-04-29  Mikhail Maltsev  <maltsevm@gmail.com>
 
 	* dojump.c (do_compare_rtx_and_jump): Use std::swap instead of
diff --git a/gcc/match.pd b/gcc/match.pd
index fc374de..357e767 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1068,3 +1068,41 @@ along with GCC; see the file COPYING3.  If not see
 	(convert (op @0 @1)))
       (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
 	(convert (op (convert:utype @0) (convert:utype @1)))))))
+
+/* This is another case of narrowing, specifically when there's an outer
+   BIT_AND_EXPR which masks off bits outside the type of the innermost
+   operands.   Like the previous case we have to convert the operands
+   to unsigned types to avoid introducing undefined behaviour for the
+   arithmetic operation.  */
+(for op (minus plus)
+  (simplify
+    (bit_and (op (convert@2 @0) (convert@3 @1)) INTEGER_CST@4)
+    (if (INTEGRAL_TYPE_P (type)
+	 /* We check for type compatibility between @0 and @1 below,
+	    so there's no need to check that @1/@3 are integral types.  */
+	 && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+	 && INTEGRAL_TYPE_P (TREE_TYPE (@2))
+	 /* The precision of the type of each operand must match the
+	    precision of the mode of each operand, similarly for the
+	    result.  */
+	 && (TYPE_PRECISION (TREE_TYPE (@0))
+	     == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
+	 && (TYPE_PRECISION (TREE_TYPE (@1))
+	     == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@1))))
+	 && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
+	 /* The inner conversion must be a widening conversion.  */
+	 && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0))
+	 && ((GENERIC 
+	      && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
+		  == TYPE_MAIN_VARIANT (TREE_TYPE (@1))))
+	     || (GIMPLE
+		 && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))))
+	 && (tree_int_cst_min_precision (@4, UNSIGNED)
+	     <= TYPE_PRECISION (TREE_TYPE (@0))))
+      (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+	(with { tree ntype = TREE_TYPE (@0); }
+	  (convert (bit_and (op @0 @1) (convert:ntype @4)))))
+      (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
+	(convert (bit_and (op (convert:utype @0) (convert:utype @1))
+			  (convert:utype @4)))))))
+
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 7aebfec..9df76c6 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-04-29  Jeff Law  <law@redhat.com>
+
+	* gcc.dg/tree-ssa/shorten-1.c: New test.
+
 2015-04-29  Petar Jovanovic  <petar.jovanovic@rt-rk.com>
 
 	* gcc.target/mips/call-from-init.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/shorten-1.c b/gcc/testsuite/gcc.dg/tree-ssa/shorten-1.c
new file mode 100644
index 0000000..cecdd44
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/shorten-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+extern const unsigned char mode_ibit[];
+extern const unsigned char mode_fbit[];
+extern const signed char smode_ibit[];
+extern const signed char smode_fbit[];
+
+/* We use bit-and rather than modulo to ensure we're actually
+   testing the desired match.pd pattern.  */
+unsigned char
+muufubar (int indx)
+{
+  int ret = (mode_fbit [indx] - mode_ibit [indx]) & 3;
+  return ret;
+}
+
+signed char
+msufubar (int indx)
+{
+  int ret = (mode_fbit [indx] - mode_ibit [indx]) & 3;
+  return ret;
+}
+
+unsigned char
+musfubar (int indx)
+{
+  int ret = (smode_fbit [indx] - smode_ibit [indx]) & 3;
+  return ret;
+}
+
+signed char
+mssfubar (int indx)
+{
+  int ret = (smode_fbit [indx] - smode_ibit [indx]) & 3;
+  return ret;
+}
+
+
+unsigned char
+puufubar (int indx)
+{
+  int ret = (mode_fbit [indx] + mode_ibit [indx]) & 3;
+  return ret;
+}
+
+signed char
+psufubar (int indx)
+{
+  int ret = (mode_fbit [indx] + mode_ibit [indx]) & 3;
+  return ret;
+}
+
+unsigned char
+pusfubar (int indx)
+{
+  int ret = (smode_fbit [indx] + smode_ibit [indx]) & 3;
+  return ret;
+}
+
+signed char
+pssfubar (int indx)
+{
+  int ret = (smode_fbit [indx] + smode_ibit [indx]) & 3;
+  return ret;
+}
+
+/* The shortening patterns in match.pd should arrange to do the
+   arithmetic in char modes and thus any casts to ints should
+   have been removed.  */
+/* { dg-final {scan-tree-dump-not "\\(int\\)" "optimized"} } */
+
+/* We should have casted 4 operands from signed to unsigned char types.  */
+/* { dg-final {scan-tree-dump-times "\\(unsigned char\\)" 8 "optimized" } } */
+
+/* And two return values should have been casted from unsigned char to
+   a normal char.  */
+/* { dg-final {scan-tree-dump-times "\\(signed char\\)" 4 "optimized" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30  3:57 More type narrowing in match.pd Jeff Law
@ 2015-04-30  7:28 ` Marc Glisse
  2015-04-30 17:29   ` Jeff Law
  2015-04-30  9:22 ` Richard Biener
  1 sibling, 1 reply; 10+ messages in thread
From: Marc Glisse @ 2015-04-30  7:28 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Wed, 29 Apr 2015, Jeff Law wrote:

> This is an incremental improvement to the type narrowing in match.pd. It's 
> largely based on the pattern I added to fix 47477.
>
> Basically if we have
>
> (bit_and (arith_op (convert A) (convert B)) mask)
>
> Where the conversions are widening and the mask turns off all bits outside 
> the original types of A & B, then we can turn that into
>
> (bit_and (arith_op A B) mask)
>
> We may need to convert A & B to an unsigned type with the same 
> width/precision as their original type, but that's still better than a 
> widening conversion.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.
>
> OK for the trunk?

+/* This is another case of narrowing, specifically when there's an outer
+   BIT_AND_EXPR which masks off bits outside the type of the innermost
+   operands.   Like the previous case we have to convert the operands
+   to unsigned types to avoid introducing undefined behaviour for the
+   arithmetic operation.  */
+(for op (minus plus)

No mult? or widen_mult with a different pattern? (maybe that's already 
done elsewhere)

+  (simplify
+    (bit_and (op (convert@2 @0) (convert@3 @1)) INTEGER_CST@4)

Maybe op@5 and then test single_use on @5? If I compute something, and 
before using it I test if the result is odd, I may not want to recompute 
it.

+    (if (INTEGRAL_TYPE_P (type)

Can this be false, or is it for documentation?

+        /* We check for type compatibility between @0 and @1 below,
+           so there's no need to check that @1/@3 are integral types.  */
+        && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+        && INTEGRAL_TYPE_P (TREE_TYPE (@2))
+        /* The precision of the type of each operand must match the
+           precision of the mode of each operand, similarly for the
+           result.  */

A nicely named helper that does this test would be cool. Every time I
see it I have to think again why it is necessary, and if there was a
function, I could refer to the comment above its definition ;-)

+        && (TYPE_PRECISION (TREE_TYPE (@0))
+            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
+        && (TYPE_PRECISION (TREE_TYPE (@1))
+            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@1))))
+        && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
+        /* The inner conversion must be a widening conversion.  */
+        && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0))
+        && ((GENERIC 
+             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
+                 == TYPE_MAIN_VARIANT (TREE_TYPE (@1))))
+            || (GIMPLE
+                && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))))

We don't need to be that strict, but this probably covers the most
common case.

+        && (tree_int_cst_min_precision (@4, UNSIGNED)
+            <= TYPE_PRECISION (TREE_TYPE (@0))))
+      (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+       (with { tree ntype = TREE_TYPE (@0); }
+         (convert (bit_and (op @0 @1) (convert:ntype @4)))))
+      (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
+       (convert (bit_and (op (convert:utype @0) (convert:utype @1))
+                         (convert:utype @4)))))))


-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30  3:57 More type narrowing in match.pd Jeff Law
  2015-04-30  7:28 ` Marc Glisse
@ 2015-04-30  9:22 ` Richard Biener
  2015-04-30 11:10   ` Marc Glisse
  2015-04-30 16:04   ` Jeff Law
  1 sibling, 2 replies; 10+ messages in thread
From: Richard Biener @ 2015-04-30  9:22 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Thu, Apr 30, 2015 at 5:52 AM, Jeff Law <law@redhat.com> wrote:
>
> This is an incremental improvement to the type narrowing in match.pd. It's
> largely based on the pattern I added to fix 47477.
>
> Basically if we have
>
> (bit_and (arith_op (convert A) (convert B)) mask)
>
> Where the conversions are widening and the mask turns off all bits outside
> the original types of A & B, then we can turn that into
>
> (bit_and (arith_op A B) mask)
>
> We may need to convert A & B to an unsigned type with the same
> width/precision as their original type, but that's still better than a
> widening conversion.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.
>
> OK for the trunk?

Without looking too close at this patch I'll note that we might want to
improve the previous one first to also handle a constant 2nd operand
for the operation (your new one also misses that).

I have in my local dev tree (so completely untested...)

@@ -1040,31 +1052,22 @@ (define_operator_list CBRT BUILT_IN_CBRT
    operation and convert the result to the desired type.  */
 (for op (plus minus)
   (simplify
-    (convert (op (convert@2 @0) (convert@3 @1)))
+    (convert (op:c@4 (convert@2 @0) (convert?@3 @1)))
     (if (INTEGRAL_TYPE_P (type)
-        /* We check for type compatibility between @0 and @1 below,
-           so there's no need to check that @1/@3 are integral types.  */
         && INTEGRAL_TYPE_P (TREE_TYPE (@0))
-        && INTEGRAL_TYPE_P (TREE_TYPE (@2))
+        && INTEGRAL_TYPE_P (TREE_TYPE (@4))
         /* The precision of the type of each operand must match the
            precision of the mode of each operand, similarly for the
            result.  */
         && (TYPE_PRECISION (TREE_TYPE (@0))
             == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
-        && (TYPE_PRECISION (TREE_TYPE (@1))
-            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@1))))
-        && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
         /* The inner conversion must be a widening conversion.  */
         && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0))
-        && ((GENERIC
-             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
-                 == TYPE_MAIN_VARIANT (TREE_TYPE (@1)))
-             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
-                 == TYPE_MAIN_VARIANT (type)))
-            || (GIMPLE
-                && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))
-                && types_compatible_p (TREE_TYPE (@0), type))))
+        /* The final precision should match that of operand @0.  */
+        && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0))
+        /* Make sure the wide operation is dead after the transform.  */
+        && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
       (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
-       (convert (op @0 @1)))
+       (convert (op @0 (convert @1))))
       (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
        (convert (op (convert:utype @0) (convert:utype @1)))))))

and it was noticed multiple times that the type comparison boiler-plate
needs some helper function.  Like

Index: gimple-match-head.c
===================================================================
--- gimple-match-head.c (revision 222375)
+++ gimple-match-head.c (working copy)
@@ -861,3 +861,8 @@
   return op;
 }

+inline bool
+types_match (tree t1, tree t2)
+{
+  return types_compatible_p (t1, t2);
+}
Index: generic-match-head.c
===================================================================
--- generic-match-head.c        (revision 222375)
+++ generic-match-head.c        (working copy)
@@ -70,4 +70,8 @@
 #include "dumpfile.h"
 #include "generic-match.h"

-
+inline bool
+types_match (tree t1, tree t2)
+{
+  return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
+}

and then just use types_match (TREE_TYPE (...), TREE_TYPE (...)) everywhere.

I'll also add to the comment about using has_single_use - that will cause
missed optimizations for pattern uses via gimple_build which adds stmts
to a sequence and does not have SSA operands computed (so everything
will appear as having zero uses).  We need to add an abstraction for the
single-use test as well, like for generic-match-head.c

inline bool
single_use (tree t)
{
  return true;
}

and for gimple-match-head.c

inline bool
single use (tree t)
{
  return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t);
}

And if you'd like to lend helping hands to adding patterns then transitioning
patterns from fold-const.c to match.pd is more appreciated than inventing
new ones ;)

Thanks,
Richard.

>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 5c7558a..51f68ab 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2015-04-29  Jeff Law  <law@redhat.com>
> +
> +       * match.pd (bit_and (plus/minus (convert @0) (convert @1) mask): New
> +       simplifier to narrow arithmetic.
> +
>  2015-04-29  Mikhail Maltsev  <maltsevm@gmail.com>
>
>         * dojump.c (do_compare_rtx_and_jump): Use std::swap instead of
> diff --git a/gcc/match.pd b/gcc/match.pd
> index fc374de..357e767 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1068,3 +1068,41 @@ along with GCC; see the file COPYING3.  If not see
>         (convert (op @0 @1)))
>        (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
>         (convert (op (convert:utype @0) (convert:utype @1)))))))
> +
> +/* This is another case of narrowing, specifically when there's an outer
> +   BIT_AND_EXPR which masks off bits outside the type of the innermost
> +   operands.   Like the previous case we have to convert the operands
> +   to unsigned types to avoid introducing undefined behaviour for the
> +   arithmetic operation.  */
> +(for op (minus plus)
> +  (simplify
> +    (bit_and (op (convert@2 @0) (convert@3 @1)) INTEGER_CST@4)
> +    (if (INTEGRAL_TYPE_P (type)
> +        /* We check for type compatibility between @0 and @1 below,
> +           so there's no need to check that @1/@3 are integral types.  */
> +        && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +        && INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +        /* The precision of the type of each operand must match the
> +           precision of the mode of each operand, similarly for the
> +           result.  */
> +        && (TYPE_PRECISION (TREE_TYPE (@0))
> +            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
> +        && (TYPE_PRECISION (TREE_TYPE (@1))
> +            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@1))))
> +        && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
> +        /* The inner conversion must be a widening conversion.  */
> +        && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE
> (@0))
> +        && ((GENERIC
> +             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
> +                 == TYPE_MAIN_VARIANT (TREE_TYPE (@1))))
> +            || (GIMPLE
> +                && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))))
> +        && (tree_int_cst_min_precision (@4, UNSIGNED)
> +            <= TYPE_PRECISION (TREE_TYPE (@0))))
> +      (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> +       (with { tree ntype = TREE_TYPE (@0); }
> +         (convert (bit_and (op @0 @1) (convert:ntype @4)))))
> +      (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
> +       (convert (bit_and (op (convert:utype @0) (convert:utype @1))
> +                         (convert:utype @4)))))))
> +
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 7aebfec..9df76c6 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2015-04-29  Jeff Law  <law@redhat.com>
> +
> +       * gcc.dg/tree-ssa/shorten-1.c: New test.
> +
>  2015-04-29  Petar Jovanovic  <petar.jovanovic@rt-rk.com>
>
>         * gcc.target/mips/call-from-init.c: New test.
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/shorten-1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/shorten-1.c
> new file mode 100644
> index 0000000..cecdd44
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/shorten-1.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +extern const unsigned char mode_ibit[];
> +extern const unsigned char mode_fbit[];
> +extern const signed char smode_ibit[];
> +extern const signed char smode_fbit[];
> +
> +/* We use bit-and rather than modulo to ensure we're actually
> +   testing the desired match.pd pattern.  */
> +unsigned char
> +muufubar (int indx)
> +{
> +  int ret = (mode_fbit [indx] - mode_ibit [indx]) & 3;
> +  return ret;
> +}
> +
> +signed char
> +msufubar (int indx)
> +{
> +  int ret = (mode_fbit [indx] - mode_ibit [indx]) & 3;
> +  return ret;
> +}
> +
> +unsigned char
> +musfubar (int indx)
> +{
> +  int ret = (smode_fbit [indx] - smode_ibit [indx]) & 3;
> +  return ret;
> +}
> +
> +signed char
> +mssfubar (int indx)
> +{
> +  int ret = (smode_fbit [indx] - smode_ibit [indx]) & 3;
> +  return ret;
> +}
> +
> +
> +unsigned char
> +puufubar (int indx)
> +{
> +  int ret = (mode_fbit [indx] + mode_ibit [indx]) & 3;
> +  return ret;
> +}
> +
> +signed char
> +psufubar (int indx)
> +{
> +  int ret = (mode_fbit [indx] + mode_ibit [indx]) & 3;
> +  return ret;
> +}
> +
> +unsigned char
> +pusfubar (int indx)
> +{
> +  int ret = (smode_fbit [indx] + smode_ibit [indx]) & 3;
> +  return ret;
> +}
> +
> +signed char
> +pssfubar (int indx)
> +{
> +  int ret = (smode_fbit [indx] + smode_ibit [indx]) & 3;
> +  return ret;
> +}
> +
> +/* The shortening patterns in match.pd should arrange to do the
> +   arithmetic in char modes and thus any casts to ints should
> +   have been removed.  */
> +/* { dg-final {scan-tree-dump-not "\\(int\\)" "optimized"} } */
> +
> +/* We should have casted 4 operands from signed to unsigned char types.  */
> +/* { dg-final {scan-tree-dump-times "\\(unsigned char\\)" 8 "optimized" } }
> */
> +
> +/* And two return values should have been casted from unsigned char to
> +   a normal char.  */
> +/* { dg-final {scan-tree-dump-times "\\(signed char\\)" 4 "optimized" } }
> */
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30  9:22 ` Richard Biener
@ 2015-04-30 11:10   ` Marc Glisse
  2015-04-30 11:18     ` Richard Biener
  2015-04-30 16:04   ` Jeff Law
  1 sibling, 1 reply; 10+ messages in thread
From: Marc Glisse @ 2015-04-30 11:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

On Thu, 30 Apr 2015, Richard Biener wrote:

> I have in my local dev tree (so completely untested...)
>
> @@ -1040,31 +1052,22 @@ (define_operator_list CBRT BUILT_IN_CBRT
>    operation and convert the result to the desired type.  */
> (for op (plus minus)
>   (simplify
> -    (convert (op (convert@2 @0) (convert@3 @1)))
> +    (convert (op:c@4 (convert@2 @0) (convert?@3 @1)))

I believe the :c here requires extra code further down, so we don't turn 
a-b into b-a.

>     (if (INTEGRAL_TYPE_P (type)
> -        /* We check for type compatibility between @0 and @1 below,
> -           so there's no need to check that @1/@3 are integral types.  */
>         && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> -        && INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +        && INTEGRAL_TYPE_P (TREE_TYPE (@4))
>         /* The precision of the type of each operand must match the
>            precision of the mode of each operand, similarly for the
>            result.  */
>         && (TYPE_PRECISION (TREE_TYPE (@0))
>             == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
> -        && (TYPE_PRECISION (TREE_TYPE (@1))
> -            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@1))))
> -        && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
>         /* The inner conversion must be a widening conversion.  */
>         && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0))
> -        && ((GENERIC
> -             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
> -                 == TYPE_MAIN_VARIANT (TREE_TYPE (@1)))
> -             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
> -                 == TYPE_MAIN_VARIANT (type)))
> -            || (GIMPLE
> -                && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))
> -                && types_compatible_p (TREE_TYPE (@0), type))))
> +        /* The final precision should match that of operand @0.  */
> +        && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0))
> +        /* Make sure the wide operation is dead after the transform.  */
> +        && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
>       (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> -       (convert (op @0 @1)))
> +       (convert (op @0 (convert @1))))
>       (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
>        (convert (op (convert:utype @0) (convert:utype @1)))))))

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30 11:10   ` Marc Glisse
@ 2015-04-30 11:18     ` Richard Biener
  2015-05-01 18:57       ` Jeff Law
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2015-04-30 11:18 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jeff Law

On Thu, Apr 30, 2015 at 12:53 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Thu, 30 Apr 2015, Richard Biener wrote:
>
>> I have in my local dev tree (so completely untested...)
>>
>> @@ -1040,31 +1052,22 @@ (define_operator_list CBRT BUILT_IN_CBRT
>>    operation and convert the result to the desired type.  */
>> (for op (plus minus)
>>   (simplify
>> -    (convert (op (convert@2 @0) (convert@3 @1)))
>> +    (convert (op:c@4 (convert@2 @0) (convert?@3 @1)))
>
>
> I believe the :c here requires extra code further down, so we don't turn a-b
> into b-a.

Indeed.  I've added :c only for minus as 5 - x can't be canonicalized to
move the constant to 2nd position which is always possible for plus.

Might be cleaner to add a separate pattern for that case.

Richard.

>
>>     (if (INTEGRAL_TYPE_P (type)
>> -        /* We check for type compatibility between @0 and @1 below,
>> -           so there's no need to check that @1/@3 are integral types.  */
>>         && INTEGRAL_TYPE_P (TREE_TYPE (@0))
>> -        && INTEGRAL_TYPE_P (TREE_TYPE (@2))
>> +        && INTEGRAL_TYPE_P (TREE_TYPE (@4))
>>         /* The precision of the type of each operand must match the
>>            precision of the mode of each operand, similarly for the
>>            result.  */
>>         && (TYPE_PRECISION (TREE_TYPE (@0))
>>             == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
>> -        && (TYPE_PRECISION (TREE_TYPE (@1))
>> -            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@1))))
>> -        && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
>>         /* The inner conversion must be a widening conversion.  */
>>         && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE
>> (@0))
>> -        && ((GENERIC
>> -             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
>> -                 == TYPE_MAIN_VARIANT (TREE_TYPE (@1)))
>> -             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
>> -                 == TYPE_MAIN_VARIANT (type)))
>> -            || (GIMPLE
>> -                && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))
>> -                && types_compatible_p (TREE_TYPE (@0), type))))
>> +        /* The final precision should match that of operand @0.  */
>> +        && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0))
>> +        /* Make sure the wide operation is dead after the transform.  */
>> +        && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
>>       (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>> -       (convert (op @0 @1)))
>> +       (convert (op @0 (convert @1))))
>>       (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
>>        (convert (op (convert:utype @0) (convert:utype @1)))))))
>
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30  9:22 ` Richard Biener
  2015-04-30 11:10   ` Marc Glisse
@ 2015-04-30 16:04   ` Jeff Law
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff Law @ 2015-04-30 16:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 04/30/2015 03:00 AM, Richard Biener wrote:
>
> Without looking too close at this patch I'll note that we might want to
> improve the previous one first to also handle a constant 2nd operand
> for the operation (your new one also misses that).
Yea, I think you mentioned in that in the 47477 BZ as well.  If you've 
got testcases, pass them along so that we can build testcases around 
those forms as well.



> and it was noticed multiple times that the type comparison boiler-plate
> needs some helper function.  Like
Yes.  It's on the TODO list.  There's certainly more follow-ups in the 
pipeline.  If we want to factor out the boiler-plate now, that works for me.


>
> And if you'd like to lend helping hands to adding patterns then transitioning
> patterns from fold-const.c to match.pd is more appreciated than inventing
> new ones ;)
The next round of work is much more likely to be reimplementing the 
operand shortening code shared between the C/C++ front-ends in match.pd 
and removal of the C/C++ operand shortening code.

This patch didn't fit into that work terribly well and seemed 
self-contained enough to go forward now rather than waiting.

Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30  7:28 ` Marc Glisse
@ 2015-04-30 17:29   ` Jeff Law
  2015-04-30 23:23     ` Marc Glisse
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2015-04-30 17:29 UTC (permalink / raw)
  To: gcc-patches

On 04/30/2015 01:17 AM, Marc Glisse wrote:
>
> +/* This is another case of narrowing, specifically when there's an outer
> +   BIT_AND_EXPR which masks off bits outside the type of the innermost
> +   operands.   Like the previous case we have to convert the operands
> +   to unsigned types to avoid introducing undefined behaviour for the
> +   arithmetic operation.  */
> +(for op (minus plus)
>
> No mult? or widen_mult with a different pattern? (maybe that's already
> done elsewhere)
No mult.  When I worked on the pattern for 47477, supporting mult 
clearly regressed the generated code -- presumably because we can often 
widen the operands for free.

>
> +  (simplify
> +    (bit_and (op (convert@2 @0) (convert@3 @1)) INTEGER_CST@4)
>
> Maybe op@5 and then test single_use on @5? If I compute something, and
> before using it I test if the result is odd, I may not want to recompute
> it.
Sure.  That ought to be easy to add.

>
> +    (if (INTEGRAL_TYPE_P (type)
>
> Can this be false, or is it for documentation?
Can't recall a case where we were presented with a non-integral type, 
but I haven't even tried to work though what might happen on 
non-integral types.  Better safe than sorry.

>
> +        /* We check for type compatibility between @0 and @1 below,
> +           so there's no need to check that @1/@3 are integral types.  */
> +        && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +        && INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +        /* The precision of the type of each operand must match the
> +           precision of the mode of each operand, similarly for the
> +           result.  */
>
> A nicely named helper that does this test would be cool. Every time I
> see it I have to think again why it is necessary, and if there was a
> function, I could refer to the comment above its definition ;-)
Factoring helpers for this stuff is something I wanted to do a bit 
latter as Kai and I build up the necessary patterns to eliminate the 
C/C++ specific operand shortening and hopefully the set of helpers 
needed becomes clearer.

The type_same_p helper clearly already makes sense as there's these two 
shortening patterns and two others that need it.  Given that & Richi's 
request, I'll go ahead and factor that one out.



>
> +        && (TYPE_PRECISION (TREE_TYPE (@0))
> +            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
> +        && (TYPE_PRECISION (TREE_TYPE (@1))
> +            == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@1))))
> +        && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
> +        /* The inner conversion must be a widening conversion.  */
> +        && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE
> (@0))
> +        && ((GENERIC +             && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
> +                 == TYPE_MAIN_VARIANT (TREE_TYPE (@1))))
> +            || (GIMPLE
> +                && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))))
>
> We don't need to be that strict, but this probably covers the most
> common case.
Probably not.  The idea was to start with what we know is right & 
correct, then extend, particularly as we find/build testcases.  THe 
obvious extensions are those Richi pointed out in 47477, then again on 
this thread.  I'd like to tackle them as a follow-up and do so with both 
patterns.

[ They weren't tackled as part of 47477 as I wanted to focus on fixing
   the regression and didn't want to go much beyond what was necessary
   to fix the regression.  Obviously with stage1 open, it's time to
   tackle the cases Richi pointed out that we can/should handle. ]

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30 17:29   ` Jeff Law
@ 2015-04-30 23:23     ` Marc Glisse
  2015-05-01 18:41       ` Jeff Law
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Glisse @ 2015-04-30 23:23 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Thu, 30 Apr 2015, Jeff Law wrote:

> On 04/30/2015 01:17 AM, Marc Glisse wrote:
>> 
>> +/* This is another case of narrowing, specifically when there's an outer
>> +   BIT_AND_EXPR which masks off bits outside the type of the innermost
>> +   operands.   Like the previous case we have to convert the operands
>> +   to unsigned types to avoid introducing undefined behaviour for the
>> +   arithmetic operation.  */
>> +(for op (minus plus)
>> 
>> No mult? or widen_mult with a different pattern? (maybe that's already
>> done elsewhere)
> No mult.  When I worked on the pattern for 47477, supporting mult clearly 
> regressed the generated code -- presumably because we can often widen the 
> operands for free.

It would help with the testcase below, but I am willing to accept that the 
cases where it hurts are more common (and guessing if it will help or hurt 
may be hard), while with +- the cases that help are more common.

void f(short*a) {
   a = __builtin_assume_aligned(a,128);
   for (int i = 0; i < (1<<22); ++i) {
#ifdef EASY
     a[i] *= a[i];
#else
     int x = a[i];
     x *= x;
     a[i] = x;
#endif
   }
}

With EASY, a nice little loop:
.L2:
 	movdqa	(%rdi), %xmm0
 	addq	$16, %rdi
 	pmullw	%xmm0, %xmm0
 	movaps	%xmm0, -16(%rdi)
 	cmpq	%rdi, %rax
 	jne	.L2

while without EASY, we get the uglier:
.L2:
 	movdqa	(%rdi), %xmm0
 	addq	$16, %rdi
 	movdqa	%xmm0, %xmm2
 	movdqa	%xmm0, %xmm1
 	pmullw	%xmm0, %xmm2
 	pmulhw	%xmm0, %xmm1
 	movdqa	%xmm2, %xmm0
 	punpckhwd	%xmm1, %xmm2
 	punpcklwd	%xmm1, %xmm0
 	movdqa	%xmm2, %xmm1
 	movdqa	%xmm0, %xmm2
 	punpcklwd	%xmm1, %xmm0
 	punpckhwd	%xmm1, %xmm2
 	movdqa	%xmm0, %xmm1
 	punpcklwd	%xmm2, %xmm0
 	punpckhwd	%xmm2, %xmm1
 	punpcklwd	%xmm1, %xmm0
 	movaps	%xmm0, -16(%rdi)
 	cmpq	%rdi, %rax
 	jne	.L2

A small pattern like
(simplify
  (vec_pack_trunc (widen_mult_lo @0 @1) (widen_mult_hi:c @0 @1))
  (mult @0 @1))

probably with some tweaks (convert to unsigned? only do it before vector 
lowering?), would fix this particular case, but not as well as narrowing 
before vectorization.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30 23:23     ` Marc Glisse
@ 2015-05-01 18:41       ` Jeff Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2015-05-01 18:41 UTC (permalink / raw)
  To: gcc-patches

On 04/30/2015 03:38 PM, Marc Glisse wrote:
> On Thu, 30 Apr 2015, Jeff Law wrote:
>
>> On 04/30/2015 01:17 AM, Marc Glisse wrote:
>>>
>>> +/* This is another case of narrowing, specifically when there's an
>>> outer
>>> +   BIT_AND_EXPR which masks off bits outside the type of the innermost
>>> +   operands.   Like the previous case we have to convert the operands
>>> +   to unsigned types to avoid introducing undefined behaviour for the
>>> +   arithmetic operation.  */
>>> +(for op (minus plus)
>>>
>>> No mult? or widen_mult with a different pattern? (maybe that's already
>>> done elsewhere)
>> No mult.  When I worked on the pattern for 47477, supporting mult
>> clearly regressed the generated code -- presumably because we can
>> often widen the operands for free.
>
> It would help with the testcase below, but I am willing to accept that
> the cases where it hurts are more common (and guessing if it will help
> or hurt may be hard), while with +- the cases that help are more common.
>
> void f(short*a) {
>    a = __builtin_assume_aligned(a,128);
>    for (int i = 0; i < (1<<22); ++i) {
> #ifdef EASY
>      a[i] *= a[i];
> #else
>      int x = a[i];
>      x *= x;
>      a[i] = x;
> #endif
>    }
> }
Thanks.  I've filed a bug and attached it the operand 
shortening/narrowing BZ (65964).  I strongly suspect there's other bugs 
in BZ that need to be attached to 65964.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: More type narrowing in match.pd
  2015-04-30 11:18     ` Richard Biener
@ 2015-05-01 18:57       ` Jeff Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2015-05-01 18:57 UTC (permalink / raw)
  To: Richard Biener, GCC Patches

On 04/30/2015 05:07 AM, Richard Biener wrote:
> On Thu, Apr 30, 2015 at 12:53 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> On Thu, 30 Apr 2015, Richard Biener wrote:
>>
>>> I have in my local dev tree (so completely untested...)
>>>
>>> @@ -1040,31 +1052,22 @@ (define_operator_list CBRT BUILT_IN_CBRT
>>>     operation and convert the result to the desired type.  */
>>> (for op (plus minus)
>>>    (simplify
>>> -    (convert (op (convert@2 @0) (convert@3 @1)))
>>> +    (convert (op:c@4 (convert@2 @0) (convert?@3 @1)))
>>
>>
>> I believe the :c here requires extra code further down, so we don't turn a-b
>> into b-a.
>
> Indeed.  I've added :c only for minus as 5 - x can't be canonicalized to
> move the constant to 2nd position which is always possible for plus.
>
> Might be cleaner to add a separate pattern for that case.
FWIW, this is the patch that's attached to 65084 (not 47477 as I 
initially reported).  It's supposed to address one or more of the 
example loops in that testcase.

I'd like to tackle it as a follow-up.
jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-05-01 18:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30  3:57 More type narrowing in match.pd Jeff Law
2015-04-30  7:28 ` Marc Glisse
2015-04-30 17:29   ` Jeff Law
2015-04-30 23:23     ` Marc Glisse
2015-05-01 18:41       ` Jeff Law
2015-04-30  9:22 ` Richard Biener
2015-04-30 11:10   ` Marc Glisse
2015-04-30 11:18     ` Richard Biener
2015-05-01 18:57       ` Jeff Law
2015-04-30 16:04   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).