public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
@ 2015-07-24 14:46 Tom de Vries
  2015-07-26 17:00 ` Tom de Vries
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tom de Vries @ 2015-07-24 14:46 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 272 bytes --]

Hi,

this patch allows parallelization and vectorization of reduction 
operators that are guaranteed to not overflow (such as min and max 
operators), independent of the overflow behaviour of the type.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 0002-Allow-non-overflow-ops-in-vect_is_simple_reduction_1.patch --]
[-- Type: text/x-patch, Size: 6542 bytes --]

Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-24  Tom de Vries  <tom@codesourcery.com>

	* tree.c (no_overflow_tree_code): New function.
	* tree.h (no_overflow_tree_code): Declare.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Use
	no_overflow_tree_code.

	* gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").
	Add successful scans for 2 detected reductions.	 Add xfail scans for 3
	detected reductions.
	* gcc.dg/autopar/reduc-2short.c: Same.
	* gcc.dg/autopar/reduc-8.c  (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").  Add successful scans for 2
	detected reductions.
	* gcc.dg/vect/trapv-vect-reduc-4.c: Expect succesful reductions for min
	and max loops.
---
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c     | 10 +++++++---
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c    | 10 ++++++----
 gcc/testsuite/gcc.dg/autopar/reduc-8.c         |  7 ++++---
 gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c |  2 +-
 gcc/tree-vect-loop.c                           |  3 ++-
 gcc/tree.c                                     | 24 ++++++++++++++++++++++++
 gcc/tree.h                                     |  1 +
 7 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 14867f3..a2dad44 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result)
     abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
    int i;
 
@@ -60,7 +61,10 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 7c19cc5..a50e14f 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result)
     abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
    int i;
 
@@ -58,7 +59,8 @@ int main (void)
   return 0;
 }
 
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 1d05c48..18ba03d 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -40,7 +40,8 @@ testmin (const T *c, T init, T result)
     abort ();
 }
 
-int main (void)
+int __attribute__((optimize ("-ftree-parallelize-loops=0")))
+main (void)
 { 
   static signed char A[N] = {
     0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
@@ -84,5 +85,5 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
index 2129717..86f9b90 100644
--- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
@@ -46,4 +46,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c31bfbd..42ba5f8 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2613,7 +2613,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			"reduction: unsafe fp math optimization: ");
       return NULL;
     }
-  else if (INTEGRAL_TYPE_P (type) && check_reduction)
+  else if (INTEGRAL_TYPE_P (type) && check_reduction
+	   && !no_overflow_tree_code (code, type))
     {
       if (TYPE_OVERFLOW_TRAPS (type))
 	{
diff --git a/gcc/tree.c b/gcc/tree.c
index 94263af..5b8dd1a 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7541,6 +7541,30 @@ associative_tree_code (enum tree_code code)
   return false;
 }
 
+/* Return true if CODE represents an tree code that cannot overflow, given
+   operand type OP_TYPE.  Otherwise return false.  */
+bool
+no_overflow_tree_code (enum tree_code code, tree op_type)
+{
+  /* For now, just handle associative tree codes.  */
+  switch (code)
+    {
+    case BIT_IOR_EXPR:
+    case BIT_AND_EXPR:
+    case BIT_XOR_EXPR:
+      return true;
+
+    case MIN_EXPR:
+    case MAX_EXPR:
+      return (ANY_INTEGRAL_TYPE_P (op_type)
+	      && TREE_CODE (op_type) != COMPLEX_TYPE);
+
+    default:
+      break;
+    }
+  return false;
+}
+
 /* Return true if CODE represents a commutative tree code.  Otherwise
    return false.  */
 bool
diff --git a/gcc/tree.h b/gcc/tree.h
index 6df2217..360d13e 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4367,6 +4367,7 @@ extern tree get_file_function_name (const char *);
 extern tree get_callee_fndecl (const_tree);
 extern int type_num_arguments (const_tree);
 extern bool associative_tree_code (enum tree_code);
+extern bool no_overflow_tree_code (enum tree_code, tree);
 extern bool commutative_tree_code (enum tree_code);
 extern bool commutative_ternary_tree_code (enum tree_code);
 extern tree upper_bound_in_type (tree, tree);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-24 14:46 [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1 Tom de Vries
@ 2015-07-26 17:00 ` Tom de Vries
  2015-07-26 17:09   ` Tom de Vries
  2015-07-28  7:41 ` [committed, gomp4] Allow non-overflow ops in vect_is_simple_reduction_1 Tom de Vries
  2015-07-28  8:09 ` [PATCH] " Richard Biener
  2 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2015-07-26 17:00 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 330 bytes --]

On 24/07/15 16:39, Tom de Vries wrote:
> Hi,
>
> this patch allows parallelization and vectorization of reduction
> operators that are guaranteed to not overflow (such as min and max
> operators), independent of the overflow behaviour of the type.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?
>
> Thanks,
> - Tom


[-- Attachment #2: 0002-Handle-non-overflow-reductions-in-graphite.patch --]
[-- Type: text/x-patch, Size: 971 bytes --]

Handle non-overflow reductions in graphite

2015-07-21  Tom de Vries  <tom@codesourcery.com>

	* graphite-sese-to-poly.c (is_reduction_operation_p): Allow operations
	that do not overflow.
---
 gcc/graphite-sese-to-poly.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index c583f16..531c848 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2614,8 +2614,19 @@ is_reduction_operation_p (gimple stmt)
   if (FLOAT_TYPE_P (type))
     return flag_associative_math;
 
-  return (INTEGRAL_TYPE_P (type)
-	  && TYPE_OVERFLOW_WRAPS (type));
+  if (ANY_INTEGRAL_TYPE_P (type))
+    {
+      if (INTEGRAL_TYPE_P (type)
+	  && TYPE_OVERFLOW_WRAPS (type))
+	return true;
+
+      if (no_overflow_tree_code (code, type))
+	return true;
+
+      return false;
+    }
+
+  return false;
 }
 
 /* Returns true when PHI contains an argument ARG.  */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-26 17:00 ` Tom de Vries
@ 2015-07-26 17:09   ` Tom de Vries
  2015-07-28  7:44     ` [committed, gomp4] Handle non-overflow reductions in graphite Tom de Vries
  0 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2015-07-26 17:09 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 524 bytes --]

On 26/07/15 18:49, Tom de Vries wrote:
> On 24/07/15 16:39, Tom de Vries wrote:
>> Hi,
>>
>> this patch allows parallelization and vectorization of reduction
>> operators that are guaranteed to not overflow (such as min and max
>> operators), independent of the overflow behaviour of the type.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for trunk?
>>
>> Thanks,
>> - Tom

[ Slip-of-the-keyboard ]

This is the graphite version of this patch.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0002-Handle-non-overflow-reductions-in-graphite.patch --]
[-- Type: text/x-patch, Size: 971 bytes --]

Handle non-overflow reductions in graphite

2015-07-21  Tom de Vries  <tom@codesourcery.com>

	* graphite-sese-to-poly.c (is_reduction_operation_p): Allow operations
	that do not overflow.
---
 gcc/graphite-sese-to-poly.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index c583f16..531c848 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2614,8 +2614,19 @@ is_reduction_operation_p (gimple stmt)
   if (FLOAT_TYPE_P (type))
     return flag_associative_math;
 
-  return (INTEGRAL_TYPE_P (type)
-	  && TYPE_OVERFLOW_WRAPS (type));
+  if (ANY_INTEGRAL_TYPE_P (type))
+    {
+      if (INTEGRAL_TYPE_P (type)
+	  && TYPE_OVERFLOW_WRAPS (type))
+	return true;
+
+      if (no_overflow_tree_code (code, type))
+	return true;
+
+      return false;
+    }
+
+  return false;
 }
 
 /* Returns true when PHI contains an argument ARG.  */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [committed, gomp4] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-24 14:46 [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1 Tom de Vries
  2015-07-26 17:00 ` Tom de Vries
@ 2015-07-28  7:41 ` Tom de Vries
  2015-07-28  8:09 ` [PATCH] " Richard Biener
  2 siblings, 0 replies; 12+ messages in thread
From: Tom de Vries @ 2015-07-28  7:41 UTC (permalink / raw)
  To: gcc-patches

On 24/07/15 16:39, Tom de Vries wrote:
> Hi,
>
> this patch allows parallelization and vectorization of reduction
> operators that are guaranteed to not overflow (such as min and max
> operators), independent of the overflow behaviour of the type.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?
>

Committed to gomp-4_0-branch.

Thanks,
- Tom

> 0002-Allow-non-overflow-ops-in-vect_is_simple_reduction_1.patch
>
>
> Allow non-overflow ops in vect_is_simple_reduction_1
>
> 2015-07-24  Tom de Vries<tom@codesourcery.com>
>
> 	* tree.c (no_overflow_tree_code): New function.
> 	* tree.h (no_overflow_tree_code): Declare.
> 	* tree-vect-loop.c (vect_is_simple_reduction_1): Use
> 	no_overflow_tree_code.
>
> 	* gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute
> 	optimize ("-ftree-parallelize-loops=0").
> 	Add successful scans for 2 detected reductions.	 Add xfail scans for 3
> 	detected reductions.
> 	* gcc.dg/autopar/reduc-2short.c: Same.
> 	* gcc.dg/autopar/reduc-8.c  (init_arrays): Mark with attribute
> 	optimize ("-ftree-parallelize-loops=0").  Add successful scans for 2
> 	detected reductions.
> 	* gcc.dg/vect/trapv-vect-reduc-4.c: Expect succesful reductions for min
> 	and max loops.
> ---
>   gcc/testsuite/gcc.dg/autopar/reduc-2char.c     | 10 +++++++---
>   gcc/testsuite/gcc.dg/autopar/reduc-2short.c    | 10 ++++++----
>   gcc/testsuite/gcc.dg/autopar/reduc-8.c         |  7 ++++---
>   gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c |  2 +-
>   gcc/tree-vect-loop.c                           |  3 ++-
>   gcc/tree.c                                     | 24 ++++++++++++++++++++++++
>   gcc/tree.h                                     |  1 +
>   7 files changed, 45 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
> index 14867f3..a2dad44 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
> @@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result)
>       abort ();
>   }
>
> - __attribute__((noinline))
> - void init_arrays ()
> +void __attribute__((noinline))
> +  __attribute__((optimize ("-ftree-parallelize-loops=0")))
> +init_arrays ()
>    {
>      int i;
>
> @@ -60,7 +61,10 @@ int main (void)
>   }
>
>
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
> +
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
>   /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
>
>
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
> index 7c19cc5..a50e14f 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
> @@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result)
>       abort ();
>   }
>
> - __attribute__((noinline))
> - void init_arrays ()
> +void __attribute__((noinline))
> +  __attribute__((optimize ("-ftree-parallelize-loops=0")))
> +init_arrays ()
>    {
>      int i;
>
> @@ -58,7 +59,8 @@ int main (void)
>     return 0;
>   }
>
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
>
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
>   /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
> -
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
> index 1d05c48..18ba03d 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
> @@ -40,7 +40,8 @@ testmin (const T *c, T init, T result)
>       abort ();
>   }
>
> -int main (void)
> +int __attribute__((optimize ("-ftree-parallelize-loops=0")))
> +main (void)
>   {
>     static signed char A[N] = {
>       0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
> @@ -84,5 +85,5 @@ int main (void)
>   }
>
>
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
> index 2129717..86f9b90 100644
> --- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
> @@ -46,4 +46,4 @@ int main (void)
>     return 0;
>   }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c31bfbd..42ba5f8 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2613,7 +2613,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
>   			"reduction: unsafe fp math optimization: ");
>         return NULL;
>       }
> -  else if (INTEGRAL_TYPE_P (type) && check_reduction)
> +  else if (INTEGRAL_TYPE_P (type) && check_reduction
> +	   && !no_overflow_tree_code (code, type))
>       {
>         if (TYPE_OVERFLOW_TRAPS (type))
>   	{
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 94263af..5b8dd1a 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -7541,6 +7541,30 @@ associative_tree_code (enum tree_code code)
>     return false;
>   }
>
> +/* Return true if CODE represents an tree code that cannot overflow, given
> +   operand type OP_TYPE.  Otherwise return false.  */
> +bool
> +no_overflow_tree_code (enum tree_code code, tree op_type)
> +{
> +  /* For now, just handle associative tree codes.  */
> +  switch (code)
> +    {
> +    case BIT_IOR_EXPR:
> +    case BIT_AND_EXPR:
> +    case BIT_XOR_EXPR:
> +      return true;
> +
> +    case MIN_EXPR:
> +    case MAX_EXPR:
> +      return (ANY_INTEGRAL_TYPE_P (op_type)
> +	      && TREE_CODE (op_type) != COMPLEX_TYPE);
> +
> +    default:
> +      break;
> +    }
> +  return false;
> +}
> +
>   /* Return true if CODE represents a commutative tree code.  Otherwise
>      return false.  */
>   bool
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 6df2217..360d13e 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4367,6 +4367,7 @@ extern tree get_file_function_name (const char *);
>   extern tree get_callee_fndecl (const_tree);
>   extern int type_num_arguments (const_tree);
>   extern bool associative_tree_code (enum tree_code);
> +extern bool no_overflow_tree_code (enum tree_code, tree);
>   extern bool commutative_tree_code (enum tree_code);
>   extern bool commutative_ternary_tree_code (enum tree_code);
>   extern tree upper_bound_in_type (tree, tree);
> -- 1.9.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [committed, gomp4] Handle non-overflow reductions in graphite
  2015-07-26 17:09   ` Tom de Vries
@ 2015-07-28  7:44     ` Tom de Vries
  0 siblings, 0 replies; 12+ messages in thread
From: Tom de Vries @ 2015-07-28  7:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek

On 26/07/15 18:53, Tom de Vries wrote:
> On 26/07/15 18:49, Tom de Vries wrote:
>> On 24/07/15 16:39, Tom de Vries wrote:
>>> Hi,
>>>
>>> this patch allows parallelization and vectorization of reduction
>>> operators that are guaranteed to not overflow (such as min and max
>>> operators), independent of the overflow behaviour of the type.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>>
>>> Thanks,
>>> - Tom
>
> [ Slip-of-the-keyboard ]
>
> This is the graphite version of this patch.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?
>

Committed to gomp-4_0-branch.

Thanks,
- Tom

> 0002-Handle-non-overflow-reductions-in-graphite.patch
>
>
> Handle non-overflow reductions in graphite
>
> 2015-07-21  Tom de Vries<tom@codesourcery.com>
>
> 	* graphite-sese-to-poly.c (is_reduction_operation_p): Allow operations
> 	that do not overflow.
> ---
>   gcc/graphite-sese-to-poly.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
> index c583f16..531c848 100644
> --- a/gcc/graphite-sese-to-poly.c
> +++ b/gcc/graphite-sese-to-poly.c
> @@ -2614,8 +2614,19 @@ is_reduction_operation_p (gimple stmt)
>     if (FLOAT_TYPE_P (type))
>       return flag_associative_math;
>
> -  return (INTEGRAL_TYPE_P (type)
> -	  && TYPE_OVERFLOW_WRAPS (type));
> +  if (ANY_INTEGRAL_TYPE_P (type))
> +    {
> +      if (INTEGRAL_TYPE_P (type)
> +	  && TYPE_OVERFLOW_WRAPS (type))
> +	return true;
> +
> +      if (no_overflow_tree_code (code, type))
> +	return true;
> +
> +      return false;
> +    }
> +
> +  return false;
>   }
>
>   /* Returns true when PHI contains an argument ARG.  */
> -- 1.9.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-24 14:46 [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1 Tom de Vries
  2015-07-26 17:00 ` Tom de Vries
  2015-07-28  7:41 ` [committed, gomp4] Allow non-overflow ops in vect_is_simple_reduction_1 Tom de Vries
@ 2015-07-28  8:09 ` Richard Biener
  2015-07-28 12:30   ` Tom de Vries
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2015-07-28  8:09 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> Hi,
>
> this patch allows parallelization and vectorization of reduction operators
> that are guaranteed to not overflow (such as min and max operators),
> independent of the overflow behaviour of the type.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?

Hmm, I don't like that no_overflow_tree_code function.  We have a much more
clear understanding which codes may overflow or trap.  Thus please add
a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like

bool
operation_overflow_traps (tree type, enum tree_code code)
{
  if (!ANY_INTEGRAL_TYPE_P (type)
     || !TYPE_OVERFLOW_TRAPS (type))
    return false;
  switch (code)
    {
    case PLUS_EXPR:
    case MINUS_EXPR:
    case MULT_EXPR:
    case LSHIFT_EXPR:
       /* Can overflow in various ways */
    case TRUNC_DIV_EXPR:
    case EXACT_DIV_EXPR:
    case FLOOR_DIV_EXPR:
    case CEIL_DIV_EXPR:
       /* For INT_MIN / -1 */
    case NEGATE_EXPR:
    case ABS_EXPR:
       /* For -INT_MIN */
       return true;
    default:
       return false;
   }
}

and similar variants for _wraps and _undefined.  I think we decided at
some point
the compiler should not take advantage of the fact that lshift or
*_div have undefined
behavior on signed integer overflow, similar we only take advantage of
integral-type
overflow behavior, not vector or complex.  So we could reduce the
number of cases
the functions return true if we document that it returns true only for
the cases where
the compiler needs to / may assume wrapping behavior does not take place.
As for _traps for example we only have optabs and libfuncs for
plus,minus,mult,negate
and abs.

Thanks,
Richard.

> Thanks,
> - Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-28  8:09 ` [PATCH] " Richard Biener
@ 2015-07-28 12:30   ` Tom de Vries
  2015-07-29  8:28     ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2015-07-28 12:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

On 28/07/15 09:59, Richard Biener wrote:
> On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> Hi,
>>
>> this patch allows parallelization and vectorization of reduction operators
>> that are guaranteed to not overflow (such as min and max operators),
>> independent of the overflow behaviour of the type.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for trunk?
>
> Hmm, I don't like that no_overflow_tree_code function.  We have a much more
> clear understanding which codes may overflow or trap.  Thus please add
> a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like
>

Done.

> bool
> operation_overflow_traps (tree type, enum tree_code code)
> {
>    if (!ANY_INTEGRAL_TYPE_P (type)

I've changed this test into a gcc_checking_assert.

>       || !TYPE_OVERFLOW_TRAPS (type))
>      return false;
>    switch (code)
>      {
>      case PLUS_EXPR:
>      case MINUS_EXPR:
>      case MULT_EXPR:
>      case LSHIFT_EXPR:
>         /* Can overflow in various ways */
>      case TRUNC_DIV_EXPR:
>      case EXACT_DIV_EXPR:
>      case FLOOR_DIV_EXPR:
>      case CEIL_DIV_EXPR:
>         /* For INT_MIN / -1 */
>      case NEGATE_EXPR:
>      case ABS_EXPR:
>         /* For -INT_MIN */
>         return true;
>      default:
>         return false;
>     }
> }
>
> and similar variants for _wraps and _undefined.  I think we decided at
> some point
> the compiler should not take advantage of the fact that lshift or
> *_div have undefined
> behavior on signed integer overflow, similar we only take advantage of
> integral-type
> overflow behavior, not vector or complex.  So we could reduce the
> number of cases
> the functions return true if we document that it returns true only for
> the cases where
> the compiler needs to / may assume wrapping behavior does not take place.
> As for _traps for example we only have optabs and libfuncs for
> plus,minus,mult,negate
> and abs.

I've tried to capture all of this in the three new functions:
- operation_overflows_and_traps
- operation_no_overflow_or_wraps
- operation_overflows_and_undefined (unused atm)

I've also added the graphite bit.

OK for trunk, if bootstrap and reg-test succeeds?

Thanks,
- Tom

[-- Attachment #2: 0001-Allow-non-overflow-ops-in-vect_is_simple_reduction_1.patch --]
[-- Type: text/x-patch, Size: 10627 bytes --]

Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-28  Tom de Vries  <tom@codesourcery.com>

	* tree.c (operation_overflows_and_traps, operation_no_overflow_or_wraps)
	(operation_overflows_and_undefined): New function.
	* tree.h (operation_overflows_and_traps, operation_no_overflow_or_wraps)
	(operation_overflows_and_undefined): Declare.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Use
	operation_overflows_and_traps and operation_overflows_and_wraps.
	* graphite-sese-to-poly.c (is_reduction_operation_p): Same.

	* gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").
	Add successful scans for 2 detected reductions.	 Add xfail scans for 3
	detected reductions.
	* gcc.dg/autopar/reduc-2short.c: Same.
	* gcc.dg/autopar/reduc-8.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").  Add successful scans for 2
	detected reductions.
	* gcc.dg/vect/trapv-vect-reduc-4.c: Update scan to match vectorized min
	and max reductions.
---
 gcc/graphite-sese-to-poly.c                    |   6 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c     |  10 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c    |  10 +-
 gcc/testsuite/gcc.dg/autopar/reduc-8.c         |   7 +-
 gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c |   2 +-
 gcc/tree-vect-loop.c                           |   5 +-
 gcc/tree.c                                     | 125 +++++++++++++++++++++++++
 gcc/tree.h                                     |   3 +
 8 files changed, 153 insertions(+), 15 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index c583f16..b57dc9c 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2614,8 +2614,10 @@ is_reduction_operation_p (gimple stmt)
   if (FLOAT_TYPE_P (type))
     return flag_associative_math;
 
-  return (INTEGRAL_TYPE_P (type)
-	  && TYPE_OVERFLOW_WRAPS (type));
+  if (ANY_INTEGRAL_TYPE_P (type))
+    return operation_no_overflow_or_wraps (type, code);
+
+  return false;
 }
 
 /* Returns true when PHI contains an argument ARG.  */
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 14867f3..a2dad44 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result)
     abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
    int i;
 
@@ -60,7 +61,10 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 7c19cc5..a50e14f 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result)
     abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
    int i;
 
@@ -58,7 +59,8 @@ int main (void)
   return 0;
 }
 
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 1d05c48..18ba03d 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -40,7 +40,8 @@ testmin (const T *c, T init, T result)
     abort ();
 }
 
-int main (void)
+int __attribute__((optimize ("-ftree-parallelize-loops=0")))
+main (void)
 { 
   static signed char A[N] = {
     0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
@@ -84,5 +85,5 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
index 2129717..86f9b90 100644
--- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
@@ -46,4 +46,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c31bfbd..38eab77 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2615,7 +2615,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
     }
   else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
-      if (TYPE_OVERFLOW_TRAPS (type))
+      if (operation_overflows_and_traps (type, code))
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
@@ -2624,7 +2624,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			    " (overflow traps): ");
 	  return NULL;
 	}
-      if (need_wrapping_integral_overflow && !TYPE_OVERFLOW_WRAPS (type))
+      if (need_wrapping_integral_overflow
+	  && !operation_no_overflow_or_wraps (type, code))
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
diff --git a/gcc/tree.c b/gcc/tree.c
index 94263af..8708727 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7597,6 +7597,131 @@ commutative_ternary_tree_code (enum tree_code code)
   return false;
 }
 
+/* Returns true if CODE operating on operands of type TYPE can overflow, and
+   fwrapv generates trapping insns for CODE.  */
+
+bool
+operation_overflows_and_traps (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+
+  /* We don't take advantage of integral type overflow behaviour for complex and
+     vector types.  */
+  if (!INTEGRAL_TYPE_P (type))
+    return true;
+
+  if (!TYPE_OVERFLOW_TRAPS (type))
+    return false;
+
+  switch (code)
+    {
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+    case LSHIFT_EXPR:
+      /* Can overflow in various ways.  */
+      return true;
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+      /* These operators can overflow, but -fwrapv only generates trapping code
+	 for addition, subtraction and multiplication operations.  */
+      return false;
+    case NEGATE_EXPR:
+    case ABS_EXPR:
+      /* For -INT_MIN.  */
+      return true;
+    default:
+      return false;
+    }
+}
+
+/* Returns true if CODE operating on operands of type TYPE cannot overflow, or
+   wraps on overflow.  */
+
+bool
+operation_no_overflow_or_wraps (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+
+  /* We don't take advantage of integral type overflow behaviour for complex and
+     vector types.  */
+  if (!INTEGRAL_TYPE_P (type))
+    return false;
+
+  if (TYPE_OVERFLOW_WRAPS (type))
+    return true;
+
+  switch (code)
+    {
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+    case LSHIFT_EXPR:
+      /* Can overflow in various ways.  */
+      return false;
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+      /* For INT_MIN / -1.  */
+      return false;
+    case NEGATE_EXPR:
+    case ABS_EXPR:
+      /* For -INT_MIN.  */
+      return false;
+    default:
+      return true;
+    }
+}
+
+/* Returns true if CODE operating on operands of type TYPE can overflow, and
+   overflow is undefined.  */
+
+bool
+operation_overflow_and_undefined (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+
+  /* We don't take advantage of integral type overflow behaviour for complex and
+     vector types.  */
+  if (!INTEGRAL_TYPE_P (type))
+    return false;
+
+  if (!TYPE_OVERFLOW_UNDEFINED (type))
+    return false;
+
+  switch (code)
+    {
+    case LSHIFT_EXPR:
+      /* LSHIFT_EXPR can overflow, but we don't take advantage of that:
+	 GCC manual, C Implementation-defined behavior, Integers implementation:
+	 GCC does not use the latitude given in C99 and C11 only to treat
+	 certain aspects of signed << as undefined, but this is subject to
+	 change.  */
+      return false;
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+      /* These operators can overflow, but we don't take advantage of that.
+	 FIXME: where has this been documented?  */
+      return false;
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+      /* Can overflow in various ways.  */
+      return true;
+    case NEGATE_EXPR:
+    case ABS_EXPR:
+      /* For -INT_MIN.  */
+      return true;
+    default:
+      return false;
+    }
+}
+
 namespace inchash
 {
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 6df2217..1e44f55 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4369,6 +4369,9 @@ extern int type_num_arguments (const_tree);
 extern bool associative_tree_code (enum tree_code);
 extern bool commutative_tree_code (enum tree_code);
 extern bool commutative_ternary_tree_code (enum tree_code);
+extern bool operation_overflows_and_traps (tree, enum tree_code);
+extern bool operation_no_overflow_or_wraps (tree, enum tree_code);
+extern bool operation_overflows_and_undefined (tree, enum tree_code);
 extern tree upper_bound_in_type (tree, tree);
 extern tree lower_bound_in_type (tree, tree);
 extern int operand_equal_for_phi_arg_p (const_tree, const_tree);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-28 12:30   ` Tom de Vries
@ 2015-07-29  8:28     ` Richard Biener
  2015-07-29 11:59       ` Tom de Vries
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2015-07-29  8:28 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 28/07/15 09:59, Richard Biener wrote:
>>
>> On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> this patch allows parallelization and vectorization of reduction
>>> operators
>>> that are guaranteed to not overflow (such as min and max operators),
>>> independent of the overflow behaviour of the type.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>
>>
>> Hmm, I don't like that no_overflow_tree_code function.  We have a much
>> more
>> clear understanding which codes may overflow or trap.  Thus please add
>> a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like
>>
>
> Done.
>
>> bool
>> operation_overflow_traps (tree type, enum tree_code code)
>> {
>>    if (!ANY_INTEGRAL_TYPE_P (type)
>
>
> I've changed this test into a gcc_checking_assert.
>
>
>>       || !TYPE_OVERFLOW_TRAPS (type))
>>      return false;
>>    switch (code)
>>      {
>>      case PLUS_EXPR:
>>      case MINUS_EXPR:
>>      case MULT_EXPR:
>>      case LSHIFT_EXPR:
>>         /* Can overflow in various ways */
>>      case TRUNC_DIV_EXPR:
>>      case EXACT_DIV_EXPR:
>>      case FLOOR_DIV_EXPR:
>>      case CEIL_DIV_EXPR:
>>         /* For INT_MIN / -1 */
>>      case NEGATE_EXPR:
>>      case ABS_EXPR:
>>         /* For -INT_MIN */
>>         return true;
>>      default:
>>         return false;
>>     }
>> }
>>
>> and similar variants for _wraps and _undefined.  I think we decided at
>> some point
>> the compiler should not take advantage of the fact that lshift or
>> *_div have undefined
>> behavior on signed integer overflow, similar we only take advantage of
>> integral-type
>> overflow behavior, not vector or complex.  So we could reduce the
>> number of cases
>> the functions return true if we document that it returns true only for
>> the cases where
>> the compiler needs to / may assume wrapping behavior does not take place.
>> As for _traps for example we only have optabs and libfuncs for
>> plus,minus,mult,negate
>> and abs.
>
>
> I've tried to capture all of this in the three new functions:
> - operation_overflows_and_traps
> - operation_no_overflow_or_wraps
> - operation_overflows_and_undefined (unused atm)
>
> I've also added the graphite bit.
>
> OK for trunk, if bootstrap and reg-test succeeds?

+/* Returns true if CODE operating on operands of type TYPE can overflow, and
+   fwrapv generates trapping insns for CODE.  */

ftrapv

+bool
+operation_overflows_and_traps (tree type, enum tree_code code)
+{

operation_overflow_traps

is better wording.  Meaning that when the operation overflows then it traps.

+  /* We don't take advantage of integral type overflow behaviour for
complex and
+     vector types.  */

We don't generate instructions that trap on overflow for complex or vector types

+  if (!INTEGRAL_TYPE_P (type))
+    return true;

+  switch (code)
+    {
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+    case LSHIFT_EXPR:
+      /* Can overflow in various ways.  */

we don't have a trapping optab for lshift

+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:

nor division.  See optabs.c:optab_for_tree_code.  I suggest to only return true
for those.

+/* Returns true if CODE operating on operands of type TYPE cannot overflow, or
+   wraps on overflow.  */
+
+bool
+operation_no_overflow_or_wraps (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));

operation_overflow_wraps ()

is still my preferred name.

+bool
+operation_overflow_and_undefined (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+

operation_overflow_undefined ()

If you like to keep an explicit operation_can_overflow then there is the
opportunity to split out the switch statement from operation_overflow_wraps
and operation_overflow_undefined.

Thanks,
Richard.




> Thanks,
> - Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-29  8:28     ` Richard Biener
@ 2015-07-29 11:59       ` Tom de Vries
  2015-07-29 12:13         ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2015-07-29 11:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6008 bytes --]

On 29/07/15 10:09, Richard Biener wrote:
> On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 28/07/15 09:59, Richard Biener wrote:
>>>
>>> On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries <Tom_deVries@mentor.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> this patch allows parallelization and vectorization of reduction
>>>> operators
>>>> that are guaranteed to not overflow (such as min and max operators),
>>>> independent of the overflow behaviour of the type.
>>>>
>>>> Bootstrapped and reg-tested on x86_64.
>>>>
>>>> OK for trunk?
>>>
>>>
>>> Hmm, I don't like that no_overflow_tree_code function.  We have a much
>>> more
>>> clear understanding which codes may overflow or trap.  Thus please add
>>> a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like
>>>
>>
>> Done.
>>
>>> bool
>>> operation_overflow_traps (tree type, enum tree_code code)
>>> {
>>>     if (!ANY_INTEGRAL_TYPE_P (type)
>>
>>
>> I've changed this test into a gcc_checking_assert.
>>
>>
>>>        || !TYPE_OVERFLOW_TRAPS (type))
>>>       return false;
>>>     switch (code)
>>>       {
>>>       case PLUS_EXPR:
>>>       case MINUS_EXPR:
>>>       case MULT_EXPR:
>>>       case LSHIFT_EXPR:
>>>          /* Can overflow in various ways */
>>>       case TRUNC_DIV_EXPR:
>>>       case EXACT_DIV_EXPR:
>>>       case FLOOR_DIV_EXPR:
>>>       case CEIL_DIV_EXPR:
>>>          /* For INT_MIN / -1 */
>>>       case NEGATE_EXPR:
>>>       case ABS_EXPR:
>>>          /* For -INT_MIN */
>>>          return true;
>>>       default:
>>>          return false;
>>>      }
>>> }
>>>
>>> and similar variants for _wraps and _undefined.  I think we decided at
>>> some point
>>> the compiler should not take advantage of the fact that lshift or
>>> *_div have undefined
>>> behavior on signed integer overflow, similar we only take advantage of
>>> integral-type
>>> overflow behavior, not vector or complex.  So we could reduce the
>>> number of cases
>>> the functions return true if we document that it returns true only for
>>> the cases where
>>> the compiler needs to / may assume wrapping behavior does not take place.
>>> As for _traps for example we only have optabs and libfuncs for
>>> plus,minus,mult,negate
>>> and abs.
>>
>>
>> I've tried to capture all of this in the three new functions:
>> - operation_overflows_and_traps
>> - operation_no_overflow_or_wraps
>> - operation_overflows_and_undefined (unused atm)
>>
>> I've also added the graphite bit.
>>
>> OK for trunk, if bootstrap and reg-test succeeds?
>
> +/* Returns true if CODE operating on operands of type TYPE can overflow, and
> +   fwrapv generates trapping insns for CODE.  */
>
> ftrapv
>

Done.

> +bool
> +operation_overflows_and_traps (tree type, enum tree_code code)
> +{
>
> operation_overflow_traps
>
> is better wording.  Meaning that when the operation overflows then it traps.
>

AFAIU, the purpose of the function is to enable optimizations when it 
returns false, that is:
- if the operation doesn't overflow, or
- if the operation overflows, but doesn't trap.

The name operation_overflow_traps does not make clear what it returns 
when the operation doesn't overflow. If the name doesn't make it clear, 
you need to be conservative, that is, return true. Which defies the 
purpose of the function.

I've changed the name to operation_no_trapping_overflow (and inverted 
logic in the function).

But perhaps you want operation_overflow_traps with a conservative return 
for non-overflow operations, and use it like this:
...
   else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
       if (operation_overflows (type, code)
           && operation_overflow_traps (type, code))
         {
           /* Changing the order of operations changes the semantics.  */
...
?

> +  /* We don't take advantage of integral type overflow behaviour for
> complex and
> +     vector types.  */
>
> We don't generate instructions that trap on overflow for complex or vector types
>

Done.

> +  if (!INTEGRAL_TYPE_P (type))
> +    return true;
>
> +  switch (code)
> +    {
> +    case PLUS_EXPR:
> +    case MINUS_EXPR:
> +    case MULT_EXPR:
> +    case LSHIFT_EXPR:
> +      /* Can overflow in various ways.  */
>
> we don't have a trapping optab for lshift
>
> +    case TRUNC_DIV_EXPR:
> +    case EXACT_DIV_EXPR:
> +    case FLOOR_DIV_EXPR:
> +    case CEIL_DIV_EXPR:
>
> nor division.  See optabs.c:optab_for_tree_code.  I suggest to only return true
> for those.
>

Before the logic inversion, we return false for these (And also for 
operators that do not overflow).

> +/* Returns true if CODE operating on operands of type TYPE cannot overflow, or
> +   wraps on overflow.  */
> +
> +bool
> +operation_no_overflow_or_wraps (tree type, enum tree_code code)
> +{
> +  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
>
> operation_overflow_wraps ()
>
> is still my preferred name.
>

The name operation_overflow_wraps doesn't make clear what it returns if 
the operation doesn't overflow. And I didn't manage to come up with a 
better name sofar.

Btw, I wonder about something like vector max operation. The current 
implementation of operation_no_overflow_or_wraps returns false. Could we do:
...
  /* We don't take advantage of integral type overflow behaviour for
     complex and vector types.  */
   if (!INTEGRAL_TYPE_P (type))
     return !operation_overflows (type, code);
...
?

> +bool
> +operation_overflow_and_undefined (tree type, enum tree_code code)
> +{
> +  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
> +
>
> operation_overflow_undefined ()
>

The name operation_overflow_undefined doesn't make clear what it returns 
if the operation doesn't overflow. I've changed it into 
operation_undefined_overflow.

> If you like to keep an explicit operation_can_overflow then there is the
> opportunity to split out the switch statement from operation_overflow_wraps
> and operation_overflow_undefined.
>

Done.

OK for trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom

[-- Attachment #2: 0001-Allow-non-overflow-ops-in-reductions.patch --]
[-- Type: text/x-patch, Size: 10841 bytes --]

Allow non-overflow ops in reductions

2015-07-28  Tom de Vries  <tom@codesourcery.com>

	* tree.c (operations_overflows, operation_no_trapping_overflow)
	(operation_no_overflow_or_wraps, operation_undefined_overflow): New
	function.
	* tree.h (operations_overflows, operation_no_trapping_overflow)
	(operation_no_overflow_or_wraps, operation_undefined_overflow): Declare.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Use
	operation_no_trapping_overflow and operation_no_overflow_or_wraps.
	* graphite-sese-to-poly.c (is_reduction_operation_p): Use
	operation_no_overflow_or_wraps.

	* gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").
	Add successful scans for 2 detected reductions.	 Add xfail scans for 3
	detected reductions.
	* gcc.dg/autopar/reduc-2short.c: Same.
	* gcc.dg/autopar/reduc-8.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").  Add successful scans for 2
	detected reductions.
	* gcc.dg/vect/trapv-vect-reduc-4.c: Update scan to match vectorized min
	and max reductions.
---
 gcc/graphite-sese-to-poly.c                    |   6 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c     |  10 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c    |  10 +-
 gcc/testsuite/gcc.dg/autopar/reduc-8.c         |   7 +-
 gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c |   2 +-
 gcc/tree-vect-loop.c                           |   5 +-
 gcc/tree.c                                     | 127 +++++++++++++++++++++++++
 gcc/tree.h                                     |   4 +
 8 files changed, 156 insertions(+), 15 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index c583f16..b57dc9c 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2614,8 +2614,10 @@ is_reduction_operation_p (gimple stmt)
   if (FLOAT_TYPE_P (type))
     return flag_associative_math;
 
-  return (INTEGRAL_TYPE_P (type)
-	  && TYPE_OVERFLOW_WRAPS (type));
+  if (ANY_INTEGRAL_TYPE_P (type))
+    return operation_no_overflow_or_wraps (type, code);
+
+  return false;
 }
 
 /* Returns true when PHI contains an argument ARG.  */
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 14867f3..a2dad44 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result)
     abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
    int i;
 
@@ -60,7 +61,10 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 7c19cc5..a50e14f 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result)
     abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
    int i;
 
@@ -58,7 +59,8 @@ int main (void)
   return 0;
 }
 
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 1d05c48..18ba03d 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -40,7 +40,8 @@ testmin (const T *c, T init, T result)
     abort ();
 }
 
-int main (void)
+int __attribute__((optimize ("-ftree-parallelize-loops=0")))
+main (void)
 { 
   static signed char A[N] = {
     0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
@@ -84,5 +85,5 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
index 2129717..86f9b90 100644
--- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
@@ -46,4 +46,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c31bfbd..b89c09b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2615,7 +2615,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
     }
   else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
-      if (TYPE_OVERFLOW_TRAPS (type))
+      if (!operation_no_trapping_overflow (type, code))
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
@@ -2624,7 +2624,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			    " (overflow traps): ");
 	  return NULL;
 	}
-      if (need_wrapping_integral_overflow && !TYPE_OVERFLOW_WRAPS (type))
+      if (need_wrapping_integral_overflow
+	  && !operation_no_overflow_or_wraps (type, code))
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
diff --git a/gcc/tree.c b/gcc/tree.c
index 94263af..08fba14 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7597,6 +7597,133 @@ commutative_ternary_tree_code (enum tree_code code)
   return false;
 }
 
+/* Returns true if CODE operating on operands of type TYPE can overflow.  */
+
+bool
+operation_overflows (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+
+  switch (code)
+    {
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+    case LSHIFT_EXPR:
+      /* Can overflow in various ways.  */
+      return true;
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+      /* For INT_MIN / -1.  */
+      return true;
+    case NEGATE_EXPR:
+    case ABS_EXPR:
+      /* For -INT_MIN.  */
+      return true;
+    default:
+      /* These operators cannot overflow.  */
+      return false;
+    }
+}
+
+/* Returns true if CODE operating on operands of type TYPE doesn't overflow, or
+   ftrapv doesn't generate trapping insns for CODE.  */
+
+bool
+operation_no_trapping_overflow (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+
+  /* We don't generate instructions that trap on overflow for complex or vector
+     types.  */
+  if (!INTEGRAL_TYPE_P (type))
+    return true;
+
+  if (!TYPE_OVERFLOW_TRAPS (type))
+    return true;
+
+  switch (code)
+    {
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+    case NEGATE_EXPR:
+    case ABS_EXPR:
+      /* These operators can overflow, and -ftrapv generates trapping code for
+	 these.  */
+      return false;
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+    case LSHIFT_EXPR:
+      /* These operators can overflow, but -ftrapv does not generate trapping
+	 code for these.  */
+      return true;
+    default:
+      /* These operators cannot overflow.  */
+      return true;
+    }
+}
+
+/* Returns true if CODE operating on operands of type TYPE cannot overflow, or
+   wraps on overflow.  */
+
+bool
+operation_no_overflow_or_wraps (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+
+  /* We don't take advantage of integral type overflow behaviour for complex and
+     vector types.  */
+  if (!INTEGRAL_TYPE_P (type))
+    return false;
+
+  if (TYPE_OVERFLOW_WRAPS (type))
+    return true;
+
+  return !operation_overflows (type, code);
+}
+
+/* Returns true if CODE operating on operands of type TYPE can overflow, and
+   overflow is undefined.  */
+
+bool
+operation_undefined_overflow (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+
+  /* We don't take advantage of integral type overflow behaviour for complex and
+     vector types.  */
+  if (!INTEGRAL_TYPE_P (type))
+    return false;
+
+  if (!TYPE_OVERFLOW_UNDEFINED (type))
+    return false;
+
+  switch (code)
+    {
+    case LSHIFT_EXPR:
+      /* LSHIFT_EXPR can overflow, but we don't take advantage of that:
+	 GCC manual, C Implementation-defined behavior, Integers implementation:
+	 GCC does not use the latitude given in C99 and C11 only to treat
+	 certain aspects of signed << as undefined, but this is subject to
+	 change.  */
+      return false;
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+      /* These operators can overflow, but we don't take advantage of that.
+	 FIXME: where has this been documented?  */
+      return false;
+    default:
+      return operation_overflows (type, code);
+    }
+}
+
 namespace inchash
 {
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 6df2217..76a004e 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4369,6 +4369,10 @@ extern int type_num_arguments (const_tree);
 extern bool associative_tree_code (enum tree_code);
 extern bool commutative_tree_code (enum tree_code);
 extern bool commutative_ternary_tree_code (enum tree_code);
+extern bool operation_overflows (tree, enum tree_code);
+extern bool operation_no_trapping_overflow (tree, enum tree_code);
+extern bool operation_no_overflow_or_wraps (tree, enum tree_code);
+extern bool operation_undefined_overflow (tree, enum tree_code);
 extern tree upper_bound_in_type (tree, tree);
 extern tree lower_bound_in_type (tree, tree);
 extern int operand_equal_for_phi_arg_p (const_tree, const_tree);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-29 11:59       ` Tom de Vries
@ 2015-07-29 12:13         ` Richard Biener
  2015-07-29 17:29           ` Tom de Vries
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2015-07-29 12:13 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Wed, Jul 29, 2015 at 1:22 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 29/07/15 10:09, Richard Biener wrote:
>>
>> On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>>
>>> On 28/07/15 09:59, Richard Biener wrote:
>>>>
>>>>
>>>> On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> this patch allows parallelization and vectorization of reduction
>>>>> operators
>>>>> that are guaranteed to not overflow (such as min and max operators),
>>>>> independent of the overflow behaviour of the type.
>>>>>
>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>
>>>>> OK for trunk?
>>>>
>>>>
>>>>
>>>> Hmm, I don't like that no_overflow_tree_code function.  We have a much
>>>> more
>>>> clear understanding which codes may overflow or trap.  Thus please add
>>>> a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED}
>>>> like
>>>>
>>>
>>> Done.
>>>
>>>> bool
>>>> operation_overflow_traps (tree type, enum tree_code code)
>>>> {
>>>>     if (!ANY_INTEGRAL_TYPE_P (type)
>>>
>>>
>>>
>>> I've changed this test into a gcc_checking_assert.
>>>
>>>
>>>>        || !TYPE_OVERFLOW_TRAPS (type))
>>>>       return false;
>>>>     switch (code)
>>>>       {
>>>>       case PLUS_EXPR:
>>>>       case MINUS_EXPR:
>>>>       case MULT_EXPR:
>>>>       case LSHIFT_EXPR:
>>>>          /* Can overflow in various ways */
>>>>       case TRUNC_DIV_EXPR:
>>>>       case EXACT_DIV_EXPR:
>>>>       case FLOOR_DIV_EXPR:
>>>>       case CEIL_DIV_EXPR:
>>>>          /* For INT_MIN / -1 */
>>>>       case NEGATE_EXPR:
>>>>       case ABS_EXPR:
>>>>          /* For -INT_MIN */
>>>>          return true;
>>>>       default:
>>>>          return false;
>>>>      }
>>>> }
>>>>
>>>> and similar variants for _wraps and _undefined.  I think we decided at
>>>> some point
>>>> the compiler should not take advantage of the fact that lshift or
>>>> *_div have undefined
>>>> behavior on signed integer overflow, similar we only take advantage of
>>>> integral-type
>>>> overflow behavior, not vector or complex.  So we could reduce the
>>>> number of cases
>>>> the functions return true if we document that it returns true only for
>>>> the cases where
>>>> the compiler needs to / may assume wrapping behavior does not take
>>>> place.
>>>> As for _traps for example we only have optabs and libfuncs for
>>>> plus,minus,mult,negate
>>>> and abs.
>>>
>>>
>>>
>>> I've tried to capture all of this in the three new functions:
>>> - operation_overflows_and_traps
>>> - operation_no_overflow_or_wraps
>>> - operation_overflows_and_undefined (unused atm)
>>>
>>> I've also added the graphite bit.
>>>
>>> OK for trunk, if bootstrap and reg-test succeeds?
>>
>>
>> +/* Returns true if CODE operating on operands of type TYPE can overflow,
>> and
>> +   fwrapv generates trapping insns for CODE.  */
>>
>> ftrapv
>>
>
> Done.
>
>> +bool
>> +operation_overflows_and_traps (tree type, enum tree_code code)
>> +{
>>
>> operation_overflow_traps
>>
>> is better wording.  Meaning that when the operation overflows then it
>> traps.
>>
>
> AFAIU, the purpose of the function is to enable optimizations when it
> returns false, that is:
> - if the operation doesn't overflow, or
> - if the operation overflows, but doesn't trap.
>
> The name operation_overflow_traps does not make clear what it returns when
> the operation doesn't overflow. If the name doesn't make it clear, you need
> to be conservative, that is, return true. Which defies the purpose of the
> function.
>
> I've changed the name to operation_no_trapping_overflow (and inverted logic
> in the function).
>
> But perhaps you want operation_overflow_traps with a conservative return for
> non-overflow operations, and use it like this:
> ...
>   else if (INTEGRAL_TYPE_P (type) && check_reduction)
>     {
>       if (operation_overflows (type, code)
>           && operation_overflow_traps (type, code))
>         {
>           /* Changing the order of operations changes the semantics.  */
> ...
> ?

I think operation_no_trapping_overflow has the same wording issue as
operation_overflow_traps but I'm not a native speaker so I'll take your
word that operation_no_trapping_overflow is non-ambiguous iff the
operation cannot overflow.

And no, I didn't mean to use it in combination with operation_overflows.

>> +  /* We don't take advantage of integral type overflow behaviour for
>> complex and
>> +     vector types.  */
>>
>> We don't generate instructions that trap on overflow for complex or vector
>> types
>>
>
> Done.
>
>> +  if (!INTEGRAL_TYPE_P (type))
>> +    return true;
>>
>> +  switch (code)
>> +    {
>> +    case PLUS_EXPR:
>> +    case MINUS_EXPR:
>> +    case MULT_EXPR:
>> +    case LSHIFT_EXPR:
>> +      /* Can overflow in various ways.  */
>>
>> we don't have a trapping optab for lshift
>>
>> +    case TRUNC_DIV_EXPR:
>> +    case EXACT_DIV_EXPR:
>> +    case FLOOR_DIV_EXPR:
>> +    case CEIL_DIV_EXPR:
>>
>> nor division.  See optabs.c:optab_for_tree_code.  I suggest to only return
>> true
>> for those.
>>
>
> Before the logic inversion, we return false for these (And also for
> operators that do not overflow).
>
>> +/* Returns true if CODE operating on operands of type TYPE cannot
>> overflow, or
>> +   wraps on overflow.  */
>> +
>> +bool
>> +operation_no_overflow_or_wraps (tree type, enum tree_code code)
>> +{
>> +  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
>>
>> operation_overflow_wraps ()
>>
>> is still my preferred name.
>>
>
> The name operation_overflow_wraps doesn't make clear what it returns if the
> operation doesn't overflow. And I didn't manage to come up with a better
> name sofar.
>
> Btw, I wonder about something like vector max operation. The current
> implementation of operation_no_overflow_or_wraps returns false. Could we do:
> ...
>  /* We don't take advantage of integral type overflow behaviour for
>     complex and vector types.  */
>   if (!INTEGRAL_TYPE_P (type))
>     return !operation_overflows (type, code);
> ...
> ?

Yes, we can use operation_overflows and the existing overflow macros as well:

  if (!INTEGRAL_TYPE_P (type)
     || TYPE_OVERFLOW_WRAPS (type)
     || !operation_overflows (type, code))

and get rid of operation_overflow_{wraps,undefined} unless we want to take
advantage of the cases the compiler doesn't take advantage of the overflow
behavior.  I think keeping the traps variant separate makes sense because
of the clear facts on what trapping optabs we implement.

>> +bool
>> +operation_overflow_and_undefined (tree type, enum tree_code code)
>> +{
>> +  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
>> +
>>
>> operation_overflow_undefined ()
>>
>
> The name operation_overflow_undefined doesn't make clear what it returns if
> the operation doesn't overflow. I've changed it into
> operation_undefined_overflow.
>
>> If you like to keep an explicit operation_can_overflow then there is the
>> opportunity to split out the switch statement from
>> operation_overflow_wraps
>> and operation_overflow_undefined.
>>

Why 'operation_overflows' ...  it's operation_can_overflow, it clearly doesn't
always overflow...  Also it doesn't need its type argument (the assert
doesn't make much sense, for example fixed-point types can overflow
as well, likewise real types).

I'm fine with operation_no_trapping_overflow as in the patch, likewise
with operation_overflows apart from its name.

operation_no_overflow_or_wraps can be replaced by
ANY_INTEGRAL_TYPE_P ()
&& (TYPE_OVERFLOW_WRAPS () || !operation_can_overflows (code))

conservatively operation_undefined_overflow could be treated the same
and I'd prefer to do it that way for now.

Richard.

> Done.
>
> OK for trunk if bootstrap and reg-test succeeds?
>
> Thanks,
> - Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-29 12:13         ` Richard Biener
@ 2015-07-29 17:29           ` Tom de Vries
  2015-07-31 10:26             ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2015-07-29 17:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8883 bytes --]

On 29/07/15 14:00, Richard Biener wrote:
> On Wed, Jul 29, 2015 at 1:22 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 29/07/15 10:09, Richard Biener wrote:
>>>
>>> On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries <Tom_deVries@mentor.com>
>>> wrote:
>>>>
>>>> On 28/07/15 09:59, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> this patch allows parallelization and vectorization of reduction
>>>>>> operators
>>>>>> that are guaranteed to not overflow (such as min and max operators),
>>>>>> independent of the overflow behaviour of the type.
>>>>>>
>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>
>>>>>> OK for trunk?
>>>>>
>>>>>
>>>>>
>>>>> Hmm, I don't like that no_overflow_tree_code function.  We have a much
>>>>> more
>>>>> clear understanding which codes may overflow or trap.  Thus please add
>>>>> a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED}
>>>>> like
>>>>>
>>>>
>>>> Done.
>>>>
>>>>> bool
>>>>> operation_overflow_traps (tree type, enum tree_code code)
>>>>> {
>>>>>      if (!ANY_INTEGRAL_TYPE_P (type)
>>>>
>>>>
>>>>
>>>> I've changed this test into a gcc_checking_assert.
>>>>
>>>>
>>>>>         || !TYPE_OVERFLOW_TRAPS (type))
>>>>>        return false;
>>>>>      switch (code)
>>>>>        {
>>>>>        case PLUS_EXPR:
>>>>>        case MINUS_EXPR:
>>>>>        case MULT_EXPR:
>>>>>        case LSHIFT_EXPR:
>>>>>           /* Can overflow in various ways */
>>>>>        case TRUNC_DIV_EXPR:
>>>>>        case EXACT_DIV_EXPR:
>>>>>        case FLOOR_DIV_EXPR:
>>>>>        case CEIL_DIV_EXPR:
>>>>>           /* For INT_MIN / -1 */
>>>>>        case NEGATE_EXPR:
>>>>>        case ABS_EXPR:
>>>>>           /* For -INT_MIN */
>>>>>           return true;
>>>>>        default:
>>>>>           return false;
>>>>>       }
>>>>> }
>>>>>
>>>>> and similar variants for _wraps and _undefined.  I think we decided at
>>>>> some point
>>>>> the compiler should not take advantage of the fact that lshift or
>>>>> *_div have undefined
>>>>> behavior on signed integer overflow, similar we only take advantage of
>>>>> integral-type
>>>>> overflow behavior, not vector or complex.  So we could reduce the
>>>>> number of cases
>>>>> the functions return true if we document that it returns true only for
>>>>> the cases where
>>>>> the compiler needs to / may assume wrapping behavior does not take
>>>>> place.
>>>>> As for _traps for example we only have optabs and libfuncs for
>>>>> plus,minus,mult,negate
>>>>> and abs.
>>>>
>>>>
>>>>
>>>> I've tried to capture all of this in the three new functions:
>>>> - operation_overflows_and_traps
>>>> - operation_no_overflow_or_wraps
>>>> - operation_overflows_and_undefined (unused atm)
>>>>
>>>> I've also added the graphite bit.
>>>>
>>>> OK for trunk, if bootstrap and reg-test succeeds?
>>>
>>>
>>> +/* Returns true if CODE operating on operands of type TYPE can overflow,
>>> and
>>> +   fwrapv generates trapping insns for CODE.  */
>>>
>>> ftrapv
>>>
>>
>> Done.
>>
>>> +bool
>>> +operation_overflows_and_traps (tree type, enum tree_code code)
>>> +{
>>>
>>> operation_overflow_traps
>>>
>>> is better wording.  Meaning that when the operation overflows then it
>>> traps.
>>>
>>
>> AFAIU, the purpose of the function is to enable optimizations when it
>> returns false, that is:
>> - if the operation doesn't overflow, or
>> - if the operation overflows, but doesn't trap.
>>
>> The name operation_overflow_traps does not make clear what it returns when
>> the operation doesn't overflow. If the name doesn't make it clear, you need
>> to be conservative, that is, return true. Which defies the purpose of the
>> function.
>>
>> I've changed the name to operation_no_trapping_overflow (and inverted logic
>> in the function).
>>
>> But perhaps you want operation_overflow_traps with a conservative return for
>> non-overflow operations, and use it like this:
>> ...
>>    else if (INTEGRAL_TYPE_P (type) && check_reduction)
>>      {
>>        if (operation_overflows (type, code)
>>            && operation_overflow_traps (type, code))
>>          {
>>            /* Changing the order of operations changes the semantics.  */
>> ...
>> ?
>
> I think operation_no_trapping_overflow has the same wording issue as
> operation_overflow_traps but I'm not a native speaker

Hmm, I'm also not a native speaker. I think I understand what you mean, 
if operation_no_trapping_overflow is read with stress on 'trapping', we 
have the same ambiguity issue.

[ Possibility: a more verbose variant, but I hope no ambiguity: 
operation_can_overflow_and_trap ? ]

 > so I'll take your
> word that operation_no_trapping_overflow is non-ambiguous iff the
> operation cannot overflow.
>
> And no, I didn't mean to use it in combination with operation_overflows.
>
>>> +  /* We don't take advantage of integral type overflow behaviour for
>>> complex and
>>> +     vector types.  */
>>>
>>> We don't generate instructions that trap on overflow for complex or vector
>>> types
>>>
>>
>> Done.
>>
>>> +  if (!INTEGRAL_TYPE_P (type))
>>> +    return true;
>>>
>>> +  switch (code)
>>> +    {
>>> +    case PLUS_EXPR:
>>> +    case MINUS_EXPR:
>>> +    case MULT_EXPR:
>>> +    case LSHIFT_EXPR:
>>> +      /* Can overflow in various ways.  */
>>>
>>> we don't have a trapping optab for lshift
>>>
>>> +    case TRUNC_DIV_EXPR:
>>> +    case EXACT_DIV_EXPR:
>>> +    case FLOOR_DIV_EXPR:
>>> +    case CEIL_DIV_EXPR:
>>>
>>> nor division.  See optabs.c:optab_for_tree_code.  I suggest to only return
>>> true
>>> for those.
>>>
>>
>> Before the logic inversion, we return false for these (And also for
>> operators that do not overflow).
>>
>>> +/* Returns true if CODE operating on operands of type TYPE cannot
>>> overflow, or
>>> +   wraps on overflow.  */
>>> +
>>> +bool
>>> +operation_no_overflow_or_wraps (tree type, enum tree_code code)
>>> +{
>>> +  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
>>>
>>> operation_overflow_wraps ()
>>>
>>> is still my preferred name.
>>>
>>
>> The name operation_overflow_wraps doesn't make clear what it returns if the
>> operation doesn't overflow. And I didn't manage to come up with a better
>> name sofar.
>>
>> Btw, I wonder about something like vector max operation. The current
>> implementation of operation_no_overflow_or_wraps returns false. Could we do:
>> ...
>>   /* We don't take advantage of integral type overflow behaviour for
>>      complex and vector types.  */
>>    if (!INTEGRAL_TYPE_P (type))
>>      return !operation_overflows (type, code);
>> ...
>> ?
>
> Yes, we can use operation_overflows and the existing overflow macros as well:
>
>    if (!INTEGRAL_TYPE_P (type)
>       || TYPE_OVERFLOW_WRAPS (type)
>       || !operation_overflows (type, code))
>

For an unsigned vector add, this would evaluate to true.

You remarked "similar we only take advantage of integral-type overflow 
behavior, not vector or complex". Was that remark then limited to 
exploiting undefined behaviour?

> and get rid of operation_overflow_{wraps,undefined}

That's not what I meant, but ok.

 > unless we want to take
> advantage of the cases the compiler doesn't take advantage of the overflow
> behavior.

I thought the purpose of the functions was to specify in one location 
and clearly documented the situations that the compiler is allowed to 
take advantage of the overflow behavior.

>  I think keeping the traps variant separate makes sense because
> of the clear facts on what trapping optabs we implement.
>
>>> +bool
>>> +operation_overflow_and_undefined (tree type, enum tree_code code)
>>> +{
>>> +  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
>>> +
>>>
>>> operation_overflow_undefined ()
>>>
>>
>> The name operation_overflow_undefined doesn't make clear what it returns if
>> the operation doesn't overflow. I've changed it into
>> operation_undefined_overflow.
>>
>>> If you like to keep an explicit operation_can_overflow then there is the
>>> opportunity to split out the switch statement from
>>> operation_overflow_wraps
>>> and operation_overflow_undefined.
>>>
>
> Why 'operation_overflows' ...  it's operation_can_overflow, it clearly doesn't
> always overflow...

Done.

> Also it doesn't need its type argument (the assert
> doesn't make much sense, for example fixed-point types can overflow
> as well, likewise real types).

Done.

>
> I'm fine with operation_no_trapping_overflow as in the patch, likewise
> with operation_overflows apart from its name.
>
> operation_no_overflow_or_wraps can be replaced by
> ANY_INTEGRAL_TYPE_P ()
> && (TYPE_OVERFLOW_WRAPS () || !operation_can_overflows (code))
>
> conservatively operation_undefined_overflow could be treated the same
> and I'd prefer to do it that way for now.
>

Dropped wrap/undefined functions.

OK like this?

Thanks,
- Tom

[-- Attachment #2: 0001-Allow-non-overflow-ops-in-reductions.patch --]
[-- Type: text/x-patch, Size: 8868 bytes --]

Allow non-overflow ops in reductions

2015-07-28  Tom de Vries  <tom@codesourcery.com>

	* tree.c (operation_can_overflow, operation_no_trapping_overflow): New
	function.
	* tree.h (operation_can_overflow, operation_no_trapping_overflow):
	Declare.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Use
	operation_no_trapping_overflow.  Allow non-overflow operations.
	* graphite-sese-to-poly.c (is_reduction_operation_p): Allow non-overflow
	operations.

	* gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").
	Add successful scans for 2 detected reductions.	 Add xfail scans for 3
	detected reductions.
	* gcc.dg/autopar/reduc-2short.c: Same.
	* gcc.dg/autopar/reduc-8.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").  Add successful scans for 2
	detected reductions.
	* gcc.dg/vect/trapv-vect-reduc-4.c: Update scan to match vectorized min
	and max reductions.
---
 gcc/graphite-sese-to-poly.c                    |  7 ++-
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c     | 10 ++--
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c    | 10 ++--
 gcc/testsuite/gcc.dg/autopar/reduc-8.c         |  7 +--
 gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c |  2 +-
 gcc/tree-vect-loop.c                           |  6 ++-
 gcc/tree.c                                     | 69 ++++++++++++++++++++++++++
 gcc/tree.h                                     |  2 +
 8 files changed, 98 insertions(+), 15 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index c583f16..fdcc790 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2614,8 +2614,11 @@ is_reduction_operation_p (gimple stmt)
   if (FLOAT_TYPE_P (type))
     return flag_associative_math;
 
-  return (INTEGRAL_TYPE_P (type)
-	  && TYPE_OVERFLOW_WRAPS (type));
+  if (ANY_INTEGRAL_TYPE_P (type))
+    return (TYPE_OVERFLOW_WRAPS (type)
+	    || !operation_can_overflow (code));
+
+  return false;
 }
 
 /* Returns true when PHI contains an argument ARG.  */
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 14867f3..a2dad44 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result)
     abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
    int i;
 
@@ -60,7 +61,10 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 7c19cc5..a50e14f 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result)
     abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
    int i;
 
@@ -58,7 +59,8 @@ int main (void)
   return 0;
 }
 
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 1d05c48..18ba03d 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -40,7 +40,8 @@ testmin (const T *c, T init, T result)
     abort ();
 }
 
-int main (void)
+int __attribute__((optimize ("-ftree-parallelize-loops=0")))
+main (void)
 { 
   static signed char A[N] = {
     0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
@@ -84,5 +85,5 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
index 2129717..86f9b90 100644
--- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
@@ -46,4 +46,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c31bfbd..59c75af 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2615,7 +2615,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
     }
   else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
-      if (TYPE_OVERFLOW_TRAPS (type))
+      if (!operation_no_trapping_overflow (type, code))
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
@@ -2624,7 +2624,9 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			    " (overflow traps): ");
 	  return NULL;
 	}
-      if (need_wrapping_integral_overflow && !TYPE_OVERFLOW_WRAPS (type))
+      if (need_wrapping_integral_overflow
+	  && !TYPE_OVERFLOW_WRAPS (type)
+	  && operation_can_overflow (code))
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
diff --git a/gcc/tree.c b/gcc/tree.c
index 94263af..3c2c20a 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7597,6 +7597,75 @@ commutative_ternary_tree_code (enum tree_code code)
   return false;
 }
 
+/* Returns true if CODE can overflow.  */
+
+bool
+operation_can_overflow (enum tree_code code)
+{
+  switch (code)
+    {
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+    case LSHIFT_EXPR:
+      /* Can overflow in various ways.  */
+      return true;
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+      /* For INT_MIN / -1.  */
+      return true;
+    case NEGATE_EXPR:
+    case ABS_EXPR:
+      /* For -INT_MIN.  */
+      return true;
+    default:
+      /* These operators cannot overflow.  */
+      return false;
+    }
+}
+
+/* Returns true if CODE operating on operands of type TYPE doesn't overflow, or
+   ftrapv doesn't generate trapping insns for CODE.  */
+
+bool
+operation_no_trapping_overflow (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
+
+  /* We don't generate instructions that trap on overflow for complex or vector
+     types.  */
+  if (!INTEGRAL_TYPE_P (type))
+    return true;
+
+  if (!TYPE_OVERFLOW_TRAPS (type))
+    return true;
+
+  switch (code)
+    {
+    case PLUS_EXPR:
+    case MINUS_EXPR:
+    case MULT_EXPR:
+    case NEGATE_EXPR:
+    case ABS_EXPR:
+      /* These operators can overflow, and -ftrapv generates trapping code for
+	 these.  */
+      return false;
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case CEIL_DIV_EXPR:
+    case LSHIFT_EXPR:
+      /* These operators can overflow, but -ftrapv does not generate trapping
+	 code for these.  */
+      return true;
+    default:
+      /* These operators cannot overflow.  */
+      return true;
+    }
+}
+
 namespace inchash
 {
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 6df2217..d280ea7 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4369,6 +4369,8 @@ extern int type_num_arguments (const_tree);
 extern bool associative_tree_code (enum tree_code);
 extern bool commutative_tree_code (enum tree_code);
 extern bool commutative_ternary_tree_code (enum tree_code);
+extern bool operation_can_overflow (enum tree_code);
+extern bool operation_no_trapping_overflow (tree, enum tree_code);
 extern tree upper_bound_in_type (tree, tree);
 extern tree lower_bound_in_type (tree, tree);
 extern int operand_equal_for_phi_arg_p (const_tree, const_tree);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1
  2015-07-29 17:29           ` Tom de Vries
@ 2015-07-31 10:26             ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2015-07-31 10:26 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Wed, Jul 29, 2015 at 7:18 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 29/07/15 14:00, Richard Biener wrote:
>>
>> On Wed, Jul 29, 2015 at 1:22 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>>
>>> On 29/07/15 10:09, Richard Biener wrote:
>>>>
>>>>
>>>> On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 28/07/15 09:59, Richard Biener wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>>>
>>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> this patch allows parallelization and vectorization of reduction
>>>>>>> operators
>>>>>>> that are guaranteed to not overflow (such as min and max operators),
>>>>>>> independent of the overflow behaviour of the type.
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>>
>>>>>>> OK for trunk?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hmm, I don't like that no_overflow_tree_code function.  We have a much
>>>>>> more
>>>>>> clear understanding which codes may overflow or trap.  Thus please add
>>>>>> a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED}
>>>>>> like
>>>>>>
>>>>>
>>>>> Done.
>>>>>
>>>>>> bool
>>>>>> operation_overflow_traps (tree type, enum tree_code code)
>>>>>> {
>>>>>>      if (!ANY_INTEGRAL_TYPE_P (type)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I've changed this test into a gcc_checking_assert.
>>>>>
>>>>>
>>>>>>         || !TYPE_OVERFLOW_TRAPS (type))
>>>>>>        return false;
>>>>>>      switch (code)
>>>>>>        {
>>>>>>        case PLUS_EXPR:
>>>>>>        case MINUS_EXPR:
>>>>>>        case MULT_EXPR:
>>>>>>        case LSHIFT_EXPR:
>>>>>>           /* Can overflow in various ways */
>>>>>>        case TRUNC_DIV_EXPR:
>>>>>>        case EXACT_DIV_EXPR:
>>>>>>        case FLOOR_DIV_EXPR:
>>>>>>        case CEIL_DIV_EXPR:
>>>>>>           /* For INT_MIN / -1 */
>>>>>>        case NEGATE_EXPR:
>>>>>>        case ABS_EXPR:
>>>>>>           /* For -INT_MIN */
>>>>>>           return true;
>>>>>>        default:
>>>>>>           return false;
>>>>>>       }
>>>>>> }
>>>>>>
>>>>>> and similar variants for _wraps and _undefined.  I think we decided at
>>>>>> some point
>>>>>> the compiler should not take advantage of the fact that lshift or
>>>>>> *_div have undefined
>>>>>> behavior on signed integer overflow, similar we only take advantage of
>>>>>> integral-type
>>>>>> overflow behavior, not vector or complex.  So we could reduce the
>>>>>> number of cases
>>>>>> the functions return true if we document that it returns true only for
>>>>>> the cases where
>>>>>> the compiler needs to / may assume wrapping behavior does not take
>>>>>> place.
>>>>>> As for _traps for example we only have optabs and libfuncs for
>>>>>> plus,minus,mult,negate
>>>>>> and abs.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I've tried to capture all of this in the three new functions:
>>>>> - operation_overflows_and_traps
>>>>> - operation_no_overflow_or_wraps
>>>>> - operation_overflows_and_undefined (unused atm)
>>>>>
>>>>> I've also added the graphite bit.
>>>>>
>>>>> OK for trunk, if bootstrap and reg-test succeeds?
>>>>
>>>>
>>>>
>>>> +/* Returns true if CODE operating on operands of type TYPE can
>>>> overflow,
>>>> and
>>>> +   fwrapv generates trapping insns for CODE.  */
>>>>
>>>> ftrapv
>>>>
>>>
>>> Done.
>>>
>>>> +bool
>>>> +operation_overflows_and_traps (tree type, enum tree_code code)
>>>> +{
>>>>
>>>> operation_overflow_traps
>>>>
>>>> is better wording.  Meaning that when the operation overflows then it
>>>> traps.
>>>>
>>>
>>> AFAIU, the purpose of the function is to enable optimizations when it
>>> returns false, that is:
>>> - if the operation doesn't overflow, or
>>> - if the operation overflows, but doesn't trap.
>>>
>>> The name operation_overflow_traps does not make clear what it returns
>>> when
>>> the operation doesn't overflow. If the name doesn't make it clear, you
>>> need
>>> to be conservative, that is, return true. Which defies the purpose of the
>>> function.
>>>
>>> I've changed the name to operation_no_trapping_overflow (and inverted
>>> logic
>>> in the function).
>>>
>>> But perhaps you want operation_overflow_traps with a conservative return
>>> for
>>> non-overflow operations, and use it like this:
>>> ...
>>>    else if (INTEGRAL_TYPE_P (type) && check_reduction)
>>>      {
>>>        if (operation_overflows (type, code)
>>>            && operation_overflow_traps (type, code))
>>>          {
>>>            /* Changing the order of operations changes the semantics.  */
>>> ...
>>> ?
>>
>>
>> I think operation_no_trapping_overflow has the same wording issue as
>> operation_overflow_traps but I'm not a native speaker
>
>
> Hmm, I'm also not a native speaker. I think I understand what you mean, if
> operation_no_trapping_overflow is read with stress on 'trapping', we have
> the same ambiguity issue.
>
> [ Possibility: a more verbose variant, but I hope no ambiguity:
> operation_can_overflow_and_trap ? ]

I'd say we choose a small name and simply refer to documentation.

>
>> so I'll take your
>>
>> word that operation_no_trapping_overflow is non-ambiguous iff the
>> operation cannot overflow.
>>
>> And no, I didn't mean to use it in combination with operation_overflows.
>>
>>>> +  /* We don't take advantage of integral type overflow behaviour for
>>>> complex and
>>>> +     vector types.  */
>>>>
>>>> We don't generate instructions that trap on overflow for complex or
>>>> vector
>>>> types
>>>>
>>>
>>> Done.
>>>
>>>> +  if (!INTEGRAL_TYPE_P (type))
>>>> +    return true;
>>>>
>>>> +  switch (code)
>>>> +    {
>>>> +    case PLUS_EXPR:
>>>> +    case MINUS_EXPR:
>>>> +    case MULT_EXPR:
>>>> +    case LSHIFT_EXPR:
>>>> +      /* Can overflow in various ways.  */
>>>>
>>>> we don't have a trapping optab for lshift
>>>>
>>>> +    case TRUNC_DIV_EXPR:
>>>> +    case EXACT_DIV_EXPR:
>>>> +    case FLOOR_DIV_EXPR:
>>>> +    case CEIL_DIV_EXPR:
>>>>
>>>> nor division.  See optabs.c:optab_for_tree_code.  I suggest to only
>>>> return
>>>> true
>>>> for those.
>>>>
>>>
>>> Before the logic inversion, we return false for these (And also for
>>> operators that do not overflow).
>>>
>>>> +/* Returns true if CODE operating on operands of type TYPE cannot
>>>> overflow, or
>>>> +   wraps on overflow.  */
>>>> +
>>>> +bool
>>>> +operation_no_overflow_or_wraps (tree type, enum tree_code code)
>>>> +{
>>>> +  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
>>>>
>>>> operation_overflow_wraps ()
>>>>
>>>> is still my preferred name.
>>>>
>>>
>>> The name operation_overflow_wraps doesn't make clear what it returns if
>>> the
>>> operation doesn't overflow. And I didn't manage to come up with a better
>>> name sofar.
>>>
>>> Btw, I wonder about something like vector max operation. The current
>>> implementation of operation_no_overflow_or_wraps returns false. Could we
>>> do:
>>> ...
>>>   /* We don't take advantage of integral type overflow behaviour for
>>>      complex and vector types.  */
>>>    if (!INTEGRAL_TYPE_P (type))
>>>      return !operation_overflows (type, code);
>>> ...
>>> ?
>>
>>
>> Yes, we can use operation_overflows and the existing overflow macros as
>> well:
>>
>>    if (!INTEGRAL_TYPE_P (type)
>>       || TYPE_OVERFLOW_WRAPS (type)
>>       || !operation_overflows (type, code))
>>
>
> For an unsigned vector add, this would evaluate to true.

Yes, so?

> You remarked "similar we only take advantage of integral-type overflow
> behavior, not vector or complex". Was that remark then limited to exploiting
> undefined behaviour?

Yes, but it's also not documented anywhere.

>> and get rid of operation_overflow_{wraps,undefined}
>
>
> That's not what I meant, but ok.
>
>> unless we want to take
>>
>> advantage of the cases the compiler doesn't take advantage of the overflow
>> behavior.
>
>
> I thought the purpose of the functions was to specify in one location and
> clearly documented the situations that the compiler is allowed to take
> advantage of the overflow behavior.

Yeah, but we're iterating over names here and that's not good progress ;)

We can always followup with that.

>>  I think keeping the traps variant separate makes sense because
>> of the clear facts on what trapping optabs we implement.
>>
>>>> +bool
>>>> +operation_overflow_and_undefined (tree type, enum tree_code code)
>>>> +{
>>>> +  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));
>>>> +
>>>>
>>>> operation_overflow_undefined ()
>>>>
>>>
>>> The name operation_overflow_undefined doesn't make clear what it returns
>>> if
>>> the operation doesn't overflow. I've changed it into
>>> operation_undefined_overflow.
>>>
>>>> If you like to keep an explicit operation_can_overflow then there is the
>>>> opportunity to split out the switch statement from
>>>> operation_overflow_wraps
>>>> and operation_overflow_undefined.
>>>>
>>
>> Why 'operation_overflows' ...  it's operation_can_overflow, it clearly
>> doesn't
>> always overflow...
>
>
> Done.
>
>> Also it doesn't need its type argument (the assert
>> doesn't make much sense, for example fixed-point types can overflow
>> as well, likewise real types).
>
>
> Done.
>
>>
>> I'm fine with operation_no_trapping_overflow as in the patch, likewise
>> with operation_overflows apart from its name.
>>
>> operation_no_overflow_or_wraps can be replaced by
>> ANY_INTEGRAL_TYPE_P ()
>> && (TYPE_OVERFLOW_WRAPS () || !operation_can_overflows (code))
>>
>> conservatively operation_undefined_overflow could be treated the same
>> and I'd prefer to do it that way for now.
>>
>
> Dropped wrap/undefined functions.
>
> OK like this?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-07-31 10:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24 14:46 [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1 Tom de Vries
2015-07-26 17:00 ` Tom de Vries
2015-07-26 17:09   ` Tom de Vries
2015-07-28  7:44     ` [committed, gomp4] Handle non-overflow reductions in graphite Tom de Vries
2015-07-28  7:41 ` [committed, gomp4] Allow non-overflow ops in vect_is_simple_reduction_1 Tom de Vries
2015-07-28  8:09 ` [PATCH] " Richard Biener
2015-07-28 12:30   ` Tom de Vries
2015-07-29  8:28     ` Richard Biener
2015-07-29 11:59       ` Tom de Vries
2015-07-29 12:13         ` Richard Biener
2015-07-29 17:29           ` Tom de Vries
2015-07-31 10:26             ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).