public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PR84512
@ 2018-02-27 12:42 Richard Biener
  2018-03-16 11:48 ` Tom de Vries
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2018-02-27 12:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubicka


With Honzas change to make the x86 backend consider the actual operation
for costing vector stmts it becomes apparent that 
vect_compute_single_scalar_iteration_cost uses the old-style target
cost hook which doesn't get enough information to distinguish different
operations.  This means instead of actual scalar multiplication cost
we cost a general scalar-stmt cost for the testcase for the scalar
iteration but cost a vector multiplication for the vectorized body
resulting in an apples-to-oranges comparison in the end.

Fixed as follows.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-02-27  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/84512
	* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
	Do not use the estimate returned from record_stmt_cost for
	the scalar iteration cost but sum properly using add_stmt_cost.

	* gcc.dg/tree-ssa/pr84512.c: New testcase.

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	(revision 258030)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -1384,16 +1384,10 @@ vect_compute_single_scalar_iteration_cos
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes, factor, scalar_single_iter_cost = 0;
+  int nbbs = loop->num_nodes, factor;
   int innerloop_iters, i;
 
-  /* Count statements in scalar loop.  Using this as scalar cost for a single
-     iteration for now.
-
-     TODO: Add outer loop support.
-
-     TODO: Consider assigning different costs to different scalar
-     statements.  */
+  /* Gather costs for statements in the scalar loop.  */
 
   /* FORNOW.  */
   innerloop_iters = 1;
@@ -1437,13 +1431,28 @@ vect_compute_single_scalar_iteration_cos
           else
             kind = scalar_stmt;
 
-	  scalar_single_iter_cost
-	    += record_stmt_cost (&LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
-				 factor, kind, stmt_info, 0, vect_prologue);
+	  record_stmt_cost (&LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+			    factor, kind, stmt_info, 0, vect_prologue);
         }
     }
-  LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo)
-    = scalar_single_iter_cost;
+
+  /* Now accumulate cost.  */
+  void *target_cost_data = init_cost (loop);
+  stmt_info_for_cost *si;
+  int j;
+  FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+		    j, si)
+    {
+      struct _stmt_vec_info *stmt_info
+	= si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
+      (void) add_stmt_cost (target_cost_data, si->count,
+			    si->kind, stmt_info, si->misalign,
+			    vect_body);
+    }
+  unsigned dummy, body_cost = 0;
+  finish_cost (target_cost_data, &dummy, &body_cost, &dummy);
+  destroy_cost_data (target_cost_data);
+  LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo) = body_cost;
 }
 
 
Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int foo()
+{
+  int a[10];
+  for(int i = 0; i < 10; ++i)
+    a[i] = i*i;
+  int res = 0;
+  for(int i = 0; i < 10; ++i)
+    res += a[i];
+  return res;
+}
+
+/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-02-27 12:42 [PATCH] Fix PR84512 Richard Biener
@ 2018-03-16 11:48 ` Tom de Vries
  2018-03-16 11:56   ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Tom de Vries @ 2018-03-16 11:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

On 02/27/2018 01:42 PM, Richard Biener wrote:
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(nonexistent)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(working copy)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int foo()
> +{
> +  int a[10];
> +  for(int i = 0; i < 10; ++i)
> +    a[i] = i*i;
> +  int res = 0;
> +  for(int i = 0; i < 10; ++i)
> +    res += a[i];
> +  return res;
> +}
> +
> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */

This fails for nvptx, because it doesn't have the required vector 
operations. To fix the fail, I've added requiring effective target 
vect_int_mult.

Thanks,
- Tom

[-- Attachment #2: 0002-testsuite-Require-vect_int_mult-in-pr84512.c.patch --]
[-- Type: text/x-patch, Size: 630 bytes --]

[testsuite] Require vect_int_mult in pr84512.c

2018-03-16  Tom de Vries  <tom@codesourcery.com>

	* gcc.dg/tree-ssa/pr84512.c: Require effective target vect_int_mult.

---
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
index 288fa5d..41b6c06 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-optimized" } */
+/* { dg-require-effective-target vect_int_mult } */
 
 int foo()
 {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-03-16 11:48 ` Tom de Vries
@ 2018-03-16 11:56   ` Richard Biener
  2018-03-16 15:30     ` Tom de Vries
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2018-03-16 11:56 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Jan Hubicka

On Fri, 16 Mar 2018, Tom de Vries wrote:

> On 02/27/2018 01:42 PM, Richard Biener wrote:
> > Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(nonexistent)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(working copy)
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > +
> > +int foo()
> > +{
> > +  int a[10];
> > +  for(int i = 0; i < 10; ++i)
> > +    a[i] = i*i;
> > +  int res = 0;
> > +  for(int i = 0; i < 10; ++i)
> > +    res += a[i];
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
> 
> This fails for nvptx, because it doesn't have the required vector operations.
> To fix the fail, I've added requiring effective target vect_int_mult.

On targets that do not vectorize you should see the scalar loops unrolled
instead.  Or do you have only one loop vectorized?  That's precisely
what the PR was about...  which means it isn't fixed for nvptx :/

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-03-16 11:56   ` Richard Biener
@ 2018-03-16 15:30     ` Tom de Vries
  2018-03-19 11:55       ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Tom de Vries @ 2018-03-16 15:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jan Hubicka

On 03/16/2018 12:55 PM, Richard Biener wrote:
> On Fri, 16 Mar 2018, Tom de Vries wrote:
> 
>> On 02/27/2018 01:42 PM, Richard Biener wrote:
>>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(nonexistent)
>>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(working copy)
>>> @@ -0,0 +1,15 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3 -fdump-tree-optimized" } */
>>> +
>>> +int foo()
>>> +{
>>> +  int a[10];
>>> +  for(int i = 0; i < 10; ++i)
>>> +    a[i] = i*i;
>>> +  int res = 0;
>>> +  for(int i = 0; i < 10; ++i)
>>> +    res += a[i];
>>> +  return res;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
>>
>> This fails for nvptx, because it doesn't have the required vector operations.
>> To fix the fail, I've added requiring effective target vect_int_mult.
> 
> On targets that do not vectorize you should see the scalar loops unrolled
> instead.  Or do you have only one loop vectorized?

Sort of. Loop vectorization has no effect, and the scalar loops are 
completely unrolled. But then slp vectorization vectorizes the stores.

So at optimized we have:
...
   MEM[(int *)&a] = { 0, 1 };
   MEM[(int *)&a + 8B] = { 4, 9 };
   MEM[(int *)&a + 16B] = { 16, 25 };
   MEM[(int *)&a + 24B] = { 36, 49 };
   MEM[(int *)&a + 32B] = { 64, 81 };
   _6 = a[0];
   _28 = a[1];
   res_29 = _6 + _28;
   _35 = a[2];
   res_36 = res_29 + _35;
   _42 = a[3];
   res_43 = res_36 + _42;
   _49 = a[4];
   res_50 = res_43 + _49;
   _56 = a[5];
   res_57 = res_50 + _56;
   _63 = a[6];
   res_64 = res_57 + _63;
   _70 = a[7];
   res_71 = res_64 + _70;
   _77 = a[8];
   res_78 = res_71 + _77;
   _2 = a[9];
   res_11 = _2 + res_78;
   a ={v} {CLOBBER};
   return res_11;
...

The stores and loads are eliminated by dse1 in the rtl phase, and in the 
end we have:
...
.visible .func (.param.u32 %value_out) foo
{
         .reg.u32 %value;
         .local .align 16 .b8 %frame_ar[48];
         .reg.u64 %frame;
         cvta.local.u64 %frame, %frame_ar;
         mov.u32 %value, 285;
         st.param.u32    [%value_out], %value;
         ret;
}
...

> That's precisely
> what the PR was about...  which means it isn't fixed for nvptx :/

Indeed the assembly is not optimal, and would be optimal if we'd have 
optimal code at optimized.

FWIW, using this patch we generate optimal code at optimized:
...
diff --git a/gcc/passes.def b/gcc/passes.def
index 3ebcfc30349..6b64f600c4a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -325,6 +325,7 @@ along with GCC; see the file COPYING3.  If not see
        NEXT_PASS (pass_tracer);
        NEXT_PASS (pass_thread_jumps);
        NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
+      NEXT_PASS (pass_fre);
        NEXT_PASS (pass_strlen);
        NEXT_PASS (pass_thread_jumps);
        NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
...

and we get:
...
.visible .func (.param.u32 %value_out) foo
{
         .reg.u32 %value;
         mov.u32 %value, 285;
         st.param.u32    [%value_out], %value;
         ret;
}
...

I could file a missing optimization PR for nvptx, but I'm not sure where 
this should be fixed.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-03-16 15:30     ` Tom de Vries
@ 2018-03-19 11:55       ` Richard Biener
  2018-03-20 12:42         ` Tom de Vries
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2018-03-19 11:55 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Jan Hubicka

On Fri, 16 Mar 2018, Tom de Vries wrote:

> On 03/16/2018 12:55 PM, Richard Biener wrote:
> > On Fri, 16 Mar 2018, Tom de Vries wrote:
> > 
> > > On 02/27/2018 01:42 PM, Richard Biener wrote:
> > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
> > > > ===================================================================
> > > > --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(nonexistent)
> > > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(working copy)
> > > > @@ -0,0 +1,15 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > > > +
> > > > +int foo()
> > > > +{
> > > > +  int a[10];
> > > > +  for(int i = 0; i < 10; ++i)
> > > > +    a[i] = i*i;
> > > > +  int res = 0;
> > > > +  for(int i = 0; i < 10; ++i)
> > > > +    res += a[i];
> > > > +  return res;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
> > > 
> > > This fails for nvptx, because it doesn't have the required vector
> > > operations.
> > > To fix the fail, I've added requiring effective target vect_int_mult.
> > 
> > On targets that do not vectorize you should see the scalar loops unrolled
> > instead.  Or do you have only one loop vectorized?
> 
> Sort of. Loop vectorization has no effect, and the scalar loops are completely
> unrolled. But then slp vectorization vectorizes the stores.
> 
> So at optimized we have:
> ...
>   MEM[(int *)&a] = { 0, 1 };
>   MEM[(int *)&a + 8B] = { 4, 9 };
>   MEM[(int *)&a + 16B] = { 16, 25 };
>   MEM[(int *)&a + 24B] = { 36, 49 };
>   MEM[(int *)&a + 32B] = { 64, 81 };
>   _6 = a[0];
>   _28 = a[1];
>   res_29 = _6 + _28;
>   _35 = a[2];
>   res_36 = res_29 + _35;
>   _42 = a[3];
>   res_43 = res_36 + _42;
>   _49 = a[4];
>   res_50 = res_43 + _49;
>   _56 = a[5];
>   res_57 = res_50 + _56;
>   _63 = a[6];
>   res_64 = res_57 + _63;
>   _70 = a[7];
>   res_71 = res_64 + _70;
>   _77 = a[8];
>   res_78 = res_71 + _77;
>   _2 = a[9];
>   res_11 = _2 + res_78;
>   a ={v} {CLOBBER};
>   return res_11;
> ...
> 
> The stores and loads are eliminated by dse1 in the rtl phase, and in the end
> we have:
> ...
> .visible .func (.param.u32 %value_out) foo
> {
>         .reg.u32 %value;
>         .local .align 16 .b8 %frame_ar[48];
>         .reg.u64 %frame;
>         cvta.local.u64 %frame, %frame_ar;
>         mov.u32 %value, 285;
>         st.param.u32    [%value_out], %value;
>         ret;
> }
> ...
> 
> > That's precisely
> > what the PR was about...  which means it isn't fixed for nvptx :/
> 
> Indeed the assembly is not optimal, and would be optimal if we'd have optimal
> code at optimized.
> 
> FWIW, using this patch we generate optimal code at optimized:
> ...
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 3ebcfc30349..6b64f600c4a 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_tracer);
>        NEXT_PASS (pass_thread_jumps);
>        NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> +      NEXT_PASS (pass_fre);
>        NEXT_PASS (pass_strlen);
>        NEXT_PASS (pass_thread_jumps);
>        NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
> ...
> 
> and we get:
> ...
> .visible .func (.param.u32 %value_out) foo
> {
>         .reg.u32 %value;
>         mov.u32 %value, 285;
>         st.param.u32    [%value_out], %value;
>         ret;
> }
> ...
> 
> I could file a missing optimization PR for nvptx, but I'm not sure where this
> should be fixed.

Ah, yeah... the usual issue then.

Can you please XFAIL the test on nvptx instead of requiring vect_int_mult?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-03-19 11:55       ` Richard Biener
@ 2018-03-20 12:42         ` Tom de Vries
  2018-03-20 20:15           ` Rainer Orth
  0 siblings, 1 reply; 10+ messages in thread
From: Tom de Vries @ 2018-03-20 12:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 3813 bytes --]

On 03/19/2018 10:11 AM, Richard Biener wrote:
> On Fri, 16 Mar 2018, Tom de Vries wrote:
> 
>> On 03/16/2018 12:55 PM, Richard Biener wrote:
>>> On Fri, 16 Mar 2018, Tom de Vries wrote:
>>>
>>>> On 02/27/2018 01:42 PM, Richard Biener wrote:
>>>>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
>>>>> ===================================================================
>>>>> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(nonexistent)
>>>>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(working copy)
>>>>> @@ -0,0 +1,15 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O3 -fdump-tree-optimized" } */
>>>>> +
>>>>> +int foo()
>>>>> +{
>>>>> +  int a[10];
>>>>> +  for(int i = 0; i < 10; ++i)
>>>>> +    a[i] = i*i;
>>>>> +  int res = 0;
>>>>> +  for(int i = 0; i < 10; ++i)
>>>>> +    res += a[i];
>>>>> +  return res;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
>>>>
>>>> This fails for nvptx, because it doesn't have the required vector
>>>> operations.
>>>> To fix the fail, I've added requiring effective target vect_int_mult.
>>>
>>> On targets that do not vectorize you should see the scalar loops unrolled
>>> instead.  Or do you have only one loop vectorized?
>>
>> Sort of. Loop vectorization has no effect, and the scalar loops are completely
>> unrolled. But then slp vectorization vectorizes the stores.
>>
>> So at optimized we have:
>> ...
>>    MEM[(int *)&a] = { 0, 1 };
>>    MEM[(int *)&a + 8B] = { 4, 9 };
>>    MEM[(int *)&a + 16B] = { 16, 25 };
>>    MEM[(int *)&a + 24B] = { 36, 49 };
>>    MEM[(int *)&a + 32B] = { 64, 81 };
>>    _6 = a[0];
>>    _28 = a[1];
>>    res_29 = _6 + _28;
>>    _35 = a[2];
>>    res_36 = res_29 + _35;
>>    _42 = a[3];
>>    res_43 = res_36 + _42;
>>    _49 = a[4];
>>    res_50 = res_43 + _49;
>>    _56 = a[5];
>>    res_57 = res_50 + _56;
>>    _63 = a[6];
>>    res_64 = res_57 + _63;
>>    _70 = a[7];
>>    res_71 = res_64 + _70;
>>    _77 = a[8];
>>    res_78 = res_71 + _77;
>>    _2 = a[9];
>>    res_11 = _2 + res_78;
>>    a ={v} {CLOBBER};
>>    return res_11;
>> ...
>>
>> The stores and loads are eliminated by dse1 in the rtl phase, and in the end
>> we have:
>> ...
>> .visible .func (.param.u32 %value_out) foo
>> {
>>          .reg.u32 %value;
>>          .local .align 16 .b8 %frame_ar[48];
>>          .reg.u64 %frame;
>>          cvta.local.u64 %frame, %frame_ar;
>>          mov.u32 %value, 285;
>>          st.param.u32    [%value_out], %value;
>>          ret;
>> }
>> ...
>>
>>> That's precisely
>>> what the PR was about...  which means it isn't fixed for nvptx :/
>>
>> Indeed the assembly is not optimal, and would be optimal if we'd have optimal
>> code at optimized.
>>
>> FWIW, using this patch we generate optimal code at optimized:
>> ...
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index 3ebcfc30349..6b64f600c4a 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3.  If not see
>>         NEXT_PASS (pass_tracer);
>>         NEXT_PASS (pass_thread_jumps);
>>         NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
>> +      NEXT_PASS (pass_fre);
>>         NEXT_PASS (pass_strlen);
>>         NEXT_PASS (pass_thread_jumps);
>>         NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
>> ...
>>
>> and we get:
>> ...
>> .visible .func (.param.u32 %value_out) foo
>> {
>>          .reg.u32 %value;
>>          mov.u32 %value, 285;
>>          st.param.u32    [%value_out], %value;
>>          ret;
>> }
>> ...
>>
>> I could file a missing optimization PR for nvptx, but I'm not sure where this
>> should be fixed.
> 
> Ah, yeah... the usual issue then.
> 
> Can you please XFAIL the test on nvptx instead of requiring vect_int_mult?
> 

Done.

Committed at attached.

Thanks,
- Tom

[-- Attachment #2: 0001-testsuite-Add-nvptx-xfail-to-pr84512.c.patch --]
[-- Type: text/x-patch, Size: 975 bytes --]

[testsuite] Add nvptx xfail to pr84512.c

2018-03-19  Tom de Vries  <tom@codesourcery.com>

	* gcc.dg/tree-ssa/pr84512.c: Don't require effective target
	vect_int_mult.  Add nvptx xfail for PR84958.

---
 gcc/testsuite/ChangeLog                 | 5 +++++
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
index 41b6c06..9560160 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-optimized" } */
-/* { dg-require-effective-target vect_int_mult } */
 
 int foo()
 {
@@ -13,4 +12,5 @@ int foo()
   return res;
 }
 
-/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
+/* Target nvptx xfail due to PR84958.  */
+/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail nvptx*-*-* } } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-03-20 12:42         ` Tom de Vries
@ 2018-03-20 20:15           ` Rainer Orth
  2018-03-21  8:20             ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Rainer Orth @ 2018-03-20 20:15 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Richard Biener, gcc-patches, Jan Hubicka

Hi Tom,

> On 03/19/2018 10:11 AM, Richard Biener wrote:
>> On Fri, 16 Mar 2018, Tom de Vries wrote:
>>
>>> On 03/16/2018 12:55 PM, Richard Biener wrote:
>>>> On Fri, 16 Mar 2018, Tom de Vries wrote:
>>>>
>>>>> On 02/27/2018 01:42 PM, Richard Biener wrote:
>>>>>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
>>>>>> ===================================================================
>>>>>> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(nonexistent)
>>>>>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(working copy)
>>>>>> @@ -0,0 +1,15 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O3 -fdump-tree-optimized" } */
>>>>>> +
>>>>>> +int foo()
>>>>>> +{
>>>>>> +  int a[10];
>>>>>> +  for(int i = 0; i < 10; ++i)
>>>>>> +    a[i] = i*i;
>>>>>> +  int res = 0;
>>>>>> +  for(int i = 0; i < 10; ++i)
>>>>>> +    res += a[i];
>>>>>> +  return res;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
>>>>>
>>>>> This fails for nvptx, because it doesn't have the required vector
>>>>> operations.
>>>>> To fix the fail, I've added requiring effective target vect_int_mult.
>>>>
>>>> On targets that do not vectorize you should see the scalar loops unrolled
>>>> instead.  Or do you have only one loop vectorized?
>>>
>>> Sort of. Loop vectorization has no effect, and the scalar loops are completely
>>> unrolled. But then slp vectorization vectorizes the stores.
>>>
>>> So at optimized we have:
>>> ...
>>>    MEM[(int *)&a] = { 0, 1 };
>>>    MEM[(int *)&a + 8B] = { 4, 9 };
>>>    MEM[(int *)&a + 16B] = { 16, 25 };
>>>    MEM[(int *)&a + 24B] = { 36, 49 };
>>>    MEM[(int *)&a + 32B] = { 64, 81 };
>>>    _6 = a[0];
>>>    _28 = a[1];
>>>    res_29 = _6 + _28;
>>>    _35 = a[2];
>>>    res_36 = res_29 + _35;
>>>    _42 = a[3];
>>>    res_43 = res_36 + _42;
>>>    _49 = a[4];
>>>    res_50 = res_43 + _49;
>>>    _56 = a[5];
>>>    res_57 = res_50 + _56;
>>>    _63 = a[6];
>>>    res_64 = res_57 + _63;
>>>    _70 = a[7];
>>>    res_71 = res_64 + _70;
>>>    _77 = a[8];
>>>    res_78 = res_71 + _77;
>>>    _2 = a[9];
>>>    res_11 = _2 + res_78;
>>>    a ={v} {CLOBBER};
>>>    return res_11;
>>> ...
>>>
>>> The stores and loads are eliminated by dse1 in the rtl phase, and in the end
>>> we have:
>>> ...
>>> .visible .func (.param.u32 %value_out) foo
>>> {
>>>          .reg.u32 %value;
>>>          .local .align 16 .b8 %frame_ar[48];
>>>          .reg.u64 %frame;
>>>          cvta.local.u64 %frame, %frame_ar;
>>>          mov.u32 %value, 285;
>>>          st.param.u32    [%value_out], %value;
>>>          ret;
>>> }
>>> ...
>>>
>>>> That's precisely
>>>> what the PR was about...  which means it isn't fixed for nvptx :/
>>>
>>> Indeed the assembly is not optimal, and would be optimal if we'd have optimal
>>> code at optimized.
>>>
>>> FWIW, using this patch we generate optimal code at optimized:
>>> ...
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index 3ebcfc30349..6b64f600c4a 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3.  If not see
>>>         NEXT_PASS (pass_tracer);
>>>         NEXT_PASS (pass_thread_jumps);
>>>         NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
>>> +      NEXT_PASS (pass_fre);
>>>         NEXT_PASS (pass_strlen);
>>>         NEXT_PASS (pass_thread_jumps);
>>>         NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
>>> ...
>>>
>>> and we get:
>>> ...
>>> .visible .func (.param.u32 %value_out) foo
>>> {
>>>          .reg.u32 %value;
>>>          mov.u32 %value, 285;
>>>          st.param.u32    [%value_out], %value;
>>>          ret;
>>> }
>>> ...
>>>
>>> I could file a missing optimization PR for nvptx, but I'm not sure where this
>>> should be fixed.
>>
>> Ah, yeah... the usual issue then.
>>
>> Can you please XFAIL the test on nvptx instead of requiring vect_int_mult?
>>
>
> Done.
>
> Committed at attached.

this caused the test to FAIL on 64-bit (only) sparc-sun-solaris2.11:

FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"

where it was UNSUPPORTED before.

The dump has

;; Function foo (foo, funcdef_no=0, decl_uid=1557, cgraph_uid=0, symbol_order=0)

foo ()
{
  int res;
  int a[10];
  int _2;
  int _6;
  int _28;
  int _35;
  int _42;
  int _49;
  int _56;
  int _63;
  int _70;
  int _77;

  <bb 2> [local count: 97603132]:
  MEM[(int *)&a] = { 0, 1 };
  MEM[(int *)&a + 8B] = { 4, 9 };
  MEM[(int *)&a + 16B] = { 16, 25 };
  MEM[(int *)&a + 24B] = { 36, 49 };
  MEM[(int *)&a + 32B] = { 64, 81 };
  _6 = a[0];
  _28 = a[1];
  res_29 = _6 + _28;
  _35 = a[2];
  res_36 = res_29 + _35;
  _42 = a[3];
  res_43 = res_36 + _42;
  _49 = a[4];
  res_50 = res_43 + _49;
  _56 = a[5];
  res_57 = res_50 + _56;
  _63 = a[6];
  res_64 = res_57 + _63;
  _70 = a[7];
  res_71 = res_64 + _70;
  _77 = a[8];
  res_78 = res_71 + _77;
  _2 = a[9];
  res_11 = _2 + res_78;
  a ={v} {CLOBBER};
  return res_11;

}

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-03-20 20:15           ` Rainer Orth
@ 2018-03-21  8:20             ` Richard Biener
  2018-03-21  9:07               ` Eric Botcazou
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2018-03-21  8:20 UTC (permalink / raw)
  To: Rainer Orth; +Cc: Tom de Vries, gcc-patches, Jan Hubicka

On Tue, 20 Mar 2018, Rainer Orth wrote:

> Hi Tom,
> 
> > On 03/19/2018 10:11 AM, Richard Biener wrote:
> >> On Fri, 16 Mar 2018, Tom de Vries wrote:
> >>
> >>> On 03/16/2018 12:55 PM, Richard Biener wrote:
> >>>> On Fri, 16 Mar 2018, Tom de Vries wrote:
> >>>>
> >>>>> On 02/27/2018 01:42 PM, Richard Biener wrote:
> >>>>>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
> >>>>>> ===================================================================
> >>>>>> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(nonexistent)
> >>>>>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c	(working copy)
> >>>>>> @@ -0,0 +1,15 @@
> >>>>>> +/* { dg-do compile } */
> >>>>>> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> >>>>>> +
> >>>>>> +int foo()
> >>>>>> +{
> >>>>>> +  int a[10];
> >>>>>> +  for(int i = 0; i < 10; ++i)
> >>>>>> +    a[i] = i*i;
> >>>>>> +  int res = 0;
> >>>>>> +  for(int i = 0; i < 10; ++i)
> >>>>>> +    res += a[i];
> >>>>>> +  return res;
> >>>>>> +}
> >>>>>> +
> >>>>>> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
> >>>>>
> >>>>> This fails for nvptx, because it doesn't have the required vector
> >>>>> operations.
> >>>>> To fix the fail, I've added requiring effective target vect_int_mult.
> >>>>
> >>>> On targets that do not vectorize you should see the scalar loops unrolled
> >>>> instead.  Or do you have only one loop vectorized?
> >>>
> >>> Sort of. Loop vectorization has no effect, and the scalar loops are completely
> >>> unrolled. But then slp vectorization vectorizes the stores.
> >>>
> >>> So at optimized we have:
> >>> ...
> >>>    MEM[(int *)&a] = { 0, 1 };
> >>>    MEM[(int *)&a + 8B] = { 4, 9 };
> >>>    MEM[(int *)&a + 16B] = { 16, 25 };
> >>>    MEM[(int *)&a + 24B] = { 36, 49 };
> >>>    MEM[(int *)&a + 32B] = { 64, 81 };
> >>>    _6 = a[0];
> >>>    _28 = a[1];
> >>>    res_29 = _6 + _28;
> >>>    _35 = a[2];
> >>>    res_36 = res_29 + _35;
> >>>    _42 = a[3];
> >>>    res_43 = res_36 + _42;
> >>>    _49 = a[4];
> >>>    res_50 = res_43 + _49;
> >>>    _56 = a[5];
> >>>    res_57 = res_50 + _56;
> >>>    _63 = a[6];
> >>>    res_64 = res_57 + _63;
> >>>    _70 = a[7];
> >>>    res_71 = res_64 + _70;
> >>>    _77 = a[8];
> >>>    res_78 = res_71 + _77;
> >>>    _2 = a[9];
> >>>    res_11 = _2 + res_78;
> >>>    a ={v} {CLOBBER};
> >>>    return res_11;
> >>> ...
> >>>
> >>> The stores and loads are eliminated by dse1 in the rtl phase, and in the end
> >>> we have:
> >>> ...
> >>> .visible .func (.param.u32 %value_out) foo
> >>> {
> >>>          .reg.u32 %value;
> >>>          .local .align 16 .b8 %frame_ar[48];
> >>>          .reg.u64 %frame;
> >>>          cvta.local.u64 %frame, %frame_ar;
> >>>          mov.u32 %value, 285;
> >>>          st.param.u32    [%value_out], %value;
> >>>          ret;
> >>> }
> >>> ...
> >>>
> >>>> That's precisely
> >>>> what the PR was about...  which means it isn't fixed for nvptx :/
> >>>
> >>> Indeed the assembly is not optimal, and would be optimal if we'd have optimal
> >>> code at optimized.
> >>>
> >>> FWIW, using this patch we generate optimal code at optimized:
> >>> ...
> >>> diff --git a/gcc/passes.def b/gcc/passes.def
> >>> index 3ebcfc30349..6b64f600c4a 100644
> >>> --- a/gcc/passes.def
> >>> +++ b/gcc/passes.def
> >>> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3.  If not see
> >>>         NEXT_PASS (pass_tracer);
> >>>         NEXT_PASS (pass_thread_jumps);
> >>>         NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> >>> +      NEXT_PASS (pass_fre);
> >>>         NEXT_PASS (pass_strlen);
> >>>         NEXT_PASS (pass_thread_jumps);
> >>>         NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
> >>> ...
> >>>
> >>> and we get:
> >>> ...
> >>> .visible .func (.param.u32 %value_out) foo
> >>> {
> >>>          .reg.u32 %value;
> >>>          mov.u32 %value, 285;
> >>>          st.param.u32    [%value_out], %value;
> >>>          ret;
> >>> }
> >>> ...
> >>>
> >>> I could file a missing optimization PR for nvptx, but I'm not sure where this
> >>> should be fixed.
> >>
> >> Ah, yeah... the usual issue then.
> >>
> >> Can you please XFAIL the test on nvptx instead of requiring vect_int_mult?
> >>
> >
> > Done.
> >
> > Committed at attached.
> 
> this caused the test to FAIL on 64-bit (only) sparc-sun-solaris2.11:
> 
> FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
> 
> where it was UNSUPPORTED before.

So it failed before Toms original patch.  Please add sparc-solaris
to the list of XFAILed targets.

> The dump has
> 
> ;; Function foo (foo, funcdef_no=0, decl_uid=1557, cgraph_uid=0, symbol_order=0)
> 
> foo ()
> {
>   int res;
>   int a[10];
>   int _2;
>   int _6;
>   int _28;
>   int _35;
>   int _42;
>   int _49;
>   int _56;
>   int _63;
>   int _70;
>   int _77;
> 
>   <bb 2> [local count: 97603132]:
>   MEM[(int *)&a] = { 0, 1 };
>   MEM[(int *)&a + 8B] = { 4, 9 };
>   MEM[(int *)&a + 16B] = { 16, 25 };
>   MEM[(int *)&a + 24B] = { 36, 49 };
>   MEM[(int *)&a + 32B] = { 64, 81 };
>   _6 = a[0];
>   _28 = a[1];
>   res_29 = _6 + _28;
>   _35 = a[2];
>   res_36 = res_29 + _35;
>   _42 = a[3];
>   res_43 = res_36 + _42;
>   _49 = a[4];
>   res_50 = res_43 + _49;
>   _56 = a[5];
>   res_57 = res_50 + _56;
>   _63 = a[6];
>   res_64 = res_57 + _63;
>   _70 = a[7];
>   res_71 = res_64 + _70;
>   _77 = a[8];
>   res_78 = res_71 + _77;
>   _2 = a[9];
>   res_11 = _2 + res_78;
>   a ={v} {CLOBBER};
>   return res_11;
> 
> }
> 
> 	Rainer
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-03-21  8:20             ` Richard Biener
@ 2018-03-21  9:07               ` Eric Botcazou
  2018-03-21 18:37                 ` Rainer Orth
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Botcazou @ 2018-03-21  9:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Rainer Orth, Tom de Vries, Jan Hubicka

> So it failed before Toms original patch.  Please add sparc-solaris
> to the list of XFAILed targets.

SPARC/Linux is affected too so sparc*-*-* instead.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix PR84512
  2018-03-21  9:07               ` Eric Botcazou
@ 2018-03-21 18:37                 ` Rainer Orth
  0 siblings, 0 replies; 10+ messages in thread
From: Rainer Orth @ 2018-03-21 18:37 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Biener, gcc-patches, Tom de Vries, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

Hi Eric,

>> So it failed before Toms original patch.  Please add sparc-solaris
>> to the list of XFAILed targets.
>
> SPARC/Linux is affected too so sparc*-*-* instead.

actually, it's sparc*-*-* && lp64 only.  Done like this after testing on
sparc-sun-solaris2.11 and i386-pc-solaris2.11.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-03-21  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	* gcc.dg/tree-ssa/pr84512.c: xfail on 64-bit SPARC.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: testsuite-pr84512-sparc-xfail.patch --]
[-- Type: text/x-patch, Size: 586 bytes --]

# HG changeset patch
# Parent  50996d41bbbc78ab2cf0002ba6479559089a2337
xfail gcc.dg/tree-ssa/pr84512.c on 64-bit sparc

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
@@ -13,4 +13,4 @@ int foo()
 }
 
 /* Target nvptx xfail due to PR84958.  */
-/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail nvptx*-*-* } } } */
+/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail { nvptx*-*-* || { sparc*-*-* && lp64 } } } } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-21 18:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-27 12:42 [PATCH] Fix PR84512 Richard Biener
2018-03-16 11:48 ` Tom de Vries
2018-03-16 11:56   ` Richard Biener
2018-03-16 15:30     ` Tom de Vries
2018-03-19 11:55       ` Richard Biener
2018-03-20 12:42         ` Tom de Vries
2018-03-20 20:15           ` Rainer Orth
2018-03-21  8:20             ` Richard Biener
2018-03-21  9:07               ` Eric Botcazou
2018-03-21 18:37                 ` Rainer Orth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).