public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PR56118
@ 2015-11-10  9:34 Richard Biener
  2015-11-13 18:05 ` Alan Lawrence
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2015-11-10  9:34 UTC (permalink / raw)
  To: gcc-patches


The following fixes PR56118 by adjusting the cost model handling of
basic-block vectorization to favor the vectorized version in case
estimated cost is the same as the estimated cost of the scalar
version.  This makes sense because we over-estimate the vectorized
cost in several places.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-10  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/56118
	* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
	cost favor vectorized version.

	* gcc.target/i386/pr56118.c: New testcase.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	(revision 230020)
+++ gcc/tree-vect-slp.c	(working copy)
@@ -2317,9 +2317,12 @@ vect_bb_vectorization_profitable_p (bb_v
       dump_printf (MSG_NOTE, "  Scalar cost of basic block: %d\n", scalar_cost);
     }
 
-  /* Vectorization is profitable if its cost is less than the cost of scalar
-     version.  */
-  if (vec_outside_cost + vec_inside_cost >= scalar_cost)
+  /* Vectorization is profitable if its cost is more than the cost of scalar
+     version.  Note that we err on the vector side for equal cost because
+     the cost estimate is otherwise quite pessimistic (constant uses are
+     free on the scalar side but cost a load on the vector side for
+     example).  */
+  if (vec_outside_cost + vec_inside_cost > scalar_cost)
     return false;
 
   return true;
Index: gcc/testsuite/gcc.target/i386/pr56118.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr56118.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pr56118.c	(working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -msse2" } */
+
+#include <emmintrin.h>
+
+__m128d f()
+{
+  __m128d r={3,4};
+  r[0]=1;
+  r[1]=2;
+  return r;
+}
+
+/* We want to "vectorize" this to a aligned vector load from the
+   constant pool.  */
+
+/* { dg-final { scan-assembler "movapd" } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix PR56118
  2015-11-10  9:34 [PATCH] Fix PR56118 Richard Biener
@ 2015-11-13 18:05 ` Alan Lawrence
  2015-11-16  8:39   ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Alan Lawrence @ 2015-11-13 18:05 UTC (permalink / raw)
  To: Richard Biener, gcc-patches

On 10/11/15 09:34, Richard Biener wrote:
>
> The following fixes PR56118 by adjusting the cost model handling of
> basic-block vectorization to favor the vectorized version in case
> estimated cost is the same as the estimated cost of the scalar
> version.  This makes sense because we over-estimate the vectorized
> cost in several places.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2015-11-10  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/56118
> 	* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
> 	cost favor vectorized version.
>
> 	* gcc.target/i386/pr56118.c: New testcase.

On AArch64 and ARM targets, this causes PASS->FAIL of

gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable"
gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump slp2 
"vectorization is not profitable"

....that sounds like a good thing ;), so I imagine the xfail directive may just 
need updating. The test also looks to be failing on powerpc64 (according to 
https://gcc.gnu.org/ml/gcc-testresults/2015-11/msg01327.html).

Regards, Alan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix PR56118
  2015-11-13 18:05 ` Alan Lawrence
@ 2015-11-16  8:39   ` Richard Biener
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Biener @ 2015-11-16  8:39 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On Fri, 13 Nov 2015, Alan Lawrence wrote:

> On 10/11/15 09:34, Richard Biener wrote:
> > 
> > The following fixes PR56118 by adjusting the cost model handling of
> > basic-block vectorization to favor the vectorized version in case
> > estimated cost is the same as the estimated cost of the scalar
> > version.  This makes sense because we over-estimate the vectorized
> > cost in several places.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > 
> > Richard.
> > 
> > 2015-11-10  Richard Biener  <rguenther@suse.de>
> > 
> > 	PR tree-optimization/56118
> > 	* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
> > 	cost favor vectorized version.
> > 
> > 	* gcc.target/i386/pr56118.c: New testcase.
> 
> On AArch64 and ARM targets, this causes PASS->FAIL of
> 
> gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable"
> gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump slp2
> "vectorization is not profitable"
> 
> ....that sounds like a good thing ;)

It depends ;)  You may want to look at the generated code with/without
vectorization and decide for yourselves.  The testcase is

int foo (int *p)
{
  int x[4];
  int tem0, tem1, tem2, tem3;
  tem0 = p[0] + 1;
  x[0] = tem0;
  tem1 = p[1] + 2;
  x[1] = tem1;
  tem2 = p[2] + 3;
  x[2] = tem2;
  tem3 = p[3] + 4;
  x[3] = tem3;
  bar (x);
  return tem0 + tem1 + tem2 + tem3;
}

which was added to cover the situation where we vectorize the store
to x[] but have to keep the scalar computations for tem[0-3] for
the final reduction.  The scalar cost for this kernel is 3*4
while the vector cost is unaligned load + aligned load + vector op
+ aligned vector store.  We compensate for the out-of-kernel uses
by making the scalar cost just the stores (4).  Now if all of the
vector cost pieces are 1 then we have 4 vs. 4 here.  On x86_64
the unaligned load cost is 2 and thus vectorization is deemed
non-profitable.

In reality this depends on the actual constants used in the plus
as that tells you whether the constant is free for the scalar
plus or not (for vectors it almost always comes from the constant
pool).

On x86_64 the assembler difference is

        movl    (%rdi), %eax
        movdqu  (%rdi), %xmm0
        leal    1(%rax), %r13d
        movl    4(%rdi), %eax
        paddd   .LC0(%rip), %xmm0
        movaps  %xmm0, (%rsp)
        leal    2(%rax), %r12d
        movl    8(%rdi), %eax
        addl    %r13d, %r12d
        leal    3(%rax), %ebp
        movl    12(%rdi), %eax
        movq    %rsp, %rdi
        addl    %r12d, %ebp
        leal    4(%rax), %ebx
        call    bar
        leal    0(%rbp,%rbx), %eax

vs.

        movl    (%rdi), %eax
        movl    12(%rdi), %ebx
        leal    1(%rax), %r13d
        movl    4(%rdi), %eax
        addl    $4, %ebx
        movl    %ebx, 12(%rsp)
        movl    %r13d, (%rsp)
        leal    2(%rax), %r12d
        movl    8(%rdi), %eax
        movq    %rsp, %rdi
        movl    %r12d, 4(%rsp)
        leal    3(%rax), %ebp
        movl    %ebp, 8(%rsp)
        call    bar
        leal    0(%r13,%r12), %eax
        addl    %ebp, %eax
        addl    %ebx, %eax

clearly the testcase could need adjustment to be not so on the
edge of the individual targets.  I'm considering changing it.

The testcase also shows the lack of reduction vectorization in BBs
(I have partial finished patches for this but got distracted...).

>, so I imagine the xfail directive may
> just need updating. The test also looks to be failing on powerpc64 (according
> to https://gcc.gnu.org/ml/gcc-testresults/2015-11/msg01327.html).

I'll try making the testcase more complicated instead.

Richard.

> Regards, Alan
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-11-16  8:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-10  9:34 [PATCH] Fix PR56118 Richard Biener
2015-11-13 18:05 ` Alan Lawrence
2015-11-16  8:39   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).