* [PATCH] Fix PR56118
@ 2015-11-10 9:34 Richard Biener
2015-11-13 18:05 ` Alan Lawrence
0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2015-11-10 9:34 UTC (permalink / raw)
To: gcc-patches
The following fixes PR56118 by adjusting the cost model handling of
basic-block vectorization to favor the vectorized version in case
estimated cost is the same as the estimated cost of the scalar
version. This makes sense because we over-estimate the vectorized
cost in several places.
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
Richard.
2015-11-10 Richard Biener <rguenther@suse.de>
PR tree-optimization/56118
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
cost favor vectorized version.
* gcc.target/i386/pr56118.c: New testcase.
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c (revision 230020)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2317,9 +2317,12 @@ vect_bb_vectorization_profitable_p (bb_v
dump_printf (MSG_NOTE, " Scalar cost of basic block: %d\n", scalar_cost);
}
- /* Vectorization is profitable if its cost is less than the cost of scalar
- version. */
- if (vec_outside_cost + vec_inside_cost >= scalar_cost)
+ /* Vectorization is profitable if its cost is more than the cost of scalar
+ version. Note that we err on the vector side for equal cost because
+ the cost estimate is otherwise quite pessimistic (constant uses are
+ free on the scalar side but cost a load on the vector side for
+ example). */
+ if (vec_outside_cost + vec_inside_cost > scalar_cost)
return false;
return true;
Index: gcc/testsuite/gcc.target/i386/pr56118.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr56118.c (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr56118.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -msse2" } */
+
+#include <emmintrin.h>
+
+__m128d f()
+{
+ __m128d r={3,4};
+ r[0]=1;
+ r[1]=2;
+ return r;
+}
+
+/* We want to "vectorize" this to a aligned vector load from the
+ constant pool. */
+
+/* { dg-final { scan-assembler "movapd" } } */
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Fix PR56118
2015-11-10 9:34 [PATCH] Fix PR56118 Richard Biener
@ 2015-11-13 18:05 ` Alan Lawrence
2015-11-16 8:39 ` Richard Biener
0 siblings, 1 reply; 3+ messages in thread
From: Alan Lawrence @ 2015-11-13 18:05 UTC (permalink / raw)
To: Richard Biener, gcc-patches
On 10/11/15 09:34, Richard Biener wrote:
>
> The following fixes PR56118 by adjusting the cost model handling of
> basic-block vectorization to favor the vectorized version in case
> estimated cost is the same as the estimated cost of the scalar
> version. This makes sense because we over-estimate the vectorized
> cost in several places.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2015-11-10 Richard Biener <rguenther@suse.de>
>
> PR tree-optimization/56118
> * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
> cost favor vectorized version.
>
> * gcc.target/i386/pr56118.c: New testcase.
On AArch64 and ARM targets, this causes PASS->FAIL of
gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable"
gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump slp2
"vectorization is not profitable"
....that sounds like a good thing ;), so I imagine the xfail directive may just
need updating. The test also looks to be failing on powerpc64 (according to
https://gcc.gnu.org/ml/gcc-testresults/2015-11/msg01327.html).
Regards, Alan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Fix PR56118
2015-11-13 18:05 ` Alan Lawrence
@ 2015-11-16 8:39 ` Richard Biener
0 siblings, 0 replies; 3+ messages in thread
From: Richard Biener @ 2015-11-16 8:39 UTC (permalink / raw)
To: Alan Lawrence; +Cc: gcc-patches
On Fri, 13 Nov 2015, Alan Lawrence wrote:
> On 10/11/15 09:34, Richard Biener wrote:
> >
> > The following fixes PR56118 by adjusting the cost model handling of
> > basic-block vectorization to favor the vectorized version in case
> > estimated cost is the same as the estimated cost of the scalar
> > version. This makes sense because we over-estimate the vectorized
> > cost in several places.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >
> > Richard.
> >
> > 2015-11-10 Richard Biener <rguenther@suse.de>
> >
> > PR tree-optimization/56118
> > * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
> > cost favor vectorized version.
> >
> > * gcc.target/i386/pr56118.c: New testcase.
>
> On AArch64 and ARM targets, this causes PASS->FAIL of
>
> gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable"
> gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump slp2
> "vectorization is not profitable"
>
> ....that sounds like a good thing ;)
It depends ;) You may want to look at the generated code with/without
vectorization and decide for yourselves. The testcase is
int foo (int *p)
{
int x[4];
int tem0, tem1, tem2, tem3;
tem0 = p[0] + 1;
x[0] = tem0;
tem1 = p[1] + 2;
x[1] = tem1;
tem2 = p[2] + 3;
x[2] = tem2;
tem3 = p[3] + 4;
x[3] = tem3;
bar (x);
return tem0 + tem1 + tem2 + tem3;
}
which was added to cover the situation where we vectorize the store
to x[] but have to keep the scalar computations for tem[0-3] for
the final reduction. The scalar cost for this kernel is 3*4
while the vector cost is unaligned load + aligned load + vector op
+ aligned vector store. We compensate for the out-of-kernel uses
by making the scalar cost just the stores (4). Now if all of the
vector cost pieces are 1 then we have 4 vs. 4 here. On x86_64
the unaligned load cost is 2 and thus vectorization is deemed
non-profitable.
In reality this depends on the actual constants used in the plus
as that tells you whether the constant is free for the scalar
plus or not (for vectors it almost always comes from the constant
pool).
On x86_64 the assembler difference is
movl (%rdi), %eax
movdqu (%rdi), %xmm0
leal 1(%rax), %r13d
movl 4(%rdi), %eax
paddd .LC0(%rip), %xmm0
movaps %xmm0, (%rsp)
leal 2(%rax), %r12d
movl 8(%rdi), %eax
addl %r13d, %r12d
leal 3(%rax), %ebp
movl 12(%rdi), %eax
movq %rsp, %rdi
addl %r12d, %ebp
leal 4(%rax), %ebx
call bar
leal 0(%rbp,%rbx), %eax
vs.
movl (%rdi), %eax
movl 12(%rdi), %ebx
leal 1(%rax), %r13d
movl 4(%rdi), %eax
addl $4, %ebx
movl %ebx, 12(%rsp)
movl %r13d, (%rsp)
leal 2(%rax), %r12d
movl 8(%rdi), %eax
movq %rsp, %rdi
movl %r12d, 4(%rsp)
leal 3(%rax), %ebp
movl %ebp, 8(%rsp)
call bar
leal 0(%r13,%r12), %eax
addl %ebp, %eax
addl %ebx, %eax
clearly the testcase could need adjustment to be not so on the
edge of the individual targets. I'm considering changing it.
The testcase also shows the lack of reduction vectorization in BBs
(I have partial finished patches for this but got distracted...).
>, so I imagine the xfail directive may
> just need updating. The test also looks to be failing on powerpc64 (according
> to https://gcc.gnu.org/ml/gcc-testresults/2015-11/msg01327.html).
I'll try making the testcase more complicated instead.
Richard.
> Regards, Alan
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-11-16 8:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-10 9:34 [PATCH] Fix PR56118 Richard Biener
2015-11-13 18:05 ` Alan Lawrence
2015-11-16 8:39 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).