public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Alan Lawrence <alan.lawrence@arm.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Fix PR56118
Date: Mon, 16 Nov 2015 08:39:00 -0000	[thread overview]
Message-ID: <alpine.LSU.2.11.1511160931100.4884@t29.fhfr.qr> (raw)
In-Reply-To: <56462679.5040908@arm.com>

On Fri, 13 Nov 2015, Alan Lawrence wrote:

> On 10/11/15 09:34, Richard Biener wrote:
> > 
> > The following fixes PR56118 by adjusting the cost model handling of
> > basic-block vectorization to favor the vectorized version in case
> > estimated cost is the same as the estimated cost of the scalar
> > version.  This makes sense because we over-estimate the vectorized
> > cost in several places.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > 
> > Richard.
> > 
> > 2015-11-10  Richard Biener  <rguenther@suse.de>
> > 
> > 	PR tree-optimization/56118
> > 	* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
> > 	cost favor vectorized version.
> > 
> > 	* gcc.target/i386/pr56118.c: New testcase.
> 
> On AArch64 and ARM targets, this causes PASS->FAIL of
> 
> gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable"
> gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump slp2
> "vectorization is not profitable"
> 
> ....that sounds like a good thing ;)

It depends ;)  You may want to look at the generated code with/without
vectorization and decide for yourselves.  The testcase is

int foo (int *p)
{
  int x[4];
  int tem0, tem1, tem2, tem3;
  tem0 = p[0] + 1;
  x[0] = tem0;
  tem1 = p[1] + 2;
  x[1] = tem1;
  tem2 = p[2] + 3;
  x[2] = tem2;
  tem3 = p[3] + 4;
  x[3] = tem3;
  bar (x);
  return tem0 + tem1 + tem2 + tem3;
}

which was added to cover the situation where we vectorize the store
to x[] but have to keep the scalar computations for tem[0-3] for
the final reduction.  The scalar cost for this kernel is 3*4
while the vector cost is unaligned load + aligned load + vector op
+ aligned vector store.  We compensate for the out-of-kernel uses
by making the scalar cost just the stores (4).  Now if all of the
vector cost pieces are 1 then we have 4 vs. 4 here.  On x86_64
the unaligned load cost is 2 and thus vectorization is deemed
non-profitable.

In reality this depends on the actual constants used in the plus
as that tells you whether the constant is free for the scalar
plus or not (for vectors it almost always comes from the constant
pool).

On x86_64 the assembler difference is

        movl    (%rdi), %eax
        movdqu  (%rdi), %xmm0
        leal    1(%rax), %r13d
        movl    4(%rdi), %eax
        paddd   .LC0(%rip), %xmm0
        movaps  %xmm0, (%rsp)
        leal    2(%rax), %r12d
        movl    8(%rdi), %eax
        addl    %r13d, %r12d
        leal    3(%rax), %ebp
        movl    12(%rdi), %eax
        movq    %rsp, %rdi
        addl    %r12d, %ebp
        leal    4(%rax), %ebx
        call    bar
        leal    0(%rbp,%rbx), %eax

vs.

        movl    (%rdi), %eax
        movl    12(%rdi), %ebx
        leal    1(%rax), %r13d
        movl    4(%rdi), %eax
        addl    $4, %ebx
        movl    %ebx, 12(%rsp)
        movl    %r13d, (%rsp)
        leal    2(%rax), %r12d
        movl    8(%rdi), %eax
        movq    %rsp, %rdi
        movl    %r12d, 4(%rsp)
        leal    3(%rax), %ebp
        movl    %ebp, 8(%rsp)
        call    bar
        leal    0(%r13,%r12), %eax
        addl    %ebp, %eax
        addl    %ebx, %eax

clearly the testcase could need adjustment to be not so on the
edge of the individual targets.  I'm considering changing it.

The testcase also shows the lack of reduction vectorization in BBs
(I have partial finished patches for this but got distracted...).

>, so I imagine the xfail directive may
> just need updating. The test also looks to be failing on powerpc64 (according
> to https://gcc.gnu.org/ml/gcc-testresults/2015-11/msg01327.html).

I'll try making the testcase more complicated instead.

Richard.

> Regards, Alan
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

      reply	other threads:[~2015-11-16  8:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-10  9:34 Richard Biener
2015-11-13 18:05 ` Alan Lawrence
2015-11-16  8:39   ` Richard Biener [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1511160931100.4884@t29.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=alan.lawrence@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).