[Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "glisse at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds
Date: Sat, 10 Nov 2012 15:10:00 -0000	[thread overview]
Message-ID: <bug-55266-4@http.gcc.gnu.org/bugzilla/> (raw)

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266

             Bug #: 55266
           Summary: vector expansion: 36 movs for 4 adds
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: glisse@gcc.gnu.org
            Target: x86_64-linux-gnu

I already mentioned this example, but I don't think it is in any PR:

typedef double vec __attribute__((vector_size(4*sizeof(double))));
void f(vec*x){
  *x+=*x+*x;
}

compiled with -S -O3 -msse4, produces 4 add insns (normal), and 36 mov insns,
which is a bit much... For comparison, this should be equivalent to the
following code, which generates only 6 mov insn:

typedef double vec __attribute__((vector_size(2*sizeof(double))));
void f(vec*x){
  x[0]+=x[0]+x[0];
  x[1]+=x[1]+x[1];
}

One minor enhancement would be to have fold_ternary handle BIT_FIELD_REF of
CONSTRUCTOR of vectors (I think it is already tracked elsewhere, though I
couldn't find it).

But the main issue is with copying these fake vectors. Their fake "registers"
are in memory, and copying between those (4 times 2 movs going through rax in
DImode, I assume it is faster than going through xmm registers?) isn't
optimized away. In this example, the content of *x is first copied to a fake
register. Then V2DF parts are extracted, added, and put in memory. That fake
register is now copied to a new fake register. V2DF are taken from it, added to
the V2DF that were still there, and stored to memory. And that is finally
copied to the memory location x.

I don't know how that should be improved. Maybe the vector lowering pass should
go even further, turn the first program into the second one, and not leave any
extra long vectors for the back-end to handle? It doesn't seem easy to optimize
in the back-end, too late. Or maybe something can be done at expansion time?

next             reply	other threads:[~2012-11-10 15:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-10 15:10 glisse at gcc dot gnu.org [this message]
2012-11-13 10:23 ` [Bug middle-end/55266] " glisse at gcc dot gnu.org
2012-11-28 10:11 ` glisse at gcc dot gnu.org
2012-12-09  2:08 ` [Bug middle-end/55266] vector expansion: 24 " pinskia at gcc dot gnu.org
2013-03-03 11:58 ` vincenzo.innocente at cern dot ch
2023-07-21 12:12 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-55266-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).