public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/99971] New: GCC generates partially vectorized and scalar code at once
@ 2021-04-08 14:41 andysem at mail dot ru
  2021-04-08 14:45 ` [Bug tree-optimization/99971] " andysem at mail dot ru
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: andysem at mail dot ru @ 2021-04-08 14:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971

            Bug ID: 99971
           Summary: GCC generates partially vectorized and scalar code at
                    once
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following code sample:

struct A
{
    unsigned int a, b, c, d;

    A& operator+= (A const& that)
    {
        a += that.a;
        b += that.b;
        c += that.c;
        d += that.d;
        return *this;
    }

    A& operator-= (A const& that)
    {
        a -= that.a;
        b -= that.b;
        c -= that.c;
        d -= that.d;
        return *this;
    }
};

void test(A& x, A const& y1, A const& y2)
{
    x += y1;
    x -= y2;
}

The code, when compiled with options "-O3 -march=nehalem", generates:

test(A&, A const&, A const&):
        pushq   %rbp
        movdqu  (%rdi), %xmm1
        pushq   %rbx
        movl    4(%rsi), %r8d
        movdqu  (%rsi), %xmm0
        movl    (%rsi), %r9d
        paddd   %xmm1, %xmm0
        movl    8(%rsi), %ecx
        movl    12(%rsi), %eax
        movl    %r8d, %esi
        movl    (%rdi), %ebp
        movl    4(%rdi), %ebx
        movl    8(%rdi), %r11d
        movl    12(%rdi), %r10d
        movups  %xmm0, (%rdi)
        subl    (%rdx), %r9d
        subl    4(%rdx), %esi
        subl    8(%rdx), %ecx
        subl    12(%rdx), %eax
        addl    %ebp, %r9d
        addl    %ebx, %esi
        movl    %r9d, (%rdi)
        popq    %rbx
        addl    %r11d, %ecx
        popq    %rbp
        movl    %esi, 4(%rdi)
        addl    %r10d, %eax
        movl    %ecx, 8(%rdi)
        movl    %eax, 12(%rdi)
        ret

https://gcc.godbolt.org/z/Mzchj8bxG

Here you can see that the compiler has partially vectorized the test function -
it converted "x += y1" to paddd, as expected, but failed to vectorize "x -=
y2". But at the same time the compiler also generated scalar code, including
for the already vectorized "x += y1" line, basically duplicating it.

Note that when either "x += y1" or "x -= y2" is commented, the compiler is able
to vectorize the line that is left. It is also able to vectorize both lines
when the += and -= operators are applied to different objects instead of x.

This is reproducible since gcc 8 up to and including 10.2. gcc 7 doesn't
vectorize this code. With the current trunk on godbolt the generated code is
different:

test(A&, A const&, A const&):
        movdqu  (%rsi), %xmm0
        movdqu  (%rdi), %xmm1
        paddd   %xmm1, %xmm0
        movups  %xmm0, (%rdi)
        movd    %xmm0, %eax
        subl    (%rdx), %eax
        movl    %eax, (%rdi)
        pextrd  $1, %xmm0, %eax
        subl    4(%rdx), %eax
        movl    %eax, 4(%rdi)
        pextrd  $2, %xmm0, %eax
        subl    8(%rdx), %eax
        movl    %eax, 8(%rdi)
        pextrd  $3, %xmm0, %eax
        subl    12(%rdx), %eax
        movl    %eax, 12(%rdi)
        ret

Here the compiler is able to vectorize "x += y1" but not "x -= y2". At least,
it removed the duplicate scalar version of "x += y1".

Given that the compiler is able to vectorize each line in isolation, I would
expect it to be able to vectorize them combined. Generating duplicate versions
of code is certainly not expected.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-04-23  9:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 14:41 [Bug tree-optimization/99971] New: GCC generates partially vectorized and scalar code at once andysem at mail dot ru
2021-04-08 14:45 ` [Bug tree-optimization/99971] " andysem at mail dot ru
2021-04-09  7:05 ` rguenth at gcc dot gnu.org
2021-04-15  9:15 ` andysem at mail dot ru
2021-04-15 11:26 ` rguenth at gcc dot gnu.org
2021-04-15 11:30 ` rguenth at gcc dot gnu.org
2021-04-15 16:01 ` andysem at mail dot ru
2021-04-15 23:17 ` david.bolvansky at gmail dot com
2021-04-23  7:35 ` cvs-commit at gcc dot gnu.org
2021-04-23  7:37 ` rguenth at gcc dot gnu.org
2021-04-23  8:43 ` andysem at mail dot ru
2021-04-23  9:03 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).