public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate
@ 2014-10-14 19:02 tavianator at gmail dot com
  2014-10-15  8:38 ` [Bug tree-optimization/63537] [4.9/5 Regression] " rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: tavianator at gmail dot com @ 2014-10-14 19:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

            Bug ID: 63537
           Summary: Missed optimization: Loop unrolling adds extra copy
                    when returning aggregate
           Product: gcc
           Version: 4.9.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tavianator at gmail dot com

Created attachment 33715
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33715&action=edit
Reproducer

At -O2 and above on x86_64, this manually unrolled loop generates much better
code than the automatically unrolled one:

    struct vec {
        double n[3];
    };

    struct vec mul_unrolled(struct vec lhs, double rhs) {
        struct vec ret;
        ret.n[0] = lhs.n[0]*rhs;
        ret.n[1] = lhs.n[1]*rhs;
        ret.n[2] = lhs.n[2]*rhs;
        return ret;
    }

This generates the beautiful:

    movsd    16(%rsp), %xmm2
    movq    %rdi, %rax
    movsd    24(%rsp), %xmm1
    mulsd    %xmm0, %xmm2
    mulsd    %xmm0, %xmm1
    mulsd    8(%rsp), %xmm0
    movsd    %xmm2, 8(%rdi)
    movsd    %xmm1, 16(%rdi)
    movsd    %xmm0, (%rdi)
    ret

In contrast, at -O2 this:

    struct vec mul_loop(struct vec lhs, double rhs) {
        struct vec ret;
        for (int i = 0; i < 3; ++i) {
            ret.n[i] = lhs.n[i]*rhs;
        }
        return ret;
    }

generates this:

    movsd    8(%rsp), %xmm1
    movq    %rdi, %rax
    mulsd    %xmm0, %xmm1
    movsd    %xmm1, -40(%rsp)
    movq    -40(%rsp), %rdx
    movsd    16(%rsp), %xmm1
    mulsd    %xmm0, %xmm1
    movq    %rdx, (%rdi)
    mulsd    24(%rsp), %xmm0
    movsd    %xmm1, -32(%rsp)
    movq    -32(%rsp), %rdx
    movsd    %xmm0, -24(%rsp)
    movq    %rdx, 8(%rdi)
    movq    -24(%rsp), %rdx
    movq    %rdx, 16(%rdi)
    ret

which puts the result in -40(%rsp) and then copies it to (%rdi).  At -O3 it
gets vectorized but the extra copy is still there:

    movapd    %xmm0, %xmm1
    mulsd    24(%rsp), %xmm0
    movupd    8(%rsp), %xmm2
    movq    %rdi, %rax
    unpcklpd    %xmm1, %xmm1
    mulpd    %xmm1, %xmm2
    movsd    %xmm0, -24(%rsp)
    movaps    %xmm2, -40(%rsp)
    movq    -40(%rsp), %rdx
    movq    %rdx, (%rdi)
    movq    -32(%rsp), %rdx
    movq    %rdx, 8(%rdi)
    movq    -24(%rsp), %rdx
    movq    %rdx, 16(%rdi)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/63537] [4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
  2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
@ 2014-10-15  8:38 ` rguenth at gcc dot gnu.org
  2014-10-15  8:39 ` [Bug tree-optimization/63537] [4.8/4.9/5 " rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-10-15  8:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-10-15
      Known to work|                            |4.7.3
   Target Milestone|---                         |4.9.2
            Summary|Missed optimization: Loop   |[4.9/5 Regression] Missed
                   |unrolling adds extra copy   |optimization: Loop
                   |when returning aggregate    |unrolling adds extra copy
                   |                            |when returning aggregate
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is because the outer loop is unrolled only after SRA gets a chance to
scalarize away the local aggregate.  With GCC 4.7 we unroll the loop during
early unrolling even at -O2.

With 4.9 we conclude:

Estimating sizes for loop 1
 BB: 4, after_exit: 0
  size:   2 if (i_1 <= 2)
   Exit condition will be eliminated in peeled copies.
 BB: 3, after_exit: 1
  size:   1 _4 = lhs.n[i_1];
  size:   1 _6 = _4 * rhs_5(D);
  size:   1 ret.n[i_1] = _6;
  size:   1 i_8 = i_1 + 1;
   Induction variable computation will be folded away.

size: 6-3, last_iteration: 2-0
  Loop size: 6
  Estimated size after unrolling: 7
Not unrolling loop 1: size would grow.

while 4.7 had:

Estimating sizes for loop 1
 BB: 4, after_exit: 0
  size:   2 if (i_1 <= 2)
   Exit condition will be eliminated.
 BB: 3, after_exit: 1
  size:   1 D.1593_3 = lhs.n[i_1];
  size:   1 D.1594_5 = D.1593_3 * rhs_4(D);
  size:   1 ret.n[i_1] = D.1594_5;
  size:   1 i_6 = i_1 + 1;
   Induction variable computation will be folded away.
size: 6-3, last_iteration: 2-2
  Loop size: 6
  Estimated size after unrolling: 6

so the difference is in last_iteration handling.

Honza?

Otherwise this is a optimization pass ordering issue.

Eventually a simple pass could handle

  <retval> = ret;
  ret ={v} {CLOBBER};
  return <retval>;

and back-propagate <retval> into all stores/loads of ret.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/63537] [4.8/4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
  2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
  2014-10-15  8:38 ` [Bug tree-optimization/63537] [4.9/5 Regression] " rguenth at gcc dot gnu.org
@ 2014-10-15  8:39 ` rguenth at gcc dot gnu.org
  2014-10-15 15:20 ` tavianator at gmail dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-10-15  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.9.2                       |4.8.4
            Summary|[4.9/5 Regression] Missed   |[4.8/4.9/5 Regression]
                   |optimization: Loop          |Missed optimization: Loop
                   |unrolling adds extra copy   |unrolling adds extra copy
                   |when returning aggregate    |when returning aggregate
      Known to fail|                            |4.8.3, 4.9.1, 5.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/63537] [4.8/4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
  2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
  2014-10-15  8:38 ` [Bug tree-optimization/63537] [4.9/5 Regression] " rguenth at gcc dot gnu.org
  2014-10-15  8:39 ` [Bug tree-optimization/63537] [4.8/4.9/5 " rguenth at gcc dot gnu.org
@ 2014-10-15 15:20 ` tavianator at gmail dot com
  2014-11-24 13:36 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tavianator at gmail dot com @ 2014-10-15 15:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

--- Comment #2 from Tavian Barnes <tavianator at gmail dot com> ---
Is it possible to make SRA work even if the loop isn't unrolled?  If the array
size is increased to 4 then -O2 doesn't unroll the loop at all, resulting in:

    movq    %rdi, %rax
    xorl    %edx, %edx
.L3:
    movsd    8(%rsp,%rdx), %xmm1
    mulsd    %xmm0, %xmm1
    movsd    %xmm1, -40(%rsp,%rdx)
    addq    $8, %rdx
    cmpq    $32, %rdx
    jne    .L3
    movq    -40(%rsp), %rdx
    movq    %rdx, (%rax)
    movq    -32(%rsp), %rdx
    movq    %rdx, 8(%rax)
    movq    -24(%rsp), %rdx
    movq    %rdx, 16(%rax)
    movq    -16(%rsp), %rdx
    movq    %rdx, 24(%rax)
    ret

which would be a lot prettier as something like:

    movq    %rdi, %rax
    xorl    %edx, %edx
.L3:
    movsd    8(%rsp,%rdx), %xmm1
    mulsd    %xmm0, %xmm1
    movsd    %xmm1, (%rax,%rdx)
    addl    $8, %edx
    cmpl    $32, %edx
    jne    .L3
    ret


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/63537] [4.8/4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
  2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
                   ` (2 preceding siblings ...)
  2014-10-15 15:20 ` tavianator at gmail dot com
@ 2014-11-24 13:36 ` rguenth at gcc dot gnu.org
  2014-12-19 13:33 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-24 13:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/63537] [4.8/4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
  2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
                   ` (3 preceding siblings ...)
  2014-11-24 13:36 ` rguenth at gcc dot gnu.org
@ 2014-12-19 13:33 ` jakub at gcc dot gnu.org
  2015-06-23  8:25 ` [Bug tree-optimization/63537] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-12-19 13:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.4                       |4.8.5

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.4 has been released.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/63537] [4.8/4.9/5/6 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
  2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
                   ` (4 preceding siblings ...)
  2014-12-19 13:33 ` jakub at gcc dot gnu.org
@ 2015-06-23  8:25 ` rguenth at gcc dot gnu.org
  2015-06-26 20:01 ` [Bug tree-optimization/63537] [4.9/5/6 " jakub at gcc dot gnu.org
  2015-06-26 20:31 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-23  8:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.5                       |4.9.3

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/63537] [4.9/5/6 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
  2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
                   ` (5 preceding siblings ...)
  2015-06-23  8:25 ` [Bug tree-optimization/63537] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
@ 2015-06-26 20:01 ` jakub at gcc dot gnu.org
  2015-06-26 20:31 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.9.3 has been released.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/63537] [4.9/5/6 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
  2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
                   ` (6 preceding siblings ...)
  2015-06-26 20:01 ` [Bug tree-optimization/63537] [4.9/5/6 " jakub at gcc dot gnu.org
@ 2015-06-26 20:31 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.9.3                       |4.9.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-06-26 20:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
2014-10-15  8:38 ` [Bug tree-optimization/63537] [4.9/5 Regression] " rguenth at gcc dot gnu.org
2014-10-15  8:39 ` [Bug tree-optimization/63537] [4.8/4.9/5 " rguenth at gcc dot gnu.org
2014-10-15 15:20 ` tavianator at gmail dot com
2014-11-24 13:36 ` rguenth at gcc dot gnu.org
2014-12-19 13:33 ` jakub at gcc dot gnu.org
2015-06-23  8:25 ` [Bug tree-optimization/63537] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
2015-06-26 20:01 ` [Bug tree-optimization/63537] [4.9/5/6 " jakub at gcc dot gnu.org
2015-06-26 20:31 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).