public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate
@ 2014-10-14 19:02 tavianator at gmail dot com
2014-10-15 8:38 ` [Bug tree-optimization/63537] [4.9/5 Regression] " rguenth at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: tavianator at gmail dot com @ 2014-10-14 19:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
Bug ID: 63537
Summary: Missed optimization: Loop unrolling adds extra copy
when returning aggregate
Product: gcc
Version: 4.9.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tavianator at gmail dot com
Created attachment 33715
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33715&action=edit
Reproducer
At -O2 and above on x86_64, this manually unrolled loop generates much better
code than the automatically unrolled one:
struct vec {
double n[3];
};
struct vec mul_unrolled(struct vec lhs, double rhs) {
struct vec ret;
ret.n[0] = lhs.n[0]*rhs;
ret.n[1] = lhs.n[1]*rhs;
ret.n[2] = lhs.n[2]*rhs;
return ret;
}
This generates the beautiful:
movsd 16(%rsp), %xmm2
movq %rdi, %rax
movsd 24(%rsp), %xmm1
mulsd %xmm0, %xmm2
mulsd %xmm0, %xmm1
mulsd 8(%rsp), %xmm0
movsd %xmm2, 8(%rdi)
movsd %xmm1, 16(%rdi)
movsd %xmm0, (%rdi)
ret
In contrast, at -O2 this:
struct vec mul_loop(struct vec lhs, double rhs) {
struct vec ret;
for (int i = 0; i < 3; ++i) {
ret.n[i] = lhs.n[i]*rhs;
}
return ret;
}
generates this:
movsd 8(%rsp), %xmm1
movq %rdi, %rax
mulsd %xmm0, %xmm1
movsd %xmm1, -40(%rsp)
movq -40(%rsp), %rdx
movsd 16(%rsp), %xmm1
mulsd %xmm0, %xmm1
movq %rdx, (%rdi)
mulsd 24(%rsp), %xmm0
movsd %xmm1, -32(%rsp)
movq -32(%rsp), %rdx
movsd %xmm0, -24(%rsp)
movq %rdx, 8(%rdi)
movq -24(%rsp), %rdx
movq %rdx, 16(%rdi)
ret
which puts the result in -40(%rsp) and then copies it to (%rdi). At -O3 it
gets vectorized but the extra copy is still there:
movapd %xmm0, %xmm1
mulsd 24(%rsp), %xmm0
movupd 8(%rsp), %xmm2
movq %rdi, %rax
unpcklpd %xmm1, %xmm1
mulpd %xmm1, %xmm2
movsd %xmm0, -24(%rsp)
movaps %xmm2, -40(%rsp)
movq -40(%rsp), %rdx
movq %rdx, (%rdi)
movq -32(%rsp), %rdx
movq %rdx, 8(%rdi)
movq -24(%rsp), %rdx
movq %rdx, 16(%rdi)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/63537] [4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
@ 2014-10-15 8:38 ` rguenth at gcc dot gnu.org
2014-10-15 8:39 ` [Bug tree-optimization/63537] [4.8/4.9/5 " rguenth at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-10-15 8:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2014-10-15
Known to work| |4.7.3
Target Milestone|--- |4.9.2
Summary|Missed optimization: Loop |[4.9/5 Regression] Missed
|unrolling adds extra copy |optimization: Loop
|when returning aggregate |unrolling adds extra copy
| |when returning aggregate
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is because the outer loop is unrolled only after SRA gets a chance to
scalarize away the local aggregate. With GCC 4.7 we unroll the loop during
early unrolling even at -O2.
With 4.9 we conclude:
Estimating sizes for loop 1
BB: 4, after_exit: 0
size: 2 if (i_1 <= 2)
Exit condition will be eliminated in peeled copies.
BB: 3, after_exit: 1
size: 1 _4 = lhs.n[i_1];
size: 1 _6 = _4 * rhs_5(D);
size: 1 ret.n[i_1] = _6;
size: 1 i_8 = i_1 + 1;
Induction variable computation will be folded away.
size: 6-3, last_iteration: 2-0
Loop size: 6
Estimated size after unrolling: 7
Not unrolling loop 1: size would grow.
while 4.7 had:
Estimating sizes for loop 1
BB: 4, after_exit: 0
size: 2 if (i_1 <= 2)
Exit condition will be eliminated.
BB: 3, after_exit: 1
size: 1 D.1593_3 = lhs.n[i_1];
size: 1 D.1594_5 = D.1593_3 * rhs_4(D);
size: 1 ret.n[i_1] = D.1594_5;
size: 1 i_6 = i_1 + 1;
Induction variable computation will be folded away.
size: 6-3, last_iteration: 2-2
Loop size: 6
Estimated size after unrolling: 6
so the difference is in last_iteration handling.
Honza?
Otherwise this is a optimization pass ordering issue.
Eventually a simple pass could handle
<retval> = ret;
ret ={v} {CLOBBER};
return <retval>;
and back-propagate <retval> into all stores/loads of ret.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/63537] [4.8/4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
2014-10-15 8:38 ` [Bug tree-optimization/63537] [4.9/5 Regression] " rguenth at gcc dot gnu.org
@ 2014-10-15 8:39 ` rguenth at gcc dot gnu.org
2014-10-15 15:20 ` tavianator at gmail dot com
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-10-15 8:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.9.2 |4.8.4
Summary|[4.9/5 Regression] Missed |[4.8/4.9/5 Regression]
|optimization: Loop |Missed optimization: Loop
|unrolling adds extra copy |unrolling adds extra copy
|when returning aggregate |when returning aggregate
Known to fail| |4.8.3, 4.9.1, 5.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/63537] [4.8/4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
2014-10-15 8:38 ` [Bug tree-optimization/63537] [4.9/5 Regression] " rguenth at gcc dot gnu.org
2014-10-15 8:39 ` [Bug tree-optimization/63537] [4.8/4.9/5 " rguenth at gcc dot gnu.org
@ 2014-10-15 15:20 ` tavianator at gmail dot com
2014-11-24 13:36 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: tavianator at gmail dot com @ 2014-10-15 15:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
--- Comment #2 from Tavian Barnes <tavianator at gmail dot com> ---
Is it possible to make SRA work even if the loop isn't unrolled? If the array
size is increased to 4 then -O2 doesn't unroll the loop at all, resulting in:
movq %rdi, %rax
xorl %edx, %edx
.L3:
movsd 8(%rsp,%rdx), %xmm1
mulsd %xmm0, %xmm1
movsd %xmm1, -40(%rsp,%rdx)
addq $8, %rdx
cmpq $32, %rdx
jne .L3
movq -40(%rsp), %rdx
movq %rdx, (%rax)
movq -32(%rsp), %rdx
movq %rdx, 8(%rax)
movq -24(%rsp), %rdx
movq %rdx, 16(%rax)
movq -16(%rsp), %rdx
movq %rdx, 24(%rax)
ret
which would be a lot prettier as something like:
movq %rdi, %rax
xorl %edx, %edx
.L3:
movsd 8(%rsp,%rdx), %xmm1
mulsd %xmm0, %xmm1
movsd %xmm1, (%rax,%rdx)
addl $8, %edx
cmpl $32, %edx
jne .L3
ret
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/63537] [4.8/4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
` (2 preceding siblings ...)
2014-10-15 15:20 ` tavianator at gmail dot com
@ 2014-11-24 13:36 ` rguenth at gcc dot gnu.org
2014-12-19 13:33 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-24 13:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/63537] [4.8/4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
` (3 preceding siblings ...)
2014-11-24 13:36 ` rguenth at gcc dot gnu.org
@ 2014-12-19 13:33 ` jakub at gcc dot gnu.org
2015-06-23 8:25 ` [Bug tree-optimization/63537] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-12-19 13:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.4 |4.8.5
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.4 has been released.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/63537] [4.8/4.9/5/6 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
` (4 preceding siblings ...)
2014-12-19 13:33 ` jakub at gcc dot gnu.org
@ 2015-06-23 8:25 ` rguenth at gcc dot gnu.org
2015-06-26 20:01 ` [Bug tree-optimization/63537] [4.9/5/6 " jakub at gcc dot gnu.org
2015-06-26 20:31 ` jakub at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-23 8:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.5 |4.9.3
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/63537] [4.9/5/6 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
` (5 preceding siblings ...)
2015-06-23 8:25 ` [Bug tree-optimization/63537] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
@ 2015-06-26 20:01 ` jakub at gcc dot gnu.org
2015-06-26 20:31 ` jakub at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.9.3 has been released.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/63537] [4.9/5/6 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
` (6 preceding siblings ...)
2015-06-26 20:01 ` [Bug tree-optimization/63537] [4.9/5/6 " jakub at gcc dot gnu.org
@ 2015-06-26 20:31 ` jakub at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.9.3 |4.9.4
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-06-26 20:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-14 19:02 [Bug tree-optimization/63537] New: Missed optimization: Loop unrolling adds extra copy when returning aggregate tavianator at gmail dot com
2014-10-15 8:38 ` [Bug tree-optimization/63537] [4.9/5 Regression] " rguenth at gcc dot gnu.org
2014-10-15 8:39 ` [Bug tree-optimization/63537] [4.8/4.9/5 " rguenth at gcc dot gnu.org
2014-10-15 15:20 ` tavianator at gmail dot com
2014-11-24 13:36 ` rguenth at gcc dot gnu.org
2014-12-19 13:33 ` jakub at gcc dot gnu.org
2015-06-23 8:25 ` [Bug tree-optimization/63537] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
2015-06-26 20:01 ` [Bug tree-optimization/63537] [4.9/5/6 " jakub at gcc dot gnu.org
2015-06-26 20:31 ` jakub at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).