public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/54116] New: suboptimal code for tight loops
@ 2012-07-29 9:09 neleai at seznam dot cz
2012-07-29 10:13 ` [Bug tree-optimization/54116] " pinskia at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: neleai at seznam dot cz @ 2012-07-29 9:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116
Bug #: 54116
Summary: suboptimal code for tight loops
Classification: Unclassified
Product: gcc
Version: 4.7.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: neleai@seznam.cz
Consider following loop.
int recal(int *x){int i;
for(i=0;;i+=4){
if(__builtin_expect((x[i]|x[i+1])|(x[i+2]|x[i+3]),0))
break;
}
return (x[i]|x[i+1])*(x[i+2]|x[i+3]);
}
On x64 orl instruction is destructive. Gcc saves intermediate result to
register instead recalculating it at end of loop, making loop run slower.
Relevant assembly output is following:
gcc-4.7 -O3 -S
.file "recal.c"
.text
.p2align 4,,15
.globl recal
.type recal, @function
recal:
.LFB0:
.cfi_startproc
movl 12(%rdi), %edx
orl 8(%rdi), %edx
movl 4(%rdi), %ecx
orl (%rdi), %ecx
movl %edx, %eax
orl %ecx, %eax
jne .L2
leaq 16(%rdi), %rax
.p2align 4,,10
.p2align 3
.L3:
movl 12(%rax), %edx
orl 8(%rax), %edx
movl 4(%rax), %ecx
orl (%rax), %ecx
addq $16, %rax
movl %edx, %esi
orl %ecx, %esi
je .L3
.L2:
movl %ecx, %eax
imull %edx, %eax
ret
.cfi_endproc
.LFE0:
.size recal, .-recal
.ident "GCC: (Debian 4.7.1-2) 4.7.1"
.section .note.GNU-stack,"",@progbits
--
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/54116] suboptimal code for tight loops
2012-07-29 9:09 [Bug tree-optimization/54116] New: suboptimal code for tight loops neleai at seznam dot cz
@ 2012-07-29 10:13 ` pinskia at gcc dot gnu.org
2012-07-29 10:31 ` neleai at seznam dot cz
2021-08-07 5:39 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-07-29 10:13 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-07-29 10:13:41 UTC ---
The tree level looks correct:
<bb 4>:
# ivtmp.23_69 = PHI <ivtmp.23_68(5), ivtmp.23_66(3)>
D.1771_65 = (void *) ivtmp.23_69;
D.1717_6 = MEM[base: D.1771_65, offset: 0B];
D.1722_10 = MEM[base: D.1771_65, offset: 4B];
D.1723_11 = D.1722_10 | D.1717_6;
D.1727_15 = MEM[base: D.1771_65, offset: 8B];
D.1731_19 = MEM[base: D.1771_65, offset: 12B];
D.1732_20 = D.1731_19 | D.1727_15;
D.1733_21 = D.1732_20 | D.1723_11;
ivtmp.23_68 = ivtmp.23_69 + 16;
if (D.1733_21 != 0)
goto <bb 6>;
else
goto <bb 5>;
<bb 5>:
goto <bb 4>;
<bb 6>:
# D.1723_22 = PHI <D.1723_11(4), D.1723_36(2)>
# D.1732_23 = PHI <D.1732_20(4), D.1732_45(2)>
D.1737_25 = D.1723_22 * D.1732_23;
return D.1737_25;
--- CUT ----
Are you trying to say GCC should copy the loop header in this case?
Or keeping around x[i]|x[i+1] and x[i+2]|x[i+3] result increases register
pressure and exposes issues with 2-operand machines in some cases?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/54116] suboptimal code for tight loops
2012-07-29 9:09 [Bug tree-optimization/54116] New: suboptimal code for tight loops neleai at seznam dot cz
2012-07-29 10:13 ` [Bug tree-optimization/54116] " pinskia at gcc dot gnu.org
@ 2012-07-29 10:31 ` neleai at seznam dot cz
2021-08-07 5:39 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: neleai at seznam dot cz @ 2012-07-29 10:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116
--- Comment #2 from Ondrej Bilka <neleai at seznam dot cz> 2012-07-29 10:30:46 UTC ---
On Sun, Jul 29, 2012 at 10:13:41AM +0000, pinskia at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116
>
> --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-07-29 10:13:41 UTC ---
> The tree level looks correct:
> <bb 4>:
> # ivtmp.23_69 = PHI <ivtmp.23_68(5), ivtmp.23_66(3)>
> D.1771_65 = (void *) ivtmp.23_69;
> D.1717_6 = MEM[base: D.1771_65, offset: 0B];
> D.1722_10 = MEM[base: D.1771_65, offset: 4B];
> D.1723_11 = D.1722_10 | D.1717_6;
> D.1727_15 = MEM[base: D.1771_65, offset: 8B];
> D.1731_19 = MEM[base: D.1771_65, offset: 12B];
> D.1732_20 = D.1731_19 | D.1727_15;
> D.1733_21 = D.1732_20 | D.1723_11;
> ivtmp.23_68 = ivtmp.23_69 + 16;
> if (D.1733_21 != 0)
> goto <bb 6>;
> else
> goto <bb 5>;
>
> <bb 5>:
> goto <bb 4>;
>
> <bb 6>:
> # D.1723_22 = PHI <D.1723_11(4), D.1723_36(2)>
> # D.1732_23 = PHI <D.1732_20(4), D.1732_45(2)>
> D.1737_25 = D.1723_22 * D.1732_23;
> return D.1737_25;
>
>
> --- CUT ----
> Are you trying to say GCC should copy the loop header in this case?
> Or keeping around x[i]|x[i+1] and x[i+2]|x[i+3] result increases register
> pressure and exposes issues with 2-operand machines in some cases?
Increases register pressure.
>
> --
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/54116] suboptimal code for tight loops
2012-07-29 9:09 [Bug tree-optimization/54116] New: suboptimal code for tight loops neleai at seznam dot cz
2012-07-29 10:13 ` [Bug tree-optimization/54116] " pinskia at gcc dot gnu.org
2012-07-29 10:31 ` neleai at seznam dot cz
@ 2021-08-07 5:39 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-07 5:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2021-08-07
Status|UNCONFIRMED |WAITING
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC, clang and ICC all optimize it this way.
Do you have a testcase that causes a performance of increased register
pressure?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-08-07 5:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-29 9:09 [Bug tree-optimization/54116] New: suboptimal code for tight loops neleai at seznam dot cz
2012-07-29 10:13 ` [Bug tree-optimization/54116] " pinskia at gcc dot gnu.org
2012-07-29 10:31 ` neleai at seznam dot cz
2021-08-07 5:39 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).