public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
@ 2015-01-21 5:11 vmakarov at gcc dot gnu.org
2015-01-23 2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-01-21 5:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
Bug ID: 64705
Summary: Bad code generation of sieve on x86-64 because of too
aggressive IV optimizations
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vmakarov at gcc dot gnu.org
Created attachment 34510
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34510&action=edit
preprocessed sieve program from aburto benchmarks
GCC on trunk generates a bad x86-64 code for the hotest loop in sieve
(preprocessed file in the attachment) for -Ofast -march=core-avx2:
for(k = i + prime ; k<=size ; k+=prime)
{
ci++;
*(flags+k)=0;
}
The GCC generated code is
.L82:
movb $0, (%rax)
addq %rsi, %rax
addq $1, %rbx
leaq (%rax,%rcx), %rdx
cmpq %rdx, %rbp
jge .L82
Here is the code generated by LLVM-3.5 using the same options:
.LBB41_51: # %for.body26.unr
# Parent Loop BB41_45 Depth=1
# => This Inner Loop Header: Depth=2
incq %r12
movb $0, (%rbx,%rax)
addq %r14, %rax
cmpq %r15, %rax
jle .LBB41_51
LLVM generates 5 insns loop instead of 6 insns loop in GCC.
It is achieved by using base+index addressing instead of just base
addressing in GCC which is a result of induction of flags+k expression.
I tried to make base+index addressing with zero cost by the following
patch.
Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c (revision 219705)
+++ tree-ssa-loop-ivopts.c (working copy)
@@ -3458,7 +3458,7 @@ get_address_cost (bool symbol_present, b
end_sequence ();
acost = seq_cost (seq, speed);
- acost += address_cost (addr, mem_mode, as, speed);
+ acost = 0;
if (!acost)
acost = 1;
I got base+index addressing but still it is 6 insn loop becuase of
induction of other expressions.
.L82:
movb $0, (%r8,%rax)
addq %rcx, %rax
addq $1, %rbx
leaq (%rsi,%rax), %rdx
cmpq %rdx, %r14
jge .L82
Again, too aggressive iv optimization results in worse code generated
by GCC. The code can be better which is demonstrated by LLVM-3.5.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
@ 2015-01-23 2:14 ` amker at gcc dot gnu.org
2015-01-23 4:01 ` amker at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-01-23 2:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
amker at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2015-01-23
CC| |amker at gcc dot gnu.org
Assignee|unassigned at gcc dot gnu.org |amker at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #1 from amker at gcc dot gnu.org ---
Confirmed. I shall have a look.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
2015-01-23 2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
@ 2015-01-23 4:01 ` amker at gcc dot gnu.org
2015-01-23 7:24 ` amker at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-01-23 4:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
--- Comment #2 from amker at gcc dot gnu.org ---
Loop dump before IVOPT is like below:
Loop 4, basic blocks 28/30;
<bb 26>:
count_54 = count_172 + 1;
_55 = i_161 + i_161;
prime_56 = _55 + 3;
k_57 = prime_56 + i_161;
if (size_26 >= k_57)
goto <bb 27>;
else
goto <bb 31>;
<bb 27>:
<bb 28>:
# k_167 = PHI <k_57(27), k_62(30)>
# ci_168 = PHI <ci_169(27), ci_58(30)>
ci_58 = ci_168 + 1;
k.19_59 = (sizetype) k_167;
_60 = flags_30 + k.19_59;
*_60 = 0;
k_62 = prime_56 + k_167;
if (size_26 >= k_62)
goto <bb 30>;
else
goto <bb 29>;
<bb 29>:
# ci_154 = PHI <ci_58(28)>
goto <bb 31>;
<bb 30>:
goto <bb 28>;
The IV uses found by IVOPT is like below:
use 0
address
defined in statement
used in statement *_60 = 0;
at position *_60
type char *
base flags_30 + (sizetype) k_57
step (sizetype) prime_56
base object (void *) flags_30
related candidates
use 1
compare
defined in statement
used in statement if (size_26 >= k_62)
at position
type long int
base (_55 + 3) + k_57
step prime_56
is a biv
related candidates
use 2
generic (computed on exit edge)
defined in statement ci_58 = ci_168 + 1;
used in statement ci_154 = PHI <ci_58(28)>
at position
type long int
base ci_169 + 1
step 1
is a biv
related candidates
Root cause is IVOPT expands use 1 from {prime_56 + k_57, prime_56}_loop to
{(_55 + _3) + k_57, prime_56}_loop. Thus information of "iv.step == prime_56
== (_55+_3)" is lost during costs computation and uses rewrting, resulting in
wrong candidate selected and bloated loop after IVOPT.
The related code is in function find_givs_in_stmt_scev, specifically,
if (!simple_iv (loop, loop_containing_stmt (stmt), lhs, iv, true))
return false;
iv->base = expand_simple_operations (iv->base); // <--- expansion
I will see how to fix the issue by skipping expansion in case like this.
Thanks,
bin
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
2015-01-23 2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
2015-01-23 4:01 ` amker at gcc dot gnu.org
@ 2015-01-23 7:24 ` amker at gcc dot gnu.org
2015-02-05 6:39 ` amker at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-01-23 7:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
amker at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target|x86_64-*-* |x86_64-*-*, aarch64
--- Comment #3 from amker at gcc dot gnu.org ---
Also it's a target independent issue.
Though IVOPT chooses base+index addressing mode, it needs one more instruction
to calculate the condition.
LLVM's assembly:
.LBB34_7:
add x26, x26, #1
strb wzr, [x22, x9]
add x9, x9, x24
cmp x9, x28
b.le .LBB34_7
GCC's assembly:
.L71:
strb wzr, [x27, x0]
add x0, x0, x2
add x19, x19, 1
add x1, x4, x0
cmp x21, x1
bge .L71
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
` (2 preceding siblings ...)
2015-01-23 7:24 ` amker at gcc dot gnu.org
@ 2015-02-05 6:39 ` amker at gcc dot gnu.org
2015-02-13 5:45 ` amker at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-02-05 6:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
--- Comment #4 from amker at gcc dot gnu.org ---
I had a patch.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
` (3 preceding siblings ...)
2015-02-05 6:39 ` amker at gcc dot gnu.org
@ 2015-02-13 5:45 ` amker at gcc dot gnu.org
2015-02-13 5:56 ` amker at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-02-13 5:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
--- Comment #5 from amker at gcc dot gnu.org ---
Author: amker
Date: Fri Feb 13 05:44:46 2015
New Revision: 220676
URL: https://gcc.gnu.org/viewcvs?rev=220676&root=gcc&view=rev
Log:
PR tree-optimization/64705
* tree-ssa-loop-niter.h (expand_simple_operations): New parameter.
* tree-ssa-loop-niter.c (expand_simple_operations): New parameter.
* tree-ssa-loop-ivopts.c (extract_single_var_from_expr): New.
(find_bivs, find_givs_in_stmt_scev): Pass new argument to
expand_simple_operations.
testsuite
PR tree-optimization/64705
* gcc.dg/tree-ssa/pr64705.c: New test.
Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr64705.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-loop-ivopts.c
trunk/gcc/tree-ssa-loop-niter.c
trunk/gcc/tree-ssa-loop-niter.h
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
` (4 preceding siblings ...)
2015-02-13 5:45 ` amker at gcc dot gnu.org
@ 2015-02-13 5:56 ` amker at gcc dot gnu.org
2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
2015-03-12 1:39 ` amker at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-02-13 5:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
--- Comment #6 from amker at gcc dot gnu.org ---
Since it works on gcc 3.4, so I consider this as a regression and applied the
patch. Should be fixed now.
Hi Vlad, could you please help me verify that the original benchmark is fixed
too? Thanks very much!
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
` (5 preceding siblings ...)
2015-02-13 5:56 ` amker at gcc dot gnu.org
@ 2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
2015-03-12 1:39 ` amker at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-03-11 15:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
--- Comment #7 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to amker from comment #6)
> Since it works on gcc 3.4, so I consider this as a regression and applied
> the patch. Should be fixed now.
>
> Hi Vlad, could you please help me verify that the original benchmark is
> fixed too? Thanks very much!
Yes, it was fixed. Thanks for working on this.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
` (6 preceding siblings ...)
2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
@ 2015-03-12 1:39 ` amker at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-03-12 1:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705
amker at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #8 from amker at gcc dot gnu.org ---
Fixed according to Vlad's input.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-03-12 1:39 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-21 5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
2015-01-23 2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
2015-01-23 4:01 ` amker at gcc dot gnu.org
2015-01-23 7:24 ` amker at gcc dot gnu.org
2015-02-05 6:39 ` amker at gcc dot gnu.org
2015-02-13 5:45 ` amker at gcc dot gnu.org
2015-02-13 5:56 ` amker at gcc dot gnu.org
2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
2015-03-12 1:39 ` amker at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).