public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/95760] New: ivopts with loop variables
@ 2020-06-19 6:17 hailey.chiu at sifive dot com
2020-06-21 6:12 ` [Bug tree-optimization/95760] " wilson at gcc dot gnu.org
2020-06-22 23:40 ` wilson at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: hailey.chiu at sifive dot com @ 2020-06-19 6:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760
Bug ID: 95760
Summary: ivopts with loop variables
Product: gcc
Version: tree-ssa
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hailey.chiu at sifive dot com
Target Milestone: ---
C Source:
int **matrix;
int n;
void foo()
{
static int row;
static int col;
static int sum = 0;
for( row = 0 ; row < n ; row++ )
{
for( col = 0 ; col < n ; col++ )
{
sum += matrix[row][col];
}
}
}
Compiling:
$./RISCV-GCC-10.1/bin/riscv64-unknown-elf-gcc foo.c -Os -S -o foo.s
-march=rv32imac -mabi=ilp32
Asm:
foo:
...skip load/store variables...
.L5:
slli a7,a1,2 #row*4
add a7,t3,a7 #matrix + (row*4)
li a0,0
.L3:
bgt a5,a0,.L4
addi a1,a1,1 #row++
mv a0,a5
li a7,1
j .L2
.L4:
lw t1,0(a7)
slli t4,a0,2 #col*4
addi a0,a0,1 #col++
add t1,t1,t4 #*matrix + (col*4)
lw t1,0(t1)
add a6,a6,t1
li t1,1
j .L3
The calculation of matrix offset is not increasing by 4 after each iteration. I
also check that with RISCV-GCC-8.3, it can be emitted code like "add a7, a7, 4"
after each iteration. GCC-10.1 takes two instructions to do this, while GCC-8.3
takes one. I think it might affect code size / performance slightly.
I am also wondering if "col" can be optimized to the add by 4 operation,
because gcc-8.3 doesn't optimize it too.
Thanks.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/95760] ivopts with loop variables
2020-06-19 6:17 [Bug tree-optimization/95760] New: ivopts with loop variables hailey.chiu at sifive dot com
@ 2020-06-21 6:12 ` wilson at gcc dot gnu.org
2020-06-22 23:40 ` wilson at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-06-21 6:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760
Jim Wilson <wilson at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wilson at gcc dot gnu.org
--- Comment #1 from Jim Wilson <wilson at gcc dot gnu.org> ---
You are compiling with -Os. I get the expected result if I compile with -O2.
Looking at tree dumps, I see the first difference between -O2 and -Os dumps is
in the ch2 (copy loop header 2) pass, which explicitly disables loop header
copying when -Os is used. Note the optimize_loop_for_size_p check in
should_duplicate_loop_header_p in tree-ssa-loop-ch.c. You can see the
difference if you add -ftree-dump-ch2-all. In the -O2 ch2 dump file, I see
Loop 1 is not do-while loop: latch is not empty.
Will duplicate bb 7
Not duplicating bb 3: it is single succ.
Duplicating header of the loop 1 up to edge 7->3, 4 insns.
Loop 1 is do-while loop
Loop 1 is now do-while loop.
and in the -Os ch2 dump file, I see
Loop 1 is not do-while loop: latch is not empty.
Not duplicating bb 7: optimizing for size.
The difference in loop optimization here then affects the later ivopt pass.
Normally, duplicating basic blocks will make code bigger. But in this case the
duplicated blocks enable better loop optimization which results in smaller code
at the end. This kind of thing is hard to handle with the heuristics. We
would have to optimize both ways and check to see which one is smaller at the
end to get this right every time, and the compiler doesn't work that way
currently.
I haven't checked older sources to see if/when a heuristic changed.
This isn't risc-v specific. I see the same issue with x86_64.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/95760] ivopts with loop variables
2020-06-19 6:17 [Bug tree-optimization/95760] New: ivopts with loop variables hailey.chiu at sifive dot com
2020-06-21 6:12 ` [Bug tree-optimization/95760] " wilson at gcc dot gnu.org
@ 2020-06-22 23:40 ` wilson at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-06-22 23:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760
--- Comment #2 from Jim Wilson <wilson at gcc dot gnu.org> ---
I took another look, and it turns out that the should_duplicate_loop_header_p
for size/speed is not the only issue. There is also an issue in
tree-ssa-loop-ivopts.c when computing iv costs. With speed, the +4 iv is
computed as cheaper than the +1 iv. With size, the +4 iv and +1 iv have the
exact same cost, and since the +1 iv was looked at first that one was chosen.
If I hack adjust_setup_cost to use to always use the speed cost calculation,
and retain the should_duplicate_loop_header_p hack, then both the inner and
outer loops get the +4 iv with -Os.
Looking at gcc-8.3, I see that the outer loop has the +4 iv and the inner loop
as the +1 iv. This looks similar to the result I get with the
adjust_setup_cost hack but not the should_duplicate_loop_header_p hack. So I
think the regression is solely due to some change in the cost calculation.
There is a lot of code involved in cost calculations. This could have even
been a riscv backend change. I would suggest doing a bisect over the gcc git
tree if you want to see exactly where and how the cost calculation changed.
The -Os and -O2 optimization diverges in try_improve_iv_set where it does "if
(acost < best_cost)". Maybe this could be improved to handle the case where
acost == best_cost, and use some other criteria to choose which one is better,
e.g. maybe a giv is better than a biv if they have the same cost. I haven't
tried looking into this.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-06-22 23:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-19 6:17 [Bug tree-optimization/95760] New: ivopts with loop variables hailey.chiu at sifive dot com
2020-06-21 6:12 ` [Bug tree-optimization/95760] " wilson at gcc dot gnu.org
2020-06-22 23:40 ` wilson at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).