public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
@ 2007-09-03 11:24 wouter dot vermaelen at scarlet dot be
2007-09-03 11:29 ` [Bug tree-optimization/33291] " pinskia at gcc dot gnu dot org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: wouter dot vermaelen at scarlet dot be @ 2007-09-03 11:24 UTC (permalink / raw)
To: gcc-bugs
I triggered this is the inner loop of the CPU emulation code of openMSX
(http://openmsx.sf.net/). I tried to reduce the code. Below is the smallest
code I could come with up that still shows the problem:
-------------------------------------------
struct Clock {
void f();
void add(unsigned n) { a += n; }
int a;
};
struct CPU : Clock {
virtual ~CPU();
unsigned char readSlow();
void execute();
void delay() { add(2); }
unsigned char readFast() {
if (unsigned char* p = ptrs[addr >> 8]) {
// fast-path
delay(); // ### 1
delay(); // ### 2
return p[addr & 255];
} else {
// slow-path
return readSlow();
}
}
typedef void (CPU::*FuncPtr)();
static FuncPtr tab[256];
unsigned char* ptrs[256];
unsigned addr;
};
void CPU::execute() {
f();
while (true) {
unsigned char b = readFast();
delay(); // # 3
(this->*tab[b])();
}
}
----------------------------------------
When compiled with SVN revision 128037 on a linux x86_64 machine:
> g++ -O3 -S CPU.ii
> cat -n CPU.s
1 .file "CPU.ii"
2 .text
3 .align 2
4 .p2align 4,,15
5 .globl _ZN3CPU7executeEv
6 .type _ZN3CPU7executeEv, @function
7 _ZN3CPU7executeEv:
8 .LFB5:
9 pushq %rbp
10 .LCFI0:
11 leaq 8(%rdi), %rbp
12 pushq %rbx
13 .LCFI1:
14 movq %rdi, %rbx
15 movq %rbp, %rdi
16 subq $8, %rsp
17 .LCFI2:
18 call _ZN5Clock1fEv
19 .p2align 4,,10
20 .p2align 3
21 .L6:
22 movl 2064(%rbx), %eax
23 shrl $8, %eax
24 mov %eax, %eax
25 movq 16(%rbx,%rax,8), %rdx
26 testq %rdx, %rdx
27 je .L2
28 movl 8(%rbx), %eax ###
29 addl $2, %eax ### 1
30 movl %eax, (%rbp) ###
31 movl 8(%rbx), %eax ###
32 addl $2, %eax ### 2
33 movl %eax, (%rbp) ###
34 movzbl 2064(%rbx), %eax
35 movzbl (%rdx,%rax), %edx
36 .L3:
37 movl 8(%rbx), %eax #
38 addl $2, %eax # 3
39 movl %eax, (%rbp) #
40 movzbl %dl, %eax
41 salq $4, %rax
42 movq _ZN3CPU3tabE(%rax), %rdx
43 testb $1, %dl
44 jne .L4
45 movq %rbx, %rdi
46 addq _ZN3CPU3tabE+8(%rax), %rdi
47 call *%rdx
48 jmp .L6
49 .p2align 4,,10
50 .p2align 3
51 .L4:
52 movq %rbx, %rdi
53 addq _ZN3CPU3tabE+8(%rax), %rdi
54 movq (%rdi), %rax
55 movq -1(%rdx,%rax), %rdx
56 call *%rdx
57 jmp .L6
58 .L2:
59 movq %rbx, %rdi
60 call _ZN3CPU8readSlowEv
61 movl %eax, %edx
62 .p2align 4,,4
63 .p2align 3
64 jmp .L3
[skipped the rest of the output]
The missed optimization is visible in lines 28-33. It's also strange to me why
reading the variable is done via 8(%rbx) while writing is done via (%rbp).
gcc-4.2.1 does a better job on this, it optimizes the two consecutive delay()
functions to just: addl $4, 8(%rbx)
Additionally I would have prefered that all three delay() functions would be
collapsed into a single instruction in the fast code path (and partly
duplicated as a+=4; readSlow(); a+=2; in the slow path). But I understand
this might be more difficult to implement.
--
Summary: a+=2; a+=2 not simplified to a+=4; with -O3 (ok
with gcc-4.2.1)
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: wouter dot vermaelen at scarlet dot be
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
@ 2007-09-03 11:29 ` pinskia at gcc dot gnu dot org
2007-09-03 12:02 ` rguenth at gcc dot gnu dot org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-09-03 11:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from pinskia at gcc dot gnu dot org 2007-09-03 11:28 -------
# VUSE <tab_56, SMT.9_58, SMT.10_60>
D.2581_35 = this_2(D)->D.2503.a;
D.2582_36 = (unsigned int) D.2581_35;
D.2583_37 = D.2582_36 + 2;
D.2584_38 = (int) D.2583_37;
# tab_76 = VDEF <tab_56>
# SMT.9_77 = VDEF <SMT.9_58>
D.2529_3->a = D.2584_38;
# VUSE <tab_76, SMT.9_77, SMT.10_60>
D.2586_40 = this_2(D)->D.2503.a;
D.2587_41 = (unsigned int) D.2586_40;
D.2588_42 = D.2587_41 + 2;
D.2589_43 = (int) D.2588_42;
# tab_78 = VDEF <tab_76>
# SMT.9_79 = VDEF <SMT.9_77>
D.2529_3->a = D.2589_43;
hmmm, aliasing
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|minor |normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
2007-09-03 11:29 ` [Bug tree-optimization/33291] " pinskia at gcc dot gnu dot org
@ 2007-09-03 12:02 ` rguenth at gcc dot gnu dot org
2007-09-03 12:05 ` rguenth at gcc dot gnu dot org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-09-03 12:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2007-09-03 12:01 -------
The problem is that forwprop doesn't propagate addr_exprs to memory reference
stmts in early optimization anymore (due to the volatile issues) and
value numbering cannot deal with the different (but same) load/store addresses:
D.2605_34 = &this_2(D)->D.2527;
# VUSE <tab_57, SMT.9_59, SMT.11_63>
D.2606_35 = this_2(D)->D.2527.a;
...
# tab_77 = VDEF <tab_57>
# SMT.11_78 = VDEF <SMT.11_63>
D.2605_34->a = D.2609_38;
(also aliasing computes different answers here, for whatever reason). With
scheduling an extra forwprop pass before FRE the second _load_ of a is
eliminated, but DSE still cannot figure the dead store:
# VUSE <tab_57, SMT.9_59, SMT.11_63>
D.2606_35 = this_2(D)->D.2527.a;
D.2607_36 = (unsigned int) D.2606_35;
D.2608_37 = D.2607_36 + 2;
D.2609_38 = (int) D.2608_37;
# tab_77 = VDEF <tab_57>
# SMT.9_93 = VDEF <SMT.9_59>
# SMT.11_78 = VDEF <SMT.11_63>
this_2(D)->D.2527.a = D.2609_38;
D.2612_41 = (unsigned int) D.2609_38;
D.2613_42 = D.2612_41 + 2;
D.2614_43 = (int) D.2613_42;
# tab_79 = VDEF <tab_77>
# SMT.9_94 = VDEF <SMT.9_93>
# SMT.11_80 = VDEF <SMT.11_78>
this_2(D)->D.2527.a = D.2614_43;
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu dot
| |org
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Keywords| |alias, missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2007-09-03 12:01:47
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
2007-09-03 11:29 ` [Bug tree-optimization/33291] " pinskia at gcc dot gnu dot org
2007-09-03 12:02 ` rguenth at gcc dot gnu dot org
@ 2007-09-03 12:05 ` rguenth at gcc dot gnu dot org
2007-09-03 14:36 ` rguenth at gcc dot gnu dot org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-09-03 12:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rguenth at gcc dot gnu dot org 2007-09-03 12:04 -------
That is, rtl level DSE removes the dead store:
_ZN3CPU7executeEv:
.LFB5:
pushq %rbx
.LCFI0:
movq %rdi, %rbx
leaq 8(%rdi), %rdi
call _ZN5Clock1fEv
.p2align 4,,10
.p2align 3
.L6:
movl 2064(%rbx), %eax
shrl $8, %eax
mov %eax, %eax
movq 16(%rbx,%rax,8), %rdx
testq %rdx, %rdx
je .L2
movzbl 2064(%rbx), %eax
addl $4, 8(%rbx)
movzbl (%rdx,%rax), %eax
.L3:
movzbl %al, %eax
addl $2, 8(%rbx)
salq $4, %rax
movq _ZN3CPU3tabE(%rax), %rdx
testb $1, %dl
jne .L4
movq %rbx, %rdi
addq _ZN3CPU3tabE+8(%rax), %rdi
call *%rdx
jmp .L6
.p2align 4,,10
.p2align 3
.L4:
movq %rbx, %rdi
addq _ZN3CPU3tabE+8(%rax), %rdi
movq (%rdi), %rax
movq -1(%rdx,%rax), %rdx
call *%rdx
jmp .L6
.L2:
movq %rbx, %rdi
call _ZN3CPU8readSlowEv
.p2align 4,,6
.p2align 3
jmp .L3
.LFE5:
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |TREE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
` (2 preceding siblings ...)
2007-09-03 12:05 ` rguenth at gcc dot gnu dot org
@ 2007-09-03 14:36 ` rguenth at gcc dot gnu dot org
2007-09-03 15:22 ` [Bug tree-optimization/33291] [4.3 Regression] " rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-09-03 14:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rguenth at gcc dot gnu dot org 2007-09-03 14:36 -------
I have a patch that makes it work apart from the tree level DSE issue.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
Last reconfirmed|2007-09-03 12:01:47 |2007-09-03 14:36:17
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] [4.3 Regression] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
` (3 preceding siblings ...)
2007-09-03 14:36 ` rguenth at gcc dot gnu dot org
@ 2007-09-03 15:22 ` rguenth at gcc dot gnu dot org
2007-09-04 8:39 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-09-03 15:22 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail| |4.3.0
Known to work| |4.2.1
Summary|a+=2; a+=2 not simplified |[4.3 Regression] a+=2; a+=2
|to a+=4; with -O3 (ok |not simplified to a+=4;
|with gcc-4.2.1) |with -O3 (ok with gcc-
| |4.2.1)
Target Milestone|--- |4.3.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] [4.3 Regression] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
` (4 preceding siblings ...)
2007-09-03 15:22 ` [Bug tree-optimization/33291] [4.3 Regression] " rguenth at gcc dot gnu dot org
@ 2007-09-04 8:39 ` rguenth at gcc dot gnu dot org
2007-09-04 8:41 ` rguenth at gcc dot gnu dot org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-09-04 8:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from rguenth at gcc dot gnu dot org 2007-09-04 08:39 -------
Subject: Bug 33291
Author: rguenth
Date: Tue Sep 4 08:38:56 2007
New Revision: 128068
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128068
Log:
2007-09-04 Richard Guenther <rguenther@suse.de>
PR tree-optimization/33291
* tree-pretty-print.c (dump_generic_node): Dump all
qualifiers for pointer types, not only first. Dump
qualifiers for aggregate types as well.
* tree-ssa-ccp.c (maybe_fold_offset_to_array_ref): Always
use the canonical type for building ARRAY_REFs.
* gimplify.c (canonicalize_addr_expr): Clean up. The
correct validness check is compatibility of the pointer
types. Always use the canonical type for building
ARRAY_REFs and ADDR_EXPRs.
* tree-ssa-forwprop.c (forward_propagate_addr_expr): Revert
change that disabled propagation of ADDR_EXPRs into statements
with volatile ops.
* gcc.dg/volatile2.c: New testcase.
* gcc.dg/pr32721.c: Adjust volatile reference pattern.
* gcc.dg/tree-ssa/forwprop-1.c: Remove xfail.
* gcc.dg/tree-ssa/forwprop-2.c: Likewise.
* gcc.dg/tree-ssa/pr17141-1.c: Likewise.
Added:
trunk/gcc/testsuite/gcc.dg/volatile2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/pr32721.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-1.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-2.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr17141-1.c
trunk/gcc/tree-pretty-print.c
trunk/gcc/tree-ssa-ccp.c
trunk/gcc/tree-ssa-forwprop.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] [4.3 Regression] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
` (5 preceding siblings ...)
2007-09-04 8:39 ` rguenth at gcc dot gnu dot org
@ 2007-09-04 8:41 ` rguenth at gcc dot gnu dot org
2007-09-04 12:11 ` wouter dot vermaelen at scarlet dot be
2007-09-04 12:13 ` rguenth at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-09-04 8:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rguenth at gcc dot gnu dot org 2007-09-04 08:41 -------
Fixed.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] [4.3 Regression] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
` (6 preceding siblings ...)
2007-09-04 8:41 ` rguenth at gcc dot gnu dot org
@ 2007-09-04 12:11 ` wouter dot vermaelen at scarlet dot be
2007-09-04 12:13 ` rguenth at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: wouter dot vermaelen at scarlet dot be @ 2007-09-04 12:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from wouter dot vermaelen at scarlet dot be 2007-09-04 12:11 -------
Thanks for looking into this so quickly!
I confirm the problem is solved for the reduced testcase. However in my
original code the dead-store is not eliminated. Do you want me to file a
separate bug report for that?
....
mov (%rbx),%edx
movzbl %cl,%edi
lea 0x3(%rdx),%r8d
add $0x5,%edx
mov %r8d,(%rbx)
movzbl (%rsi,%rdi,1),%eax
mov %edx,(%rbx)
....
--
wouter dot vermaelen at scarlet dot be changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |minor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/33291] [4.3 Regression] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1)
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
` (7 preceding siblings ...)
2007-09-04 12:11 ` wouter dot vermaelen at scarlet dot be
@ 2007-09-04 12:13 ` rguenth at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-09-04 12:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from rguenth at gcc dot gnu dot org 2007-09-04 12:13 -------
Yes please.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33291
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-09-04 12:13 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-03 11:24 [Bug tree-optimization/33291] New: a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) wouter dot vermaelen at scarlet dot be
2007-09-03 11:29 ` [Bug tree-optimization/33291] " pinskia at gcc dot gnu dot org
2007-09-03 12:02 ` rguenth at gcc dot gnu dot org
2007-09-03 12:05 ` rguenth at gcc dot gnu dot org
2007-09-03 14:36 ` rguenth at gcc dot gnu dot org
2007-09-03 15:22 ` [Bug tree-optimization/33291] [4.3 Regression] " rguenth at gcc dot gnu dot org
2007-09-04 8:39 ` rguenth at gcc dot gnu dot org
2007-09-04 8:41 ` rguenth at gcc dot gnu dot org
2007-09-04 12:11 ` wouter dot vermaelen at scarlet dot be
2007-09-04 12:13 ` rguenth at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).