public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/46763] New: gcc 4.5: missed optimization: copy global to local, prefetch
@ 2010-12-02 9:56 edwintorok at gmail dot com
2010-12-02 11:03 ` [Bug tree-optimization/46763] " amonakov at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: edwintorok at gmail dot com @ 2010-12-02 9:56 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46763
Summary: gcc 4.5: missed optimization: copy global to local,
prefetch
Product: gcc
Version: 4.5.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: edwintorok@gmail.com
Created attachment 22601
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22601
gy.i.bz2
I made a simple change to OCaml's GC: copy a global to a local var (and restore
before calling external function), and add a prefetchnta.
The global optimization is worth ~4% speedup, the prefetchnta alone is ~8%
speedup, and both ~10% speedup.
I would expect GCC to do this optimization by itself (at least the global to
register one).
Attached is a testcase to show the missed optimization, the relevant function
is sweep_slice (and its manually optimized variants sweep_slice2, ...):
$ gcc-4.5 gy.i -O2 -lm
$ ./a.out
default: 1.325195s ( 100.0%)
glob2loc: 1.268875s ( 95.8% +- 1.024%)
prefetchnta: 1.207342s ( 91.1% +- 0.4986%)
prefetch: 1.277638s ( 96.4% +- 0.1179%)
glob2loc+prefetchnta: 1.199906s ( 90.5% +- 0.3629%)
default is the original function (sweep_slice), glob2loc is my manual
optimization of caml_gc_sweep_hp, prefetchnta and prefetch are
__builtin_prefetch added by me (non-temporal prefetch is very good here), the
last one is both manual optimizations at once, resulting in a 9.5% speedup.
The attached testcase is quite large, because I dumped the sizes of all objects
from the GC to have a realistic run of the GC, I also included all functions
needed for the GC to run.
gcc-4.5 and gcc-4.4 both have this missed optimization, didn't try older ones.
BTW OCaml uses just -O -fno-defer-pop to compile, instead of -O2, but using -O
or -O2 doesn't make much difference on this testcase, so I used -O2.
$ gcc-4.5 -v
Using built-in specs.
COLLECT_GCC=gcc-4.5
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.5.1/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.5.1-11'
--with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.5 --enable-shared --enable-multiarch
--enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.5 --libdir=/usr/lib --enable-nls
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold
--enable-objc-gc --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 4.5.1 (Debian 4.5.1-11)
CPU: AMD Phenom(tm) II X6 1090T Processor
uname -a: Linux debian 2.6.36-phenom #107 SMP PREEMPT Sat Oct 23 10:30:01 EEST
2010 x86_64 GNU/Linux
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/46763] gcc 4.5: missed optimization: copy global to local, prefetch
2010-12-02 9:56 [Bug tree-optimization/46763] New: gcc 4.5: missed optimization: copy global to local, prefetch edwintorok at gmail dot com
@ 2010-12-02 11:03 ` amonakov at gcc dot gnu.org
2010-12-02 13:26 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: amonakov at gcc dot gnu.org @ 2010-12-02 11:03 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46763
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> 2010-12-02 11:03:00 UTC ---
Small testcase for the global load/store issue:
int g;
extern int bar(int);
void foo(int n)
{
int i;
for (i = 0; i < n; i++)
{
if (g)
{
g++;
g = bar(i);
}
else
g = i;
}
}
Trunk at -O3 does not optimize stores to g (at -O2, it also loads g on each
iteration):
.L3:
movl %ebx, g(%rip)
movl %ebx, %eax
addl $1, %ebx
cmpl %ebp, %ebx
je .L1
.L5:
testl %eax, %eax
je .L3
addl $1, %eax
movl %ebx, %edi
addl $1, %ebx
movl %eax, g(%rip)
call bar
cmpl %ebp, %ebx
movl %eax, g(%rip)
jne .L5
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/46763] gcc 4.5: missed optimization: copy global to local, prefetch
2010-12-02 9:56 [Bug tree-optimization/46763] New: gcc 4.5: missed optimization: copy global to local, prefetch edwintorok at gmail dot com
2010-12-02 11:03 ` [Bug tree-optimization/46763] " amonakov at gcc dot gnu.org
@ 2010-12-02 13:26 ` rguenth at gcc dot gnu.org
2011-03-15 11:44 ` rguenth at gcc dot gnu.org
2021-09-12 5:10 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-12-02 13:26 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46763
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2010.12.02 13:26:31
CC| |rguenth at gcc dot gnu.org
Depends on| |41490
Ever Confirmed|0 |1
Severity|normal |enhancement
--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-12-02 13:26:31 UTC ---
GCC has to preserve the stores and loads around the call to bar() as that
might change the value of the variable. So transforming to
int g;
extern int bar(int);
void foo(int n)
{
int i;
int tem = g;
for (i = 0; i < n; i++)
{
if (tem)
{
tem++;
tem = bar(i);
}
else
tem = i;
}
g = tem;
}
if that is what you did in your source-to-source transformation isn't valid.
GCC can't do conditional store motion, that is, transform it to
int g;
extern int bar(int);
void foo(int n)
{
int i;
int tem = g;
for (i = 0; i < n; i++)
{
if (tem)
{
tem++;
g = tem;
tem = bar(i);
}
else
tem = i;
}
g = tem;
}
which would be valid. An enabling transform is missing as well, sinking
the store to g:
int g;
extern int bar(int);
void foo(int n)
{
int i;
for (i = 0; i < n; i++)
{
if (g)
{
g++;
tem = bar(i);
}
else
tem = i;
g = tem;
}
}
which would then allow us to do the load part of the partial store motion
by PRE. That is, you'd get
int g;
extern int bar(int);
void foo(int n)
{
int i,tem;
tem = g;
for (i = 0; i < n; i++)
{
if (tem)
{
tem++;
g = tem;
tem = bar(i);
}
else
tem = i;
g = tem;
}
}
but we don't understand that we can sink the store out of the loop
as we don't understand the combined effect of g = tem; tem = bar (i);
to g. You also get the above with -O3 because we see a partial partial
redundancy but then you retain three stores (we still miss both
sinking opportunities). Fixing PR41490 might fix both.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/46763] gcc 4.5: missed optimization: copy global to local, prefetch
2010-12-02 9:56 [Bug tree-optimization/46763] New: gcc 4.5: missed optimization: copy global to local, prefetch edwintorok at gmail dot com
2010-12-02 11:03 ` [Bug tree-optimization/46763] " amonakov at gcc dot gnu.org
2010-12-02 13:26 ` rguenth at gcc dot gnu.org
@ 2011-03-15 11:44 ` rguenth at gcc dot gnu.org
2021-09-12 5:10 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-03-15 11:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46763
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |matz at gcc dot gnu.org
--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-15 11:32:49 UTC ---
store sinking now works and exposes an if-conversion possibility:
<bb 5>:
g.1_6 = i_19 + 1;
g = g.1_6;
i_7 = bar (i_16);
g = i_7;
goto <bb 7>;
<bb 6>:
g = i_16;
<bb 7>:
# i_5 = PHI <i_7(5), i_16(6)>
the stores to g can be if-converted by re-using the existing PHI like so:
<bb 5>:
g.1_6 = i_19 + 1;
g = g.1_6;
i_7 = bar (i_16);
goto <bb 7>;
<bb 6>:
;
<bb 7>:
# i_5 = PHI <i_7(5), i_16(6)>
g = i_5;
that eventually fits into the cs_elim framework, but cs_elim runs
too early - Micha, do you remember why it runs where it runs?
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/46763] gcc 4.5: missed optimization: copy global to local, prefetch
2010-12-02 9:56 [Bug tree-optimization/46763] New: gcc 4.5: missed optimization: copy global to local, prefetch edwintorok at gmail dot com
` (2 preceding siblings ...)
2011-03-15 11:44 ` rguenth at gcc dot gnu.org
@ 2021-09-12 5:10 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-12 5:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46763
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
-O3 produces:
jmp .L6
.p2align 4,,10
.p2align 3
.L3:
leal 1(%rbx), %edx
movl %ebx, g(%rip)
cmpl %edx, %ebp
je .L1
.L4:
movl %ebx, %eax
movl %edx, %ebx
.L6:
testl %eax, %eax
je .L3
addl $1, %eax
movl %ebx, %edi
movl %eax, g(%rip)
call bar(int)
leal 1(%rbx), %edx
movl %eax, g(%rip)
cmpl %edx, %ebp
je .L1
movl %eax, %ebx
jmp .L4
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-09-12 5:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-02 9:56 [Bug tree-optimization/46763] New: gcc 4.5: missed optimization: copy global to local, prefetch edwintorok at gmail dot com
2010-12-02 11:03 ` [Bug tree-optimization/46763] " amonakov at gcc dot gnu.org
2010-12-02 13:26 ` rguenth at gcc dot gnu.org
2011-03-15 11:44 ` rguenth at gcc dot gnu.org
2021-09-12 5:10 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).