public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/46854] New: PowerPC optimization regression
@ 2010-12-08 19:59 joakim.tjernlund at transmode dot se
2010-12-09 9:11 ` [Bug rtl-optimization/46854] " joakim.tjernlund at transmode dot se
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-12-08 19:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854
Summary: PowerPC optimization regression
Product: gcc
Version: 4.4.5
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: joakim.tjernlund@transmode.se
I have noticed gcc 4.4.5 often produces less optimzed code
than the old 3.4.6. Below is the latest example. I am
starting to wonder if I need rebuild gcc 4.4.5 and/or
add new options to gcc when I compile. Any insight?
Jocke
const char *test(int i)
{
const char *p = "abc\0def\0gef";
for(; i; --i)
while(*++p);
return p;
}
/* gcc 4.4.5 -O2 -S
.section ".text"
.align 2
.globl test
.type test, @function
test:
mr. 0,3
mtctr 0
beq- 0,.L10
lis 3,.LANCHOR0@ha
la 3,.LANCHOR0@l(3)
.L8:
lbzu 0,1(3)
cmpwi 7,0,0
bne+ 7,.L8
bdnz .L8
blr
.L10:
lis 3,.LANCHOR0@ha
la 3,.LANCHOR0@l(3)
blr
.size test, .-test
.section .rodata
.align 2
.set .LANCHOR0,. + 0
.LC0:
.string "abc"
.string "def"
.string "gef"
.ident "GCC: (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5"
*/
/* gcc 4.4.5 -Os -S
.globl test
.type test, @function
test:
mr 9,3
lis 3,.LANCHOR0@ha
la 3,.LANCHOR0@l(3)
b .L2
.L5:
lbzu 0,1(3)
cmpwi 7,0,0
bne+ 7,.L5
addi 9,9,-1
.L2:
cmpwi 7,9,0
bne+ 7,.L5
blr
.size test, .-test
.section .rodata
.set .LANCHOR0,. + 0
.LC0:
.string "abc"
.string "def"
.string "gef"
.ident "GCC: (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5"
*/
/* gcc 3.4.6 -Os -S and gcc -O2 -S
section .rodata
.align 2
.LC0:
.string "abc"
.string "def"
.string "gef"
.section ".text"
.align 2
.globl test
.type test, @function
test:
mr. 0,3
lis 9,.LC0@ha
la 3,.LC0@l(9)
mtctr 0
beqlr- 0
.L13:
lbzu 0,1(3)
cmpwi 7,0,0
bne- 7,.L13
bdnz .L13
blr
.size test, .-test
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)"
*/
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/46854] PowerPC optimization regression
2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
@ 2010-12-09 9:11 ` joakim.tjernlund at transmode dot se
2010-12-09 17:57 ` meissner at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-12-09 9:11 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854
--- Comment #1 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-12-09 09:10:50 UTC ---
Somewhat related observation:
It would be nice if gcc could optimize
static inline const char *test(int i)
{
const char *p = "abc\0def\0gef";
for(; i; --i)
while(*++p);
return p;
}
const char * myfun(void)
{
return test(2);
}
into
const char * myfun(void)
{
return "gef";
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/46854] PowerPC optimization regression
2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
2010-12-09 9:11 ` [Bug rtl-optimization/46854] " joakim.tjernlund at transmode dot se
@ 2010-12-09 17:57 ` meissner at gcc dot gnu.org
2010-12-09 18:22 ` joakim.tjernlund at transmode dot se
2010-12-09 18:24 ` joakim.tjernlund at transmode dot se
3 siblings, 0 replies; 5+ messages in thread
From: meissner at gcc dot gnu.org @ 2010-12-09 17:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854
Michael Meissner <meissner at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2010.12.09 17:56:39
CC| |meissner at gcc dot gnu.org
Ever Confirmed|0 |1
--- Comment #2 from Michael Meissner <meissner at gcc dot gnu.org> 2010-12-09 17:56:39 UTC ---
Note, -O2 generates mostly the code you want, except that it looks the address
of the string twice:
Here is the code generated with a 4.4.4 based compiler (the compiler happens to
be the IBM advance toolchain, version 3.0-1) using -O2 -m32 (-O1/-O3 generate
the same code):
test:
mr. 0,3
mtctr 0
beq 0,.L10
lis 3,.LANCHOR0@ha
la 3,.LANCHOR0@l(3)
.p2align 4,,15
.L8:
lbzu 0,1(3)
cmpwi 7,0,0
bne 7,.L8
bdnz .L8
blr
.L10:
lis 3,.LANCHOR0@ha
la 3,.LANCHOR0@l(3)
blr
The SLES 11SP1 system compiler, which is based on GCC 4.3.4 generates the same
code.
However, the GCC 4.6 trunk seems to have regressed slightly with -O2 or -O3, in
that it does not track that the lbzu updates the pointer, but maintains its own
copy:
mr. 0,3
mtctr 0
beq- 0,.L5
lis 3,.LANCHOR0@ha
la 3,.LANCHOR0@l(3)
.L4:
mr 9,3
.L3:
lbzu 0,1(9)
addi 3,3,1
cmpwi 7,0,0
bne+ 7,.L3
bdnz .L4
blr
.L5:
lis 3,.LANCHOR0@ha
la 3,.LANCHOR0@l(3)
blr
Trunk with -Os does generate the two comparisons:
mr 9,3
lis 3,.LANCHOR0@ha
la 3,.LANCHOR0@l(3)
b .L2
.L5:
mr 11,3
addi 3,3,1
lbz 0,1(11)
cmpwi 7,0,0
bne+ 7,.L5
addi 9,9,-1
.L2:
cmpwi 7,9,0
bne+ 7,.L5
blr
So, there are two bugs in this. One that -Os generates larger code than -O2,
and the code regression for GCC 4.6.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/46854] PowerPC optimization regression
2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
2010-12-09 9:11 ` [Bug rtl-optimization/46854] " joakim.tjernlund at transmode dot se
2010-12-09 17:57 ` meissner at gcc dot gnu.org
@ 2010-12-09 18:22 ` joakim.tjernlund at transmode dot se
2010-12-09 18:24 ` joakim.tjernlund at transmode dot se
3 siblings, 0 replies; 5+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-12-09 18:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854
--- Comment #3 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-12-09 18:21:50 UTC ---
(In reply to comment #2)
> Note, -O2 generates mostly the code you want, except that it looks the address
> of the string twice:
>
> Here is the code generated with a 4.4.4 based compiler (the compiler happens to
> be the IBM advance toolchain, version 3.0-1) using -O2 -m32 (-O1/-O3 generate
> the same code):
>
> test:
> mr. 0,3
> mtctr 0
> beq 0,.L10
> lis 3,.LANCHOR0@ha
> la 3,.LANCHOR0@l(3)
> .p2align 4,,15
> .L8:
> lbzu 0,1(3)
> cmpwi 7,0,0
> bne 7,.L8
> bdnz .L8
> blr
> .L10:
> lis 3,.LANCHOR0@ha
> la 3,.LANCHOR0@l(3)
> blr
>
> The SLES 11SP1 system compiler, which is based on GCC 4.3.4 generates the same
> code.
>
> However, the GCC 4.6 trunk seems to have regressed slightly with -O2 or -O3, in
> that it does not track that the lbzu updates the pointer, but maintains its own
I have seen more similar mistakes such as not using lwzu/stwu at all. Will
add a copy of a mail I sent earlier about that.
> copy:
>
> mr. 0,3
> mtctr 0
> beq- 0,.L5
> lis 3,.LANCHOR0@ha
> la 3,.LANCHOR0@l(3)
> .L4:
> mr 9,3
> .L3:
> lbzu 0,1(9)
> addi 3,3,1
> cmpwi 7,0,0
> bne+ 7,.L3
> bdnz .L4
> blr
> .L5:
> lis 3,.LANCHOR0@ha
> la 3,.LANCHOR0@l(3)
> blr
>
> Trunk with -Os does generate the two comparisons:
>
> mr 9,3
> lis 3,.LANCHOR0@ha
> la 3,.LANCHOR0@l(3)
> b .L2
> .L5:
> mr 11,3
> addi 3,3,1
> lbz 0,1(11)
> cmpwi 7,0,0
> bne+ 7,.L5
> addi 9,9,-1
> .L2:
> cmpwi 7,9,0
> bne+ 7,.L5
> blr
>
> So, there are two bugs in this. One that -Os generates larger code than -O2,
> and the code regression for GCC 4.6.
And gcc 4.4.4/4.4.5. I suspect this started much earlier though.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/46854] PowerPC optimization regression
2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
` (2 preceding siblings ...)
2010-12-09 18:22 ` joakim.tjernlund at transmode dot se
@ 2010-12-09 18:24 ` joakim.tjernlund at transmode dot se
3 siblings, 0 replies; 5+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-12-09 18:24 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854
--- Comment #4 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-12-09 18:23:59 UTC ---
Here is the copy an an earlier mail I sent to the list in November:
Using gcc 4.4.4 -Os on
loop(long *to, long *from, long len)
{
for (; len; --len)
*++to = *++from;
}
I get
/* gcc 4.4.4 -Os
loop:
addi 5,5,1
li 9,0
mtctr 5
b .L2
.L3:
lwzx 0,4,9
stwx 0,3,9
.L2:
addi 9,9,4
bdnz .L3
blr
*/
gcc 3.4.6 has:
/* gcc 3.4.6 -Os
loop:
mr. 0,5
mtctr 0
beqlr- 0
.L8:
lwzu 0,4(4)
stwu 0,4(3)
bdnz .L8
blr
*/
It doesn't matter which cpu type I use. It seems impossible
to make gcc produce small/faster code with newer gcc.
Perhaps lwzx/stwx is faster on bigger Power cpus but this
can't be true for all cpus, can it?
That should matter though because I asked gcc to produce smaller
code with -Os
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-12-09 18:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
2010-12-09 9:11 ` [Bug rtl-optimization/46854] " joakim.tjernlund at transmode dot se
2010-12-09 17:57 ` meissner at gcc dot gnu.org
2010-12-09 18:22 ` joakim.tjernlund at transmode dot se
2010-12-09 18:24 ` joakim.tjernlund at transmode dot se
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).