public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/46854] New: PowerPC optimization regression
@ 2010-12-08 19:59 joakim.tjernlund at transmode dot se
  2010-12-09  9:11 ` [Bug rtl-optimization/46854] " joakim.tjernlund at transmode dot se
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-12-08 19:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854

           Summary: PowerPC optimization regression
           Product: gcc
           Version: 4.4.5
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: joakim.tjernlund@transmode.se


I have noticed gcc 4.4.5 often produces less optimzed code
than the old 3.4.6. Below is the latest example. I am
starting to wonder if I need rebuild gcc 4.4.5 and/or
add new options to gcc when I compile. Any insight?

 Jocke

const char *test(int i)
{
    const char *p = "abc\0def\0gef";
    for(; i; --i)
        while(*++p);
    return p;
}

/* gcc 4.4.5 -O2 -S
       .section        ".text"
        .align 2
        .globl test
        .type   test, @function
test:
        mr. 0,3
        mtctr 0
        beq- 0,.L10
        lis 3,.LANCHOR0@ha
        la 3,.LANCHOR0@l(3)
.L8:
        lbzu 0,1(3)
        cmpwi 7,0,0
        bne+ 7,.L8
        bdnz .L8
        blr
.L10:
        lis 3,.LANCHOR0@ha
        la 3,.LANCHOR0@l(3)
        blr
        .size   test, .-test
        .section        .rodata
        .align 2
        .set    .LANCHOR0,. + 0
.LC0:
        .string "abc"
        .string "def"
        .string "gef"
        .ident  "GCC: (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5"
 */
/* gcc 4.4.5 -Os -S
.globl test
        .type   test, @function
test:
        mr 9,3
        lis 3,.LANCHOR0@ha
        la 3,.LANCHOR0@l(3)
        b .L2
.L5:
        lbzu 0,1(3)
        cmpwi 7,0,0
        bne+ 7,.L5
        addi 9,9,-1
.L2:
        cmpwi 7,9,0
        bne+ 7,.L5
        blr
        .size   test, .-test
        .section        .rodata
        .set    .LANCHOR0,. + 0
.LC0:
        .string "abc"
        .string "def"
        .string "gef"
        .ident  "GCC: (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5"
 */

/* gcc 3.4.6 -Os -S and gcc -O2 -S
section        .rodata
        .align 2
.LC0:
        .string "abc"
        .string "def"
        .string "gef"
        .section        ".text"
        .align 2
        .globl test
        .type   test, @function
test:
        mr. 0,3
        lis 9,.LC0@ha
        la 3,.LC0@l(9)
        mtctr 0
        beqlr- 0
.L13:
        lbzu 0,1(3)
        cmpwi 7,0,0
        bne- 7,.L13
        bdnz .L13
        blr
        .size   test, .-test
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)"
*/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/46854] PowerPC optimization regression
  2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
@ 2010-12-09  9:11 ` joakim.tjernlund at transmode dot se
  2010-12-09 17:57 ` meissner at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-12-09  9:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854

--- Comment #1 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-12-09 09:10:50 UTC ---
Somewhat related observation:

It would be nice if gcc could optimize
static inline const char *test(int i)
{
    const char *p = "abc\0def\0gef";
    for(; i; --i)
        while(*++p);
    return p;
}

const char * myfun(void)
{
    return test(2);
}

into 
const char * myfun(void)
{
    return "gef";
}


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/46854] PowerPC optimization regression
  2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
  2010-12-09  9:11 ` [Bug rtl-optimization/46854] " joakim.tjernlund at transmode dot se
@ 2010-12-09 17:57 ` meissner at gcc dot gnu.org
  2010-12-09 18:22 ` joakim.tjernlund at transmode dot se
  2010-12-09 18:24 ` joakim.tjernlund at transmode dot se
  3 siblings, 0 replies; 5+ messages in thread
From: meissner at gcc dot gnu.org @ 2010-12-09 17:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854

Michael Meissner <meissner at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2010.12.09 17:56:39
                 CC|                            |meissner at gcc dot gnu.org
     Ever Confirmed|0                           |1

--- Comment #2 from Michael Meissner <meissner at gcc dot gnu.org> 2010-12-09 17:56:39 UTC ---
Note, -O2 generates mostly the code you want, except that it looks the address
of the string twice:

Here is the code generated with a 4.4.4 based compiler (the compiler happens to
be the IBM advance toolchain, version 3.0-1) using -O2 -m32 (-O1/-O3 generate
the same code):

test:
        mr. 0,3
        mtctr 0
        beq 0,.L10
        lis 3,.LANCHOR0@ha
        la 3,.LANCHOR0@l(3)
        .p2align 4,,15
.L8:
        lbzu 0,1(3)
        cmpwi 7,0,0
        bne 7,.L8
        bdnz .L8
        blr
.L10:
        lis 3,.LANCHOR0@ha
        la 3,.LANCHOR0@l(3)
        blr

The SLES 11SP1 system compiler, which is based on GCC 4.3.4 generates the same
code.

However, the GCC 4.6 trunk seems to have regressed slightly with -O2 or -O3, in
that it does not track that the lbzu updates the pointer, but maintains its own
copy:

        mr. 0,3
        mtctr 0
        beq- 0,.L5
        lis 3,.LANCHOR0@ha
        la 3,.LANCHOR0@l(3)
.L4:
        mr 9,3
.L3:
        lbzu 0,1(9)
        addi 3,3,1
        cmpwi 7,0,0
        bne+ 7,.L3
        bdnz .L4
        blr
.L5:
        lis 3,.LANCHOR0@ha
        la 3,.LANCHOR0@l(3)
        blr

Trunk with -Os does generate the two comparisons:

        mr 9,3
        lis 3,.LANCHOR0@ha
        la 3,.LANCHOR0@l(3)
        b .L2
.L5:
        mr 11,3
        addi 3,3,1
        lbz 0,1(11)
        cmpwi 7,0,0
        bne+ 7,.L5
        addi 9,9,-1
.L2:
        cmpwi 7,9,0
        bne+ 7,.L5
        blr

So, there are two bugs in this.  One that -Os generates larger code than -O2,
and the code regression for GCC 4.6.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/46854] PowerPC optimization regression
  2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
  2010-12-09  9:11 ` [Bug rtl-optimization/46854] " joakim.tjernlund at transmode dot se
  2010-12-09 17:57 ` meissner at gcc dot gnu.org
@ 2010-12-09 18:22 ` joakim.tjernlund at transmode dot se
  2010-12-09 18:24 ` joakim.tjernlund at transmode dot se
  3 siblings, 0 replies; 5+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-12-09 18:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854

--- Comment #3 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-12-09 18:21:50 UTC ---
(In reply to comment #2)
> Note, -O2 generates mostly the code you want, except that it looks the address
> of the string twice:
> 
> Here is the code generated with a 4.4.4 based compiler (the compiler happens to
> be the IBM advance toolchain, version 3.0-1) using -O2 -m32 (-O1/-O3 generate
> the same code):
> 
> test:
>         mr. 0,3
>         mtctr 0
>         beq 0,.L10
>         lis 3,.LANCHOR0@ha
>         la 3,.LANCHOR0@l(3)
>         .p2align 4,,15
> .L8:
>         lbzu 0,1(3)
>         cmpwi 7,0,0
>         bne 7,.L8
>         bdnz .L8
>         blr
> .L10:
>         lis 3,.LANCHOR0@ha
>         la 3,.LANCHOR0@l(3)
>         blr
> 
> The SLES 11SP1 system compiler, which is based on GCC 4.3.4 generates the same
> code.
> 
> However, the GCC 4.6 trunk seems to have regressed slightly with -O2 or -O3, in
> that it does not track that the lbzu updates the pointer, but maintains its own

I have seen more similar mistakes such as not using lwzu/stwu at all. Will
add a copy of a mail I sent earlier about that.

> copy:
> 
>         mr. 0,3
>         mtctr 0
>         beq- 0,.L5
>         lis 3,.LANCHOR0@ha
>         la 3,.LANCHOR0@l(3)
> .L4:
>         mr 9,3
> .L3:
>         lbzu 0,1(9)
>         addi 3,3,1
>         cmpwi 7,0,0
>         bne+ 7,.L3
>         bdnz .L4
>         blr
> .L5:
>         lis 3,.LANCHOR0@ha
>         la 3,.LANCHOR0@l(3)
>         blr
> 
> Trunk with -Os does generate the two comparisons:
> 
>         mr 9,3
>         lis 3,.LANCHOR0@ha
>         la 3,.LANCHOR0@l(3)
>         b .L2
> .L5:
>         mr 11,3
>         addi 3,3,1
>         lbz 0,1(11)
>         cmpwi 7,0,0
>         bne+ 7,.L5
>         addi 9,9,-1
> .L2:
>         cmpwi 7,9,0
>         bne+ 7,.L5
>         blr
> 
> So, there are two bugs in this.  One that -Os generates larger code than -O2,
> and the code regression for GCC 4.6.

And gcc 4.4.4/4.4.5. I suspect this started much earlier though.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug rtl-optimization/46854] PowerPC optimization regression
  2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
                   ` (2 preceding siblings ...)
  2010-12-09 18:22 ` joakim.tjernlund at transmode dot se
@ 2010-12-09 18:24 ` joakim.tjernlund at transmode dot se
  3 siblings, 0 replies; 5+ messages in thread
From: joakim.tjernlund at transmode dot se @ 2010-12-09 18:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854

--- Comment #4 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-12-09 18:23:59 UTC ---
Here is the copy an an earlier mail I sent to the list in November:

Using gcc 4.4.4 -Os on
loop(long *to, long *from, long len)
{
    for (; len; --len)
        *++to = *++from;
}
I get
/* gcc 4.4.4 -Os
loop:
        addi 5,5,1
        li 9,0
        mtctr 5
        b .L2
.L3:
        lwzx 0,4,9
        stwx 0,3,9
.L2:
        addi 9,9,4
        bdnz .L3
        blr
 */

gcc 3.4.6 has:
/* gcc 3.4.6 -Os
loop:
        mr. 0,5
        mtctr 0
        beqlr- 0
.L8:
        lwzu 0,4(4)
        stwu 0,4(3)
        bdnz .L8
        blr
 */

It doesn't matter which cpu type I use. It seems impossible
to make gcc produce small/faster code with newer gcc.

Perhaps lwzx/stwx is faster on bigger Power cpus but this
can't be true for all cpus, can it?
That should matter though because I asked gcc to produce smaller
code with -Os


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-12-09 18:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-08 19:59 [Bug rtl-optimization/46854] New: PowerPC optimization regression joakim.tjernlund at transmode dot se
2010-12-09  9:11 ` [Bug rtl-optimization/46854] " joakim.tjernlund at transmode dot se
2010-12-09 17:57 ` meissner at gcc dot gnu.org
2010-12-09 18:22 ` joakim.tjernlund at transmode dot se
2010-12-09 18:24 ` joakim.tjernlund at transmode dot se

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).