public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/45462]  New: Bad optimization in -O3 sometimes
@ 2010-08-31 11:51 yotambarnoy at gmail dot com
  2010-08-31 11:53 ` [Bug c++/45462] " yotambarnoy at gmail dot com
                   ` (14 more replies)
  0 siblings, 15 replies; 19+ messages in thread
From: yotambarnoy at gmail dot com @ 2010-08-31 11:51 UTC (permalink / raw)
  To: gcc-bugs

This bug was very hard to trace. I'm on the ScummVM dev team and I develop
specifically for the PSP(MIPS). I came across a crash when trying to start one
of our engines. The bug only occurred under very specific circumstances -- I
bisected it and adding class variables or adding some code made it go away, but
I'm not sure what the pattern is.

Here's the problematic code, packaged together easily:

void Logic::logicUp(uint32 new_script) {
       debug(5, "new pc = %d", new_script & 0xffff);

       // going up a level - and we'll keep going this cycle
       _curObjectHub.setLogicLevel(_curObjectHub.getLogicLevel() + 1);

       assert(_curObjectHub.getLogicLevel() < 3);      // Can be 0, 1, 2
       logicReplace(new_script);
}

void Logic::logicReplace(uint32 new_script) {
       uint32 level = _curObjectHub.getLogicLevel();

       _curObjectHub.setScriptId(level, new_script);
       _curObjectHub.setScriptPc(level, new_script & 0xffff);
}

        void setScriptId(int level, uint32 x) {
                WRITE_LE_UINT32(_addr + 20 + 4 * level, x);
        }

        uint32 getScriptId(int level) {
                return READ_LE_UINT32(_addr + 20 + 4 * level);
        }
        void setScriptPc(int level, uint32 x) {
                WRITE_LE_UINT32(_addr + 32 + 4 * level, x);
        }


G++ optimized these 2 functions into 1 and came up with this code:

8934bc0 <_ZN6Sword25Logic7logicUpEj>:
 8934bc0:       27bdfff0        addiu   sp,sp,-16
 8934bc4:       afb20008        sw      s2,8(sp)
 8934bc8:       afb10004        sw      s1,4(sp)
 8934bcc:       30b2ffff        andi    s2,a1,0xffff  # s2 = new_scip & 0xffff
 8934bd0:       00a08821        move    s1,a1 # s1 = new_scipt
 8934bd4:       3c0508aa        lui     a1,0x8aa # a1 = 0x8aa0000
 8934bd8:       afb00000        sw      s0,0(sp)
 8934bdc:       24a50d68        addiu   a1,a1,3432 # a1 = 0x8aa3432
 8934be0:       00808021        move    s0,a0 # s0 = this
 8934be4:       02403021        move    a2,s2 # a2 = new_script & 0xffff
 8934be8:       afbf000c        sw      ra,12(sp)
 8934bec:       0e286377        jal     8a18ddc <_Z5debugiPKcz>
 8934bf0:       24040005        li      a0,5 # a0 = 5 (jump)
 8934bf4:       8e0400d8        lw      a0,216(s0)  # a0 = *(this + 216)
 8934bf8:       88850007        lwl     a1,7(a0) # a1 =
 8934bfc:       98850004        lwr     a1,4(a0) # a1 = logicLevel
 8934c00:       24a20001        addiu   v0,a1,1 # v0 = logicLevel + 1
 8934c04:       a8820007        swl     v0,7(a0) # 7(a0) = 0.5 new logicLevel
 8934c08:       2ca30003        sltiu   v1,a1,3 # v1 = a1 < 3?
 8934c0c:       10600011        beqz    v1,8934c54
<_ZN6Sword25Logic7logicUpEj+0x94> # assert
 8934c10:       b8820004        swr     v0,4(a0) # 4(a0) = 0.5 new logicLevel
 8934c14:       24a20005        addiu   v0,a1,5 # v0 = logicLevel + 5 ???
 8934c18:       00021080        sll     v0,v0,0x2 # v0 = 4*logicLevel + 20
(scriptId offset)
 8934c1c:       00821021        addu    v0,a0,v0 # v0 = &scriptId
 8934c20:       24a30008        addiu   v1,a1,8 # v1 = logicLevel + 8
 8934c24:       a8510003        swl     s1,3(v0) # scriptId[3] = new_script
 8934c28:       00031880        sll     v1,v1,0x2 # v1 = logicLevel * 4 + 32
 8934c2c:       b8510000        swr     s1,0(v0) # scriptId[0] = new_script
 8934c30:       00831821        addu    v1,a0,v1 # v1 = *(this + 216)

Note the mistake in line 8934c00. v0 is used for incrementing the
logicLevel, and v0 is indeed saved into memory (ie.
_curObjectHub.setLogicLevel(_curObjectHub.getLogicLevel() + 1); )
But the optimization gcc made prevents it from realizing it needs to
use the newly incremented logicLevel value for the other calculations,
so instead it keeps using a1 in the rest of the function. This causes
it to write the new scriptId value in the wrong place, which causes
the VM to think its next scriptId is 0.

I'd like to attach the .ii files but I don't know how (can't find a button for
it). You can see the full code at
https://scummvm.svn.sourceforge.net/svnroot/scummvm/scummvm. The 'problem' code
is under the engines/sword2 directory in header.h, logic.cpp and function.cpp.
This bug does not happen on any other platform as far as I know, but there's a
huge random element involved regarding a particular memory layout.

Thanks
Yotam Barnoy


-- 
           Summary: Bad optimization in -O3 sometimes
           Product: gcc
           Version: 4.3.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: yotambarnoy at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
  2010-08-31 11:53 ` [Bug c++/45462] " yotambarnoy at gmail dot com
@ 2010-08-31 11:53 ` yotambarnoy at gmail dot com
  2010-08-31 14:17 ` rguenth at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: yotambarnoy at gmail dot com @ 2010-08-31 11:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from yotambarnoy at gmail dot com  2010-08-31 11:52 -------
Created an attachment (id=21602)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21602&action=view)
Logic.ii, where gcc makes the mistake

LogicUp() is the critical function


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
@ 2010-08-31 11:53 ` yotambarnoy at gmail dot com
  2010-08-31 11:53 ` yotambarnoy at gmail dot com
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: yotambarnoy at gmail dot com @ 2010-08-31 11:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from yotambarnoy at gmail dot com  2010-08-31 11:53 -------
Created an attachment (id=21603)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21603&action=view)
header.h, used by logic.cpp


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
  2010-08-31 11:53 ` [Bug c++/45462] " yotambarnoy at gmail dot com
  2010-08-31 11:53 ` yotambarnoy at gmail dot com
@ 2010-08-31 14:17 ` rguenth at gcc dot gnu dot org
  2010-08-31 15:24 ` yotambarnoy at gmail dot com
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-08-31 14:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2010-08-31 14:17 -------
 inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
{
  struct Unaligned32 { uint32 val; } __attribute__ ((__packed__));
  return ((const Unaligned32 *)ptr)->val;
 }

and similar look like they might violate C aliasing rules.  Try using
-fno-strict-aliasing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (2 preceding siblings ...)
  2010-08-31 14:17 ` rguenth at gcc dot gnu dot org
@ 2010-08-31 15:24 ` yotambarnoy at gmail dot com
  2010-08-31 19:09   ` Andrew Pinski
  2010-08-31 19:09 ` pinskia at gmail dot com
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 19+ messages in thread
From: yotambarnoy at gmail dot com @ 2010-08-31 15:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from yotambarnoy at gmail dot com  2010-08-31 15:24 -------
Good job picking up on that. 

There must be a better way of telling the compiler to generate lwr and lwl MIPS
instructions without breaking strict aliasing rules...?

Thanks a bunch!


-- 

yotambarnoy at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (3 preceding siblings ...)
  2010-08-31 15:24 ` yotambarnoy at gmail dot com
@ 2010-08-31 19:09 ` pinskia at gmail dot com
  2010-09-01  4:33 ` yotambarnoy at gmail dot com
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: pinskia at gmail dot com @ 2010-08-31 19:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gmail dot com  2010-08-31 19:09 -------
Subject: Re:  Bad optimization in -O3 sometimes



On Aug 31, 2010, at 8:24 AM, "yotambarnoy at gmail dot com"
<gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #4 from yotambarnoy at gmail dot com  2010-08-31  
> 15:24 -------
> Good job picking up on that.
>
> There must be a better way of telling the compiler to generate lwr  
> and lwl MIPS
> instructions without breaking strict aliasing rules...?

Have you tried using memcpy?

>
> Thanks a bunch!
>
>
> -- 
>
> yotambarnoy at gmail dot com changed:
>
>           What    |Removed                     |Added
> --- 
> --- 
> ----------------------------------------------------------------------
>             Status|UNCONFIRMED                 |RESOLVED
>         Resolution|                            |FIXED
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 15:24 ` yotambarnoy at gmail dot com
@ 2010-08-31 19:09   ` Andrew Pinski
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Pinski @ 2010-08-31 19:09 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs



On Aug 31, 2010, at 8:24 AM, "yotambarnoy at gmail dot com" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #4 from yotambarnoy at gmail dot com  2010-08-31  
> 15:24 -------
> Good job picking up on that.
>
> There must be a better way of telling the compiler to generate lwr  
> and lwl MIPS
> instructions without breaking strict aliasing rules...?

Have you tried using memcpy?

>
> Thanks a bunch!
>
>
> -- 
>
> yotambarnoy at gmail dot com changed:
>
>           What    |Removed                     |Added
> --- 
> --- 
> ----------------------------------------------------------------------
>             Status|UNCONFIRMED                 |RESOLVED
>         Resolution|                            |FIXED
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (4 preceding siblings ...)
  2010-08-31 19:09 ` pinskia at gmail dot com
@ 2010-09-01  4:33 ` yotambarnoy at gmail dot com
  2010-09-01  4:41   ` Andrew Pinski
  2010-09-01  4:42 ` pinskia at gmail dot com
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 19+ messages in thread
From: yotambarnoy at gmail dot com @ 2010-09-01  4:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from yotambarnoy at gmail dot com  2010-09-01 04:32 -------
I recently implemented a custom memcpy for ScummVM. I didn't notice the
standard memcpy using lwl and lwr. In any case, how would memcpy do it any
better? Unless you're referring to the new memcpy inlining in newer versions of
gcc?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-09-01  4:33 ` yotambarnoy at gmail dot com
@ 2010-09-01  4:41   ` Andrew Pinski
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Pinski @ 2010-09-01  4:41 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs



On Aug 31, 2010, at 9:32 PM, "yotambarnoy at gmail dot com" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #6 from yotambarnoy at gmail dot com  2010-09-01  
> 04:32 -------
> I recently implemented a custom memcpy for ScummVM. I didn't notice  
> the
> standard memcpy using lwl and lwr. In any case, how would memcpy do  
> it any
> better? Unless you're referring to the new memcpy inlining in newer  
> versions of
> gcc?

I am referring to the standard builtin version of memcpy.  It is not  
just in newer versions; it has been there since 3.0. What is new is  
the more optimized version for x86 with either a large constant or a  
non constant. Can you try memcpy? If that does not work, please file a  
bug and cc me, I will see what I can do. I am working with MIPS lately.

>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (5 preceding siblings ...)
  2010-09-01  4:33 ` yotambarnoy at gmail dot com
@ 2010-09-01  4:42 ` pinskia at gmail dot com
  2010-09-01  5:03 ` yotambarnoy at gmail dot com
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: pinskia at gmail dot com @ 2010-09-01  4:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from pinskia at gmail dot com  2010-09-01 04:41 -------
Subject: Re:  Bad optimization in -O3 sometimes



On Aug 31, 2010, at 9:32 PM, "yotambarnoy at gmail dot com"
<gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #6 from yotambarnoy at gmail dot com  2010-09-01  
> 04:32 -------
> I recently implemented a custom memcpy for ScummVM. I didn't notice  
> the
> standard memcpy using lwl and lwr. In any case, how would memcpy do  
> it any
> better? Unless you're referring to the new memcpy inlining in newer  
> versions of
> gcc?

I am referring to the standard builtin version of memcpy.  It is not  
just in newer versions; it has been there since 3.0. What is new is  
the more optimized version for x86 with either a large constant or a  
non constant. Can you try memcpy? If that does not work, please file a  
bug and cc me, I will see what I can do. I am working with MIPS lately.

>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (6 preceding siblings ...)
  2010-09-01  4:42 ` pinskia at gmail dot com
@ 2010-09-01  5:03 ` yotambarnoy at gmail dot com
  2010-09-01  6:17   ` Andrew Pinski
  2010-09-01  6:17 ` pinskia at gmail dot com
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 19+ messages in thread
From: yotambarnoy at gmail dot com @ 2010-09-01  5:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from yotambarnoy at gmail dot com  2010-09-01 05:03 -------
Unfortunately, a lib based solutions are difficult for me to implement. The
reason is that the current PSP SDK uses newlib. I can probably change my
personal toolchain with some work, but then it's a custom modification that
needs to be replicated to every other ScummVM dev as well as our buildbot. Not
impossible, but not work I'd like to get in to right now. 

In any case, it sounds like what you're saying is that memcpy has asm
instructions in the right place to use lwl and lwr. I can also do that in my
implementation.

My request was more general, as in gcc needs some kind of custom keyword to
tell it to allow unaligned pointers and to generate appropriate unaligned code,
so we don't have to trick the compiler into doing it in a way that ruins
optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (7 preceding siblings ...)
  2010-09-01  5:03 ` yotambarnoy at gmail dot com
@ 2010-09-01  6:17 ` pinskia at gmail dot com
  2010-09-01  9:45 ` rguenth at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: pinskia at gmail dot com @ 2010-09-01  6:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from pinskia at gmail dot com  2010-09-01 06:17 -------
Subject: Re:  Bad optimization in -O3 sometimes

I am not talking about a library solution at all. I am talking about a  
solution inside the compiler. Gcc will optimize memcpy; how much for  
MIPS is a good question. Try it out and see. Oh if you are using  
scei's gcc you really should be reporting issues to them.

On Aug 31, 2010, at 10:03 PM, "yotambarnoy at gmail dot com"
<gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #8 from yotambarnoy at gmail dot com  2010-09-01  
> 05:03 -------
> Unfortunately, a lib based solutions are difficult for me to  
> implement. The
> reason is that the current PSP SDK uses newlib. I can probably  
> change my
> personal toolchain with some work, but then it's a custom  
> modification that
> needs to be replicated to every other ScummVM dev as well as our  
> buildbot. Not
> impossible, but not work I'd like to get in to right now.
>
> In any case, it sounds like what you're saying is that memcpy has asm
> instructions in the right place to use lwl and lwr. I can also do  
> that in my
> implementation.
>
> My request was more general, as in gcc needs some kind of custom  
> keyword to
> tell it to allow unaligned pointers and to generate appropriate  
> unaligned code,
> so we don't have to trick the compiler into doing it in a way that  
> ruins
> optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr;
>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-09-01  5:03 ` yotambarnoy at gmail dot com
@ 2010-09-01  6:17   ` Andrew Pinski
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Pinski @ 2010-09-01  6:17 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs

I am not talking about a library solution at all. I am talking about a  
solution inside the compiler. Gcc will optimize memcpy; how much for  
MIPS is a good question. Try it out and see. Oh if you are using  
scei's gcc you really should be reporting issues to them.

On Aug 31, 2010, at 10:03 PM, "yotambarnoy at gmail dot com" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #8 from yotambarnoy at gmail dot com  2010-09-01  
> 05:03 -------
> Unfortunately, a lib based solutions are difficult for me to  
> implement. The
> reason is that the current PSP SDK uses newlib. I can probably  
> change my
> personal toolchain with some work, but then it's a custom  
> modification that
> needs to be replicated to every other ScummVM dev as well as our  
> buildbot. Not
> impossible, but not work I'd like to get in to right now.
>
> In any case, it sounds like what you're saying is that memcpy has asm
> instructions in the right place to use lwl and lwr. I can also do  
> that in my
> implementation.
>
> My request was more general, as in gcc needs some kind of custom  
> keyword to
> tell it to allow unaligned pointers and to generate appropriate  
> unaligned code,
> so we don't have to trick the compiler into doing it in a way that  
> ruins
> optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr;
>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (8 preceding siblings ...)
  2010-09-01  6:17 ` pinskia at gmail dot com
@ 2010-09-01  9:45 ` rguenth at gcc dot gnu dot org
  2010-09-01 18:26 ` pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-09-01  9:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rguenth at gcc dot gnu dot org  2010-09-01 09:45 -------
(In reply to comment #8)
> Unfortunately, a lib based solutions are difficult for me to implement. The
> reason is that the current PSP SDK uses newlib. I can probably change my
> personal toolchain with some work, but then it's a custom modification that
> needs to be replicated to every other ScummVM dev as well as our buildbot. Not
> impossible, but not work I'd like to get in to right now. 
> 
> In any case, it sounds like what you're saying is that memcpy has asm
> instructions in the right place to use lwl and lwr. I can also do that in my
> implementation.
> 
> My request was more general, as in gcc needs some kind of custom keyword to
> tell it to allow unaligned pointers and to generate appropriate unaligned code,
> so we don't have to trick the compiler into doing it in a way that ruins
> optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; 
> 

typedef my_unaligned_aliasing_uint32 uint32
__attribute__((aligned(1),may_alias));

inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
{
  return *(const my_unaligned_aliasing_uint32 *)ptr;
}

should do it and does not require -fno-strict-aliasing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (9 preceding siblings ...)
  2010-09-01  9:45 ` rguenth at gcc dot gnu dot org
@ 2010-09-01 18:26 ` pinskia at gcc dot gnu dot org
  2010-09-01 18:36 ` yotambarnoy at gmail dot com
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2010-09-01 18:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from pinskia at gcc dot gnu dot org  2010-09-01 18:25 -------
(In reply to comment #10)
> typedef my_unaligned_aliasing_uint32 uint32
> __attribute__((aligned(1),may_alias));
> 
> inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
> {
>   return *(const my_unaligned_aliasing_uint32 *)ptr;
> }

It does not:
READ_UINT32:
        j       $31
        lw      $2,0($4)

The aligned attribute is ignored there I think.  memcpy produces:
        lbu     $2,3($4)
        lbu     $6,0($4)
        lbu     $5,1($4)
        lbu     $3,2($4)
        addiu   $sp,$sp,-16
        sb      $6,0($sp)
        sb      $5,1($sp)
        sb      $3,2($sp)
        sb      $2,3($sp)
        lw      $2,0($sp)
        j       $31
        addiu   $sp,$sp,16

Which is bad and could be improved by using lwl/lwr.  I will file a bug about
that.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (10 preceding siblings ...)
  2010-09-01 18:26 ` pinskia at gcc dot gnu dot org
@ 2010-09-01 18:36 ` yotambarnoy at gmail dot com
  2010-09-02  9:08 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: yotambarnoy at gmail dot com @ 2010-09-01 18:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from yotambarnoy at gmail dot com  2010-09-01 18:35 -------
Right. Unfortunately 
> typedef my_unaligned_aliasing_uint32 uint32
> __attribute__((aligned(1),may_alias));
> 
> inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
> {
>   return *(const my_unaligned_aliasing_uint32 *)ptr;
> }

doesn't work and doesn't align. I kept the struct method and added the
__may_alias__ attribute to fix the problem on my end. I'm glad to see gcc has
these attributes after all.

Regarding memcpy, I can't get gcc to optimize it for me at all, probably
because the PSP toolchain adds -fno-builtin to newlib. If I use
-Wl,--wrap,memcpy can I then create a __builtin_memcpy and have gcc optimize
using it?

Thanks for all your feedback guys. You've been a huge help.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (11 preceding siblings ...)
  2010-09-01 18:36 ` yotambarnoy at gmail dot com
@ 2010-09-02  9:08 ` rguenth at gcc dot gnu dot org
  2010-09-02 20:47 ` yotambarnoy at gmail dot com
  2010-09-04  3:08 ` hp at gcc dot gnu dot org
  14 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-09-02  9:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from rguenth at gcc dot gnu dot org  2010-09-02 09:07 -------
(In reply to comment #11)
> (In reply to comment #10)
> > typedef my_unaligned_aliasing_uint32 uint32
> > __attribute__((aligned(1),may_alias));
> > 
> > inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
> > {
> >   return *(const my_unaligned_aliasing_uint32 *)ptr;
> > }
> 
> It does not:
> READ_UINT32:
>         j       $31
>         lw      $2,0($4)
> 
> The aligned attribute is ignored there I think.

It is if the target is STRICT_ALIGNMENT (which of course is a bug, but
well ... and I happen to have a fix as well)

>  memcpy produces:
>         lbu     $2,3($4)
>         lbu     $6,0($4)
>         lbu     $5,1($4)
>         lbu     $3,2($4)
>         addiu   $sp,$sp,-16
>         sb      $6,0($sp)
>         sb      $5,1($sp)
>         sb      $3,2($sp)
>         sb      $2,3($sp)
>         lw      $2,0($sp)
>         j       $31
>         addiu   $sp,$sp,16
> 
> Which is bad and could be improved by using lwl/lwr.  I will file a bug about
> that.
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (12 preceding siblings ...)
  2010-09-02  9:08 ` rguenth at gcc dot gnu dot org
@ 2010-09-02 20:47 ` yotambarnoy at gmail dot com
  2010-09-04  3:08 ` hp at gcc dot gnu dot org
  14 siblings, 0 replies; 19+ messages in thread
From: yotambarnoy at gmail dot com @ 2010-09-02 20:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from yotambarnoy at gmail dot com  2010-09-02 20:47 -------
Getting back to the original question, I did some reading online and I can't
figure out why this breaks the strict aliasing rules. 

Isn't void * some kind of special case? Shouldn't I be able to convert it to
whatever I need within the function without breaking aliasing? 

I think the problem is that gcc assumes that I want alignment (for the uint32 *
inside the struct) and doesn't realize I've used PACKED, so it decides that
it's undefined behavior. What do you guys think? This aliasing topic is so
confusing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug c++/45462] Bad optimization in -O3 sometimes
  2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
                   ` (13 preceding siblings ...)
  2010-09-02 20:47 ` yotambarnoy at gmail dot com
@ 2010-09-04  3:08 ` hp at gcc dot gnu dot org
  14 siblings, 0 replies; 19+ messages in thread
From: hp at gcc dot gnu dot org @ 2010-09-04  3:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from hp at gcc dot gnu dot org  2010-09-04 03:08 -------
(In reply to comment #4)
> Good job picking up on that. 
> 
> There must be a better way of telling the compiler to generate lwr and lwl MIPS
> instructions without breaking strict aliasing rules...?

When requiring a specific insn you want an asm:

unsigned int result;
unsigned char *p;

/* Need the "m" (dummy) to mark memory as read. Need earlyclobber because gcc
using the same register would cause...problems.  Little endian assumed. */
asm ("lwr %0,0(%1)\n\tlwl %0,3(%1)" : "=&r" (result) : "r" (p), "m" (*p));


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-09-04  3:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-31 11:51 [Bug c++/45462] New: Bad optimization in -O3 sometimes yotambarnoy at gmail dot com
2010-08-31 11:53 ` [Bug c++/45462] " yotambarnoy at gmail dot com
2010-08-31 11:53 ` yotambarnoy at gmail dot com
2010-08-31 14:17 ` rguenth at gcc dot gnu dot org
2010-08-31 15:24 ` yotambarnoy at gmail dot com
2010-08-31 19:09   ` Andrew Pinski
2010-08-31 19:09 ` pinskia at gmail dot com
2010-09-01  4:33 ` yotambarnoy at gmail dot com
2010-09-01  4:41   ` Andrew Pinski
2010-09-01  4:42 ` pinskia at gmail dot com
2010-09-01  5:03 ` yotambarnoy at gmail dot com
2010-09-01  6:17   ` Andrew Pinski
2010-09-01  6:17 ` pinskia at gmail dot com
2010-09-01  9:45 ` rguenth at gcc dot gnu dot org
2010-09-01 18:26 ` pinskia at gcc dot gnu dot org
2010-09-01 18:36 ` yotambarnoy at gmail dot com
2010-09-02  9:08 ` rguenth at gcc dot gnu dot org
2010-09-02 20:47 ` yotambarnoy at gmail dot com
2010-09-04  3:08 ` hp at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).