public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/38453]  New: Output code optimisation excessive use of builtins
@ 2008-12-09 14:51 vince at simtec dot co dot uk
  2008-12-09 14:52 ` [Bug c/38453] " vince at simtec dot co dot uk
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: vince at simtec dot co dot uk @ 2008-12-09 14:51 UTC (permalink / raw)
  To: gcc-bugs

While compiling compression code for LZMA for use with an embedded ARM target I
have discovered a regression from previous editions of GCC.

I have pared this down to a trivial example (attached) which boils down to a
application specific modulus operation (please note this is the *minimal* test
case and obviously is a bit more complex buried in the middle of the
compression system. The behavior exhibited remains the same in both the large
and small systems.

The simple test case is compiled with  
arm-unknown-linux-gnu-gcc -Os -o foo test.c

and the resulting objdump is:

000083fc <foo>:
    83fc:       e92d4010        push    {r4, lr}
    8400:       e5d11000        ldrb    r1, [r1]
    8404:       e1a04000        mov     r4, r0
    8408:       e1a02001        mov     r2, r1
    840c:       ea000002        b       841c <foo+0x20>
    8410:       e5943004        ldr     r3, [r4, #4]
    8414:       e2833001        add     r3, r3, #1      ; 0x1
    8418:       e5843004        str     r3, [r4, #4]
    841c:       e242302d        sub     r3, r2, #45     ; 0x2d
    8420:       e352002c        cmp     r2, #44 ; 0x2c
    8424:       e20320ff        and     r2, r3, #255    ; 0xff
    8428:       8afffff8        bhi     8410 <foo+0x14>
    842c:       e1a00001        mov     r0, r1
    8430:       e3a0102d        mov     r1, #45 ; 0x2d
    8434:       eb000003        bl      8448 <__umodsi3>
    8438:       e20000ff        and     r0, r0, #255    ; 0xff
    843c:       e5840000        str     r0, [r4]
    8440:       e8bd8010        pop     {r4, pc}

if a differing optimisation is used:

arm-unknown-linux-gnu-gcc -O2 -o foo test.c

000083fc <foo>:
    83fc:       e92d4070        push    {r4, r5, r6, lr}
    8400:       e5d14000        ldrb    r4, [r1]
    8404:       e354002c        cmp     r4, #44 ; 0x2c
    8408:       e1a06000        mov     r6, r0
    840c:       9a00000e        bls     844c <foo+0x50>
    8410:       e244402d        sub     r4, r4, #45     ; 0x2d
    8414:       e20440ff        and     r4, r4, #255    ; 0xff
    8418:       e5905004        ldr     r5, [r0, #4]
    841c:       e3a0102d        mov     r1, #45 ; 0x2d
    8420:       e1a00004        mov     r0, r4
    8424:       eb00004f        bl      8568 <__umodsi3>
    8428:       e3a0102d        mov     r1, #45 ; 0x2d
    842c:       e1a03000        mov     r3, r0
    8430:       e1a00004        mov     r0, r4
    8434:       e20340ff        and     r4, r3, #255    ; 0xff
    8438:       eb000006        bl      8458 <__aeabi_uidiv>
    843c:       e2855001        add     r5, r5, #1      ; 0x1
    8440:       e20000ff        and     r0, r0, #255    ; 0xff
    8444:       e0855000        add     r5, r5, r0
    8448:       e5865004        str     r5, [r6, #4]
    844c:       e5864000        str     r4, [r6]
    8450:       e8bd8070        pop     {r4, r5, r6, pc}

Actually several optimization levels were tried and all produced similar output

GCC 4.2.2 and 4.2.4 (which are our current compliers) 
arm-unknown-linux-gnueabi-gcc -Os -o foo test.c
produce:

00008328 <foo>:
    8328:       e5d12000        ldrb    r2, [r1]
    832c:       ea000003        b       8340 <foo+0x18>
    8330:       e5903004        ldr     r3, [r0, #4]
    8334:       e20120ff        and     r2, r1, #255    ; 0xff
    8338:       e2833001        add     r3, r3, #1      ; 0x1
    833c:       e5803004        str     r3, [r0, #4]
    8340:       e352002c        cmp     r2, #44 ; 0x2c
    8344:       e242102d        sub     r1, r2, #45     ; 0x2d
    8348:       8afffff8        bhi     8330 <foo+0x8>
    834c:       e5802000        str     r2, [r0]
    8350:       e12fff1e        bx      lr



As can be seen the trivial loop is performed and the divisor and remainder
found but then the __umodsi3 builtin is called to do the operation *again* and
that used to assign the result which is already available from the loop!

This odd behavior is seen in cross built (and native) GCC 4.3.2 but not in
4.2.4 it seems to be present in current development builds however I have
issues building those reliably so cannot give definite results.

The behavior is especially obvious with large performance and code size
degradation in compression code on small embedded system. Also the additional
need to link in the __umodsi3 implementation causes more space to be lost. 

This has also been observed in some circumstances within ARM kernels when using
modulous on powers of two! the obvious optimisation using shifts is performed
and then the value recomputed using __modsi3

Just for completeness here is the GCC 4.3.2 compiler used for the tests (the
4.3.4 produces identical compiled output but has other undesirable behaviors
not relevant to this report)

arm-unknown-linux-gnu-gcc -v
Using built-in specs.
Target: arm-unknown-linux-gnu
Configured with: /opt/simtec/crosstool-ng/targets/src/gcc-4.3.2/configure
--build=x86_64-build_unknown-linux-gnu --host=x86_64-build_unknown-linux-gnu
--target=arm-unknown-linux-gnu --prefix=/opt/simtec/arm-unknown-linux-gnu
--with-sysroot=/opt/simtec/arm-unknown-linux-gnu/arm-unknown-linux-gnu/sys-root
--enable-languages=c,c++,fortran,java --disable-multilib --with-float=soft
--with-gmp=/opt/simtec/arm-unknown-linux-gnu
--with-mpfr=/opt/simtec/arm-unknown-linux-gnu
--with-pkgversion=crosstool-NG-1.3.0 --enable-__cxa_atexit
--with-local-prefix=/opt/simtec/arm-unknown-linux-gnu/arm-unknown-linux-gnu/sys-root
--disable-nls --enable-threads=posix --enable-symvers=gnu --enable-c99
--enable-long-long --enable-target-optspace
Thread model: posix
gcc version 4.3.2 (crosstool-NG-1.3.0)


-- 
           Summary: Output code optimisation excessive use of builtins
           Product: gcc
           Version: 4.3.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: vince at simtec dot co dot uk
 GCC build triplet: x86_64-build_unknown-linux-gnu
  GCC host triplet: x86_64-build_unknown-linux-gnu
GCC target triplet: arm-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/38453] Output code optimisation excessive use of builtins
  2008-12-09 14:51 [Bug c/38453] New: Output code optimisation excessive use of builtins vince at simtec dot co dot uk
@ 2008-12-09 14:52 ` vince at simtec dot co dot uk
  2008-12-10  0:27 ` pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: vince at simtec dot co dot uk @ 2008-12-09 14:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from vince at simtec dot co dot uk  2008-12-09 14:51 -------
Created an attachment (id=16854)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16854&action=view)
Trivial test code to show behaviour


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/38453] Output code optimisation excessive use of builtins
  2008-12-09 14:51 [Bug c/38453] New: Output code optimisation excessive use of builtins vince at simtec dot co dot uk
  2008-12-09 14:52 ` [Bug c/38453] " vince at simtec dot co dot uk
@ 2008-12-10  0:27 ` pinskia at gcc dot gnu dot org
  2008-12-10 10:56 ` steven at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-10  0:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2008-12-10 00:25 -------
I don't see an issue here really, the code got optimized to just:
<bb 2>:
  prop0.24 = *propsData;
  prop0 = prop0.24;
  goto <bb 4>;

<bb 3>:
  propsRes->pb = [plus_expr] propsRes->pb + 1;
  prop0 = prop0 + 211;

<bb 4>:
  if (prop0 > 44)
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  propsRes->lc = (int) (int) (prop0.24 % 45);
  return;

But since for arm, there is no %/divide instruction (which is sad by the way),
a call to __umodsi3/__aeabi_uidiv is used.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/38453] Output code optimisation excessive use of builtins
  2008-12-09 14:51 [Bug c/38453] New: Output code optimisation excessive use of builtins vince at simtec dot co dot uk
  2008-12-09 14:52 ` [Bug c/38453] " vince at simtec dot co dot uk
  2008-12-10  0:27 ` pinskia at gcc dot gnu dot org
@ 2008-12-10 10:56 ` steven at gcc dot gnu dot org
  2008-12-10 11:20   ` Andrew Thomas Pinski
  2008-12-10 11:21 ` pinskia at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-10 10:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from steven at gcc dot gnu dot org  2008-12-10 10:51 -------
Investigating.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |steven at gcc dot gnu dot
                   |dot org                     |org
             Status|UNCONFIRMED                 |ASSIGNED
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2008-12-10 10:51:37
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug c/38453] Output code optimisation excessive use of builtins
  2008-12-10 10:56 ` steven at gcc dot gnu dot org
@ 2008-12-10 11:20   ` Andrew Thomas Pinski
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Thomas Pinski @ 2008-12-10 11:20 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs



Sent from my iPhone

On Dec 10, 2008, at 2:51 AM, "steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #3 from steven at gcc dot gnu dot org  2008-12-10  
> 10:51 -------
> Investigating.
>
There is no reason to investigate.  The reason why this change  
happened was because the hurestic in scev-cp was removed and is now  
done always. There is another bug about this with respect to the Linux  
kernel too.
Thanks,
Andrew Pinski


>
> -- 
>
> steven at gcc dot gnu dot org changed:
>
>           What    |Removed                     |Added
> --- 
> --- 
> ----------------------------------------------------------------------
>         AssignedTo|unassigned at gcc dot gnu   |steven at gcc dot  
> gnu dot
>                   |dot org                     |org
>             Status|UNCONFIRMED                 |ASSIGNED
>     Ever Confirmed|0                           |1
>   Last reconfirmed|0000-00-00 00:00:00         |2008-12-10 10:51:37
>               date|                            |
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/38453] Output code optimisation excessive use of builtins
  2008-12-09 14:51 [Bug c/38453] New: Output code optimisation excessive use of builtins vince at simtec dot co dot uk
                   ` (2 preceding siblings ...)
  2008-12-10 10:56 ` steven at gcc dot gnu dot org
@ 2008-12-10 11:21 ` pinskia at gmail dot com
  2008-12-10 11:26 ` steven at gcc dot gnu dot org
  2008-12-10 11:29 ` steven at gcc dot gnu dot org
  5 siblings, 0 replies; 8+ messages in thread
From: pinskia at gmail dot com @ 2008-12-10 11:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from pinskia at gmail dot com  2008-12-10 11:20 -------
Subject: Re:  Output code optimisation excessive use of builtins



Sent from my iPhone

On Dec 10, 2008, at 2:51 AM, "steven at gcc dot gnu dot org"
<gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #3 from steven at gcc dot gnu dot org  2008-12-10  
> 10:51 -------
> Investigating.
>
There is no reason to investigate.  The reason why this change  
happened was because the hurestic in scev-cp was removed and is now  
done always. There is another bug about this with respect to the Linux  
kernel too.
Thanks,
Andrew Pinski


>
> -- 
>
> steven at gcc dot gnu dot org changed:
>
>           What    |Removed                     |Added
> --- 
> --- 
> ----------------------------------------------------------------------
>         AssignedTo|unassigned at gcc dot gnu   |steven at gcc dot  
> gnu dot
>                   |dot org                     |org
>             Status|UNCONFIRMED                 |ASSIGNED
>     Ever Confirmed|0                           |1
>   Last reconfirmed|0000-00-00 00:00:00         |2008-12-10 10:51:37
>               date|                            |
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/38453] Output code optimisation excessive use of builtins
  2008-12-09 14:51 [Bug c/38453] New: Output code optimisation excessive use of builtins vince at simtec dot co dot uk
                   ` (3 preceding siblings ...)
  2008-12-10 11:21 ` pinskia at gmail dot com
@ 2008-12-10 11:26 ` steven at gcc dot gnu dot org
  2008-12-10 11:29 ` steven at gcc dot gnu dot org
  5 siblings, 0 replies; 8+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-10 11:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from steven at gcc dot gnu dot org  2008-12-10 11:24 -------
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32044#c5


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c/38453] Output code optimisation excessive use of builtins
  2008-12-09 14:51 [Bug c/38453] New: Output code optimisation excessive use of builtins vince at simtec dot co dot uk
                   ` (4 preceding siblings ...)
  2008-12-10 11:26 ` steven at gcc dot gnu dot org
@ 2008-12-10 11:29 ` steven at gcc dot gnu dot org
  5 siblings, 0 replies; 8+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-10 11:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from steven at gcc dot gnu dot org  2008-12-10 11:25 -------


*** This bug has been marked as a duplicate of 32044 ***


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38453


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-12-10 11:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-09 14:51 [Bug c/38453] New: Output code optimisation excessive use of builtins vince at simtec dot co dot uk
2008-12-09 14:52 ` [Bug c/38453] " vince at simtec dot co dot uk
2008-12-10  0:27 ` pinskia at gcc dot gnu dot org
2008-12-10 10:56 ` steven at gcc dot gnu dot org
2008-12-10 11:20   ` Andrew Thomas Pinski
2008-12-10 11:21 ` pinskia at gmail dot com
2008-12-10 11:26 ` steven at gcc dot gnu dot org
2008-12-10 11:29 ` steven at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).