public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
       [not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/>
@ 2012-12-20  4:45 ` siarhei.siamashka at gmail dot com
  2012-12-20  5:47 ` siarhei.siamashka at gmail dot com
  1 sibling, 0 replies; 10+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2012-12-20  4:45 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294

Siarhei Siamashka <siarhei.siamashka at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |siarhei.siamashka at gmail
                   |                            |dot com

--- Comment #9 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 04:45:10 UTC ---
(In reply to comment #3)
> Actually this case should not be using post modify at all except how many bits
> does ARM have to use for an offset? I thought 16bits which means you don't need
> that at all and GCC should generate it without an increment.  Oh and this is a
> RTL opt issue.

Seems like gcc 4.7.2 and 4.8.0 20121219 (experimental) are already doing this,
which hides the postincrement issue for the currently attached testcase.

However postincrement is still a performance problem for ARM. The code I'm
having troubles with is the following:

/*******************************************/

typedef unsigned long long T;

void fill(T *buf, int n, T v)
{
    while ((n -= 16) >= 0)
    {
        *buf++ = v;
        *buf++ = v;
    }
}

/*******************************************/

$ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c
$ objdump -d test.o

00000000 <fill>:
   0:    e2511010     subs    r1, r1, #16
   4:    412fff1e     bxmi    lr
   8:    e2511010     subs    r1, r1, #16
   c:    e1c020f0     strd    r2, [r0]
  10:    e1c020f8     strd    r2, [r0, #8]
  14:    e2800010     add    r0, r0, #16
  18:    5afffffa     bpl    8 <fill+0x8>
  1c:    e12fff1e     bx    lr


$ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c
$ objdump -d test.o

00000000 <fill>:
   0:    e351000f     cmp    r1, #15
   4:    d12fff1e     bxle    lr
   8:    e2411010     sub    r1, r1, #16
   c:    e280c010     add    ip, r0, #16
  10:    e3c1100f     bic    r1, r1, #15
  14:    e08c1001     add    r1, ip, r1
  18:    e1c020f0     strd    r2, [r0]
  1c:    e2800010     add    r0, r0, #16
  20:    e14020f8     strd    r2, [r0, #-8]
  24:    e1500001     cmp    r0, r1
  28:    1afffffa     bne    18 <fill+0x18>
  2c:    e12fff1e     bx    lr


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
       [not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/>
  2012-12-20  4:45 ` [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM siarhei.siamashka at gmail dot com
@ 2012-12-20  5:47 ` siarhei.siamashka at gmail dot com
  1 sibling, 0 replies; 10+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2012-12-20  5:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294

--- Comment #10 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 05:47:30 UTC ---
(In reply to comment #9)

And some performance measurements (for working with L1 cache):

> $ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c
> $ objdump -d test.o
> 
> 00000000 <fill>:
>    0:    e2511010     subs    r1, r1, #16
>    4:    412fff1e     bxmi    lr
>    8:    e2511010     subs    r1, r1, #16
>    c:    e1c020f0     strd    r2, [r0]
>   10:    e1c020f8     strd    r2, [r0, #8]
>   14:    e2800010     add    r0, r0, #16
>   18:    5afffffa     bpl    8 <fill+0x8>
>   1c:    e12fff1e     bx    lr

Cortex-A8  - 5   cycles per iteration
Cortex-A9  - 4.5 cycles per iteration
Cortex-A15 - 3   cycles per iteration

> $ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c
> $ objdump -d test.o
> 
> 00000000 <fill>:
>    0:    e351000f     cmp    r1, #15
>    4:    d12fff1e     bxle    lr
>    8:    e2411010     sub    r1, r1, #16
>    c:    e280c010     add    ip, r0, #16
>   10:    e3c1100f     bic    r1, r1, #15
>   14:    e08c1001     add    r1, ip, r1
>   18:    e1c020f0     strd    r2, [r0]
>   1c:    e2800010     add    r0, r0, #16
>   20:    e14020f8     strd    r2, [r0, #-8]
>   24:    e1500001     cmp    r0, r1
>   28:    1afffffa     bne    18 <fill+0x18>
>   2c:    e12fff1e     bx    lr

Cortex-A8  - 6 cycles per iteration
Cortex-A9  - 4 cycles per iteration
Cortex-A15 - 3 cycles per iteration

While we could have expected something like the following code for the inner
loop:

1:      strd    V, [BUF], #8
        subs    N, N, #16
        strd    V, [BUF], #8
        bpl    1b

Cortex-A8  - 4 cycles per iteration
Cortex-A9  - 4 cycles per iteration
Cortex-A15 - 2.5 cycles per iteration


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
  2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
                   ` (6 preceding siblings ...)
  2009-04-21 14:09 ` ramana at gcc dot gnu dot org
@ 2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org
  7 siblings, 0 replies; 10+ messages in thread
From: rearnsha at gcc dot gnu dot org @ 2009-12-14 23:22 UTC (permalink / raw)
  To: gcc-bugs



-- 

rearnsha at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
  2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
                   ` (5 preceding siblings ...)
  2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com
@ 2009-04-21 14:09 ` ramana at gcc dot gnu dot org
  2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org
  7 siblings, 0 replies; 10+ messages in thread
From: ramana at gcc dot gnu dot org @ 2009-04-21 14:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from ramana at gcc dot gnu dot org  2009-04-21 14:09 -------
Confirmed with trunk.


-- 

ramana at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-04-21 14:09:43
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
  2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
                   ` (4 preceding siblings ...)
  2006-10-06 19:07 ` eplondke at gmail dot com
@ 2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com
  2009-04-21 14:09 ` ramana at gcc dot gnu dot org
  2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org
  7 siblings, 0 replies; 10+ messages in thread
From: ramana dot radhakrishnan at codito dot com @ 2006-10-09 16:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from ramana dot radhakrishnan at codito dot com  2006-10-09 16:33 -------
(In reply to comment #5)

flow.c is responsible for generating POST_INCs and POST_MODIFY's in 3.4 / 4.0 /
4.1 / 4.2 . I believe this is being replaced by the new data flow bits in the
data flow branch. This might not be ready until 4.3 . We have hit similar
issues in a private port that I maintain. 

2 options are either to fix flow.c or use some of Joern's auto increment
patches for 4.1 / 4.2 to fix this issue. This doesn't really take care of
POST_MODIFY but I don't think that affects ARM that much. 


> Here's what's going on in this case:
> 
> CSE changes an address if:
>    A) The cost of the address is lower
> or
>    B) The cost of the address is the same and the cost of the RTX would be 
>       higher outside of an address
> 
> So, CSE changes (R) to (R+4) because it is lower cost as specified by the 
> address_costs hook.
> 
> It doesn't change beyond (R+4) because (R+8) is the same cost as (R+4).
> 
> Once the address (R+4) gets in the RTL sequence, it never gets converted to 
> a postincrement form.
> 
> So by adding the cost of a simple REG RTX as being lower than (+ (REG) (CONST)) 
> in the addressing modes, CSE doesn't convert the address to base+offset, and
> we get the postincrement code back again in 4.x.
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
  2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
                   ` (3 preceding siblings ...)
  2006-10-06 14:56 ` eplondke at gmail dot com
@ 2006-10-06 19:07 ` eplondke at gmail dot com
  2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: eplondke at gmail dot com @ 2006-10-06 19:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from eplondke at gmail dot com  2006-10-06 19:07 -------
Changing the cost of (REG) to 1 fixes 4.1 but not 4.2, it seems.

In 4.2, the RTL optimization does not combine 

        ldr     r2, [r1, #0]
        ldr     r3, [r0, #0]
        add     r0, r0, #4
        add     r1, r1, #4

into

        ldr     r2, [r1], #4
        ldr     r3, [r0], #4


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
  2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
                   ` (2 preceding siblings ...)
  2006-10-02 19:16 ` eplondke at gmail dot com
@ 2006-10-06 14:56 ` eplondke at gmail dot com
  2006-10-06 19:07 ` eplondke at gmail dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: eplondke at gmail dot com @ 2006-10-06 14:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from eplondke at gmail dot com  2006-10-06 14:55 -------
Here's what's going on in this case:

CSE changes an address if:
   A) The cost of the address is lower
or
   B) The cost of the address is the same and the cost of the RTX would be 
      higher outside of an address

So, CSE changes (R) to (R+4) because it is lower cost as specified by the 
address_costs hook.

It doesn't change beyond (R+4) because (R+8) is the same cost as (R+4).

Once the address (R+4) gets in the RTL sequence, it never gets converted to 
a postincrement form.

So by adding the cost of a simple REG RTX as being lower than (+ (REG) (CONST)) 
in the addressing modes, CSE doesn't convert the address to base+offset, and
we get the postincrement code back again in 4.x.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
  2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
  2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com
  2006-09-29 23:43 ` pinskia at gcc dot gnu dot org
@ 2006-10-02 19:16 ` eplondke at gmail dot com
  2006-10-06 14:56 ` eplondke at gmail dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: eplondke at gmail dot com @ 2006-10-02 19:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from eplondke at gmail dot com  2006-10-02 19:16 -------
(In reply to comment #3)
> Actually this case should not be using post modify at all except how many bits
> does ARM have to use for an offset? I thought 16bits which means you don't need
> that at all and GCC should generate it without an increment.  Oh and this is a
> RTL opt issue.
> 

ARM normal arithmetic operands support an 8-bit integer rotated right by an
even
number of bits (0-30).  If you rotate at all some microarchitectures may cause 
stalls (Xscale maybe?)...

Load and Store word (and unsigned byte) have a +/- 12-bit offset.
Load and store other single values has a +/- 8-bit offset.
Load and store multiple may have no offset.

That's for ARM.  For THUMB, you get + 5 bits.

Both ARM and THUMB mode have postincrement modes.  ARM gets a decent
postmodify.  For THUMB you use the Load Multiple Increment After instruction
with a single
regster specified.

Looks like CSE1 is the first time that 

(set (reg) (mem (reg))) gets converted to
(set (reg) (mem (plus (reg) (4))))

I have noticed a propensity for postmodify to not be used in several targets 
comparing GCC 4.X to GCC 3.X.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
  2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
  2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com
@ 2006-09-29 23:43 ` pinskia at gcc dot gnu dot org
  2006-10-02 19:16 ` eplondke at gmail dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-29 23:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2006-09-29 23:43 -------
Actually this case should not be using post modify at all except how many bits
does ARM have to use for an offset? I thought 16bits which means you don't need
that at all and GCC should generate it without an increment.  Oh and this is a
RTL opt issue.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
  2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
@ 2006-09-29 23:42 ` eplondke at gmail dot com
  2006-09-29 23:43 ` pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: eplondke at gmail dot com @ 2006-09-29 23:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from eplondke at gmail dot com  2006-09-29 23:42 -------
GCC 4.1/4.2 ouptut looks like:

postinc:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        ldr     ip, [r1, #0]
        ldr     r3, [r0, #0]
        stmfd   sp!, {r4, lr}
        mul     lr, r3, ip
        ldr     r4, [r1, #4]
        ldr     r2, [r0, #4]
        add     r1, r1, #4
        mla     ip, r2, r4, lr
        add     r0, r0, #4
        ldr     r2, [r1, #4]
        ldr     r3, [r0, #4]
        add     r1, r1, #4
        mla     lr, r3, r2, ip
        add     r0, r0, #4
        ldr     r2, [r1, #4]
        ldr     r3, [r0, #4]
        add     r1, r1, #4
        mla     ip, r3, r2, lr
        add     r0, r0, #4
        ....

GCC 3.4.2 output looks like:

postinc:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        stmfd   sp!, {r4, r5, lr}
        ldr     r4, [r1], #4
        ldr     r5, [r0], #4
        ldr     r2, [r1], #4
        ldr     lr, [r0], #4
        mul     ip, r2, lr
        mla     r3, r4, r5, ip
        ldr     lr, [r0], #4
        ldr     r2, [r1], #4
        mla     r3, r2, lr, r3
        ldr     ip, [r0], #4
        ldr     r2, [r1], #4
        mla     r3, r2, ip, r3
        ....


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-12-20  5:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/>
2012-12-20  4:45 ` [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM siarhei.siamashka at gmail dot com
2012-12-20  5:47 ` siarhei.siamashka at gmail dot com
2006-09-29 23:37 [Bug tree-optimization/29294] New: " eplondke at gmail dot com
2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com
2006-09-29 23:43 ` pinskia at gcc dot gnu dot org
2006-10-02 19:16 ` eplondke at gmail dot com
2006-10-06 14:56 ` eplondke at gmail dot com
2006-10-06 19:07 ` eplondke at gmail dot com
2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com
2009-04-21 14:09 ` ramana at gcc dot gnu dot org
2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).