public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
@ 2006-09-29 23:37 eplondke at gmail dot com
2006-09-29 23:38 ` [Bug tree-optimization/29294] " eplondke at gmail dot com
` (8 more replies)
0 siblings, 9 replies; 12+ messages in thread
From: eplondke at gmail dot com @ 2006-09-29 23:37 UTC (permalink / raw)
To: gcc-bugs
The attached file benefits greatly from the ARM postincrement address mode.
In 4.1.1 and 4.2 20060923, we no longer get postincrement addressing mode,
but (base+4) followed by base = base+4
This leads to an increase in instruction count of 40%.
While the test is of course trivial, I see real code not benefiting from
postmodify quite often while using 4.1.1.
I'm not quite sure if it belongs in tree-optimization, but it comes out of the
expander as separate
(set (reg B) (plus (reg A) (4))
(set (reg C) (plus (reg A) (4))
which is how it ends up in the assembly....
--
Summary: 4.1, 4.2 (possibly 4.0?) not finding postmodify address
mode on ARM
Product: gcc
Version: 4.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: eplondke at gmail dot com
GCC host triplet: x86_64-suse-linux
GCC target triplet: arm-unknown-elf
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
@ 2006-09-29 23:38 ` eplondke at gmail dot com
2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com
` (7 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: eplondke at gmail dot com @ 2006-09-29 23:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from eplondke at gmail dot com 2006-09-29 23:38 -------
Created an attachment (id=12359)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12359&action=view)
Test for postmodify address mode
simple function that benefits from postmodify selection
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
2006-09-29 23:38 ` [Bug tree-optimization/29294] " eplondke at gmail dot com
@ 2006-09-29 23:42 ` eplondke at gmail dot com
2006-09-29 23:43 ` pinskia at gcc dot gnu dot org
` (6 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: eplondke at gmail dot com @ 2006-09-29 23:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from eplondke at gmail dot com 2006-09-29 23:42 -------
GCC 4.1/4.2 ouptut looks like:
postinc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
ldr ip, [r1, #0]
ldr r3, [r0, #0]
stmfd sp!, {r4, lr}
mul lr, r3, ip
ldr r4, [r1, #4]
ldr r2, [r0, #4]
add r1, r1, #4
mla ip, r2, r4, lr
add r0, r0, #4
ldr r2, [r1, #4]
ldr r3, [r0, #4]
add r1, r1, #4
mla lr, r3, r2, ip
add r0, r0, #4
ldr r2, [r1, #4]
ldr r3, [r0, #4]
add r1, r1, #4
mla ip, r3, r2, lr
add r0, r0, #4
....
GCC 3.4.2 output looks like:
postinc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
stmfd sp!, {r4, r5, lr}
ldr r4, [r1], #4
ldr r5, [r0], #4
ldr r2, [r1], #4
ldr lr, [r0], #4
mul ip, r2, lr
mla r3, r4, r5, ip
ldr lr, [r0], #4
ldr r2, [r1], #4
mla r3, r2, lr, r3
ldr ip, [r0], #4
ldr r2, [r1], #4
mla r3, r2, ip, r3
....
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
2006-09-29 23:38 ` [Bug tree-optimization/29294] " eplondke at gmail dot com
2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com
@ 2006-09-29 23:43 ` pinskia at gcc dot gnu dot org
2006-10-02 19:16 ` eplondke at gmail dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-29 23:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from pinskia at gcc dot gnu dot org 2006-09-29 23:43 -------
Actually this case should not be using post modify at all except how many bits
does ARM have to use for an offset? I thought 16bits which means you don't need
that at all and GCC should generate it without an increment. Oh and this is a
RTL opt issue.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
` (2 preceding siblings ...)
2006-09-29 23:43 ` pinskia at gcc dot gnu dot org
@ 2006-10-02 19:16 ` eplondke at gmail dot com
2006-10-06 14:56 ` eplondke at gmail dot com
` (4 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: eplondke at gmail dot com @ 2006-10-02 19:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from eplondke at gmail dot com 2006-10-02 19:16 -------
(In reply to comment #3)
> Actually this case should not be using post modify at all except how many bits
> does ARM have to use for an offset? I thought 16bits which means you don't need
> that at all and GCC should generate it without an increment. Oh and this is a
> RTL opt issue.
>
ARM normal arithmetic operands support an 8-bit integer rotated right by an
even
number of bits (0-30). If you rotate at all some microarchitectures may cause
stalls (Xscale maybe?)...
Load and Store word (and unsigned byte) have a +/- 12-bit offset.
Load and store other single values has a +/- 8-bit offset.
Load and store multiple may have no offset.
That's for ARM. For THUMB, you get + 5 bits.
Both ARM and THUMB mode have postincrement modes. ARM gets a decent
postmodify. For THUMB you use the Load Multiple Increment After instruction
with a single
regster specified.
Looks like CSE1 is the first time that
(set (reg) (mem (reg))) gets converted to
(set (reg) (mem (plus (reg) (4))))
I have noticed a propensity for postmodify to not be used in several targets
comparing GCC 4.X to GCC 3.X.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
` (3 preceding siblings ...)
2006-10-02 19:16 ` eplondke at gmail dot com
@ 2006-10-06 14:56 ` eplondke at gmail dot com
2006-10-06 19:07 ` eplondke at gmail dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: eplondke at gmail dot com @ 2006-10-06 14:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from eplondke at gmail dot com 2006-10-06 14:55 -------
Here's what's going on in this case:
CSE changes an address if:
A) The cost of the address is lower
or
B) The cost of the address is the same and the cost of the RTX would be
higher outside of an address
So, CSE changes (R) to (R+4) because it is lower cost as specified by the
address_costs hook.
It doesn't change beyond (R+4) because (R+8) is the same cost as (R+4).
Once the address (R+4) gets in the RTL sequence, it never gets converted to
a postincrement form.
So by adding the cost of a simple REG RTX as being lower than (+ (REG) (CONST))
in the addressing modes, CSE doesn't convert the address to base+offset, and
we get the postincrement code back again in 4.x.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
` (4 preceding siblings ...)
2006-10-06 14:56 ` eplondke at gmail dot com
@ 2006-10-06 19:07 ` eplondke at gmail dot com
2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com
` (2 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: eplondke at gmail dot com @ 2006-10-06 19:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from eplondke at gmail dot com 2006-10-06 19:07 -------
Changing the cost of (REG) to 1 fixes 4.1 but not 4.2, it seems.
In 4.2, the RTL optimization does not combine
ldr r2, [r1, #0]
ldr r3, [r0, #0]
add r0, r0, #4
add r1, r1, #4
into
ldr r2, [r1], #4
ldr r3, [r0], #4
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
` (5 preceding siblings ...)
2006-10-06 19:07 ` eplondke at gmail dot com
@ 2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com
2009-04-21 14:09 ` ramana at gcc dot gnu dot org
2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org
8 siblings, 0 replies; 12+ messages in thread
From: ramana dot radhakrishnan at codito dot com @ 2006-10-09 16:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from ramana dot radhakrishnan at codito dot com 2006-10-09 16:33 -------
(In reply to comment #5)
flow.c is responsible for generating POST_INCs and POST_MODIFY's in 3.4 / 4.0 /
4.1 / 4.2 . I believe this is being replaced by the new data flow bits in the
data flow branch. This might not be ready until 4.3 . We have hit similar
issues in a private port that I maintain.
2 options are either to fix flow.c or use some of Joern's auto increment
patches for 4.1 / 4.2 to fix this issue. This doesn't really take care of
POST_MODIFY but I don't think that affects ARM that much.
> Here's what's going on in this case:
>
> CSE changes an address if:
> A) The cost of the address is lower
> or
> B) The cost of the address is the same and the cost of the RTX would be
> higher outside of an address
>
> So, CSE changes (R) to (R+4) because it is lower cost as specified by the
> address_costs hook.
>
> It doesn't change beyond (R+4) because (R+8) is the same cost as (R+4).
>
> Once the address (R+4) gets in the RTL sequence, it never gets converted to
> a postincrement form.
>
> So by adding the cost of a simple REG RTX as being lower than (+ (REG) (CONST))
> in the addressing modes, CSE doesn't convert the address to base+offset, and
> we get the postincrement code back again in 4.x.
>
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
` (6 preceding siblings ...)
2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com
@ 2009-04-21 14:09 ` ramana at gcc dot gnu dot org
2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org
8 siblings, 0 replies; 12+ messages in thread
From: ramana at gcc dot gnu dot org @ 2009-04-21 14:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from ramana at gcc dot gnu dot org 2009-04-21 14:09 -------
Confirmed with trunk.
--
ramana at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-04-21 14:09:43
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
` (7 preceding siblings ...)
2009-04-21 14:09 ` ramana at gcc dot gnu dot org
@ 2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org
8 siblings, 0 replies; 12+ messages in thread
From: rearnsha at gcc dot gnu dot org @ 2009-12-14 23:22 UTC (permalink / raw)
To: gcc-bugs
--
rearnsha at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
[not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/>
2012-12-20 4:45 ` siarhei.siamashka at gmail dot com
@ 2012-12-20 5:47 ` siarhei.siamashka at gmail dot com
1 sibling, 0 replies; 12+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2012-12-20 5:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
--- Comment #10 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 05:47:30 UTC ---
(In reply to comment #9)
And some performance measurements (for working with L1 cache):
> $ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c
> $ objdump -d test.o
>
> 00000000 <fill>:
> 0: e2511010 subs r1, r1, #16
> 4: 412fff1e bxmi lr
> 8: e2511010 subs r1, r1, #16
> c: e1c020f0 strd r2, [r0]
> 10: e1c020f8 strd r2, [r0, #8]
> 14: e2800010 add r0, r0, #16
> 18: 5afffffa bpl 8 <fill+0x8>
> 1c: e12fff1e bx lr
Cortex-A8 - 5 cycles per iteration
Cortex-A9 - 4.5 cycles per iteration
Cortex-A15 - 3 cycles per iteration
> $ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c
> $ objdump -d test.o
>
> 00000000 <fill>:
> 0: e351000f cmp r1, #15
> 4: d12fff1e bxle lr
> 8: e2411010 sub r1, r1, #16
> c: e280c010 add ip, r0, #16
> 10: e3c1100f bic r1, r1, #15
> 14: e08c1001 add r1, ip, r1
> 18: e1c020f0 strd r2, [r0]
> 1c: e2800010 add r0, r0, #16
> 20: e14020f8 strd r2, [r0, #-8]
> 24: e1500001 cmp r0, r1
> 28: 1afffffa bne 18 <fill+0x18>
> 2c: e12fff1e bx lr
Cortex-A8 - 6 cycles per iteration
Cortex-A9 - 4 cycles per iteration
Cortex-A15 - 3 cycles per iteration
While we could have expected something like the following code for the inner
loop:
1: strd V, [BUF], #8
subs N, N, #16
strd V, [BUF], #8
bpl 1b
Cortex-A8 - 4 cycles per iteration
Cortex-A9 - 4 cycles per iteration
Cortex-A15 - 2.5 cycles per iteration
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
[not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/>
@ 2012-12-20 4:45 ` siarhei.siamashka at gmail dot com
2012-12-20 5:47 ` siarhei.siamashka at gmail dot com
1 sibling, 0 replies; 12+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2012-12-20 4:45 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
Siarhei Siamashka <siarhei.siamashka at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |siarhei.siamashka at gmail
| |dot com
--- Comment #9 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 04:45:10 UTC ---
(In reply to comment #3)
> Actually this case should not be using post modify at all except how many bits
> does ARM have to use for an offset? I thought 16bits which means you don't need
> that at all and GCC should generate it without an increment. Oh and this is a
> RTL opt issue.
Seems like gcc 4.7.2 and 4.8.0 20121219 (experimental) are already doing this,
which hides the postincrement issue for the currently attached testcase.
However postincrement is still a performance problem for ARM. The code I'm
having troubles with is the following:
/*******************************************/
typedef unsigned long long T;
void fill(T *buf, int n, T v)
{
while ((n -= 16) >= 0)
{
*buf++ = v;
*buf++ = v;
}
}
/*******************************************/
$ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c
$ objdump -d test.o
00000000 <fill>:
0: e2511010 subs r1, r1, #16
4: 412fff1e bxmi lr
8: e2511010 subs r1, r1, #16
c: e1c020f0 strd r2, [r0]
10: e1c020f8 strd r2, [r0, #8]
14: e2800010 add r0, r0, #16
18: 5afffffa bpl 8 <fill+0x8>
1c: e12fff1e bx lr
$ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c
$ objdump -d test.o
00000000 <fill>:
0: e351000f cmp r1, #15
4: d12fff1e bxle lr
8: e2411010 sub r1, r1, #16
c: e280c010 add ip, r0, #16
10: e3c1100f bic r1, r1, #15
14: e08c1001 add r1, ip, r1
18: e1c020f0 strd r2, [r0]
1c: e2800010 add r0, r0, #16
20: e14020f8 strd r2, [r0, #-8]
24: e1500001 cmp r0, r1
28: 1afffffa bne 18 <fill+0x18>
2c: e12fff1e bx lr
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-12-20 5:47 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com
2006-09-29 23:38 ` [Bug tree-optimization/29294] " eplondke at gmail dot com
2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com
2006-09-29 23:43 ` pinskia at gcc dot gnu dot org
2006-10-02 19:16 ` eplondke at gmail dot com
2006-10-06 14:56 ` eplondke at gmail dot com
2006-10-06 19:07 ` eplondke at gmail dot com
2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com
2009-04-21 14:09 ` ramana at gcc dot gnu dot org
2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org
[not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/>
2012-12-20 4:45 ` siarhei.siamashka at gmail dot com
2012-12-20 5:47 ` siarhei.siamashka at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).