public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM @ 2006-09-29 23:37 eplondke at gmail dot com 2006-09-29 23:38 ` [Bug tree-optimization/29294] " eplondke at gmail dot com ` (8 more replies) 0 siblings, 9 replies; 12+ messages in thread From: eplondke at gmail dot com @ 2006-09-29 23:37 UTC (permalink / raw) To: gcc-bugs The attached file benefits greatly from the ARM postincrement address mode. In 4.1.1 and 4.2 20060923, we no longer get postincrement addressing mode, but (base+4) followed by base = base+4 This leads to an increase in instruction count of 40%. While the test is of course trivial, I see real code not benefiting from postmodify quite often while using 4.1.1. I'm not quite sure if it belongs in tree-optimization, but it comes out of the expander as separate (set (reg B) (plus (reg A) (4)) (set (reg C) (plus (reg A) (4)) which is how it ends up in the assembly.... -- Summary: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM Product: gcc Version: 4.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: eplondke at gmail dot com GCC host triplet: x86_64-suse-linux GCC target triplet: arm-unknown-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com @ 2006-09-29 23:38 ` eplondke at gmail dot com 2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com ` (7 subsequent siblings) 8 siblings, 0 replies; 12+ messages in thread From: eplondke at gmail dot com @ 2006-09-29 23:38 UTC (permalink / raw) To: gcc-bugs ------- Comment #1 from eplondke at gmail dot com 2006-09-29 23:38 ------- Created an attachment (id=12359) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12359&action=view) Test for postmodify address mode simple function that benefits from postmodify selection -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com 2006-09-29 23:38 ` [Bug tree-optimization/29294] " eplondke at gmail dot com @ 2006-09-29 23:42 ` eplondke at gmail dot com 2006-09-29 23:43 ` pinskia at gcc dot gnu dot org ` (6 subsequent siblings) 8 siblings, 0 replies; 12+ messages in thread From: eplondke at gmail dot com @ 2006-09-29 23:42 UTC (permalink / raw) To: gcc-bugs ------- Comment #2 from eplondke at gmail dot com 2006-09-29 23:42 ------- GCC 4.1/4.2 ouptut looks like: postinc: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 ldr ip, [r1, #0] ldr r3, [r0, #0] stmfd sp!, {r4, lr} mul lr, r3, ip ldr r4, [r1, #4] ldr r2, [r0, #4] add r1, r1, #4 mla ip, r2, r4, lr add r0, r0, #4 ldr r2, [r1, #4] ldr r3, [r0, #4] add r1, r1, #4 mla lr, r3, r2, ip add r0, r0, #4 ldr r2, [r1, #4] ldr r3, [r0, #4] add r1, r1, #4 mla ip, r3, r2, lr add r0, r0, #4 .... GCC 3.4.2 output looks like: postinc: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 stmfd sp!, {r4, r5, lr} ldr r4, [r1], #4 ldr r5, [r0], #4 ldr r2, [r1], #4 ldr lr, [r0], #4 mul ip, r2, lr mla r3, r4, r5, ip ldr lr, [r0], #4 ldr r2, [r1], #4 mla r3, r2, lr, r3 ldr ip, [r0], #4 ldr r2, [r1], #4 mla r3, r2, ip, r3 .... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com 2006-09-29 23:38 ` [Bug tree-optimization/29294] " eplondke at gmail dot com 2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com @ 2006-09-29 23:43 ` pinskia at gcc dot gnu dot org 2006-10-02 19:16 ` eplondke at gmail dot com ` (5 subsequent siblings) 8 siblings, 0 replies; 12+ messages in thread From: pinskia at gcc dot gnu dot org @ 2006-09-29 23:43 UTC (permalink / raw) To: gcc-bugs ------- Comment #3 from pinskia at gcc dot gnu dot org 2006-09-29 23:43 ------- Actually this case should not be using post modify at all except how many bits does ARM have to use for an offset? I thought 16bits which means you don't need that at all and GCC should generate it without an increment. Oh and this is a RTL opt issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com ` (2 preceding siblings ...) 2006-09-29 23:43 ` pinskia at gcc dot gnu dot org @ 2006-10-02 19:16 ` eplondke at gmail dot com 2006-10-06 14:56 ` eplondke at gmail dot com ` (4 subsequent siblings) 8 siblings, 0 replies; 12+ messages in thread From: eplondke at gmail dot com @ 2006-10-02 19:16 UTC (permalink / raw) To: gcc-bugs ------- Comment #4 from eplondke at gmail dot com 2006-10-02 19:16 ------- (In reply to comment #3) > Actually this case should not be using post modify at all except how many bits > does ARM have to use for an offset? I thought 16bits which means you don't need > that at all and GCC should generate it without an increment. Oh and this is a > RTL opt issue. > ARM normal arithmetic operands support an 8-bit integer rotated right by an even number of bits (0-30). If you rotate at all some microarchitectures may cause stalls (Xscale maybe?)... Load and Store word (and unsigned byte) have a +/- 12-bit offset. Load and store other single values has a +/- 8-bit offset. Load and store multiple may have no offset. That's for ARM. For THUMB, you get + 5 bits. Both ARM and THUMB mode have postincrement modes. ARM gets a decent postmodify. For THUMB you use the Load Multiple Increment After instruction with a single regster specified. Looks like CSE1 is the first time that (set (reg) (mem (reg))) gets converted to (set (reg) (mem (plus (reg) (4)))) I have noticed a propensity for postmodify to not be used in several targets comparing GCC 4.X to GCC 3.X. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com ` (3 preceding siblings ...) 2006-10-02 19:16 ` eplondke at gmail dot com @ 2006-10-06 14:56 ` eplondke at gmail dot com 2006-10-06 19:07 ` eplondke at gmail dot com ` (3 subsequent siblings) 8 siblings, 0 replies; 12+ messages in thread From: eplondke at gmail dot com @ 2006-10-06 14:56 UTC (permalink / raw) To: gcc-bugs ------- Comment #5 from eplondke at gmail dot com 2006-10-06 14:55 ------- Here's what's going on in this case: CSE changes an address if: A) The cost of the address is lower or B) The cost of the address is the same and the cost of the RTX would be higher outside of an address So, CSE changes (R) to (R+4) because it is lower cost as specified by the address_costs hook. It doesn't change beyond (R+4) because (R+8) is the same cost as (R+4). Once the address (R+4) gets in the RTL sequence, it never gets converted to a postincrement form. So by adding the cost of a simple REG RTX as being lower than (+ (REG) (CONST)) in the addressing modes, CSE doesn't convert the address to base+offset, and we get the postincrement code back again in 4.x. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com ` (4 preceding siblings ...) 2006-10-06 14:56 ` eplondke at gmail dot com @ 2006-10-06 19:07 ` eplondke at gmail dot com 2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com ` (2 subsequent siblings) 8 siblings, 0 replies; 12+ messages in thread From: eplondke at gmail dot com @ 2006-10-06 19:07 UTC (permalink / raw) To: gcc-bugs ------- Comment #6 from eplondke at gmail dot com 2006-10-06 19:07 ------- Changing the cost of (REG) to 1 fixes 4.1 but not 4.2, it seems. In 4.2, the RTL optimization does not combine ldr r2, [r1, #0] ldr r3, [r0, #0] add r0, r0, #4 add r1, r1, #4 into ldr r2, [r1], #4 ldr r3, [r0], #4 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com ` (5 preceding siblings ...) 2006-10-06 19:07 ` eplondke at gmail dot com @ 2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com 2009-04-21 14:09 ` ramana at gcc dot gnu dot org 2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org 8 siblings, 0 replies; 12+ messages in thread From: ramana dot radhakrishnan at codito dot com @ 2006-10-09 16:33 UTC (permalink / raw) To: gcc-bugs ------- Comment #7 from ramana dot radhakrishnan at codito dot com 2006-10-09 16:33 ------- (In reply to comment #5) flow.c is responsible for generating POST_INCs and POST_MODIFY's in 3.4 / 4.0 / 4.1 / 4.2 . I believe this is being replaced by the new data flow bits in the data flow branch. This might not be ready until 4.3 . We have hit similar issues in a private port that I maintain. 2 options are either to fix flow.c or use some of Joern's auto increment patches for 4.1 / 4.2 to fix this issue. This doesn't really take care of POST_MODIFY but I don't think that affects ARM that much. > Here's what's going on in this case: > > CSE changes an address if: > A) The cost of the address is lower > or > B) The cost of the address is the same and the cost of the RTX would be > higher outside of an address > > So, CSE changes (R) to (R+4) because it is lower cost as specified by the > address_costs hook. > > It doesn't change beyond (R+4) because (R+8) is the same cost as (R+4). > > Once the address (R+4) gets in the RTL sequence, it never gets converted to > a postincrement form. > > So by adding the cost of a simple REG RTX as being lower than (+ (REG) (CONST)) > in the addressing modes, CSE doesn't convert the address to base+offset, and > we get the postincrement code back again in 4.x. > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com ` (6 preceding siblings ...) 2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com @ 2009-04-21 14:09 ` ramana at gcc dot gnu dot org 2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org 8 siblings, 0 replies; 12+ messages in thread From: ramana at gcc dot gnu dot org @ 2009-04-21 14:09 UTC (permalink / raw) To: gcc-bugs ------- Comment #8 from ramana at gcc dot gnu dot org 2009-04-21 14:09 ------- Confirmed with trunk. -- ramana at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2009-04-21 14:09:43 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com ` (7 preceding siblings ...) 2009-04-21 14:09 ` ramana at gcc dot gnu dot org @ 2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org 8 siblings, 0 replies; 12+ messages in thread From: rearnsha at gcc dot gnu dot org @ 2009-12-14 23:22 UTC (permalink / raw) To: gcc-bugs -- rearnsha at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <bug-29294-4@http.gcc.gnu.org/bugzilla/>]
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM [not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/> @ 2012-12-20 4:45 ` siarhei.siamashka at gmail dot com 2012-12-20 5:47 ` siarhei.siamashka at gmail dot com 1 sibling, 0 replies; 12+ messages in thread From: siarhei.siamashka at gmail dot com @ 2012-12-20 4:45 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 Siarhei Siamashka <siarhei.siamashka at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |siarhei.siamashka at gmail | |dot com --- Comment #9 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 04:45:10 UTC --- (In reply to comment #3) > Actually this case should not be using post modify at all except how many bits > does ARM have to use for an offset? I thought 16bits which means you don't need > that at all and GCC should generate it without an increment. Oh and this is a > RTL opt issue. Seems like gcc 4.7.2 and 4.8.0 20121219 (experimental) are already doing this, which hides the postincrement issue for the currently attached testcase. However postincrement is still a performance problem for ARM. The code I'm having troubles with is the following: /*******************************************/ typedef unsigned long long T; void fill(T *buf, int n, T v) { while ((n -= 16) >= 0) { *buf++ = v; *buf++ = v; } } /*******************************************/ $ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c $ objdump -d test.o 00000000 <fill>: 0: e2511010 subs r1, r1, #16 4: 412fff1e bxmi lr 8: e2511010 subs r1, r1, #16 c: e1c020f0 strd r2, [r0] 10: e1c020f8 strd r2, [r0, #8] 14: e2800010 add r0, r0, #16 18: 5afffffa bpl 8 <fill+0x8> 1c: e12fff1e bx lr $ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c $ objdump -d test.o 00000000 <fill>: 0: e351000f cmp r1, #15 4: d12fff1e bxle lr 8: e2411010 sub r1, r1, #16 c: e280c010 add ip, r0, #16 10: e3c1100f bic r1, r1, #15 14: e08c1001 add r1, ip, r1 18: e1c020f0 strd r2, [r0] 1c: e2800010 add r0, r0, #16 20: e14020f8 strd r2, [r0, #-8] 24: e1500001 cmp r0, r1 28: 1afffffa bne 18 <fill+0x18> 2c: e12fff1e bx lr ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM [not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/> 2012-12-20 4:45 ` siarhei.siamashka at gmail dot com @ 2012-12-20 5:47 ` siarhei.siamashka at gmail dot com 1 sibling, 0 replies; 12+ messages in thread From: siarhei.siamashka at gmail dot com @ 2012-12-20 5:47 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294 --- Comment #10 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 05:47:30 UTC --- (In reply to comment #9) And some performance measurements (for working with L1 cache): > $ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c > $ objdump -d test.o > > 00000000 <fill>: > 0: e2511010 subs r1, r1, #16 > 4: 412fff1e bxmi lr > 8: e2511010 subs r1, r1, #16 > c: e1c020f0 strd r2, [r0] > 10: e1c020f8 strd r2, [r0, #8] > 14: e2800010 add r0, r0, #16 > 18: 5afffffa bpl 8 <fill+0x8> > 1c: e12fff1e bx lr Cortex-A8 - 5 cycles per iteration Cortex-A9 - 4.5 cycles per iteration Cortex-A15 - 3 cycles per iteration > $ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c > $ objdump -d test.o > > 00000000 <fill>: > 0: e351000f cmp r1, #15 > 4: d12fff1e bxle lr > 8: e2411010 sub r1, r1, #16 > c: e280c010 add ip, r0, #16 > 10: e3c1100f bic r1, r1, #15 > 14: e08c1001 add r1, ip, r1 > 18: e1c020f0 strd r2, [r0] > 1c: e2800010 add r0, r0, #16 > 20: e14020f8 strd r2, [r0, #-8] > 24: e1500001 cmp r0, r1 > 28: 1afffffa bne 18 <fill+0x18> > 2c: e12fff1e bx lr Cortex-A8 - 6 cycles per iteration Cortex-A9 - 4 cycles per iteration Cortex-A15 - 3 cycles per iteration While we could have expected something like the following code for the inner loop: 1: strd V, [BUF], #8 subs N, N, #16 strd V, [BUF], #8 bpl 1b Cortex-A8 - 4 cycles per iteration Cortex-A9 - 4 cycles per iteration Cortex-A15 - 2.5 cycles per iteration ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-12-20 5:47 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-09-29 23:37 [Bug tree-optimization/29294] New: 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM eplondke at gmail dot com 2006-09-29 23:38 ` [Bug tree-optimization/29294] " eplondke at gmail dot com 2006-09-29 23:42 ` [Bug rtl-optimization/29294] " eplondke at gmail dot com 2006-09-29 23:43 ` pinskia at gcc dot gnu dot org 2006-10-02 19:16 ` eplondke at gmail dot com 2006-10-06 14:56 ` eplondke at gmail dot com 2006-10-06 19:07 ` eplondke at gmail dot com 2006-10-09 16:33 ` ramana dot radhakrishnan at codito dot com 2009-04-21 14:09 ` ramana at gcc dot gnu dot org 2009-12-14 23:22 ` rearnsha at gcc dot gnu dot org [not found] <bug-29294-4@http.gcc.gnu.org/bugzilla/> 2012-12-20 4:45 ` siarhei.siamashka at gmail dot com 2012-12-20 5:47 ` siarhei.siamashka at gmail dot com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).