* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction " siarhei dot siamashka at gmail dot com
@ 2009-10-29 15:21 ` siarhei dot siamashka at gmail dot com
2009-11-02 16:51 ` pinskia at gcc dot gnu dot org
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: siarhei dot siamashka at gmail dot com @ 2009-10-29 15:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from siarhei dot siamashka at gmail dot com 2009-10-29 15:21 -------
-O2:
0000000000000010 <.x>:
10: 2c 23 00 00 cmpdi r3,0
14: 7c 08 02 a6 mflr r0
18: f8 01 00 10 std r0,16(r1)
1c: f8 21 ff 81 stdu r1,-128(r1)
20: 41 82 00 1c beq- 3c <.x+0x2c>
24: f8 61 00 70 std r3,112(r1)
28: 48 00 00 01 bl 28 <.x+0x18>
2c: e8 01 00 70 ld r0,112(r1)
30: 35 20 ff ff addic. r9,r0,-1
34: f9 21 00 70 std r9,112(r1)
38: 40 82 ff f0 bne+ 28 <.x+0x18>
3c: 38 21 00 80 addi r1,r1,128
40: e8 01 00 10 ld r0,16(r1)
44: 7c 08 03 a6 mtlr r0
48: 4e 80 00 20 blr
4c: 00 00 00 00 .long 0x0
50: 00 00 00 01 .long 0x1
54: 80 00 00 00 lwz r0,0(0)
-Os:
0000000000000010 <.x>:
10: fb e1 ff f8 std r31,-8(r1)
14: 7c 08 02 a6 mflr r0
18: f8 01 00 10 std r0,16(r1)
1c: 7c 7f 1b 78 mr r31,r3
20: f8 21 ff 81 stdu r1,-128(r1)
24: 48 00 00 08 b 2c <.x+0x1c>
28: 48 00 00 01 bl 28 <.x+0x18>
2c: 2f bf 00 00 cmpdi cr7,r31,0
30: 3b ff ff ff addi r31,r31,-1
34: 40 9e ff f4 bne+ cr7,28 <.x+0x18>
38: 38 21 00 80 addi r1,r1,128
3c: e8 01 00 10 ld r0,16(r1)
40: eb e1 ff f8 ld r31,-8(r1)
44: 7c 08 03 a6 mtlr r0
48: 4e 80 00 20 blr
4c: 00 00 00 00 .long 0x0
50: 00 00 00 01 .long 0x1
54: 80 01 00 00 lwz r0,0(r1)
--
siarhei dot siamashka at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |siarhei dot siamashka at
| |gmail dot com
Keywords| |missed-optimization
Summary|cell microcode instruction |cell microcode instruction
|is generated for a trivial |(addic.) is generated for a
|loop with -O2 optimizations,|trivial loop with -O2
|hurting performance badly |optimizations, hurting
| |performance badly
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction " siarhei dot siamashka at gmail dot com
2009-10-29 15:21 ` [Bug target/41868] cell microcode instruction (addic.) " siarhei dot siamashka at gmail dot com
@ 2009-11-02 16:51 ` pinskia at gcc dot gnu dot org
2009-11-02 16:56 ` pinskia at gcc dot gnu dot org
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 16:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from pinskia at gcc dot gnu dot org 2009-11-02 16:51 -------
Simple patch which I am testing right now:
Index: gcc/gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/gcc/config/rs6000/rs6000.md (revision 153680)
+++ gcc/gcc/config/rs6000/rs6000.md (working copy)
@@ -1627,7 +1627,7 @@ (define_insn "*add<mode>3_internal3"
(set_attr "length" "4,4,8,8")])
(define_split
- [(set (match_operand:CC 3 "cc_reg_not_cr0_operand" "")
+ [(set (match_operand:CC 3 "cc_reg_not_micro_cr0_operand" "")
(compare:CC (plus:P (match_operand:P 1 "gpc_reg_operand" "")
(match_operand:P 2 "reg_or_short_operand" ""))
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |pinskia at gcc dot gnu dot
|dot org |org
Status|UNCONFIRMED |ASSIGNED
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-11-02 16:51:40
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction " siarhei dot siamashka at gmail dot com
2009-10-29 15:21 ` [Bug target/41868] cell microcode instruction (addic.) " siarhei dot siamashka at gmail dot com
2009-11-02 16:51 ` pinskia at gcc dot gnu dot org
@ 2009-11-02 16:56 ` pinskia at gcc dot gnu dot org
2009-11-02 17:05 ` pinskia at gcc dot gnu dot org
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 16:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from pinskia at gcc dot gnu dot org 2009-11-02 16:56 -------
Actually the warning is incorrect at least according to the PPU book 4.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction " siarhei dot siamashka at gmail dot com
` (2 preceding siblings ...)
2009-11-02 16:56 ` pinskia at gcc dot gnu dot org
@ 2009-11-02 17:05 ` pinskia at gcc dot gnu dot org
2009-11-02 17:09 ` pinskia at gcc dot gnu dot org
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 17:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from pinskia at gcc dot gnu dot org 2009-11-02 17:05 -------
In fact changing the the addic. into addic/cmpwi does not improve the speed of
the code:
With the change:
[apinski@dhcp-10-98-10-216 local]$ time ./a.out
56.316u 0.084s 0:57.09 98.7% 0+0k 0+0io 0pf+0w
Without:
56.276u 0.088s 0:57.08 98.7% 0+0k 0+0io 0pf+0w
So the warning is only invalid.
With -Os on the trunk:
24.144u 0.032s 0:24.45 98.8% 0+0k 0+0io 0pf+0w
I don't know why off hand -Os is faster than -O2.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction " siarhei dot siamashka at gmail dot com
` (3 preceding siblings ...)
2009-11-02 17:05 ` pinskia at gcc dot gnu dot org
@ 2009-11-02 17:09 ` pinskia at gcc dot gnu dot org
2009-11-02 17:10 ` pinskia at gcc dot gnu dot org
2009-11-03 20:09 ` siarhei dot siamashka at gmail dot com
6 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 17:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from pinskia at gcc dot gnu dot org 2009-11-02 17:08 -------
In fact doing the following diff to the -Os assembly:
--- t5.Os.s 2009-11-02 23:18:52.000000000 +0900
+++ t5.Os.dot.s 2009-11-02 23:20:19.000000000 +0900
@@ -29,9 +29,9 @@ x:
.L4:
bl y
.L3:
- cmpwi 7,31,0
- addi 31,31,-1
- bne 7,.L4
+# cmpwi 7,31,0
+ addic. 31,31,-1
+ bne .L4
addi 11,1,16
b _restgpr_31_x
.size x,.-x
produces the same result as -Os on the trunk.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction " siarhei dot siamashka at gmail dot com
` (4 preceding siblings ...)
2009-11-02 17:09 ` pinskia at gcc dot gnu dot org
@ 2009-11-02 17:10 ` pinskia at gcc dot gnu dot org
2009-11-03 20:09 ` siarhei dot siamashka at gmail dot com
6 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 17:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from pinskia at gcc dot gnu dot org 2009-11-02 17:10 -------
So in conclusion, addic. is not microcoded and the warning is incorrect but
still -Os is faster than -O2.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction " siarhei dot siamashka at gmail dot com
` (5 preceding siblings ...)
2009-11-02 17:10 ` pinskia at gcc dot gnu dot org
@ 2009-11-03 20:09 ` siarhei dot siamashka at gmail dot com
6 siblings, 0 replies; 9+ messages in thread
From: siarhei dot siamashka at gmail dot com @ 2009-11-03 20:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from siarhei dot siamashka at gmail dot com 2009-11-03 20:09 -------
Thanks a lot for checking this. And sorry about the confusion caused by
attributing slowness of the testcase to the microcoded stuff (which turned out
to be not the case) without proper checking this first.
So should this bug be split into two? One about the incorrect warning, and
another one about generating nonoptimal code at -O2 level (extra load and store
operations, which are probably penalized by something like RAW hazard in such a
short loop)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868
^ permalink raw reply [flat|nested] 9+ messages in thread