public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/41868]  New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly
@ 2009-10-29 15:16 siarhei dot siamashka at gmail dot com
  2009-10-29 15:21 ` [Bug target/41868] cell microcode instruction (addic.) " siarhei dot siamashka at gmail dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: siarhei dot siamashka at gmail dot com @ 2009-10-29 15:16 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1374 bytes --]

/***************************************/
void __attribute__((noinline)) y()
{
    asm volatile ("# nop\n");
}

void __attribute__((noinline)) x(long c)
{
    while (c--)
        y();
}

int main()
{
    /* Run total 3.2G iterations */
    x(1600000000);
    x(1600000000);
    return 0;
}
/***************************************/

$ gcc -O2 -mcpu=cell -mtune=cell -mwarn-cell-microcode -o test-O2 test.c
test.c: In function ‘x’:
test.c:9: warning: emitting microcode insn {ai.|addic.} %0,%1,%2       
[*adddi3_internal3] #38

$ time ./test-O2
real    0m56.385s
user    0m56.232s
sys     0m0.138s

$ gcc -Os -mcpu=cell -mtune=cell -mwarn-cell-microcode -o test-Os test.c
$ time ./test-Os

real    0m24.149s
user    0m24.086s
sys     0m0.060s


-- 
           Summary: cell microcode instruction is generated for a trivial
                    loop with -O2 optimizations, hurting performance badly
           Product: gcc
           Version: 4.4.2
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: siarhei dot siamashka at gmail dot com
 GCC build triplet: powerpc64-unknown-linux-gnu
  GCC host triplet: powerpc64-unknown-linux-gnu
GCC target triplet: powerpc64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
  2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly siarhei dot siamashka at gmail dot com
@ 2009-10-29 15:21 ` siarhei dot siamashka at gmail dot com
  2009-11-02 16:51 ` pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: siarhei dot siamashka at gmail dot com @ 2009-10-29 15:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from siarhei dot siamashka at gmail dot com  2009-10-29 15:21 -------
-O2:

0000000000000010 <.x>:
  10:   2c 23 00 00     cmpdi   r3,0
  14:   7c 08 02 a6     mflr    r0
  18:   f8 01 00 10     std     r0,16(r1)
  1c:   f8 21 ff 81     stdu    r1,-128(r1)
  20:   41 82 00 1c     beq-    3c <.x+0x2c>
  24:   f8 61 00 70     std     r3,112(r1)
  28:   48 00 00 01     bl      28 <.x+0x18>
  2c:   e8 01 00 70     ld      r0,112(r1)
  30:   35 20 ff ff     addic.  r9,r0,-1
  34:   f9 21 00 70     std     r9,112(r1)
  38:   40 82 ff f0     bne+    28 <.x+0x18>
  3c:   38 21 00 80     addi    r1,r1,128
  40:   e8 01 00 10     ld      r0,16(r1)
  44:   7c 08 03 a6     mtlr    r0
  48:   4e 80 00 20     blr
  4c:   00 00 00 00     .long 0x0
  50:   00 00 00 01     .long 0x1
  54:   80 00 00 00     lwz     r0,0(0)


-Os:

0000000000000010 <.x>:
  10:   fb e1 ff f8     std     r31,-8(r1)
  14:   7c 08 02 a6     mflr    r0
  18:   f8 01 00 10     std     r0,16(r1)
  1c:   7c 7f 1b 78     mr      r31,r3
  20:   f8 21 ff 81     stdu    r1,-128(r1)
  24:   48 00 00 08     b       2c <.x+0x1c>
  28:   48 00 00 01     bl      28 <.x+0x18>
  2c:   2f bf 00 00     cmpdi   cr7,r31,0
  30:   3b ff ff ff     addi    r31,r31,-1
  34:   40 9e ff f4     bne+    cr7,28 <.x+0x18>
  38:   38 21 00 80     addi    r1,r1,128
  3c:   e8 01 00 10     ld      r0,16(r1)
  40:   eb e1 ff f8     ld      r31,-8(r1)
  44:   7c 08 03 a6     mtlr    r0
  48:   4e 80 00 20     blr
  4c:   00 00 00 00     .long 0x0
  50:   00 00 00 01     .long 0x1
  54:   80 01 00 00     lwz     r0,0(r1)


-- 

siarhei dot siamashka at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |siarhei dot siamashka at
                   |                            |gmail dot com
           Keywords|                            |missed-optimization
            Summary|cell microcode instruction  |cell microcode instruction
                   |is generated for a trivial  |(addic.) is generated for a
                   |loop with -O2 optimizations,|trivial loop with -O2
                   |hurting performance badly   |optimizations, hurting
                   |                            |performance badly


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
  2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly siarhei dot siamashka at gmail dot com
  2009-10-29 15:21 ` [Bug target/41868] cell microcode instruction (addic.) " siarhei dot siamashka at gmail dot com
@ 2009-11-02 16:51 ` pinskia at gcc dot gnu dot org
  2009-11-02 16:56 ` pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 16:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2009-11-02 16:51 -------
Simple patch which I am testing right now:
Index: gcc/gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/gcc/config/rs6000/rs6000.md     (revision 153680)
+++ gcc/gcc/config/rs6000/rs6000.md     (working copy)
@@ -1627,7 +1627,7 @@ (define_insn "*add<mode>3_internal3"
    (set_attr "length" "4,4,8,8")])

 (define_split
-  [(set (match_operand:CC 3 "cc_reg_not_cr0_operand" "")
+  [(set (match_operand:CC 3 "cc_reg_not_micro_cr0_operand" "")
        (compare:CC (plus:P (match_operand:P 1 "gpc_reg_operand" "")
                            (match_operand:P 2 "reg_or_short_operand" ""))


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |pinskia at gcc dot gnu dot
                   |dot org                     |org
             Status|UNCONFIRMED                 |ASSIGNED
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-11-02 16:51:40
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
  2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly siarhei dot siamashka at gmail dot com
  2009-10-29 15:21 ` [Bug target/41868] cell microcode instruction (addic.) " siarhei dot siamashka at gmail dot com
  2009-11-02 16:51 ` pinskia at gcc dot gnu dot org
@ 2009-11-02 16:56 ` pinskia at gcc dot gnu dot org
  2009-11-02 17:05 ` pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 16:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2009-11-02 16:56 -------
Actually the warning is incorrect at least according to the PPU book 4.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
  2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly siarhei dot siamashka at gmail dot com
                   ` (2 preceding siblings ...)
  2009-11-02 16:56 ` pinskia at gcc dot gnu dot org
@ 2009-11-02 17:05 ` pinskia at gcc dot gnu dot org
  2009-11-02 17:09 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 17:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from pinskia at gcc dot gnu dot org  2009-11-02 17:05 -------
In fact changing the the addic. into addic/cmpwi does not improve the speed of
the code:


With the change:
[apinski@dhcp-10-98-10-216 local]$ time ./a.out
56.316u 0.084s 0:57.09 98.7%    0+0k 0+0io 0pf+0w

Without:
56.276u 0.088s 0:57.08 98.7%    0+0k 0+0io 0pf+0w


So the warning is only invalid.  

With -Os on the trunk:
24.144u 0.032s 0:24.45 98.8%    0+0k 0+0io 0pf+0w


I don't know why off hand -Os is faster than -O2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
  2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly siarhei dot siamashka at gmail dot com
                   ` (3 preceding siblings ...)
  2009-11-02 17:05 ` pinskia at gcc dot gnu dot org
@ 2009-11-02 17:09 ` pinskia at gcc dot gnu dot org
  2009-11-02 17:10 ` pinskia at gcc dot gnu dot org
  2009-11-03 20:09 ` siarhei dot siamashka at gmail dot com
  6 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 17:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gcc dot gnu dot org  2009-11-02 17:08 -------
In fact doing the following diff to the -Os assembly:
--- t5.Os.s     2009-11-02 23:18:52.000000000 +0900
+++ t5.Os.dot.s 2009-11-02 23:20:19.000000000 +0900
@@ -29,9 +29,9 @@ x:
 .L4:
        bl y
 .L3:
-       cmpwi 7,31,0
-       addi 31,31,-1
-       bne 7,.L4
+#      cmpwi 7,31,0
+       addic. 31,31,-1
+       bne .L4
        addi 11,1,16
        b _restgpr_31_x
        .size   x,.-x

produces the same result as -Os on the trunk.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
  2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly siarhei dot siamashka at gmail dot com
                   ` (4 preceding siblings ...)
  2009-11-02 17:09 ` pinskia at gcc dot gnu dot org
@ 2009-11-02 17:10 ` pinskia at gcc dot gnu dot org
  2009-11-03 20:09 ` siarhei dot siamashka at gmail dot com
  6 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-11-02 17:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at gcc dot gnu dot org  2009-11-02 17:10 -------
So in conclusion, addic. is not microcoded and the warning is incorrect but
still -Os is faster than -O2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
  2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly siarhei dot siamashka at gmail dot com
                   ` (5 preceding siblings ...)
  2009-11-02 17:10 ` pinskia at gcc dot gnu dot org
@ 2009-11-03 20:09 ` siarhei dot siamashka at gmail dot com
  6 siblings, 0 replies; 10+ messages in thread
From: siarhei dot siamashka at gmail dot com @ 2009-11-03 20:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from siarhei dot siamashka at gmail dot com  2009-11-03 20:09 -------
Thanks a lot for checking this. And sorry about the confusion caused by
attributing slowness of the testcase to the microcoded stuff (which turned out
to be not the case) without proper checking this first.

So should this bug be split into two? One about the incorrect warning, and
another one about generating nonoptimal code at -O2 level (extra load and store
operations, which are probably penalized by something like RAW hazard in such a
short loop)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
       [not found] <bug-41868-4@http.gcc.gnu.org/bugzilla/>
  2011-11-29 23:21 ` pinskia at gcc dot gnu.org
@ 2011-11-29 23:28 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-11-29 23:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-11-29 23:18:32 UTC ---
No longer working on this.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly
       [not found] <bug-41868-4@http.gcc.gnu.org/bugzilla/>
@ 2011-11-29 23:21 ` pinskia at gcc dot gnu.org
  2011-11-29 23:28 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-11-29 23:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
         AssignedTo|pinskia at gcc dot gnu.org  |unassigned at gcc dot
                   |                            |gnu.org

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-11-29 23:18:46 UTC ---
No longer working on this.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-11-29 23:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-29 15:16 [Bug target/41868] New: cell microcode instruction is generated for a trivial loop with -O2 optimizations, hurting performance badly siarhei dot siamashka at gmail dot com
2009-10-29 15:21 ` [Bug target/41868] cell microcode instruction (addic.) " siarhei dot siamashka at gmail dot com
2009-11-02 16:51 ` pinskia at gcc dot gnu dot org
2009-11-02 16:56 ` pinskia at gcc dot gnu dot org
2009-11-02 17:05 ` pinskia at gcc dot gnu dot org
2009-11-02 17:09 ` pinskia at gcc dot gnu dot org
2009-11-02 17:10 ` pinskia at gcc dot gnu dot org
2009-11-03 20:09 ` siarhei dot siamashka at gmail dot com
     [not found] <bug-41868-4@http.gcc.gnu.org/bugzilla/>
2011-11-29 23:21 ` pinskia at gcc dot gnu.org
2011-11-29 23:28 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).