public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* Re: optimization/10080: Loop unroller nearly useless
@ 2003-03-15  0:06 rakdver
  0 siblings, 0 replies; 4+ messages in thread
From: rakdver @ 2003-03-15  0:06 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR optimization/10080; it has been noted by GNATS.

From: rakdver@atrey.karlin.mff.cuni.cz
To: gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org, nobody@gcc.gnu.org,
	gcc-prs@gcc.gnu.org, falk.hueffner@student.uni-tuebingen.de
Cc:  
Subject: Re: optimization/10080: Loop unroller nearly useless
Date: Sat, 15 Mar 2003 00:59:36 +0100 (CET)

 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=
 10080
 
 the problem is that ++i is translated into
 
 (insn 28 26 29 1 0x40135c40 (set (reg:DI 79)
         (plus:DI (reg/v:DI 72 [ i ])
             (const_int 1 [0x1]))) -1 (nil)
     (nil))
 (insn 29 28 75 1 0x40135c40 (set (reg/v:DI 72 [ i ])
         (sign_extend:DI (subreg:SI (reg:DI 79) 0))) -1 (nil)
     (nil))
 
 but my overly simplistic analysis does not recognize it.
 I am workning on fix.
 
 Zdenek


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: optimization/10080: Loop unroller nearly useless
@ 2003-03-15  5:36 bangerth
  0 siblings, 0 replies; 4+ messages in thread
From: bangerth @ 2003-03-15  5:36 UTC (permalink / raw)
  To: falk.hueffner, gcc-bugs, gcc-prs, nobody

Synopsis: Loop unroller nearly useless

State-Changed-From-To: open->analyzed
State-Changed-By: bangerth
State-Changed-When: Sat Mar 15 05:36:27 2003
State-Changed-Why:
    Zdenek analyzed this.

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=10080


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: optimization/10080: Loop unroller nearly useless
@ 2003-03-14 23:56 rakdver
  0 siblings, 0 replies; 4+ messages in thread
From: rakdver @ 2003-03-14 23:56 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR optimization/10080; it has been noted by GNATS.

From: rakdver@atrey.karlin.mff.cuni.cz
To: gcc-gnats@gcc.gnu.org,gcc-bugs@gcc.gnu.org,nobody@gcc.gnu.org,gcc-prs@gcc.gnu.org,falk.hueffner@student.uni-tuebingen.de
Cc:  
Subject: Re: optimization/10080: Loop unroller nearly useless
Date: Sat, 15 Mar 2003 00:36:04 +0100

 Hello,
 
 the problem is that ++i is tranlated into
 
 (insn 28 26 29 1 0x40135c40 (set (reg:DI 79)
         (plus:DI (reg/v:DI 72 [ i ])
             (const_int 1 [0x1]))) -1 (nil)
     (nil))
 (insn 29 28 75 1 0x40135c40 (set (reg/v:DI 72 [ i ])
         (sign_extend:DI (subreg:SI (reg:DI 79) 0))) -1 (nil)
     (nil))
 
 but my analysis is overly simplistic and does not recognize this. I am working
 on fix.
 
 Zdenek


^ permalink raw reply	[flat|nested] 4+ messages in thread

* optimization/10080: Loop unroller nearly useless
@ 2003-03-14 11:56 Falk Hueffner
  0 siblings, 0 replies; 4+ messages in thread
From: Falk Hueffner @ 2003-03-14 11:56 UTC (permalink / raw)
  To: gcc-gnats


>Number:         10080
>Category:       optimization
>Synopsis:       Loop unroller nearly useless
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          pessimizes-code
>Submitter-Id:   net
>Arrival-Date:   Fri Mar 14 11:56:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Falk Hueffner
>Release:        3.4 20030310 (experimental)
>Organization:
>Environment:
System: Linux juist 2.5.59 #4 Sat Jan 18 12:46:41 CET 2003 alpha unknown unknown GNU/Linux
Architecture: alpha

	
host: alphaev68-unknown-linux-gnu
build: alphaev68-unknown-linux-gnu
target: alphaev68-unknown-linux-gnu
configured with: ../configure --enable-languages=c++
>Description:
Loops with known count:

int f (int *p) {
    int r = 0, i;
    for (i = 0; i < 4; ++i)
	r += p[i];
    return r;
}

% gcc -funroll-all-loops -c -O3 test.c && objdump -d test.o    
0000000000000000 <f>:
   0:   00 04 ff 47     clr     v0
   4:   03 04 ff 47     clr     t2
   8:   00 00 30 a0     ldl     t0,0(a0)
   c:   02 30 60 40     addl    t2,0x1,t1
  10:   04 00 10 22     lda     a0,4(a0)
  14:   a4 7d 40 40     cmple   t1,0x3,t3
  18:   03 30 40 40     addl    t1,0x1,t2
  1c:   00 00 01 40     addl    v0,t0,v0
  20:   13 00 80 e4     beq     t3,70 <f+0x70>
  24:   00 00 b0 a0     ldl     t4,0(a0)
  28:   a4 7d 60 40     cmple   t2,0x3,t3
  2c:   04 00 10 22     lda     a0,4(a0)
  30:   03 30 60 40     addl    t2,0x1,t2
  34:   00 00 05 40     addl    v0,t4,v0
  38:   0d 00 80 e4     beq     t3,70 <f+0x70>
  3c:   00 00 f0 a0     ldl     t6,0(a0)
  40:   a6 7d 60 40     cmple   t2,0x3,t5
  44:   04 00 10 22     lda     a0,4(a0)
  48:   03 30 60 40     addl    t2,0x1,t2
  4c:   00 00 07 40     addl    v0,t6,v0
  50:   07 00 c0 e4     beq     t5,70 <f+0x70>
  54:   00 00 30 a2     ldl     a1,0(a0)
  58:   a8 7d 60 40     cmple   t2,0x3,t7
  5c:   04 00 10 22     lda     a0,4(a0)
  60:   00 00 11 40     addl    v0,a1,v0
  64:   e8 ff 1f f5     bne     t7,8 <f+0x8>
  68:   1f 04 ff 47     nop
  6c:   00 00 fe 2f     unop
  70:   01 80 fa 6b     ret
  74:   00 00 fe 2f     unop
  78:   1f 04 ff 47     nop
  7c:   00 00 fe 2f     unop

gcc 3.2 generates this:

0000000000000000 <f>:
   0:   04 00 10 a0     ldl     v0,4(a0)
   4:   00 00 50 a0     ldl     t1,0(a0)
   8:   08 00 b0 a0     ldl     t4,8(a0)
   c:   0c 00 90 a0     ldl     t3,12(a0)
  10:   03 00 40 40     addl    t1,v0,t2
  14:   01 00 65 40     addl    t2,t4,t0
  18:   00 00 24 40     addl    t0,t3,v0
  1c:   01 80 fa 6b     ret


Loops with unknown count:

int g (int *p, int n) {
    int r = 0, i;
    for (i = 0; i < n; ++i)
	r += p[i];
    return r;
}

% gcc -funroll-all-loops -c -O3 test.c && objdump -d test.o    
0000000000000080 <g>:
  80:   00 04 ff 47     clr     v0
  84:   03 04 ff 47     clr     t2
  88:   19 00 20 ee     ble     a1,f0 <g+0x70>
  8c:   00 00 30 a0     ldl     t0,0(a0)
  90:   02 30 60 40     addl    t2,0x1,t1
  94:   04 00 10 22     lda     a0,4(a0)
  98:   a4 09 51 40     cmplt   t1,a1,t3
  9c:   03 30 40 40     addl    t1,0x1,t2
  a0:   00 00 01 40     addl    v0,t0,v0
  a4:   12 00 80 e4     beq     t3,f0 <g+0x70>
  a8:   00 00 b0 a0     ldl     t4,0(a0)
  ac:   a4 09 71 40     cmplt   t2,a1,t3
  b0:   04 00 10 22     lda     a0,4(a0)
  b4:   03 30 60 40     addl    t2,0x1,t2
  b8:   00 00 05 40     addl    v0,t4,v0
  bc:   0c 00 80 e4     beq     t3,f0 <g+0x70>
  c0:   00 00 f0 a0     ldl     t6,0(a0)
  c4:   a6 09 71 40     cmplt   t2,a1,t5
  c8:   04 00 10 22     lda     a0,4(a0)
  cc:   03 30 60 40     addl    t2,0x1,t2
  d0:   00 00 07 40     addl    v0,t6,v0
  d4:   06 00 c0 e4     beq     t5,f0 <g+0x70>
  d8:   00 00 50 a2     ldl     a2,0(a0)
  dc:   a8 09 71 40     cmplt   t2,a1,t7
  e0:   04 00 10 22     lda     a0,4(a0)
  e4:   00 00 12 40     addl    v0,a2,v0
  e8:   e8 ff 1f f5     bne     t7,8c <g+0xc>
  ec:   00 00 fe 2f     unop
  f0:   01 80 fa 6b     ret
  f4:   00 00 fe 2f     unop
  f8:   1f 04 ff 47     nop
  fc:   00 00 fe 2f     unop

Well, that gains exactly nothing over not unrolling. Ideally, it
should look more like (Compaq compiler output):

0000000000000030 <g>:
  30:   00 04 ff 47     clr     v0
  34:   28 00 20 ee     ble     a1,d8 <g+0xa8>
  38:   23 d1 20 42     subl    a1,0x6,t2
  3c:   02 04 ff 47     clr     t1
  40:   a4 0d 71 40     cmple   t2,a1,t3
  44:   a5 1d 60 40     cmple   t2,0,t4
  48:   04 01 85 44     andnot  t3,t4,t3
  4c:   00 00 fe 2f     unop
  50:   1b 00 80 e0     blbc    t3,c0 <g+0x90>
  54:   00 00 fe 2f     unop
  58:   00 00 fe 2f     unop
  5c:   00 00 fe 2f     unop
  60:   00 02 f0 a3     ldl     zero,512(a0)
  64:   00 00 d0 a0     ldl     t5,0(a0)
  68:   02 f0 40 40     addl    t1,0x7,t1
  6c:   1c 00 10 22     lda     a0,28(a0)
  70:   e8 ff f0 a0     ldl     t6,-24(a0)
  74:   ec ff 10 a1     ldl     t7,-20(a0)
  78:   b7 09 43 40     cmplt   t1,t2,t9
  7c:   f0 ff 50 a2     ldl     a2,-16(a0)
  80:   f4 ff 70 a2     ldl     a3,-12(a0)
  84:   f8 ff 90 a2     ldl     a4,-8(a0)
  88:   fc ff b0 a2     ldl     a5,-4(a0)
  8c:   06 00 c7 40     addl    t5,t6,t5
  90:   08 00 12 41     addl    t7,a2,t7
  94:   13 00 74 42     addl    a3,a4,a3
  98:   06 00 06 41     addl    t7,t5,t5
  9c:   13 00 b3 42     addl    a5,a3,a3
  a0:   06 00 d3 40     addl    t5,a3,t5
  a4:   00 00 06 40     addl    v0,t5,v0
  a8:   ed ff ff f6     bne     t9,60 <g+0x30>
  ac:   b8 09 51 40     cmplt   t1,a1,t10
  b0:   09 00 00 e7     beq     t10,d8 <g+0xa8>
  b4:   00 00 fe 2f     unop
  b8:   00 00 fe 2f     unop
  bc:   00 00 fe 2f     unop
  c0:   00 00 30 a3     ldl     t11,0(a0)
  c4:   02 30 40 40     addl    t1,0x1,t1
  c8:   04 00 10 22     lda     a0,4(a0)
  cc:   bb 09 51 40     cmplt   t1,a1,t12
  d0:   00 00 19 40     addl    v0,t11,v0
  d4:   fa ff 7f f7     bne     t12,c0 <g+0x90>
  d8:   01 80 fa 6b     ret

>How-To-Repeat:
	
>Fix:
	
>Release-Note:
>Audit-Trail:
>Unformatted:


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-03-15  5:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-15  0:06 optimization/10080: Loop unroller nearly useless rakdver
  -- strict thread matches above, loose matches on Subject: below --
2003-03-15  5:36 bangerth
2003-03-14 23:56 rakdver
2003-03-14 11:56 Falk Hueffner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).