public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
From: Falk Hueffner <falk.hueffner@student.uni-tuebingen.de>
To: gcc-gnats@gcc.gnu.org
Subject: optimization/10080: Loop unroller nearly useless
Date: Fri, 14 Mar 2003 11:56:00 -0000	[thread overview]
Message-ID: <E18tnj5-0002uS-00@juist> (raw)


>Number:         10080
>Category:       optimization
>Synopsis:       Loop unroller nearly useless
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          pessimizes-code
>Submitter-Id:   net
>Arrival-Date:   Fri Mar 14 11:56:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Falk Hueffner
>Release:        3.4 20030310 (experimental)
>Organization:
>Environment:
System: Linux juist 2.5.59 #4 Sat Jan 18 12:46:41 CET 2003 alpha unknown unknown GNU/Linux
Architecture: alpha

	
host: alphaev68-unknown-linux-gnu
build: alphaev68-unknown-linux-gnu
target: alphaev68-unknown-linux-gnu
configured with: ../configure --enable-languages=c++
>Description:
Loops with known count:

int f (int *p) {
    int r = 0, i;
    for (i = 0; i < 4; ++i)
	r += p[i];
    return r;
}

% gcc -funroll-all-loops -c -O3 test.c && objdump -d test.o    
0000000000000000 <f>:
   0:   00 04 ff 47     clr     v0
   4:   03 04 ff 47     clr     t2
   8:   00 00 30 a0     ldl     t0,0(a0)
   c:   02 30 60 40     addl    t2,0x1,t1
  10:   04 00 10 22     lda     a0,4(a0)
  14:   a4 7d 40 40     cmple   t1,0x3,t3
  18:   03 30 40 40     addl    t1,0x1,t2
  1c:   00 00 01 40     addl    v0,t0,v0
  20:   13 00 80 e4     beq     t3,70 <f+0x70>
  24:   00 00 b0 a0     ldl     t4,0(a0)
  28:   a4 7d 60 40     cmple   t2,0x3,t3
  2c:   04 00 10 22     lda     a0,4(a0)
  30:   03 30 60 40     addl    t2,0x1,t2
  34:   00 00 05 40     addl    v0,t4,v0
  38:   0d 00 80 e4     beq     t3,70 <f+0x70>
  3c:   00 00 f0 a0     ldl     t6,0(a0)
  40:   a6 7d 60 40     cmple   t2,0x3,t5
  44:   04 00 10 22     lda     a0,4(a0)
  48:   03 30 60 40     addl    t2,0x1,t2
  4c:   00 00 07 40     addl    v0,t6,v0
  50:   07 00 c0 e4     beq     t5,70 <f+0x70>
  54:   00 00 30 a2     ldl     a1,0(a0)
  58:   a8 7d 60 40     cmple   t2,0x3,t7
  5c:   04 00 10 22     lda     a0,4(a0)
  60:   00 00 11 40     addl    v0,a1,v0
  64:   e8 ff 1f f5     bne     t7,8 <f+0x8>
  68:   1f 04 ff 47     nop
  6c:   00 00 fe 2f     unop
  70:   01 80 fa 6b     ret
  74:   00 00 fe 2f     unop
  78:   1f 04 ff 47     nop
  7c:   00 00 fe 2f     unop

gcc 3.2 generates this:

0000000000000000 <f>:
   0:   04 00 10 a0     ldl     v0,4(a0)
   4:   00 00 50 a0     ldl     t1,0(a0)
   8:   08 00 b0 a0     ldl     t4,8(a0)
   c:   0c 00 90 a0     ldl     t3,12(a0)
  10:   03 00 40 40     addl    t1,v0,t2
  14:   01 00 65 40     addl    t2,t4,t0
  18:   00 00 24 40     addl    t0,t3,v0
  1c:   01 80 fa 6b     ret


Loops with unknown count:

int g (int *p, int n) {
    int r = 0, i;
    for (i = 0; i < n; ++i)
	r += p[i];
    return r;
}

% gcc -funroll-all-loops -c -O3 test.c && objdump -d test.o    
0000000000000080 <g>:
  80:   00 04 ff 47     clr     v0
  84:   03 04 ff 47     clr     t2
  88:   19 00 20 ee     ble     a1,f0 <g+0x70>
  8c:   00 00 30 a0     ldl     t0,0(a0)
  90:   02 30 60 40     addl    t2,0x1,t1
  94:   04 00 10 22     lda     a0,4(a0)
  98:   a4 09 51 40     cmplt   t1,a1,t3
  9c:   03 30 40 40     addl    t1,0x1,t2
  a0:   00 00 01 40     addl    v0,t0,v0
  a4:   12 00 80 e4     beq     t3,f0 <g+0x70>
  a8:   00 00 b0 a0     ldl     t4,0(a0)
  ac:   a4 09 71 40     cmplt   t2,a1,t3
  b0:   04 00 10 22     lda     a0,4(a0)
  b4:   03 30 60 40     addl    t2,0x1,t2
  b8:   00 00 05 40     addl    v0,t4,v0
  bc:   0c 00 80 e4     beq     t3,f0 <g+0x70>
  c0:   00 00 f0 a0     ldl     t6,0(a0)
  c4:   a6 09 71 40     cmplt   t2,a1,t5
  c8:   04 00 10 22     lda     a0,4(a0)
  cc:   03 30 60 40     addl    t2,0x1,t2
  d0:   00 00 07 40     addl    v0,t6,v0
  d4:   06 00 c0 e4     beq     t5,f0 <g+0x70>
  d8:   00 00 50 a2     ldl     a2,0(a0)
  dc:   a8 09 71 40     cmplt   t2,a1,t7
  e0:   04 00 10 22     lda     a0,4(a0)
  e4:   00 00 12 40     addl    v0,a2,v0
  e8:   e8 ff 1f f5     bne     t7,8c <g+0xc>
  ec:   00 00 fe 2f     unop
  f0:   01 80 fa 6b     ret
  f4:   00 00 fe 2f     unop
  f8:   1f 04 ff 47     nop
  fc:   00 00 fe 2f     unop

Well, that gains exactly nothing over not unrolling. Ideally, it
should look more like (Compaq compiler output):

0000000000000030 <g>:
  30:   00 04 ff 47     clr     v0
  34:   28 00 20 ee     ble     a1,d8 <g+0xa8>
  38:   23 d1 20 42     subl    a1,0x6,t2
  3c:   02 04 ff 47     clr     t1
  40:   a4 0d 71 40     cmple   t2,a1,t3
  44:   a5 1d 60 40     cmple   t2,0,t4
  48:   04 01 85 44     andnot  t3,t4,t3
  4c:   00 00 fe 2f     unop
  50:   1b 00 80 e0     blbc    t3,c0 <g+0x90>
  54:   00 00 fe 2f     unop
  58:   00 00 fe 2f     unop
  5c:   00 00 fe 2f     unop
  60:   00 02 f0 a3     ldl     zero,512(a0)
  64:   00 00 d0 a0     ldl     t5,0(a0)
  68:   02 f0 40 40     addl    t1,0x7,t1
  6c:   1c 00 10 22     lda     a0,28(a0)
  70:   e8 ff f0 a0     ldl     t6,-24(a0)
  74:   ec ff 10 a1     ldl     t7,-20(a0)
  78:   b7 09 43 40     cmplt   t1,t2,t9
  7c:   f0 ff 50 a2     ldl     a2,-16(a0)
  80:   f4 ff 70 a2     ldl     a3,-12(a0)
  84:   f8 ff 90 a2     ldl     a4,-8(a0)
  88:   fc ff b0 a2     ldl     a5,-4(a0)
  8c:   06 00 c7 40     addl    t5,t6,t5
  90:   08 00 12 41     addl    t7,a2,t7
  94:   13 00 74 42     addl    a3,a4,a3
  98:   06 00 06 41     addl    t7,t5,t5
  9c:   13 00 b3 42     addl    a5,a3,a3
  a0:   06 00 d3 40     addl    t5,a3,t5
  a4:   00 00 06 40     addl    v0,t5,v0
  a8:   ed ff ff f6     bne     t9,60 <g+0x30>
  ac:   b8 09 51 40     cmplt   t1,a1,t10
  b0:   09 00 00 e7     beq     t10,d8 <g+0xa8>
  b4:   00 00 fe 2f     unop
  b8:   00 00 fe 2f     unop
  bc:   00 00 fe 2f     unop
  c0:   00 00 30 a3     ldl     t11,0(a0)
  c4:   02 30 40 40     addl    t1,0x1,t1
  c8:   04 00 10 22     lda     a0,4(a0)
  cc:   bb 09 51 40     cmplt   t1,a1,t12
  d0:   00 00 19 40     addl    v0,t11,v0
  d4:   fa ff 7f f7     bne     t12,c0 <g+0x90>
  d8:   01 80 fa 6b     ret

>How-To-Repeat:
	
>Fix:
	
>Release-Note:
>Audit-Trail:
>Unformatted:


             reply	other threads:[~2003-03-14 11:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-03-14 11:56 Falk Hueffner [this message]
2003-03-14 23:56 rakdver
2003-03-15  0:06 rakdver
2003-03-15  5:36 bangerth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E18tnj5-0002uS-00@juist \
    --to=falk.hueffner@student.uni-tuebingen.de \
    --cc=gcc-gnats@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).