public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* Re: optimization/10080: Loop unroller nearly useless
@ 2003-03-15 0:06 rakdver
0 siblings, 0 replies; 4+ messages in thread
From: rakdver @ 2003-03-15 0:06 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/10080; it has been noted by GNATS.
From: rakdver@atrey.karlin.mff.cuni.cz
To: gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org, nobody@gcc.gnu.org,
gcc-prs@gcc.gnu.org, falk.hueffner@student.uni-tuebingen.de
Cc:
Subject: Re: optimization/10080: Loop unroller nearly useless
Date: Sat, 15 Mar 2003 00:59:36 +0100 (CET)
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=
10080
the problem is that ++i is translated into
(insn 28 26 29 1 0x40135c40 (set (reg:DI 79)
(plus:DI (reg/v:DI 72 [ i ])
(const_int 1 [0x1]))) -1 (nil)
(nil))
(insn 29 28 75 1 0x40135c40 (set (reg/v:DI 72 [ i ])
(sign_extend:DI (subreg:SI (reg:DI 79) 0))) -1 (nil)
(nil))
but my overly simplistic analysis does not recognize it.
I am workning on fix.
Zdenek
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: optimization/10080: Loop unroller nearly useless
@ 2003-03-15 5:36 bangerth
0 siblings, 0 replies; 4+ messages in thread
From: bangerth @ 2003-03-15 5:36 UTC (permalink / raw)
To: falk.hueffner, gcc-bugs, gcc-prs, nobody
Synopsis: Loop unroller nearly useless
State-Changed-From-To: open->analyzed
State-Changed-By: bangerth
State-Changed-When: Sat Mar 15 05:36:27 2003
State-Changed-Why:
Zdenek analyzed this.
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=10080
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: optimization/10080: Loop unroller nearly useless
@ 2003-03-14 23:56 rakdver
0 siblings, 0 replies; 4+ messages in thread
From: rakdver @ 2003-03-14 23:56 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR optimization/10080; it has been noted by GNATS.
From: rakdver@atrey.karlin.mff.cuni.cz
To: gcc-gnats@gcc.gnu.org,gcc-bugs@gcc.gnu.org,nobody@gcc.gnu.org,gcc-prs@gcc.gnu.org,falk.hueffner@student.uni-tuebingen.de
Cc:
Subject: Re: optimization/10080: Loop unroller nearly useless
Date: Sat, 15 Mar 2003 00:36:04 +0100
Hello,
the problem is that ++i is tranlated into
(insn 28 26 29 1 0x40135c40 (set (reg:DI 79)
(plus:DI (reg/v:DI 72 [ i ])
(const_int 1 [0x1]))) -1 (nil)
(nil))
(insn 29 28 75 1 0x40135c40 (set (reg/v:DI 72 [ i ])
(sign_extend:DI (subreg:SI (reg:DI 79) 0))) -1 (nil)
(nil))
but my analysis is overly simplistic and does not recognize this. I am working
on fix.
Zdenek
^ permalink raw reply [flat|nested] 4+ messages in thread
* optimization/10080: Loop unroller nearly useless
@ 2003-03-14 11:56 Falk Hueffner
0 siblings, 0 replies; 4+ messages in thread
From: Falk Hueffner @ 2003-03-14 11:56 UTC (permalink / raw)
To: gcc-gnats
>Number: 10080
>Category: optimization
>Synopsis: Loop unroller nearly useless
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: unassigned
>State: open
>Class: pessimizes-code
>Submitter-Id: net
>Arrival-Date: Fri Mar 14 11:56:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Falk Hueffner
>Release: 3.4 20030310 (experimental)
>Organization:
>Environment:
System: Linux juist 2.5.59 #4 Sat Jan 18 12:46:41 CET 2003 alpha unknown unknown GNU/Linux
Architecture: alpha
host: alphaev68-unknown-linux-gnu
build: alphaev68-unknown-linux-gnu
target: alphaev68-unknown-linux-gnu
configured with: ../configure --enable-languages=c++
>Description:
Loops with known count:
int f (int *p) {
int r = 0, i;
for (i = 0; i < 4; ++i)
r += p[i];
return r;
}
% gcc -funroll-all-loops -c -O3 test.c && objdump -d test.o
0000000000000000 <f>:
0: 00 04 ff 47 clr v0
4: 03 04 ff 47 clr t2
8: 00 00 30 a0 ldl t0,0(a0)
c: 02 30 60 40 addl t2,0x1,t1
10: 04 00 10 22 lda a0,4(a0)
14: a4 7d 40 40 cmple t1,0x3,t3
18: 03 30 40 40 addl t1,0x1,t2
1c: 00 00 01 40 addl v0,t0,v0
20: 13 00 80 e4 beq t3,70 <f+0x70>
24: 00 00 b0 a0 ldl t4,0(a0)
28: a4 7d 60 40 cmple t2,0x3,t3
2c: 04 00 10 22 lda a0,4(a0)
30: 03 30 60 40 addl t2,0x1,t2
34: 00 00 05 40 addl v0,t4,v0
38: 0d 00 80 e4 beq t3,70 <f+0x70>
3c: 00 00 f0 a0 ldl t6,0(a0)
40: a6 7d 60 40 cmple t2,0x3,t5
44: 04 00 10 22 lda a0,4(a0)
48: 03 30 60 40 addl t2,0x1,t2
4c: 00 00 07 40 addl v0,t6,v0
50: 07 00 c0 e4 beq t5,70 <f+0x70>
54: 00 00 30 a2 ldl a1,0(a0)
58: a8 7d 60 40 cmple t2,0x3,t7
5c: 04 00 10 22 lda a0,4(a0)
60: 00 00 11 40 addl v0,a1,v0
64: e8 ff 1f f5 bne t7,8 <f+0x8>
68: 1f 04 ff 47 nop
6c: 00 00 fe 2f unop
70: 01 80 fa 6b ret
74: 00 00 fe 2f unop
78: 1f 04 ff 47 nop
7c: 00 00 fe 2f unop
gcc 3.2 generates this:
0000000000000000 <f>:
0: 04 00 10 a0 ldl v0,4(a0)
4: 00 00 50 a0 ldl t1,0(a0)
8: 08 00 b0 a0 ldl t4,8(a0)
c: 0c 00 90 a0 ldl t3,12(a0)
10: 03 00 40 40 addl t1,v0,t2
14: 01 00 65 40 addl t2,t4,t0
18: 00 00 24 40 addl t0,t3,v0
1c: 01 80 fa 6b ret
Loops with unknown count:
int g (int *p, int n) {
int r = 0, i;
for (i = 0; i < n; ++i)
r += p[i];
return r;
}
% gcc -funroll-all-loops -c -O3 test.c && objdump -d test.o
0000000000000080 <g>:
80: 00 04 ff 47 clr v0
84: 03 04 ff 47 clr t2
88: 19 00 20 ee ble a1,f0 <g+0x70>
8c: 00 00 30 a0 ldl t0,0(a0)
90: 02 30 60 40 addl t2,0x1,t1
94: 04 00 10 22 lda a0,4(a0)
98: a4 09 51 40 cmplt t1,a1,t3
9c: 03 30 40 40 addl t1,0x1,t2
a0: 00 00 01 40 addl v0,t0,v0
a4: 12 00 80 e4 beq t3,f0 <g+0x70>
a8: 00 00 b0 a0 ldl t4,0(a0)
ac: a4 09 71 40 cmplt t2,a1,t3
b0: 04 00 10 22 lda a0,4(a0)
b4: 03 30 60 40 addl t2,0x1,t2
b8: 00 00 05 40 addl v0,t4,v0
bc: 0c 00 80 e4 beq t3,f0 <g+0x70>
c0: 00 00 f0 a0 ldl t6,0(a0)
c4: a6 09 71 40 cmplt t2,a1,t5
c8: 04 00 10 22 lda a0,4(a0)
cc: 03 30 60 40 addl t2,0x1,t2
d0: 00 00 07 40 addl v0,t6,v0
d4: 06 00 c0 e4 beq t5,f0 <g+0x70>
d8: 00 00 50 a2 ldl a2,0(a0)
dc: a8 09 71 40 cmplt t2,a1,t7
e0: 04 00 10 22 lda a0,4(a0)
e4: 00 00 12 40 addl v0,a2,v0
e8: e8 ff 1f f5 bne t7,8c <g+0xc>
ec: 00 00 fe 2f unop
f0: 01 80 fa 6b ret
f4: 00 00 fe 2f unop
f8: 1f 04 ff 47 nop
fc: 00 00 fe 2f unop
Well, that gains exactly nothing over not unrolling. Ideally, it
should look more like (Compaq compiler output):
0000000000000030 <g>:
30: 00 04 ff 47 clr v0
34: 28 00 20 ee ble a1,d8 <g+0xa8>
38: 23 d1 20 42 subl a1,0x6,t2
3c: 02 04 ff 47 clr t1
40: a4 0d 71 40 cmple t2,a1,t3
44: a5 1d 60 40 cmple t2,0,t4
48: 04 01 85 44 andnot t3,t4,t3
4c: 00 00 fe 2f unop
50: 1b 00 80 e0 blbc t3,c0 <g+0x90>
54: 00 00 fe 2f unop
58: 00 00 fe 2f unop
5c: 00 00 fe 2f unop
60: 00 02 f0 a3 ldl zero,512(a0)
64: 00 00 d0 a0 ldl t5,0(a0)
68: 02 f0 40 40 addl t1,0x7,t1
6c: 1c 00 10 22 lda a0,28(a0)
70: e8 ff f0 a0 ldl t6,-24(a0)
74: ec ff 10 a1 ldl t7,-20(a0)
78: b7 09 43 40 cmplt t1,t2,t9
7c: f0 ff 50 a2 ldl a2,-16(a0)
80: f4 ff 70 a2 ldl a3,-12(a0)
84: f8 ff 90 a2 ldl a4,-8(a0)
88: fc ff b0 a2 ldl a5,-4(a0)
8c: 06 00 c7 40 addl t5,t6,t5
90: 08 00 12 41 addl t7,a2,t7
94: 13 00 74 42 addl a3,a4,a3
98: 06 00 06 41 addl t7,t5,t5
9c: 13 00 b3 42 addl a5,a3,a3
a0: 06 00 d3 40 addl t5,a3,t5
a4: 00 00 06 40 addl v0,t5,v0
a8: ed ff ff f6 bne t9,60 <g+0x30>
ac: b8 09 51 40 cmplt t1,a1,t10
b0: 09 00 00 e7 beq t10,d8 <g+0xa8>
b4: 00 00 fe 2f unop
b8: 00 00 fe 2f unop
bc: 00 00 fe 2f unop
c0: 00 00 30 a3 ldl t11,0(a0)
c4: 02 30 40 40 addl t1,0x1,t1
c8: 04 00 10 22 lda a0,4(a0)
cc: bb 09 51 40 cmplt t1,a1,t12
d0: 00 00 19 40 addl v0,t11,v0
d4: fa ff 7f f7 bne t12,c0 <g+0x90>
d8: 01 80 fa 6b ret
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-03-15 5:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-15 0:06 optimization/10080: Loop unroller nearly useless rakdver
-- strict thread matches above, loose matches on Subject: below --
2003-03-15 5:36 bangerth
2003-03-14 23:56 rakdver
2003-03-14 11:56 Falk Hueffner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).