From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23967 invoked by alias); 14 Mar 2003 11:56:01 -0000 Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-prs-owner@gcc.gnu.org Received: (qmail 23924 invoked by uid 71); 14 Mar 2003 11:56:00 -0000 Resent-Date: 14 Mar 2003 11:56:00 -0000 Resent-Message-ID: <20030314115600.23922.qmail@sources.redhat.com> Resent-From: gcc-gnats@gcc.gnu.org (GNATS Filer) Resent-Cc: gcc-prs@gcc.gnu.org, gcc-bugs@gcc.gnu.org Resent-Reply-To: gcc-gnats@gcc.gnu.org, Falk Hueffner Received: (qmail 29921 invoked from network); 14 Mar 2003 11:51:55 -0000 Received: from unknown (HELO mx02.uni-tuebingen.de) (134.2.3.12) by sources.redhat.com with SMTP; 14 Mar 2003 11:51:55 -0000 Received: from juist (semeai.Informatik.Uni-Tuebingen.De [134.2.15.66]) by mx02.uni-tuebingen.de (8.12.3/8.12.3) with ESMTP id h2EBprnu025295 for ; Fri, 14 Mar 2003 12:51:54 +0100 Received: from falk by juist with local (Exim 3.36 #1 (Debian)) id 18tnj5-0002uS-00 for ; Fri, 14 Mar 2003 12:51:51 +0100 Message-Id: Date: Fri, 14 Mar 2003 11:56:00 -0000 From: Falk Hueffner To: gcc-gnats@gcc.gnu.org X-Send-Pr-Version: 3.113 Subject: optimization/10080: Loop unroller nearly useless X-SW-Source: 2003-03/txt/msg00837.txt.bz2 List-Id: >Number: 10080 >Category: optimization >Synopsis: Loop unroller nearly useless >Confidential: no >Severity: serious >Priority: medium >Responsible: unassigned >State: open >Class: pessimizes-code >Submitter-Id: net >Arrival-Date: Fri Mar 14 11:56:00 UTC 2003 >Closed-Date: >Last-Modified: >Originator: Falk Hueffner >Release: 3.4 20030310 (experimental) >Organization: >Environment: System: Linux juist 2.5.59 #4 Sat Jan 18 12:46:41 CET 2003 alpha unknown unknown GNU/Linux Architecture: alpha host: alphaev68-unknown-linux-gnu build: alphaev68-unknown-linux-gnu target: alphaev68-unknown-linux-gnu configured with: ../configure --enable-languages=c++ >Description: Loops with known count: int f (int *p) { int r = 0, i; for (i = 0; i < 4; ++i) r += p[i]; return r; } % gcc -funroll-all-loops -c -O3 test.c && objdump -d test.o 0000000000000000 : 0: 00 04 ff 47 clr v0 4: 03 04 ff 47 clr t2 8: 00 00 30 a0 ldl t0,0(a0) c: 02 30 60 40 addl t2,0x1,t1 10: 04 00 10 22 lda a0,4(a0) 14: a4 7d 40 40 cmple t1,0x3,t3 18: 03 30 40 40 addl t1,0x1,t2 1c: 00 00 01 40 addl v0,t0,v0 20: 13 00 80 e4 beq t3,70 24: 00 00 b0 a0 ldl t4,0(a0) 28: a4 7d 60 40 cmple t2,0x3,t3 2c: 04 00 10 22 lda a0,4(a0) 30: 03 30 60 40 addl t2,0x1,t2 34: 00 00 05 40 addl v0,t4,v0 38: 0d 00 80 e4 beq t3,70 3c: 00 00 f0 a0 ldl t6,0(a0) 40: a6 7d 60 40 cmple t2,0x3,t5 44: 04 00 10 22 lda a0,4(a0) 48: 03 30 60 40 addl t2,0x1,t2 4c: 00 00 07 40 addl v0,t6,v0 50: 07 00 c0 e4 beq t5,70 54: 00 00 30 a2 ldl a1,0(a0) 58: a8 7d 60 40 cmple t2,0x3,t7 5c: 04 00 10 22 lda a0,4(a0) 60: 00 00 11 40 addl v0,a1,v0 64: e8 ff 1f f5 bne t7,8 68: 1f 04 ff 47 nop 6c: 00 00 fe 2f unop 70: 01 80 fa 6b ret 74: 00 00 fe 2f unop 78: 1f 04 ff 47 nop 7c: 00 00 fe 2f unop gcc 3.2 generates this: 0000000000000000 : 0: 04 00 10 a0 ldl v0,4(a0) 4: 00 00 50 a0 ldl t1,0(a0) 8: 08 00 b0 a0 ldl t4,8(a0) c: 0c 00 90 a0 ldl t3,12(a0) 10: 03 00 40 40 addl t1,v0,t2 14: 01 00 65 40 addl t2,t4,t0 18: 00 00 24 40 addl t0,t3,v0 1c: 01 80 fa 6b ret Loops with unknown count: int g (int *p, int n) { int r = 0, i; for (i = 0; i < n; ++i) r += p[i]; return r; } % gcc -funroll-all-loops -c -O3 test.c && objdump -d test.o 0000000000000080 : 80: 00 04 ff 47 clr v0 84: 03 04 ff 47 clr t2 88: 19 00 20 ee ble a1,f0 8c: 00 00 30 a0 ldl t0,0(a0) 90: 02 30 60 40 addl t2,0x1,t1 94: 04 00 10 22 lda a0,4(a0) 98: a4 09 51 40 cmplt t1,a1,t3 9c: 03 30 40 40 addl t1,0x1,t2 a0: 00 00 01 40 addl v0,t0,v0 a4: 12 00 80 e4 beq t3,f0 a8: 00 00 b0 a0 ldl t4,0(a0) ac: a4 09 71 40 cmplt t2,a1,t3 b0: 04 00 10 22 lda a0,4(a0) b4: 03 30 60 40 addl t2,0x1,t2 b8: 00 00 05 40 addl v0,t4,v0 bc: 0c 00 80 e4 beq t3,f0 c0: 00 00 f0 a0 ldl t6,0(a0) c4: a6 09 71 40 cmplt t2,a1,t5 c8: 04 00 10 22 lda a0,4(a0) cc: 03 30 60 40 addl t2,0x1,t2 d0: 00 00 07 40 addl v0,t6,v0 d4: 06 00 c0 e4 beq t5,f0 d8: 00 00 50 a2 ldl a2,0(a0) dc: a8 09 71 40 cmplt t2,a1,t7 e0: 04 00 10 22 lda a0,4(a0) e4: 00 00 12 40 addl v0,a2,v0 e8: e8 ff 1f f5 bne t7,8c ec: 00 00 fe 2f unop f0: 01 80 fa 6b ret f4: 00 00 fe 2f unop f8: 1f 04 ff 47 nop fc: 00 00 fe 2f unop Well, that gains exactly nothing over not unrolling. Ideally, it should look more like (Compaq compiler output): 0000000000000030 : 30: 00 04 ff 47 clr v0 34: 28 00 20 ee ble a1,d8 38: 23 d1 20 42 subl a1,0x6,t2 3c: 02 04 ff 47 clr t1 40: a4 0d 71 40 cmple t2,a1,t3 44: a5 1d 60 40 cmple t2,0,t4 48: 04 01 85 44 andnot t3,t4,t3 4c: 00 00 fe 2f unop 50: 1b 00 80 e0 blbc t3,c0 54: 00 00 fe 2f unop 58: 00 00 fe 2f unop 5c: 00 00 fe 2f unop 60: 00 02 f0 a3 ldl zero,512(a0) 64: 00 00 d0 a0 ldl t5,0(a0) 68: 02 f0 40 40 addl t1,0x7,t1 6c: 1c 00 10 22 lda a0,28(a0) 70: e8 ff f0 a0 ldl t6,-24(a0) 74: ec ff 10 a1 ldl t7,-20(a0) 78: b7 09 43 40 cmplt t1,t2,t9 7c: f0 ff 50 a2 ldl a2,-16(a0) 80: f4 ff 70 a2 ldl a3,-12(a0) 84: f8 ff 90 a2 ldl a4,-8(a0) 88: fc ff b0 a2 ldl a5,-4(a0) 8c: 06 00 c7 40 addl t5,t6,t5 90: 08 00 12 41 addl t7,a2,t7 94: 13 00 74 42 addl a3,a4,a3 98: 06 00 06 41 addl t7,t5,t5 9c: 13 00 b3 42 addl a5,a3,a3 a0: 06 00 d3 40 addl t5,a3,t5 a4: 00 00 06 40 addl v0,t5,v0 a8: ed ff ff f6 bne t9,60 ac: b8 09 51 40 cmplt t1,a1,t10 b0: 09 00 00 e7 beq t10,d8 b4: 00 00 fe 2f unop b8: 00 00 fe 2f unop bc: 00 00 fe 2f unop c0: 00 00 30 a3 ldl t11,0(a0) c4: 02 30 40 40 addl t1,0x1,t1 c8: 04 00 10 22 lda a0,4(a0) cc: bb 09 51 40 cmplt t1,a1,t12 d0: 00 00 19 40 addl v0,t11,v0 d4: fa ff 7f f7 bne t12,c0 d8: 01 80 fa 6b ret >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: