From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kamil Iskra To: egcs@cygnus.com Subject: Loop optimizer misses simple optimisation? Date: Tue, 28 Apr 1998 07:23:00 -0000 Message-id: X-SW-Source: 1998-04/msg01099.html Hi, In a debate on comp.sys.amiga.programmer, a debate that I would sum up as "Does GCC really optimize so well as some people claim?" I came across a relatively simple source where GCC indeed does perform rather poorly: #define N (1UL<<22) #define eN N unsigned char a[N + 1]; main () { unsigned int i; for (i = 2; i <= (1UL << 11); i++) if (a[i]) { unsigned int j; for (j = 2; j <= eN / i; j++) a[i * j] = 0; } } The author used GCC 2.6.3, I checked with EGCS 1.0.2 and the problem is still there. Please have a look at what I get under m68k-linux, when compiling with -O2: .file "te2.c" .version "01.01" | GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release) (m68k-unknown-linux-gnu) compiled by GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release). | options passed: -O2 | options enabled: -fdefer-pop -fcse-follow-jumps -fcse-skip-blocks | -fexpensive-optimizations -fthread-jumps -fstrength-reduce -fpeephole | -fforce-mem -ffunction-cse -finline -fkeep-static-consts -fcaller-saves | -freg-struct-return -frerun-cse-after-loop -frerun-loop-opt -fcommon | -fverbose-asm -fgnu-linker -fregmove -falias-check -fargument-alias | -m68020 -mc68020 -mbitfield -m68881 -m68030 -m68332 -mcpu32 gcc2_compiled.: .text .align 2 .globl main .type main,@function main: link.w %a6,#0 movm.l #0x3820,-(%sp) moveq.l #2,%d2 | 9 movsi+1/2 move.l #a,%d4 | 105 movsi+1/1 lea a+2,%a2 | 109 movsi+1/1 moveq.l #4,%d3 | 114 movsi+1/2 .align 2 .L5: tst.b (%a2)+ | 23 tstqi+1 jbeq .L4 | 24 beq move.w #2,%a1 | 29 movsi+1/2 moveq.l #64,%d0 | 117 movsi+1/1 swap %d0 divu.l %d2,%d0 | 79 udivmodsi4 cmp.l %a1,%d0 | 80 cmpsi+1/1 jbcs .L4 | 81 bltu move.l %d3,%a0 | 124 movsi+1/1 add.l %d4,%a0 | 96 *addsi3_internal/2 .align 2 .L10: clr.b (%a0) | 43 movqi+1/3 add.l %d2,%a0 | 92 *addsi3_internal/2 addq.l #1,%a1 | 47 *addsi3_internal/2 moveq.l #64,%d0 | 120 movsi+1/1 swap %d0 divu.l %d2,%d0 | 32 udivmodsi4 cmp.l %a1,%d0 | 34 cmpsi+1/1 jbcc .L10 | 35 bgeu .L4: addq.l #2,%d3 | 111 *addsi3_internal/4 addq.l #1,%d2 | 61 *addsi3_internal/4 cmp.l #2048,%d2 | 13 cmpsi+1/2 jbls .L5 | 14 bleu movm.l (%sp)+,#0x41c unlk %a6 rts .Lfe1: .size main,.Lfe1-main .comm a,4194305,1 .ident "GCC: (GNU) egcs-2.90.27 980315 (egcs-1.0.2 release)" Have a look at insn 32: the division eN/i was not moved out of the innermost loop, even though it clearly can be done. I checked it on i486-linux and the problem is there, as well, unless I can't grok i486 assembler (which happens to be almost true, actually :-). Funny thing is that, on m68k, when I use -m68000, making udivmodsi4 unavailable, the optimisation is performed: the libcall is generated just once. Is this some sort of optimizer bug, or is this case more difficult than it looks? / Kamil Iskra AmigaOS Linux/i386 Linux/m68k \ | GeekGadgets GCC maintainer UNIX system administrator | | iskra@student.uci.agh.edu.pl kiskra@ernie.icslab.agh.edu.pl | \ kamil@dwd.interkom.pl http://student.uci.agh.edu.pl/~iskra /