From mboxrd@z Thu Jan  1 00:00:00 1970
From: Kamil Iskra <kiskra@ernie.icslab.agh.edu.pl>
To: egcs@cygnus.com
Subject: Loop optimizer misses simple optimisation?
Date: Tue, 28 Apr 1998 07:23:00 -0000
Message-id: <Pine.GSO.3.95.980428154521.25424H-100000@ernie>
X-SW-Source: 1998-04/msg01099.html

Hi,

In a debate on comp.sys.amiga.programmer, a debate that I would sum up as
"Does GCC really optimize so well as some people claim?" I came across a
relatively simple source where GCC indeed does perform rather poorly:

#define N (1UL<<22)
#define eN N

unsigned char a[N + 1];

main ()
{
  unsigned int i;

  for (i = 2; i <= (1UL << 11); i++)
      if (a[i])
      {
	  unsigned int j;
	  for (j = 2; j <= eN / i; j++)
	      a[i * j] = 0;
      }
}

The author used GCC 2.6.3, I checked with EGCS 1.0.2 and the problem is
still there. Please have a look at what I get under m68k-linux, when
compiling with -O2:

	.file	"te2.c"
	.version	"01.01"
| GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release) (m68k-unknown-linux-gnu) compiled by GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release).
| options passed:  -O2
| options enabled:  -fdefer-pop -fcse-follow-jumps -fcse-skip-blocks
| -fexpensive-optimizations -fthread-jumps -fstrength-reduce -fpeephole
| -fforce-mem -ffunction-cse -finline -fkeep-static-consts -fcaller-saves
| -freg-struct-return -frerun-cse-after-loop -frerun-loop-opt -fcommon
| -fverbose-asm -fgnu-linker -fregmove -falias-check -fargument-alias
| -m68020 -mc68020 -mbitfield -m68881 -m68030 -m68332 -mcpu32

gcc2_compiled.:
.text
	.align 	2
.globl main
	.type	 main,@function
main:
	link.w %a6,#0
	movm.l #0x3820,-(%sp)
	moveq.l #2,%d2 | 9 movsi+1/2
	move.l #a,%d4 | 105 movsi+1/1
	lea a+2,%a2 | 109 movsi+1/1
	moveq.l #4,%d3 | 114 movsi+1/2
	.align 	2
.L5:
	tst.b (%a2)+ | 23 tstqi+1
	jbeq .L4 | 24 beq
	move.w #2,%a1 | 29 movsi+1/2
	moveq.l #64,%d0 | 117 movsi+1/1
	swap %d0
	divu.l %d2,%d0 | 79 udivmodsi4
	cmp.l %a1,%d0 | 80 cmpsi+1/1
	jbcs .L4 | 81 bltu
	move.l %d3,%a0 | 124 movsi+1/1
	add.l %d4,%a0 | 96 *addsi3_internal/2
	.align 	2
.L10:
	clr.b (%a0) | 43 movqi+1/3
	add.l %d2,%a0 | 92 *addsi3_internal/2
	addq.l #1,%a1 | 47 *addsi3_internal/2
	moveq.l #64,%d0 | 120 movsi+1/1
	swap %d0
	divu.l %d2,%d0 | 32 udivmodsi4
	cmp.l %a1,%d0 | 34 cmpsi+1/1
	jbcc .L10 | 35 bgeu
.L4:
	addq.l #2,%d3 | 111 *addsi3_internal/4
	addq.l #1,%d2 | 61 *addsi3_internal/4
	cmp.l #2048,%d2 | 13 cmpsi+1/2
	jbls .L5 | 14 bleu
	movm.l (%sp)+,#0x41c
	unlk %a6
	rts
.Lfe1:
	.size	 main,.Lfe1-main
	.comm	a,4194305,1
	.ident	"GCC: (GNU) egcs-2.90.27 980315 (egcs-1.0.2 release)"

Have a look at insn 32: the division eN/i was not moved out of the
innermost loop, even though it clearly can be done.

I checked it on i486-linux and the problem is there, as well, unless I
can't grok i486 assembler (which happens to be almost true, actually :-).

Funny thing is that, on m68k, when I use -m68000, making udivmodsi4
unavailable, the optimisation is performed: the libcall is generated just
once.

Is this some sort of optimizer bug, or is this case more difficult than it
looks?

/ Kamil Iskra    AmigaOS  Linux/i386  Linux/m68k               \
| GeekGadgets GCC maintainer   UNIX system administrator       |
| iskra@student.uci.agh.edu.pl  kiskra@ernie.icslab.agh.edu.pl |
\ kamil@dwd.interkom.pl   http://student.uci.agh.edu.pl/~iskra /