Loop optimizer misses simple optimisation?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Loop optimizer misses simple optimisation?
@ 1998-04-28  7:23 Kamil Iskra
  1998-04-28 19:14 ` Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Kamil Iskra @ 1998-04-28  7:23 UTC (permalink / raw)
  To: egcs

Hi,

In a debate on comp.sys.amiga.programmer, a debate that I would sum up as
"Does GCC really optimize so well as some people claim?" I came across a
relatively simple source where GCC indeed does perform rather poorly:

#define N (1UL<<22)
#define eN N

unsigned char a[N + 1];

main ()
{
  unsigned int i;

  for (i = 2; i <= (1UL << 11); i++)
      if (a[i])
      {
	  unsigned int j;
	  for (j = 2; j <= eN / i; j++)
	      a[i * j] = 0;
      }
}

The author used GCC 2.6.3, I checked with EGCS 1.0.2 and the problem is
still there. Please have a look at what I get under m68k-linux, when
compiling with -O2:

	.file	"te2.c"
	.version	"01.01"
| GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release) (m68k-unknown-linux-gnu) compiled by GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release).
| options passed:  -O2
| options enabled:  -fdefer-pop -fcse-follow-jumps -fcse-skip-blocks
| -fexpensive-optimizations -fthread-jumps -fstrength-reduce -fpeephole
| -fforce-mem -ffunction-cse -finline -fkeep-static-consts -fcaller-saves
| -freg-struct-return -frerun-cse-after-loop -frerun-loop-opt -fcommon
| -fverbose-asm -fgnu-linker -fregmove -falias-check -fargument-alias
| -m68020 -mc68020 -mbitfield -m68881 -m68030 -m68332 -mcpu32

gcc2_compiled.:
.text
	.align 	2
.globl main
	.type	 main,@function
main:
	link.w %a6,#0
	movm.l #0x3820,-(%sp)
	moveq.l #2,%d2 | 9 movsi+1/2
	move.l #a,%d4 | 105 movsi+1/1
	lea a+2,%a2 | 109 movsi+1/1
	moveq.l #4,%d3 | 114 movsi+1/2
	.align 	2
.L5:
	tst.b (%a2)+ | 23 tstqi+1
	jbeq .L4 | 24 beq
	move.w #2,%a1 | 29 movsi+1/2
	moveq.l #64,%d0 | 117 movsi+1/1
	swap %d0
	divu.l %d2,%d0 | 79 udivmodsi4
	cmp.l %a1,%d0 | 80 cmpsi+1/1
	jbcs .L4 | 81 bltu
	move.l %d3,%a0 | 124 movsi+1/1
	add.l %d4,%a0 | 96 *addsi3_internal/2
	.align 	2
.L10:
	clr.b (%a0) | 43 movqi+1/3
	add.l %d2,%a0 | 92 *addsi3_internal/2
	addq.l #1,%a1 | 47 *addsi3_internal/2
	moveq.l #64,%d0 | 120 movsi+1/1
	swap %d0
	divu.l %d2,%d0 | 32 udivmodsi4
	cmp.l %a1,%d0 | 34 cmpsi+1/1
	jbcc .L10 | 35 bgeu
.L4:
	addq.l #2,%d3 | 111 *addsi3_internal/4
	addq.l #1,%d2 | 61 *addsi3_internal/4
	cmp.l #2048,%d2 | 13 cmpsi+1/2
	jbls .L5 | 14 bleu
	movm.l (%sp)+,#0x41c
	unlk %a6
	rts
.Lfe1:
	.size	 main,.Lfe1-main
	.comm	a,4194305,1
	.ident	"GCC: (GNU) egcs-2.90.27 980315 (egcs-1.0.2 release)"

Have a look at insn 32: the division eN/i was not moved out of the
innermost loop, even though it clearly can be done.

I checked it on i486-linux and the problem is there, as well, unless I
can't grok i486 assembler (which happens to be almost true, actually :-).

Funny thing is that, on m68k, when I use -m68000, making udivmodsi4
unavailable, the optimisation is performed: the libcall is generated just
once.

Is this some sort of optimizer bug, or is this case more difficult than it
looks?

/ Kamil Iskra    AmigaOS  Linux/i386  Linux/m68k               \
| GeekGadgets GCC maintainer   UNIX system administrator       |
| iskra@student.uci.agh.edu.pl  kiskra@ernie.icslab.agh.edu.pl |
\ kamil@dwd.interkom.pl   http://student.uci.agh.edu.pl/~iskra /


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Loop optimizer misses simple optimisation?
  1998-04-28  7:23 Loop optimizer misses simple optimisation? Kamil Iskra
@ 1998-04-28 19:14 ` Richard Henderson
  1998-04-29 14:48   ` PÃ¥l-Kristian Engstad
  1998-04-28 19:49 ` Michael Meissner
  1998-04-29 15:21 ` Jim Wilson
  2 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 1998-04-28 19:14 UTC (permalink / raw)
  To: Kamil Iskra; +Cc: egcs

On Tue, Apr 28, 1998 at 03:53:55PM +0200, Kamil Iskra wrote:
> In a debate on comp.sys.amiga.programmer, a debate that I would sum up as
> "Does GCC really optimize so well as some people claim?" I came across a
> relatively simple source where GCC indeed does perform rather poorly:

I havn't checked, but that is the kind of thing I was trying to address
with some loop optimizations back in January.  You might try

 http://www.cygnus.com/ml/egcs/1998-Jan/0796.html

and see if it improves things.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Loop optimizer misses simple optimisation?
  1998-04-28 19:14 ` Richard Henderson
@ 1998-04-29 14:48   ` PÃ¥l-Kristian Engstad
  1998-04-29 16:08     ` Richard Henderson
  0 siblings, 1 reply; 14+ messages in thread
From: PÃƒÂ¥l-Kristian Engstad @ 1998-04-29 14:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Kamil Iskra, egcs

Richard Henderson wrote:
> I havn't checked, but that is the kind of thing I was trying to address
> with some loop optimizations back in January.  You might try
> 
>  http://www.cygnus.com/ml/egcs/1998-Jan/0796.html
> 
> and see if it improves things.

What happened to this? Did it go into EGCS?

PKE.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Loop optimizer misses simple optimisation?
  1998-04-29 14:48   ` PÃ¥l-Kristian Engstad
@ 1998-04-29 16:08     ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 1998-04-29 16:08 UTC (permalink / raw)
  To: engstad; +Cc: Richard Henderson, Kamil Iskra, egcs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 364 bytes --]

On Wed, Apr 29, 1998 at 12:51:22PM -0700, PÃ¥l-Kristian Engstad wrote:
> What happened to this? Did it go into EGCS?

What happened?  I got distracted and wound up working on other things.

It never went in, though versions through #6 might be ready for public
use.  Version #7 and later are definitely flawed.

Perhaps I'll get back to it sometime soon...


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Loop optimizer misses simple optimisation?
  1998-04-28  7:23 Loop optimizer misses simple optimisation? Kamil Iskra
  1998-04-28 19:14 ` Richard Henderson
@ 1998-04-28 19:49 ` Michael Meissner
  1998-04-29 15:21 ` Jim Wilson
  2 siblings, 0 replies; 14+ messages in thread
From: Michael Meissner @ 1998-04-28 19:49 UTC (permalink / raw)
  To: kiskra; +Cc: egcs

Kamil Iskra writes:
| 
| Hi,
| 
| In a debate on comp.sys.amiga.programmer, a debate that I would sum up as
| "Does GCC really optimize so well as some people claim?" I came across a
| relatively simple source where GCC indeed does perform rather poorly:

It depends on the machine.  I believe that GCC doesn't do as good of a
job with udivmod operations where it produces two outputs, as on
machines where it has a more normal division instruction (with
possibly a remainder instruction).

| #define N (1UL<<22)
| #define eN N
| 
| unsigned char a[N + 1];
| 
| main ()
| {
|   unsigned int i;
| 
|   for (i = 2; i <= (1UL << 11); i++)
|       if (a[i])
|       {
| 	  unsigned int j;
| 	  for (j = 2; j <= eN / i; j++)
| 	      a[i * j] = 0;
|       }
| }

For example, the powerpc-eabi-gcc compiler generates:

	.file	"foo.c"

 # rs6000/powerpc options: -msdata=data -G 8
 # GNU C version egcs-2.91.24 19980418 (gcc2 ss-980401 experimental) (powerpc-eabi) compiled by GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release).
 # options passed:  -O2 -fverbose-asm
 # options enabled:  -fdefer-pop -fomit-frame-pointer -fcse-follow-jumps
 # -fcse-skip-blocks -fexpensive-optimizations -fthread-jumps
 # -fstrength-reduce -fpeephole -fforce-mem -ffunction-cse -finline
 # -fkeep-static-consts -fcaller-saves -fpcc-struct-return
 # -frerun-cse-after-loop -frerun-loop-opt -fschedule-insns
 # -fschedule-insns2 -fsched-interblock -fsched-spec -fsjlj-exceptions
 # -fcommon -fverbose-asm -fgnu-linker -fregmove -fargument-alias -mpowerpc
 # -mnew-mnemonics -meabi -mcall-sysv -msdata=data

gcc2_compiled.:
	.section ".text"
	.align 2
	.globl main
	.type	 main,@function
main:
	stwu 1,-8(1)
	mflr 0
	stw 0,12(1)
	bl __eabi
	addis 9,0,a@ha
	addi 9,9,a@l
	li 8,2
	lis 3,0x40
	li 5,0
	add 6,9,8
	mr 4,3
	li 7,4
.L5:
	lbz 0,0(6)
	addi 6,6,1
	cmpwi 0,0,0
	bc 12,2,.L4
	divwu 0,3,8
	li 10,2
	cmplw 0,10,0
	bc 12,1,.L4
	divwu 0,4,8
	add 11,7,9
.L10:
	addi 10,10,1
	cmplw 0,10,0
	stb 5,0(11)
	add 11,11,8
	bc 4,1,.L10
.L4:
	addi 8,8,1
	cmplwi 0,8,2048
	addi 7,7,2
	bc 4,1,.L5
	lwz 0,12(1)
	mtlr 0
	addi 1,1,8
	blr
.Lfe1:
	.size	 main,.Lfe1-main
	.comm	a,4194305,1
	.ident	"GCC: (GNU) egcs-2.91.24 19980418 (gcc2 ss-980401 experimental)"

-- 
Michael Meissner, Cygnus Solutions (Massachusetts office)
4th floor, 955 Massachusetts Avenue, Cambridge, MA 02139, USA
meissner@cygnus.com,	617-354-5416 (office),	617-354-7161 (fax)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Loop optimizer misses simple optimisation?
  1998-04-28  7:23 Loop optimizer misses simple optimisation? Kamil Iskra
  1998-04-28 19:14 ` Richard Henderson
  1998-04-28 19:49 ` Michael Meissner
@ 1998-04-29 15:21 ` Jim Wilson
  1998-05-02  6:13   ` Kamil Iskra
  2 siblings, 1 reply; 14+ messages in thread
From: Jim Wilson @ 1998-04-29 15:21 UTC (permalink / raw)
  To: Kamil Iskra; +Cc: egcs

I checked, and Mike Meissner's comment is correct.  The problem here is that
the m68k divide instruction computes two values (quotient and remainder),
giving us:
(insn 83 33 84 (parallel[
            (set (reg:SI 34)
                (udiv:SI (const_int 4194304)
                    (reg/v:SI 29)))
            (set (reg:SI 35)
                (umod:SI (const_int 4194304)
                    (reg/v:SI 29)))
        ] ) -1 (nil)
    (nil))

In scan_loop in loop.c, the code that recognizes loop invariant instructions
does this:

      if (GET_CODE (p) == INSN
          && (set = single_set (p))
          && GET_CODE (SET_DEST (set)) == REG
          && ! may_not_optimize[REGNO (SET_DEST (set))])

The single_set checks means that insns that compute two values are never
considered for loop-invariant-code-motion, hence we won't move a m68k
divide out of a loop.

We would need to generalize the LICM code to handle insns with multiple
sets to fix this.

Jim

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Loop optimizer misses simple optimisation?
  1998-04-29 15:21 ` Jim Wilson
@ 1998-05-02  6:13   ` Kamil Iskra
  1998-05-03 14:15     ` Jeffrey A Law
  0 siblings, 1 reply; 14+ messages in thread
From: Kamil Iskra @ 1998-05-02  6:13 UTC (permalink / raw)
  To: Jim Wilson; +Cc: egcs

On Wed, 29 Apr 1998, Jim Wilson wrote:

> The single_set checks means that insns that compute two values are never
> considered for loop-invariant-code-motion, hence we won't move a m68k
> divide out of a loop.

I see. But please have a look at udivmodsi4 in m68k.md:

(define_insn "udivmodsi4"
  [(set (match_operand:SI 0 "general_operand" "=d")
	(udiv:SI (match_operand:SI 1 "general_operand" "0")
		 (match_operand:SI 2 "general_operand" "dmsK")))
   (set (match_operand:SI 3 "general_operand" "=d")
	(umod:SI (match_dup 1) (match_dup 2)))]
  "TARGET_68020 && !TARGET_5200"
  "*
{
  if (find_reg_note (insn, REG_UNUSED, operands[3]))
    return \"divu%.l %2,%0\";
  else
    return \"divul%.l %2,%3:%0\";
}")

I don't know enough about division instructions available in m68020+, but
it looks to me from the above that there is an instruction that doesn't
generate the reminder, and that it is actually generated when the reminder
is not needed. Wouldn't it be better to move it to a separate, new udivsi3
pattern, instead of relying on REG_UNUSED, which is only available when
optimising, I suppose? This would allow the LICM code to do a better job,
wouldn't it?

/ Kamil Iskra    AmigaOS  Linux/i386  Linux/m68k               \
| GeekGadgets GCC maintainer   UNIX system administrator       |
| iskra@student.uci.agh.edu.pl  kiskra@ernie.icslab.agh.edu.pl |
\ kamil@dwd.interkom.pl   http://student.uci.agh.edu.pl/~iskra /

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Loop optimizer misses simple optimisation?
  1998-05-02  6:13   ` Kamil Iskra
@ 1998-05-03 14:15     ` Jeffrey A Law
  1998-05-03 15:32       ` Jim Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Jeffrey A Law @ 1998-05-03 14:15 UTC (permalink / raw)
  To: Kamil Iskra; +Cc: Jim Wilson, egcs

  In message < Pine.LNX.3.96.980502143213.581F-100000@jinks.home >you write:
  > I don't know enough about division instructions available in m68020+, but
  > it looks to me from the above that there is an instruction that doesn't
  > generate the reminder, and that it is actually generated when the reminder
  > is not needed. Wouldn't it be better to move it to a separate, new udivsi3
  > pattern, instead of relying on REG_UNUSED, which is only available when
  > optimising, I suppose? This would allow the LICM code to do a better job,
  > wouldn't it?
In general, GCC does a poor job of certain optimizations for insns
that have multiple outputs (including LICM, contant propagation, etc).

It might be worth an experiment to split up the divmod insns into
div, mod and a unnamed divmod insn (for the combiner).  Though my
experience with other ports tends to make me believe the best
overall code is produced with the divmod insn.

However, a div or mod operation can not generally be moved out of
a loop unless you know it won't trap (divide by zero).

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Loop optimizer misses simple optimisation?
  1998-05-03 14:15     ` Jeffrey A Law
@ 1998-05-03 15:32       ` Jim Wilson
  1998-05-03 20:10         ` Jeffrey A Law
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Wilson @ 1998-05-03 15:32 UTC (permalink / raw)
  To: law; +Cc: Kamil Iskra, egcs

	However, a div or mod operation can not generally be moved out of
	a loop unless you know it won't trap (divide by zero).

If an instruction might trap, then it will be moved out of a loop if we
can prove that it will be executed at least once.

Jim

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Loop optimizer misses simple optimisation?
  1998-05-03 15:32       ` Jim Wilson
@ 1998-05-03 20:10         ` Jeffrey A Law
  1998-05-06  7:42           ` Was: " John Vickers
  0 siblings, 1 reply; 14+ messages in thread
From: Jeffrey A Law @ 1998-05-03 20:10 UTC (permalink / raw)
  To: Jim Wilson; +Cc: Kamil Iskra, egcs

  In message < 199805032232.PAA08246@rtl.cygnus.com >you write:
  > 	However, a div or mod operation can not generally be moved out of
  > 	a loop unless you know it won't trap (divide by zero).
  > 
  > If an instruction might trap, then it will be moved out of a loop if we
  > can prove that it will be executed at least once.
Ah, yes, you're right.  I forgot about that tidbit.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Was: Re: Loop optimizer misses simple optimisation?
  1998-05-03 20:10         ` Jeffrey A Law
@ 1998-05-06  7:42           ` John Vickers
  1998-05-06  9:19             ` Jeffrey A Law
  1998-05-06 11:36             ` Joern Rennecke
  0 siblings, 2 replies; 14+ messages in thread
From: John Vickers @ 1998-05-06  7:42 UTC (permalink / raw)
  To: egcs

On machines which do not have a divmod insn for a particular mode,
could we still recognise divmod, but generate a multiply & subtract
for the mod ?

So:

int a, b, c, d;
...
  c = b / c;
  d = b % c;

becomes:

  c = a / b;
  d = a - c * b;  

This arises on number of targets in SImode, and almost
all targets in DImode.  As I read the C docs, the behaviour
of both operations is undefined if the divisor is zero.

John Vickers.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Was: Re: Loop optimizer misses simple optimisation?
  1998-05-06  7:42           ` Was: " John Vickers
@ 1998-05-06  9:19             ` Jeffrey A Law
  1998-05-06 11:36             ` Joern Rennecke
  1 sibling, 0 replies; 14+ messages in thread
From: Jeffrey A Law @ 1998-05-06  9:19 UTC (permalink / raw)
  To: John Vickers; +Cc: egcs

  In message < Marcel-1.26-0506144150-345LJLo@ether120.acorn.co.uk >you write:
  > 
  > On machines which do not have a divmod insn for a particular mode,
  > could we still recognise divmod, but generate a multiply & subtract
  > for the mod ?
Yes, gcc already does this in some cases.  See expmed.c::expand_divmod

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Was: Re: Loop optimizer misses simple optimisation?
  1998-05-06  7:42           ` Was: " John Vickers
  1998-05-06  9:19             ` Jeffrey A Law
@ 1998-05-06 11:36             ` Joern Rennecke
  1998-05-07  5:15               ` John Vickers
  1 sibling, 1 reply; 14+ messages in thread
From: Joern Rennecke @ 1998-05-06 11:36 UTC (permalink / raw)
  To: John Vickers; +Cc: egcs

> int a, b, c, d;
> ...
>   c = b / c;
>   d = b % c;
>   
> becomes:
> 
>   c = a / b;
>   d = a - c * b;  

This would have to depend on the cost of the modulo operation - and if b
is constant or can be made constant by constant-propagation, this also
ight make a difference to the cost.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Was: Re: Loop optimizer misses simple optimisation?
  1998-05-06 11:36             ` Joern Rennecke
@ 1998-05-07  5:15               ` John Vickers
  0 siblings, 0 replies; 14+ messages in thread
From: John Vickers @ 1998-05-07  5:15 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: egcs

On Wed 06 May, Joern Rennecke wrote:
> > int a, b, c, d;
> > ...
> >   c = b / c;
> >   d = b % c;
> >   
> > becomes:
> > 
> >   c = a / b;
> >   d = a - c * b;  

Ho hum.  Perhaps the transformation above is not necessarily
to the advantage of the user. 

> This would have to depend on the cost of the modulo operation - and if b
> is constant or can be made constant by constant-propagation, this also
> ight make a difference to the cost.

Yes indeed.  We have a lot of ways of skinning the cat,
and we should check the costs, assuming sensible costs
are available ;-).

I should say that I haven't investigated what egcs is doing
in more than a rather cursory way.

I think I was over-generalising without checking my facts -
the targets I'm interested in don't have either SImode divide or divmod.
If the div needs a libcall, it looks like expmed.c::expand_divmod
gives up on trying to optimise the mod.

This is what I was really getting at.

By the way, it seems like there are a few odd tricks which expand_divmod
misses e.g. divisor so big that repeated subtraction is efficient,
or mod when the divisor is (2^n +/-1)*2^m.

John.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~1998-05-07  5:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-04-28  7:23 Loop optimizer misses simple optimisation? Kamil Iskra
1998-04-28 19:14 ` Richard Henderson
1998-04-29 14:48   ` PÃ¥l-Kristian Engstad
1998-04-29 16:08     ` Richard Henderson
1998-04-28 19:49 ` Michael Meissner
1998-04-29 15:21 ` Jim Wilson
1998-05-02  6:13   ` Kamil Iskra
1998-05-03 14:15     ` Jeffrey A Law
1998-05-03 15:32       ` Jim Wilson
1998-05-03 20:10         ` Jeffrey A Law
1998-05-06  7:42           ` Was: " John Vickers
1998-05-06  9:19             ` Jeffrey A Law
1998-05-06 11:36             ` Joern Rennecke
1998-05-07  5:15               ` John Vickers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).