* Loop optimizer misses simple optimisation?
@ 1998-04-28 7:23 Kamil Iskra
1998-04-28 19:14 ` Richard Henderson
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Kamil Iskra @ 1998-04-28 7:23 UTC (permalink / raw)
To: egcs
Hi,
In a debate on comp.sys.amiga.programmer, a debate that I would sum up as
"Does GCC really optimize so well as some people claim?" I came across a
relatively simple source where GCC indeed does perform rather poorly:
#define N (1UL<<22)
#define eN N
unsigned char a[N + 1];
main ()
{
unsigned int i;
for (i = 2; i <= (1UL << 11); i++)
if (a[i])
{
unsigned int j;
for (j = 2; j <= eN / i; j++)
a[i * j] = 0;
}
}
The author used GCC 2.6.3, I checked with EGCS 1.0.2 and the problem is
still there. Please have a look at what I get under m68k-linux, when
compiling with -O2:
.file "te2.c"
.version "01.01"
| GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release) (m68k-unknown-linux-gnu) compiled by GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release).
| options passed: -O2
| options enabled: -fdefer-pop -fcse-follow-jumps -fcse-skip-blocks
| -fexpensive-optimizations -fthread-jumps -fstrength-reduce -fpeephole
| -fforce-mem -ffunction-cse -finline -fkeep-static-consts -fcaller-saves
| -freg-struct-return -frerun-cse-after-loop -frerun-loop-opt -fcommon
| -fverbose-asm -fgnu-linker -fregmove -falias-check -fargument-alias
| -m68020 -mc68020 -mbitfield -m68881 -m68030 -m68332 -mcpu32
gcc2_compiled.:
.text
.align 2
.globl main
.type main,@function
main:
link.w %a6,#0
movm.l #0x3820,-(%sp)
moveq.l #2,%d2 | 9 movsi+1/2
move.l #a,%d4 | 105 movsi+1/1
lea a+2,%a2 | 109 movsi+1/1
moveq.l #4,%d3 | 114 movsi+1/2
.align 2
.L5:
tst.b (%a2)+ | 23 tstqi+1
jbeq .L4 | 24 beq
move.w #2,%a1 | 29 movsi+1/2
moveq.l #64,%d0 | 117 movsi+1/1
swap %d0
divu.l %d2,%d0 | 79 udivmodsi4
cmp.l %a1,%d0 | 80 cmpsi+1/1
jbcs .L4 | 81 bltu
move.l %d3,%a0 | 124 movsi+1/1
add.l %d4,%a0 | 96 *addsi3_internal/2
.align 2
.L10:
clr.b (%a0) | 43 movqi+1/3
add.l %d2,%a0 | 92 *addsi3_internal/2
addq.l #1,%a1 | 47 *addsi3_internal/2
moveq.l #64,%d0 | 120 movsi+1/1
swap %d0
divu.l %d2,%d0 | 32 udivmodsi4
cmp.l %a1,%d0 | 34 cmpsi+1/1
jbcc .L10 | 35 bgeu
.L4:
addq.l #2,%d3 | 111 *addsi3_internal/4
addq.l #1,%d2 | 61 *addsi3_internal/4
cmp.l #2048,%d2 | 13 cmpsi+1/2
jbls .L5 | 14 bleu
movm.l (%sp)+,#0x41c
unlk %a6
rts
.Lfe1:
.size main,.Lfe1-main
.comm a,4194305,1
.ident "GCC: (GNU) egcs-2.90.27 980315 (egcs-1.0.2 release)"
Have a look at insn 32: the division eN/i was not moved out of the
innermost loop, even though it clearly can be done.
I checked it on i486-linux and the problem is there, as well, unless I
can't grok i486 assembler (which happens to be almost true, actually :-).
Funny thing is that, on m68k, when I use -m68000, making udivmodsi4
unavailable, the optimisation is performed: the libcall is generated just
once.
Is this some sort of optimizer bug, or is this case more difficult than it
looks?
/ Kamil Iskra AmigaOS Linux/i386 Linux/m68k \
| GeekGadgets GCC maintainer UNIX system administrator |
| iskra@student.uci.agh.edu.pl kiskra@ernie.icslab.agh.edu.pl |
\ kamil@dwd.interkom.pl http://student.uci.agh.edu.pl/~iskra /
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation?
1998-04-28 7:23 Loop optimizer misses simple optimisation? Kamil Iskra
@ 1998-04-28 19:14 ` Richard Henderson
1998-04-29 14:48 ` PÃ¥l-Kristian Engstad
1998-04-28 19:49 ` Michael Meissner
1998-04-29 15:21 ` Jim Wilson
2 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 1998-04-28 19:14 UTC (permalink / raw)
To: Kamil Iskra; +Cc: egcs
On Tue, Apr 28, 1998 at 03:53:55PM +0200, Kamil Iskra wrote:
> In a debate on comp.sys.amiga.programmer, a debate that I would sum up as
> "Does GCC really optimize so well as some people claim?" I came across a
> relatively simple source where GCC indeed does perform rather poorly:
I havn't checked, but that is the kind of thing I was trying to address
with some loop optimizations back in January. You might try
http://www.cygnus.com/ml/egcs/1998-Jan/0796.html
and see if it improves things.
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Loop optimizer misses simple optimisation?
1998-04-28 7:23 Loop optimizer misses simple optimisation? Kamil Iskra
1998-04-28 19:14 ` Richard Henderson
@ 1998-04-28 19:49 ` Michael Meissner
1998-04-29 15:21 ` Jim Wilson
2 siblings, 0 replies; 14+ messages in thread
From: Michael Meissner @ 1998-04-28 19:49 UTC (permalink / raw)
To: kiskra; +Cc: egcs
Kamil Iskra writes:
|
| Hi,
|
| In a debate on comp.sys.amiga.programmer, a debate that I would sum up as
| "Does GCC really optimize so well as some people claim?" I came across a
| relatively simple source where GCC indeed does perform rather poorly:
It depends on the machine. I believe that GCC doesn't do as good of a
job with udivmod operations where it produces two outputs, as on
machines where it has a more normal division instruction (with
possibly a remainder instruction).
| #define N (1UL<<22)
| #define eN N
|
| unsigned char a[N + 1];
|
| main ()
| {
| unsigned int i;
|
| for (i = 2; i <= (1UL << 11); i++)
| if (a[i])
| {
| unsigned int j;
| for (j = 2; j <= eN / i; j++)
| a[i * j] = 0;
| }
| }
For example, the powerpc-eabi-gcc compiler generates:
.file "foo.c"
# rs6000/powerpc options: -msdata=data -G 8
# GNU C version egcs-2.91.24 19980418 (gcc2 ss-980401 experimental) (powerpc-eabi) compiled by GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release).
# options passed: -O2 -fverbose-asm
# options enabled: -fdefer-pop -fomit-frame-pointer -fcse-follow-jumps
# -fcse-skip-blocks -fexpensive-optimizations -fthread-jumps
# -fstrength-reduce -fpeephole -fforce-mem -ffunction-cse -finline
# -fkeep-static-consts -fcaller-saves -fpcc-struct-return
# -frerun-cse-after-loop -frerun-loop-opt -fschedule-insns
# -fschedule-insns2 -fsched-interblock -fsched-spec -fsjlj-exceptions
# -fcommon -fverbose-asm -fgnu-linker -fregmove -fargument-alias -mpowerpc
# -mnew-mnemonics -meabi -mcall-sysv -msdata=data
gcc2_compiled.:
.section ".text"
.align 2
.globl main
.type main,@function
main:
stwu 1,-8(1)
mflr 0
stw 0,12(1)
bl __eabi
addis 9,0,a@ha
addi 9,9,a@l
li 8,2
lis 3,0x40
li 5,0
add 6,9,8
mr 4,3
li 7,4
.L5:
lbz 0,0(6)
addi 6,6,1
cmpwi 0,0,0
bc 12,2,.L4
divwu 0,3,8
li 10,2
cmplw 0,10,0
bc 12,1,.L4
divwu 0,4,8
add 11,7,9
.L10:
addi 10,10,1
cmplw 0,10,0
stb 5,0(11)
add 11,11,8
bc 4,1,.L10
.L4:
addi 8,8,1
cmplwi 0,8,2048
addi 7,7,2
bc 4,1,.L5
lwz 0,12(1)
mtlr 0
addi 1,1,8
blr
.Lfe1:
.size main,.Lfe1-main
.comm a,4194305,1
.ident "GCC: (GNU) egcs-2.91.24 19980418 (gcc2 ss-980401 experimental)"
--
Michael Meissner, Cygnus Solutions (Massachusetts office)
4th floor, 955 Massachusetts Avenue, Cambridge, MA 02139, USA
meissner@cygnus.com, 617-354-5416 (office), 617-354-7161 (fax)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation?
1998-04-28 19:14 ` Richard Henderson
@ 1998-04-29 14:48 ` PÃ¥l-Kristian Engstad
1998-04-29 16:08 ` Richard Henderson
0 siblings, 1 reply; 14+ messages in thread
From: PÃÂ¥l-Kristian Engstad @ 1998-04-29 14:48 UTC (permalink / raw)
To: Richard Henderson; +Cc: Kamil Iskra, egcs
Richard Henderson wrote:
> I havn't checked, but that is the kind of thing I was trying to address
> with some loop optimizations back in January. You might try
>
> http://www.cygnus.com/ml/egcs/1998-Jan/0796.html
>
> and see if it improves things.
What happened to this? Did it go into EGCS?
PKE.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation?
1998-04-28 7:23 Loop optimizer misses simple optimisation? Kamil Iskra
1998-04-28 19:14 ` Richard Henderson
1998-04-28 19:49 ` Michael Meissner
@ 1998-04-29 15:21 ` Jim Wilson
1998-05-02 6:13 ` Kamil Iskra
2 siblings, 1 reply; 14+ messages in thread
From: Jim Wilson @ 1998-04-29 15:21 UTC (permalink / raw)
To: Kamil Iskra; +Cc: egcs
I checked, and Mike Meissner's comment is correct. The problem here is that
the m68k divide instruction computes two values (quotient and remainder),
giving us:
(insn 83 33 84 (parallel[
(set (reg:SI 34)
(udiv:SI (const_int 4194304)
(reg/v:SI 29)))
(set (reg:SI 35)
(umod:SI (const_int 4194304)
(reg/v:SI 29)))
] ) -1 (nil)
(nil))
In scan_loop in loop.c, the code that recognizes loop invariant instructions
does this:
if (GET_CODE (p) == INSN
&& (set = single_set (p))
&& GET_CODE (SET_DEST (set)) == REG
&& ! may_not_optimize[REGNO (SET_DEST (set))])
The single_set checks means that insns that compute two values are never
considered for loop-invariant-code-motion, hence we won't move a m68k
divide out of a loop.
We would need to generalize the LICM code to handle insns with multiple
sets to fix this.
Jim
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation?
1998-04-29 14:48 ` PÃ¥l-Kristian Engstad
@ 1998-04-29 16:08 ` Richard Henderson
0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 1998-04-29 16:08 UTC (permalink / raw)
To: engstad; +Cc: Richard Henderson, Kamil Iskra, egcs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 364 bytes --]
On Wed, Apr 29, 1998 at 12:51:22PM -0700, PÃ¥l-Kristian Engstad wrote:
> What happened to this? Did it go into EGCS?
What happened? I got distracted and wound up working on other things.
It never went in, though versions through #6 might be ready for public
use. Version #7 and later are definitely flawed.
Perhaps I'll get back to it sometime soon...
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation?
1998-04-29 15:21 ` Jim Wilson
@ 1998-05-02 6:13 ` Kamil Iskra
1998-05-03 14:15 ` Jeffrey A Law
0 siblings, 1 reply; 14+ messages in thread
From: Kamil Iskra @ 1998-05-02 6:13 UTC (permalink / raw)
To: Jim Wilson; +Cc: egcs
On Wed, 29 Apr 1998, Jim Wilson wrote:
> The single_set checks means that insns that compute two values are never
> considered for loop-invariant-code-motion, hence we won't move a m68k
> divide out of a loop.
I see. But please have a look at udivmodsi4 in m68k.md:
(define_insn "udivmodsi4"
[(set (match_operand:SI 0 "general_operand" "=d")
(udiv:SI (match_operand:SI 1 "general_operand" "0")
(match_operand:SI 2 "general_operand" "dmsK")))
(set (match_operand:SI 3 "general_operand" "=d")
(umod:SI (match_dup 1) (match_dup 2)))]
"TARGET_68020 && !TARGET_5200"
"*
{
if (find_reg_note (insn, REG_UNUSED, operands[3]))
return \"divu%.l %2,%0\";
else
return \"divul%.l %2,%3:%0\";
}")
I don't know enough about division instructions available in m68020+, but
it looks to me from the above that there is an instruction that doesn't
generate the reminder, and that it is actually generated when the reminder
is not needed. Wouldn't it be better to move it to a separate, new udivsi3
pattern, instead of relying on REG_UNUSED, which is only available when
optimising, I suppose? This would allow the LICM code to do a better job,
wouldn't it?
/ Kamil Iskra AmigaOS Linux/i386 Linux/m68k \
| GeekGadgets GCC maintainer UNIX system administrator |
| iskra@student.uci.agh.edu.pl kiskra@ernie.icslab.agh.edu.pl |
\ kamil@dwd.interkom.pl http://student.uci.agh.edu.pl/~iskra /
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation?
1998-05-02 6:13 ` Kamil Iskra
@ 1998-05-03 14:15 ` Jeffrey A Law
1998-05-03 15:32 ` Jim Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Jeffrey A Law @ 1998-05-03 14:15 UTC (permalink / raw)
To: Kamil Iskra; +Cc: Jim Wilson, egcs
In message < Pine.LNX.3.96.980502143213.581F-100000@jinks.home >you write:
> I don't know enough about division instructions available in m68020+, but
> it looks to me from the above that there is an instruction that doesn't
> generate the reminder, and that it is actually generated when the reminder
> is not needed. Wouldn't it be better to move it to a separate, new udivsi3
> pattern, instead of relying on REG_UNUSED, which is only available when
> optimising, I suppose? This would allow the LICM code to do a better job,
> wouldn't it?
In general, GCC does a poor job of certain optimizations for insns
that have multiple outputs (including LICM, contant propagation, etc).
It might be worth an experiment to split up the divmod insns into
div, mod and a unnamed divmod insn (for the combiner). Though my
experience with other ports tends to make me believe the best
overall code is produced with the divmod insn.
However, a div or mod operation can not generally be moved out of
a loop unless you know it won't trap (divide by zero).
jeff
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation?
1998-05-03 14:15 ` Jeffrey A Law
@ 1998-05-03 15:32 ` Jim Wilson
1998-05-03 20:10 ` Jeffrey A Law
0 siblings, 1 reply; 14+ messages in thread
From: Jim Wilson @ 1998-05-03 15:32 UTC (permalink / raw)
To: law; +Cc: Kamil Iskra, egcs
However, a div or mod operation can not generally be moved out of
a loop unless you know it won't trap (divide by zero).
If an instruction might trap, then it will be moved out of a loop if we
can prove that it will be executed at least once.
Jim
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation?
1998-05-03 15:32 ` Jim Wilson
@ 1998-05-03 20:10 ` Jeffrey A Law
1998-05-06 7:42 ` Was: " John Vickers
0 siblings, 1 reply; 14+ messages in thread
From: Jeffrey A Law @ 1998-05-03 20:10 UTC (permalink / raw)
To: Jim Wilson; +Cc: Kamil Iskra, egcs
In message < 199805032232.PAA08246@rtl.cygnus.com >you write:
> However, a div or mod operation can not generally be moved out of
> a loop unless you know it won't trap (divide by zero).
>
> If an instruction might trap, then it will be moved out of a loop if we
> can prove that it will be executed at least once.
Ah, yes, you're right. I forgot about that tidbit.
jeff
^ permalink raw reply [flat|nested] 14+ messages in thread
* Was: Re: Loop optimizer misses simple optimisation?
1998-05-03 20:10 ` Jeffrey A Law
@ 1998-05-06 7:42 ` John Vickers
1998-05-06 9:19 ` Jeffrey A Law
1998-05-06 11:36 ` Joern Rennecke
0 siblings, 2 replies; 14+ messages in thread
From: John Vickers @ 1998-05-06 7:42 UTC (permalink / raw)
To: egcs
On machines which do not have a divmod insn for a particular mode,
could we still recognise divmod, but generate a multiply & subtract
for the mod ?
So:
int a, b, c, d;
...
c = b / c;
d = b % c;
becomes:
c = a / b;
d = a - c * b;
This arises on number of targets in SImode, and almost
all targets in DImode. As I read the C docs, the behaviour
of both operations is undefined if the divisor is zero.
John Vickers.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Was: Re: Loop optimizer misses simple optimisation?
1998-05-06 7:42 ` Was: " John Vickers
@ 1998-05-06 9:19 ` Jeffrey A Law
1998-05-06 11:36 ` Joern Rennecke
1 sibling, 0 replies; 14+ messages in thread
From: Jeffrey A Law @ 1998-05-06 9:19 UTC (permalink / raw)
To: John Vickers; +Cc: egcs
In message < Marcel-1.26-0506144150-345LJLo@ether120.acorn.co.uk >you write:
>
> On machines which do not have a divmod insn for a particular mode,
> could we still recognise divmod, but generate a multiply & subtract
> for the mod ?
Yes, gcc already does this in some cases. See expmed.c::expand_divmod
jeff
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Was: Re: Loop optimizer misses simple optimisation?
1998-05-06 7:42 ` Was: " John Vickers
1998-05-06 9:19 ` Jeffrey A Law
@ 1998-05-06 11:36 ` Joern Rennecke
1998-05-07 5:15 ` John Vickers
1 sibling, 1 reply; 14+ messages in thread
From: Joern Rennecke @ 1998-05-06 11:36 UTC (permalink / raw)
To: John Vickers; +Cc: egcs
> int a, b, c, d;
> ...
> c = b / c;
> d = b % c;
>
> becomes:
>
> c = a / b;
> d = a - c * b;
This would have to depend on the cost of the modulo operation - and if b
is constant or can be made constant by constant-propagation, this also
ight make a difference to the cost.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Was: Re: Loop optimizer misses simple optimisation?
1998-05-06 11:36 ` Joern Rennecke
@ 1998-05-07 5:15 ` John Vickers
0 siblings, 0 replies; 14+ messages in thread
From: John Vickers @ 1998-05-07 5:15 UTC (permalink / raw)
To: Joern Rennecke; +Cc: egcs
On Wed 06 May, Joern Rennecke wrote:
> > int a, b, c, d;
> > ...
> > c = b / c;
> > d = b % c;
> >
> > becomes:
> >
> > c = a / b;
> > d = a - c * b;
Ho hum. Perhaps the transformation above is not necessarily
to the advantage of the user.
> This would have to depend on the cost of the modulo operation - and if b
> is constant or can be made constant by constant-propagation, this also
> ight make a difference to the cost.
Yes indeed. We have a lot of ways of skinning the cat,
and we should check the costs, assuming sensible costs
are available ;-).
I should say that I haven't investigated what egcs is doing
in more than a rather cursory way.
I think I was over-generalising without checking my facts -
the targets I'm interested in don't have either SImode divide or divmod.
If the div needs a libcall, it looks like expmed.c::expand_divmod
gives up on trying to optimise the mod.
This is what I was really getting at.
By the way, it seems like there are a few odd tricks which expand_divmod
misses e.g. divisor so big that repeated subtraction is efficient,
or mod when the divisor is (2^n +/-1)*2^m.
John.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~1998-05-07 5:15 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-04-28 7:23 Loop optimizer misses simple optimisation? Kamil Iskra
1998-04-28 19:14 ` Richard Henderson
1998-04-29 14:48 ` PÃ¥l-Kristian Engstad
1998-04-29 16:08 ` Richard Henderson
1998-04-28 19:49 ` Michael Meissner
1998-04-29 15:21 ` Jim Wilson
1998-05-02 6:13 ` Kamil Iskra
1998-05-03 14:15 ` Jeffrey A Law
1998-05-03 15:32 ` Jim Wilson
1998-05-03 20:10 ` Jeffrey A Law
1998-05-06 7:42 ` Was: " John Vickers
1998-05-06 9:19 ` Jeffrey A Law
1998-05-06 11:36 ` Joern Rennecke
1998-05-07 5:15 ` John Vickers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).