* Loop optimizer misses simple optimisation? @ 1998-04-28 7:23 Kamil Iskra 1998-04-28 19:14 ` Richard Henderson ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Kamil Iskra @ 1998-04-28 7:23 UTC (permalink / raw) To: egcs Hi, In a debate on comp.sys.amiga.programmer, a debate that I would sum up as "Does GCC really optimize so well as some people claim?" I came across a relatively simple source where GCC indeed does perform rather poorly: #define N (1UL<<22) #define eN N unsigned char a[N + 1]; main () { unsigned int i; for (i = 2; i <= (1UL << 11); i++) if (a[i]) { unsigned int j; for (j = 2; j <= eN / i; j++) a[i * j] = 0; } } The author used GCC 2.6.3, I checked with EGCS 1.0.2 and the problem is still there. Please have a look at what I get under m68k-linux, when compiling with -O2: .file "te2.c" .version "01.01" | GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release) (m68k-unknown-linux-gnu) compiled by GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release). | options passed: -O2 | options enabled: -fdefer-pop -fcse-follow-jumps -fcse-skip-blocks | -fexpensive-optimizations -fthread-jumps -fstrength-reduce -fpeephole | -fforce-mem -ffunction-cse -finline -fkeep-static-consts -fcaller-saves | -freg-struct-return -frerun-cse-after-loop -frerun-loop-opt -fcommon | -fverbose-asm -fgnu-linker -fregmove -falias-check -fargument-alias | -m68020 -mc68020 -mbitfield -m68881 -m68030 -m68332 -mcpu32 gcc2_compiled.: .text .align 2 .globl main .type main,@function main: link.w %a6,#0 movm.l #0x3820,-(%sp) moveq.l #2,%d2 | 9 movsi+1/2 move.l #a,%d4 | 105 movsi+1/1 lea a+2,%a2 | 109 movsi+1/1 moveq.l #4,%d3 | 114 movsi+1/2 .align 2 .L5: tst.b (%a2)+ | 23 tstqi+1 jbeq .L4 | 24 beq move.w #2,%a1 | 29 movsi+1/2 moveq.l #64,%d0 | 117 movsi+1/1 swap %d0 divu.l %d2,%d0 | 79 udivmodsi4 cmp.l %a1,%d0 | 80 cmpsi+1/1 jbcs .L4 | 81 bltu move.l %d3,%a0 | 124 movsi+1/1 add.l %d4,%a0 | 96 *addsi3_internal/2 .align 2 .L10: clr.b (%a0) | 43 movqi+1/3 add.l %d2,%a0 | 92 *addsi3_internal/2 addq.l #1,%a1 | 47 *addsi3_internal/2 moveq.l #64,%d0 | 120 movsi+1/1 swap %d0 divu.l %d2,%d0 | 32 udivmodsi4 cmp.l %a1,%d0 | 34 cmpsi+1/1 jbcc .L10 | 35 bgeu .L4: addq.l #2,%d3 | 111 *addsi3_internal/4 addq.l #1,%d2 | 61 *addsi3_internal/4 cmp.l #2048,%d2 | 13 cmpsi+1/2 jbls .L5 | 14 bleu movm.l (%sp)+,#0x41c unlk %a6 rts .Lfe1: .size main,.Lfe1-main .comm a,4194305,1 .ident "GCC: (GNU) egcs-2.90.27 980315 (egcs-1.0.2 release)" Have a look at insn 32: the division eN/i was not moved out of the innermost loop, even though it clearly can be done. I checked it on i486-linux and the problem is there, as well, unless I can't grok i486 assembler (which happens to be almost true, actually :-). Funny thing is that, on m68k, when I use -m68000, making udivmodsi4 unavailable, the optimisation is performed: the libcall is generated just once. Is this some sort of optimizer bug, or is this case more difficult than it looks? / Kamil Iskra AmigaOS Linux/i386 Linux/m68k \ | GeekGadgets GCC maintainer UNIX system administrator | | iskra@student.uci.agh.edu.pl kiskra@ernie.icslab.agh.edu.pl | \ kamil@dwd.interkom.pl http://student.uci.agh.edu.pl/~iskra / ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation? 1998-04-28 7:23 Loop optimizer misses simple optimisation? Kamil Iskra @ 1998-04-28 19:14 ` Richard Henderson 1998-04-29 14:48 ` PÃ¥l-Kristian Engstad 1998-04-28 19:49 ` Michael Meissner 1998-04-29 15:21 ` Jim Wilson 2 siblings, 1 reply; 14+ messages in thread From: Richard Henderson @ 1998-04-28 19:14 UTC (permalink / raw) To: Kamil Iskra; +Cc: egcs On Tue, Apr 28, 1998 at 03:53:55PM +0200, Kamil Iskra wrote: > In a debate on comp.sys.amiga.programmer, a debate that I would sum up as > "Does GCC really optimize so well as some people claim?" I came across a > relatively simple source where GCC indeed does perform rather poorly: I havn't checked, but that is the kind of thing I was trying to address with some loop optimizations back in January. You might try http://www.cygnus.com/ml/egcs/1998-Jan/0796.html and see if it improves things. r~ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation? 1998-04-28 19:14 ` Richard Henderson @ 1998-04-29 14:48 ` PÃ¥l-Kristian Engstad 1998-04-29 16:08 ` Richard Henderson 0 siblings, 1 reply; 14+ messages in thread From: PÃÂ¥l-Kristian Engstad @ 1998-04-29 14:48 UTC (permalink / raw) To: Richard Henderson; +Cc: Kamil Iskra, egcs Richard Henderson wrote: > I havn't checked, but that is the kind of thing I was trying to address > with some loop optimizations back in January. You might try > > http://www.cygnus.com/ml/egcs/1998-Jan/0796.html > > and see if it improves things. What happened to this? Did it go into EGCS? PKE. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation? 1998-04-29 14:48 ` PÃ¥l-Kristian Engstad @ 1998-04-29 16:08 ` Richard Henderson 0 siblings, 0 replies; 14+ messages in thread From: Richard Henderson @ 1998-04-29 16:08 UTC (permalink / raw) To: engstad; +Cc: Richard Henderson, Kamil Iskra, egcs [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 364 bytes --] On Wed, Apr 29, 1998 at 12:51:22PM -0700, PÃ¥l-Kristian Engstad wrote: > What happened to this? Did it go into EGCS? What happened? I got distracted and wound up working on other things. It never went in, though versions through #6 might be ready for public use. Version #7 and later are definitely flawed. Perhaps I'll get back to it sometime soon... r~ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Loop optimizer misses simple optimisation? 1998-04-28 7:23 Loop optimizer misses simple optimisation? Kamil Iskra 1998-04-28 19:14 ` Richard Henderson @ 1998-04-28 19:49 ` Michael Meissner 1998-04-29 15:21 ` Jim Wilson 2 siblings, 0 replies; 14+ messages in thread From: Michael Meissner @ 1998-04-28 19:49 UTC (permalink / raw) To: kiskra; +Cc: egcs Kamil Iskra writes: | | Hi, | | In a debate on comp.sys.amiga.programmer, a debate that I would sum up as | "Does GCC really optimize so well as some people claim?" I came across a | relatively simple source where GCC indeed does perform rather poorly: It depends on the machine. I believe that GCC doesn't do as good of a job with udivmod operations where it produces two outputs, as on machines where it has a more normal division instruction (with possibly a remainder instruction). | #define N (1UL<<22) | #define eN N | | unsigned char a[N + 1]; | | main () | { | unsigned int i; | | for (i = 2; i <= (1UL << 11); i++) | if (a[i]) | { | unsigned int j; | for (j = 2; j <= eN / i; j++) | a[i * j] = 0; | } | } For example, the powerpc-eabi-gcc compiler generates: .file "foo.c" # rs6000/powerpc options: -msdata=data -G 8 # GNU C version egcs-2.91.24 19980418 (gcc2 ss-980401 experimental) (powerpc-eabi) compiled by GNU C version egcs-2.90.27 980315 (egcs-1.0.2 release). # options passed: -O2 -fverbose-asm # options enabled: -fdefer-pop -fomit-frame-pointer -fcse-follow-jumps # -fcse-skip-blocks -fexpensive-optimizations -fthread-jumps # -fstrength-reduce -fpeephole -fforce-mem -ffunction-cse -finline # -fkeep-static-consts -fcaller-saves -fpcc-struct-return # -frerun-cse-after-loop -frerun-loop-opt -fschedule-insns # -fschedule-insns2 -fsched-interblock -fsched-spec -fsjlj-exceptions # -fcommon -fverbose-asm -fgnu-linker -fregmove -fargument-alias -mpowerpc # -mnew-mnemonics -meabi -mcall-sysv -msdata=data gcc2_compiled.: .section ".text" .align 2 .globl main .type main,@function main: stwu 1,-8(1) mflr 0 stw 0,12(1) bl __eabi addis 9,0,a@ha addi 9,9,a@l li 8,2 lis 3,0x40 li 5,0 add 6,9,8 mr 4,3 li 7,4 .L5: lbz 0,0(6) addi 6,6,1 cmpwi 0,0,0 bc 12,2,.L4 divwu 0,3,8 li 10,2 cmplw 0,10,0 bc 12,1,.L4 divwu 0,4,8 add 11,7,9 .L10: addi 10,10,1 cmplw 0,10,0 stb 5,0(11) add 11,11,8 bc 4,1,.L10 .L4: addi 8,8,1 cmplwi 0,8,2048 addi 7,7,2 bc 4,1,.L5 lwz 0,12(1) mtlr 0 addi 1,1,8 blr .Lfe1: .size main,.Lfe1-main .comm a,4194305,1 .ident "GCC: (GNU) egcs-2.91.24 19980418 (gcc2 ss-980401 experimental)" -- Michael Meissner, Cygnus Solutions (Massachusetts office) 4th floor, 955 Massachusetts Avenue, Cambridge, MA 02139, USA meissner@cygnus.com, 617-354-5416 (office), 617-354-7161 (fax) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation? 1998-04-28 7:23 Loop optimizer misses simple optimisation? Kamil Iskra 1998-04-28 19:14 ` Richard Henderson 1998-04-28 19:49 ` Michael Meissner @ 1998-04-29 15:21 ` Jim Wilson 1998-05-02 6:13 ` Kamil Iskra 2 siblings, 1 reply; 14+ messages in thread From: Jim Wilson @ 1998-04-29 15:21 UTC (permalink / raw) To: Kamil Iskra; +Cc: egcs I checked, and Mike Meissner's comment is correct. The problem here is that the m68k divide instruction computes two values (quotient and remainder), giving us: (insn 83 33 84 (parallel[ (set (reg:SI 34) (udiv:SI (const_int 4194304) (reg/v:SI 29))) (set (reg:SI 35) (umod:SI (const_int 4194304) (reg/v:SI 29))) ] ) -1 (nil) (nil)) In scan_loop in loop.c, the code that recognizes loop invariant instructions does this: if (GET_CODE (p) == INSN && (set = single_set (p)) && GET_CODE (SET_DEST (set)) == REG && ! may_not_optimize[REGNO (SET_DEST (set))]) The single_set checks means that insns that compute two values are never considered for loop-invariant-code-motion, hence we won't move a m68k divide out of a loop. We would need to generalize the LICM code to handle insns with multiple sets to fix this. Jim ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation? 1998-04-29 15:21 ` Jim Wilson @ 1998-05-02 6:13 ` Kamil Iskra 1998-05-03 14:15 ` Jeffrey A Law 0 siblings, 1 reply; 14+ messages in thread From: Kamil Iskra @ 1998-05-02 6:13 UTC (permalink / raw) To: Jim Wilson; +Cc: egcs On Wed, 29 Apr 1998, Jim Wilson wrote: > The single_set checks means that insns that compute two values are never > considered for loop-invariant-code-motion, hence we won't move a m68k > divide out of a loop. I see. But please have a look at udivmodsi4 in m68k.md: (define_insn "udivmodsi4" [(set (match_operand:SI 0 "general_operand" "=d") (udiv:SI (match_operand:SI 1 "general_operand" "0") (match_operand:SI 2 "general_operand" "dmsK"))) (set (match_operand:SI 3 "general_operand" "=d") (umod:SI (match_dup 1) (match_dup 2)))] "TARGET_68020 && !TARGET_5200" "* { if (find_reg_note (insn, REG_UNUSED, operands[3])) return \"divu%.l %2,%0\"; else return \"divul%.l %2,%3:%0\"; }") I don't know enough about division instructions available in m68020+, but it looks to me from the above that there is an instruction that doesn't generate the reminder, and that it is actually generated when the reminder is not needed. Wouldn't it be better to move it to a separate, new udivsi3 pattern, instead of relying on REG_UNUSED, which is only available when optimising, I suppose? This would allow the LICM code to do a better job, wouldn't it? / Kamil Iskra AmigaOS Linux/i386 Linux/m68k \ | GeekGadgets GCC maintainer UNIX system administrator | | iskra@student.uci.agh.edu.pl kiskra@ernie.icslab.agh.edu.pl | \ kamil@dwd.interkom.pl http://student.uci.agh.edu.pl/~iskra / ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation? 1998-05-02 6:13 ` Kamil Iskra @ 1998-05-03 14:15 ` Jeffrey A Law 1998-05-03 15:32 ` Jim Wilson 0 siblings, 1 reply; 14+ messages in thread From: Jeffrey A Law @ 1998-05-03 14:15 UTC (permalink / raw) To: Kamil Iskra; +Cc: Jim Wilson, egcs In message < Pine.LNX.3.96.980502143213.581F-100000@jinks.home >you write: > I don't know enough about division instructions available in m68020+, but > it looks to me from the above that there is an instruction that doesn't > generate the reminder, and that it is actually generated when the reminder > is not needed. Wouldn't it be better to move it to a separate, new udivsi3 > pattern, instead of relying on REG_UNUSED, which is only available when > optimising, I suppose? This would allow the LICM code to do a better job, > wouldn't it? In general, GCC does a poor job of certain optimizations for insns that have multiple outputs (including LICM, contant propagation, etc). It might be worth an experiment to split up the divmod insns into div, mod and a unnamed divmod insn (for the combiner). Though my experience with other ports tends to make me believe the best overall code is produced with the divmod insn. However, a div or mod operation can not generally be moved out of a loop unless you know it won't trap (divide by zero). jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation? 1998-05-03 14:15 ` Jeffrey A Law @ 1998-05-03 15:32 ` Jim Wilson 1998-05-03 20:10 ` Jeffrey A Law 0 siblings, 1 reply; 14+ messages in thread From: Jim Wilson @ 1998-05-03 15:32 UTC (permalink / raw) To: law; +Cc: Kamil Iskra, egcs However, a div or mod operation can not generally be moved out of a loop unless you know it won't trap (divide by zero). If an instruction might trap, then it will be moved out of a loop if we can prove that it will be executed at least once. Jim ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Loop optimizer misses simple optimisation? 1998-05-03 15:32 ` Jim Wilson @ 1998-05-03 20:10 ` Jeffrey A Law 1998-05-06 7:42 ` Was: " John Vickers 0 siblings, 1 reply; 14+ messages in thread From: Jeffrey A Law @ 1998-05-03 20:10 UTC (permalink / raw) To: Jim Wilson; +Cc: Kamil Iskra, egcs In message < 199805032232.PAA08246@rtl.cygnus.com >you write: > However, a div or mod operation can not generally be moved out of > a loop unless you know it won't trap (divide by zero). > > If an instruction might trap, then it will be moved out of a loop if we > can prove that it will be executed at least once. Ah, yes, you're right. I forgot about that tidbit. jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Was: Re: Loop optimizer misses simple optimisation? 1998-05-03 20:10 ` Jeffrey A Law @ 1998-05-06 7:42 ` John Vickers 1998-05-06 9:19 ` Jeffrey A Law 1998-05-06 11:36 ` Joern Rennecke 0 siblings, 2 replies; 14+ messages in thread From: John Vickers @ 1998-05-06 7:42 UTC (permalink / raw) To: egcs On machines which do not have a divmod insn for a particular mode, could we still recognise divmod, but generate a multiply & subtract for the mod ? So: int a, b, c, d; ... c = b / c; d = b % c; becomes: c = a / b; d = a - c * b; This arises on number of targets in SImode, and almost all targets in DImode. As I read the C docs, the behaviour of both operations is undefined if the divisor is zero. John Vickers. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Was: Re: Loop optimizer misses simple optimisation? 1998-05-06 7:42 ` Was: " John Vickers @ 1998-05-06 9:19 ` Jeffrey A Law 1998-05-06 11:36 ` Joern Rennecke 1 sibling, 0 replies; 14+ messages in thread From: Jeffrey A Law @ 1998-05-06 9:19 UTC (permalink / raw) To: John Vickers; +Cc: egcs In message < Marcel-1.26-0506144150-345LJLo@ether120.acorn.co.uk >you write: > > On machines which do not have a divmod insn for a particular mode, > could we still recognise divmod, but generate a multiply & subtract > for the mod ? Yes, gcc already does this in some cases. See expmed.c::expand_divmod jeff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Was: Re: Loop optimizer misses simple optimisation? 1998-05-06 7:42 ` Was: " John Vickers 1998-05-06 9:19 ` Jeffrey A Law @ 1998-05-06 11:36 ` Joern Rennecke 1998-05-07 5:15 ` John Vickers 1 sibling, 1 reply; 14+ messages in thread From: Joern Rennecke @ 1998-05-06 11:36 UTC (permalink / raw) To: John Vickers; +Cc: egcs > int a, b, c, d; > ... > c = b / c; > d = b % c; > > becomes: > > c = a / b; > d = a - c * b; This would have to depend on the cost of the modulo operation - and if b is constant or can be made constant by constant-propagation, this also ight make a difference to the cost. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Was: Re: Loop optimizer misses simple optimisation? 1998-05-06 11:36 ` Joern Rennecke @ 1998-05-07 5:15 ` John Vickers 0 siblings, 0 replies; 14+ messages in thread From: John Vickers @ 1998-05-07 5:15 UTC (permalink / raw) To: Joern Rennecke; +Cc: egcs On Wed 06 May, Joern Rennecke wrote: > > int a, b, c, d; > > ... > > c = b / c; > > d = b % c; > > > > becomes: > > > > c = a / b; > > d = a - c * b; Ho hum. Perhaps the transformation above is not necessarily to the advantage of the user. > This would have to depend on the cost of the modulo operation - and if b > is constant or can be made constant by constant-propagation, this also > ight make a difference to the cost. Yes indeed. We have a lot of ways of skinning the cat, and we should check the costs, assuming sensible costs are available ;-). I should say that I haven't investigated what egcs is doing in more than a rather cursory way. I think I was over-generalising without checking my facts - the targets I'm interested in don't have either SImode divide or divmod. If the div needs a libcall, it looks like expmed.c::expand_divmod gives up on trying to optimise the mod. This is what I was really getting at. By the way, it seems like there are a few odd tricks which expand_divmod misses e.g. divisor so big that repeated subtraction is efficient, or mod when the divisor is (2^n +/-1)*2^m. John. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~1998-05-07 5:15 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1998-04-28 7:23 Loop optimizer misses simple optimisation? Kamil Iskra 1998-04-28 19:14 ` Richard Henderson 1998-04-29 14:48 ` PÃ¥l-Kristian Engstad 1998-04-29 16:08 ` Richard Henderson 1998-04-28 19:49 ` Michael Meissner 1998-04-29 15:21 ` Jim Wilson 1998-05-02 6:13 ` Kamil Iskra 1998-05-03 14:15 ` Jeffrey A Law 1998-05-03 15:32 ` Jim Wilson 1998-05-03 20:10 ` Jeffrey A Law 1998-05-06 7:42 ` Was: " John Vickers 1998-05-06 9:19 ` Jeffrey A Law 1998-05-06 11:36 ` Joern Rennecke 1998-05-07 5:15 ` John Vickers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).