* switch index optimization @ 1998-01-25 22:34 Richard Henderson 1998-01-25 23:02 ` Joe Buck 0 siblings, 1 reply; 11+ messages in thread From: Richard Henderson @ 1998-01-25 22:34 UTC (permalink / raw) To: egcs The talk on linux-kernel about the bogus non-void function returns without a value warning you get from int foo(int x, int y) { switch (x & 3) { case 0: ...; return y; case 1: ...; return y; case 2: ...; return y; case 3: ...; return y; } } got me thinking. The following patch won't actually fix this case, since this is below the threshold for a tablejump, but it points the way, and does do some good. Or should this go somewhere else like jump? r~ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-25 22:34 switch index optimization Richard Henderson @ 1998-01-25 23:02 ` Joe Buck 1998-01-25 23:14 ` Richard Henderson 0 siblings, 1 reply; 11+ messages in thread From: Joe Buck @ 1998-01-25 23:02 UTC (permalink / raw) To: rth; +Cc: egcs > The talk on linux-kernel about the bogus non-void function returns > without a value warning you get from > > int foo(int x, int y) { > switch (x & 3) { > case 0: ...; return y; > case 1: ...; return y; > case 2: ...; return y; > case 3: ...; return y; > } > } > > got me thinking. The following patch won't actually fix this case, > since this is below the threshold for a tablejump, but it points the > way, and does do some good. [ patch deleted ] Not a bad idea, but kind of an unclean implementation. If you're going to start putting in recognizers for expressions that have restricted ranges, I think it would be better to put them in a separate function, and write it in a clean way so you can cover more cases. The function would return the range (or an over-approximation of the range) of the expression. You could then cleanly handle bitwise and, modulo, branching on a char, etc. But to do really well, range propagation needs to be done with dataflow analysis (though in this case you can do it without). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-25 23:02 ` Joe Buck @ 1998-01-25 23:14 ` Richard Henderson 1998-01-25 23:27 ` Jeffrey A Law 0 siblings, 1 reply; 11+ messages in thread From: Richard Henderson @ 1998-01-25 23:14 UTC (permalink / raw) To: Joe Buck; +Cc: rth, egcs On Sun, Jan 25, 1998 at 11:01:48PM -0800, Joe Buck wrote: > Not a bad idea, but kind of an unclean implementation. Guilty. > But to do really well, range propagation needs to be done with dataflow > analysis (though in this case you can do it without). Thus my comment about maybe doing it in jump. It does some of that sort of thing, and I am half-remembering Jeff doing some work recently in that area for <censored>, such that if (x < 7) { if (x == 9) { } } and similar get optimized properly. Ah well, perhaps I'll clean up what I was doing, and if someone does do the Right Thing elsewhere later, we can strip mine back out. r~ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-25 23:14 ` Richard Henderson @ 1998-01-25 23:27 ` Jeffrey A Law 1998-01-27 7:35 ` Lassi A. Tuura 0 siblings, 1 reply; 11+ messages in thread From: Jeffrey A Law @ 1998-01-25 23:27 UTC (permalink / raw) To: Richard Henderson; +Cc: Joe Buck, egcs In message < 19980125231458.46648@dot.cygnus.com >you write: > > But to do really well, range propagation needs to be done with dataflow > > analysis (though in this case you can do it without). > > Thus my comment about maybe doing it in jump. It does some of > that sort of thing, and I am half-remembering Jeff doing some > work recently in that area for <censored>, such that > > if (x < 7) { > if (x == 9) { } > } > > and similar get optimized properly. Yup. However, it's become pretty clear to me that the direction I was heading needs some serious work -- like full blown range checking code as found in fold_const.c, except on RTL instead of trees (yuk). It's also not clear how useful that would be in practice; my hacked up version of cse.c only caught a small number of cases where it could improve real code instead of contrived example code. In your case I think you've still got the tree hanging around, so you might be able to add appropriate code to the range testing code in fold-const.c to restrict the index instead of doing it in the middle of the switch code. If you look at x & 3 as creating a range 0, 1, 2 then this makes some sense. jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-25 23:27 ` Jeffrey A Law @ 1998-01-27 7:35 ` Lassi A. Tuura 1998-01-27 9:36 ` Jeffrey A Law 1998-01-27 13:09 ` Andi Kleen 0 siblings, 2 replies; 11+ messages in thread From: Lassi A. Tuura @ 1998-01-27 7:35 UTC (permalink / raw) To: Jeffrey A Law; +Cc: Richard Henderson, Joe Buck, egcs On Mon, 26 Jan 1998, Jeffrey A Law wrote: > > if (x < 7) { if (x == 9) { } } > Yup. However, it's become pretty clear to me that the direction I > was heading needs some serious work -- like full blown range checking > code as found in fold_const.c, except on RTL instead of trees (yuk). > > It's also not clear how useful that would be in practice; my hacked > up version of cse.c only caught a small number of cases where it could > improve real code instead of contrived example code. I would think this kind of an optimisation would pay off for C++ code where there are multitudes of small inlined methods. The reason is that the methods cannot always tell what state the object is in and hence the body must be guarded with checks like this: if (! m_data) { // initialise `m_data' to something } // real code here As the methods get inlined, these checks can repeat several times even if it may be possible to prove that none of them (or only the first one) needs to be evaluated. This may, again, result in significant dead code elimination and opportunities for further optimisation. It may well be that in C this optimisation does not produce any significant benefits because of the way code gets written: small inlines just aren't used all that much. Cheers, //lat -- Lassi.Tuura@cern.ch There's no sunrise without a night ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-27 7:35 ` Lassi A. Tuura @ 1998-01-27 9:36 ` Jeffrey A Law 1998-01-27 13:09 ` Andi Kleen 1 sibling, 0 replies; 11+ messages in thread From: Jeffrey A Law @ 1998-01-27 9:36 UTC (permalink / raw) To: Lassi A. Tuura; +Cc: Richard Henderson, Joe Buck, egcs In message < Pine.HPP.3.95a.980127162532.25774B-100000@hpatl20.cern.ch >you wri te: > On Mon, 26 Jan 1998, Jeffrey A Law wrote: > > It's also not clear how useful that would be in practice; my hacked > > up version of cse.c only caught a small number of cases where it could > > improve real code instead of contrived example code. > > I would think this kind of an optimisation would pay off for C++ code > where there are multitudes of small inlined methods. The reason is that > the methods cannot always tell what state the object is in and hence > the body must be guarded with checks like this: Good point. I'll run it through some C++ code to see if it kicks out anything useful. I'm about ready to drop the work on the floor if I can't find some compelling example code that shows it's worth the (significant) time to make the code work correctly. jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-27 7:35 ` Lassi A. Tuura 1998-01-27 9:36 ` Jeffrey A Law @ 1998-01-27 13:09 ` Andi Kleen 1998-01-27 9:59 ` Jeffrey A Law 1 sibling, 1 reply; 11+ messages in thread From: Andi Kleen @ 1998-01-27 13:09 UTC (permalink / raw) To: Lassi A. Tuura; +Cc: Jeffrey A Law, Richard Henderson, Joe Buck, egcs "Lassi A. Tuura" <Lassi.Tuura@cern.ch> writes: > On Mon, 26 Jan 1998, Jeffrey A Law wrote: > > > if (x < 7) { if (x == 9) { } } > > Yup. However, it's become pretty clear to me that the direction I > > was heading needs some serious work -- like full blown range checking > > code as found in fold_const.c, except on RTL instead of trees (yuk). > > > > It's also not clear how useful that would be in practice; my hacked > > up version of cse.c only caught a small number of cases where it could > > improve real code instead of contrived example code. > > I would think this kind of an optimisation would pay off for C++ code > where there are multitudes of small inlined methods. The reason is that > the methods cannot always tell what state the object is in and hence > the body must be guarded with checks like this: > > if (! m_data) { > // initialise `m_data' to something > } > // real code here > > As the methods get inlined, these checks can repeat several times even if > it may be possible to prove that none of them (or only the first one) > needs to be evaluated. This may, again, result in significant dead code > elimination and opportunities for further optimisation. > > It may well be that in C this optimisation does not produce any > significant benefits because of the way code gets written: small inlines > just aren't used all that much. Another example where it could benefit: gotoless exits from nested loops: int flag = 0; while (X > Y) { while (W > Z) { if (SOMETHING) { flag = 1; break; } .... } if (flag) break; } That happens often in real-world code. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-27 13:09 ` Andi Kleen @ 1998-01-27 9:59 ` Jeffrey A Law 1998-01-27 10:20 ` Joe Buck 0 siblings, 1 reply; 11+ messages in thread From: Jeffrey A Law @ 1998-01-27 9:59 UTC (permalink / raw) To: Andi Kleen; +Cc: Lassi A. Tuura, Richard Henderson, Joe Buck, egcs > Another example where it could benefit: gotoless exits from nested loops: > > int flag = 0; > while (X > Y) { > while (W > Z) { > if (SOMETHING) { > flag = 1; > break; > } > .... > } > if (flag) > break; > } > > That happens often in real-world code. The stuff we're talking about probably won't help that case. This needs global constant/copy propagation into if/else statements. It's on the list of things to do. jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-27 9:59 ` Jeffrey A Law @ 1998-01-27 10:20 ` Joe Buck 1998-01-27 13:09 ` Jeffrey A Law 0 siblings, 1 reply; 11+ messages in thread From: Joe Buck @ 1998-01-27 10:20 UTC (permalink / raw) To: law; +Cc: ak, Lassi.Tuura, rth, jbuck, egcs > > Another example where it could benefit: gotoless exits from nested loops: > > ... Jeff writes: > The stuff we're talking about probably won't help that case. This needs > global constant/copy propagation into if/else statements. It's on the > list of things to do. If you're going to do that, it seems you could go one step further and propagate ranges (min and max values of expressions). Constant propagation is then just a special case (min == max). With this framework one could then implement efficient array bounds checking (omitting checks for expressions guaranteed to be within range). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: switch index optimization 1998-01-27 10:20 ` Joe Buck @ 1998-01-27 13:09 ` Jeffrey A Law 0 siblings, 0 replies; 11+ messages in thread From: Jeffrey A Law @ 1998-01-27 13:09 UTC (permalink / raw) To: Joe Buck; +Cc: ak, Lassi.Tuura, rth, egcs In message < 199801271819.KAA01586@atrus.synopsys.com >you write: > > The stuff we're talking about probably won't help that case. This needs > > global constant/copy propagation into if/else statements. It's on the > > list of things to do. > > If you're going to do that, it seems you could go one step further and > propagate ranges (min and max values of expressions). Constant > propagation is then just a special case (min == max). With this framework > one could then implement efficient array bounds checking (omitting checks > for expressions guaranteed to be within range). That's the problem -- generalizing from min == max to a range test on RTL isn't trivial :-) That's the problem I'm looking into solving on a local level in CSE. The fact that it's a hard problem shouldn't be a suprise to anyone that had the joy of debugging the tree based range checking code last year -- it took 4-8 weeks to stabilize once it went into the gcc snapshots. There's also the problem of finding cc0 on machines that use it and a host of other (purely technical) issues. Odds are we'll have an incremental approach to solving the problem. * First we'll contribute gcse/pre, which includes global constant/copy propagation (which is done and will be submitted once we've received final payment from the customer who funded the work). * Getting constant/copy propagation into if/else conditionals on non-cc0 machines would probably follow next. This is happening under another contract, so we'll submit this code when we get paid on for its contract :-) * Then handle cc0 machines. * Then deal with true range checking. jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <egcs.199801271819.KAA01586@atrus.synopsys.com>]
* Re: switch index optimization [not found] <egcs.199801271819.KAA01586@atrus.synopsys.com> @ 1998-01-28 1:10 ` Todd P. Whitesel 0 siblings, 0 replies; 11+ messages in thread From: Todd P. Whitesel @ 1998-01-28 1:10 UTC (permalink / raw) To: mlist-egcs Joe Buck <jbuck@synopsys.com> writes: >If you're going to do that, it seems you could go one step further and >propagate ranges (min and max values of expressions). Constant >propagation is then just a special case (min == max). With this framework >one could then implement efficient array bounds checking (omitting checks >for expressions guaranteed to be within range). But wait, there's more where that came from... Given the min,max,signedness you can compute how many "information" bits there are. This lets you draw conclusions about how the upper bits will behave in certain circumstances (all examples 32 bit centric): (short) x if x's value (x here means any expression) can be represented in a short, nuke the cast. If your chip's register argument convention requires yucky casts to pass char/short types correctly then this might be valuable. A somewhat more complicated example: ((x * 256) / 16) easily becomes ((x << 8) / 16) but you want to do better than (x + (int) ( ((unsigned) (x >> 31)) >> 28)) >> 4 when x is signed or will be promoted to signed. If you can show that (x >= 0) and (x <= 0x00FFFFFF) then you have (a) the divide collapses to unsigned >> ((x << 8) >> 4) and (b) the top bits in the danger zone are already zero. (x << 4) I ran into this while tuning a customer benchmark once. It was plain C, but with heavy macro use. Fairly quickly I realized I wanted a more generic pass, but it never happened (at least not while I was there). BTW make sure CSE doesn't yank (x*256) into a temp until after we've checked for this trick. More fun: typedef long LONGEST; func(char *source1, char *source2, char *destination, int width) { int i; LONGEST *p = (LONGEST *) source1; LONGEST *q = (LONGEST *) source2; LONGEST *r = (LONGEST *) destination; width /= sizeof(LONGEST); width *= sizeof(LONGEST); for (i = width; i >= 0; i -= sizeof(LONGEST)) { r[i/sizeof(LONGEST)] = p[i/sizeof(LONGEST)] & q[i/sizeof(LONGEST)]; } } This is a slightly misguided attempt to get good bitmap code on machines like sparc with reg+reg addressing. Here's what I got with a close-at-hand gcc: func: !#PROLOGUE# 0 !#PROLOGUE# 1 andcc %o3,-4,%o3 bl .LL3 mov %o0,%o4 .LL5: and %o3,-4,%g2 # _we_ know this is a virtual NOP. ld [%o4+%g2],%g3 ld [%o1+%g2],%o0 addcc %o3,-4,%o3 and %g3,%o0,%g3 bpos .LL5 st %g3,[%o2+%g2] .LL3: retl nop .LLfe1: .size func,.LLfe1-func .ident "GCC: (GNU) cygnus-2.7.2-960126" Now this suggests tracking the low bits of an expression somehow (maybe we don't care so much about making this go fast, but perhaps there are others like it, especially the inline-C++ scenario that's been mentioned). Hmm: value = (stride * "N") + offset Here "N" represents the integers in a number-theory sense, so we should require (stride > 0) and (offset >= 0) and (stride > offset). In the example above, we could get to the virtual NOP instruction and see that it is clearing the bits; but when we get here stride=4,offset=0 so we know those bits are already clear. Seems like this might be a neat way to take analysis from the loop optimizer and make it useful to other optimizations. By default an expression or lvalue has stride=1,offset=0 and as you munge it we update as appropriate. When things get too complicated we punt and drop back to the default. Unfortunately I have a vague feeling of dread that this will turn out to be too-expensive / not-enough-boost to make it attractive except as a "compile all night, we're about to go to QA" O-level. Todd Whitesel toddpw @ ugcs.caltech.edu ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~1998-01-28 1:10 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1998-01-25 22:34 switch index optimization Richard Henderson 1998-01-25 23:02 ` Joe Buck 1998-01-25 23:14 ` Richard Henderson 1998-01-25 23:27 ` Jeffrey A Law 1998-01-27 7:35 ` Lassi A. Tuura 1998-01-27 9:36 ` Jeffrey A Law 1998-01-27 13:09 ` Andi Kleen 1998-01-27 9:59 ` Jeffrey A Law 1998-01-27 10:20 ` Joe Buck 1998-01-27 13:09 ` Jeffrey A Law [not found] <egcs.199801271819.KAA01586@atrus.synopsys.com> 1998-01-28 1:10 ` Todd P. Whitesel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).