switch index optimization

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* switch index optimization
@ 1998-01-25 22:34 Richard Henderson
  1998-01-25 23:02 ` Joe Buck
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 1998-01-25 22:34 UTC (permalink / raw)
  To: egcs

The talk on linux-kernel about the bogus non-void function returns
without a value warning you get from

  int foo(int x, int y) {
    switch (x & 3) { 
      case 0: ...; return y;
      case 1: ...; return y;
      case 2: ...; return y;
      case 3: ...; return y;
    }
  }

got me thinking.  The following patch won't actually fix this case,
since this is below the threshold for a tablejump, but it points the
way, and does do some good.  

Or should this go somewhere else like jump?


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-25 22:34 switch index optimization Richard Henderson
@ 1998-01-25 23:02 ` Joe Buck
  1998-01-25 23:14   ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Joe Buck @ 1998-01-25 23:02 UTC (permalink / raw)
  To: rth; +Cc: egcs

> The talk on linux-kernel about the bogus non-void function returns
> without a value warning you get from
> 
>   int foo(int x, int y) {
>     switch (x & 3) { 
>       case 0: ...; return y;
>       case 1: ...; return y;
>       case 2: ...; return y;
>       case 3: ...; return y;
>     }
>   }
> 
> got me thinking.  The following patch won't actually fix this case,
> since this is below the threshold for a tablejump, but it points the
> way, and does do some good.  

[ patch deleted ]

Not a bad idea, but kind of an unclean implementation.

If you're going to start putting in recognizers for expressions that
have restricted ranges, I think it would be better to put them
in a separate function, and write it in a clean way so you can cover
more cases.  The function would return the range (or an over-approximation
of the range) of the expression.  You could then cleanly handle bitwise
and, modulo, branching on a char, etc.  

But to do really well, range propagation needs to be done with dataflow
analysis (though in this case you can do it without).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-25 23:02 ` Joe Buck
@ 1998-01-25 23:14   ` Richard Henderson
  1998-01-25 23:27     ` Jeffrey A Law
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 1998-01-25 23:14 UTC (permalink / raw)
  To: Joe Buck; +Cc: rth, egcs

On Sun, Jan 25, 1998 at 11:01:48PM -0800, Joe Buck wrote:
> Not a bad idea, but kind of an unclean implementation.

Guilty.

> But to do really well, range propagation needs to be done with dataflow
> analysis (though in this case you can do it without).

Thus my comment about maybe doing it in jump.  It does some of
that sort of thing, and I am half-remembering Jeff doing some
work recently in that area for <censored>, such that

	if (x < 7) {
		if (x == 9) { }
	}

and similar get optimized properly.

Ah well, perhaps I'll clean up what I was doing, and if someone does
do the Right Thing elsewhere later, we can strip mine back out.

r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-25 23:14   ` Richard Henderson
@ 1998-01-25 23:27     ` Jeffrey A Law
  1998-01-27  7:35       ` Lassi A. Tuura
  0 siblings, 1 reply; 11+ messages in thread
From: Jeffrey A Law @ 1998-01-25 23:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Joe Buck, egcs

  In message < 19980125231458.46648@dot.cygnus.com >you write:
  > > But to do really well, range propagation needs to be done with dataflow
  > > analysis (though in this case you can do it without).
  > 
  > Thus my comment about maybe doing it in jump.  It does some of
  > that sort of thing, and I am half-remembering Jeff doing some
  > work recently in that area for <censored>, such that
  > 
  > 	if (x < 7) {
  > 		if (x == 9) { }
  > 	}
  > 
  > and similar get optimized properly.
Yup.  However, it's become pretty clear to me that the direction I
was heading needs some serious work -- like full blown range checking
code as found in fold_const.c, except on RTL instead of trees (yuk).

It's also not clear how useful that would be in practice; my hacked
up version of cse.c only caught a small number of cases where it could
improve real code instead of contrived example code.

In your case I think you've still got the tree hanging around, so you
might be able to add appropriate code to the range testing code in
fold-const.c to restrict the index instead of doing it in the middle
of the switch code.  If you look at x & 3 as creating a range 0, 1, 2 
then this makes some sense.

jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-25 23:27     ` Jeffrey A Law
@ 1998-01-27  7:35       ` Lassi A. Tuura
  1998-01-27  9:36         ` Jeffrey A Law
  1998-01-27 13:09         ` Andi Kleen
  0 siblings, 2 replies; 11+ messages in thread
From: Lassi A. Tuura @ 1998-01-27  7:35 UTC (permalink / raw)
  To: Jeffrey A Law; +Cc: Richard Henderson, Joe Buck, egcs

On Mon, 26 Jan 1998, Jeffrey A Law wrote:
>   > 	if (x < 7) { if (x == 9) { } }
> Yup.  However, it's become pretty clear to me that the direction I
> was heading needs some serious work -- like full blown range checking
> code as found in fold_const.c, except on RTL instead of trees (yuk).
> 
> It's also not clear how useful that would be in practice; my hacked
> up version of cse.c only caught a small number of cases where it could
> improve real code instead of contrived example code.

I would think this kind of an optimisation would pay off for C++ code
where there are multitudes of small inlined methods.  The reason is that
the methods cannot always tell what state the object is in and hence
the body must be guarded with checks like this:

  if (! m_data) {
     // initialise `m_data' to something
  }
  // real code here

As the methods get inlined, these checks can repeat several times even if
it may be possible to prove that none of them (or only the first one)
needs to be evaluated.  This may, again, result in significant dead code
elimination and opportunities for further optimisation. 

It may well be that in C this optimisation does not produce any
significant benefits because of the way code gets written: small inlines
just aren't used all that much. 

Cheers,
//lat
--
Lassi.Tuura@cern.ch          There's no sunrise without a night

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-27  7:35       ` Lassi A. Tuura
@ 1998-01-27  9:36         ` Jeffrey A Law
  1998-01-27 13:09         ` Andi Kleen
  1 sibling, 0 replies; 11+ messages in thread
From: Jeffrey A Law @ 1998-01-27  9:36 UTC (permalink / raw)
  To: Lassi A. Tuura; +Cc: Richard Henderson, Joe Buck, egcs

  In message < Pine.HPP.3.95a.980127162532.25774B-100000@hpatl20.cern.ch >you wri
te:
  > On Mon, 26 Jan 1998, Jeffrey A Law wrote:
  > > It's also not clear how useful that would be in practice; my hacked
  > > up version of cse.c only caught a small number of cases where it could
  > > improve real code instead of contrived example code.
  > 
  > I would think this kind of an optimisation would pay off for C++ code
  > where there are multitudes of small inlined methods.  The reason is that
  > the methods cannot always tell what state the object is in and hence
  > the body must be guarded with checks like this:
Good point.  I'll run it through some C++ code to see if it kicks out anything
useful.  I'm about ready to drop the work on the floor if I can't find some
compelling example code that shows it's worth the (significant) time to make
the code work correctly.
jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-27 13:09         ` Andi Kleen
@ 1998-01-27  9:59           ` Jeffrey A Law
  1998-01-27 10:20             ` Joe Buck
  0 siblings, 1 reply; 11+ messages in thread
From: Jeffrey A Law @ 1998-01-27  9:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Lassi A. Tuura, Richard Henderson, Joe Buck, egcs

  > Another example where it could benefit: gotoless exits from nested loops:
  > 
  > int flag = 0;
  > while (X > Y) {
  > 	while (W > Z) {
  > 		if (SOMETHING) {
  > 			flag = 1;
  > 			break;
  > 		}
  > 		....
  > 	}
  > 	if (flag)
  > 		break;
  > }
  > 
  > That happens often in real-world code.
The stuff we're talking about probably won't help that case.  This needs global
constant/copy propagation into if/else statements.  It's on the list of things
to do.

jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-27  9:59           ` Jeffrey A Law
@ 1998-01-27 10:20             ` Joe Buck
  1998-01-27 13:09               ` Jeffrey A Law
  0 siblings, 1 reply; 11+ messages in thread
From: Joe Buck @ 1998-01-27 10:20 UTC (permalink / raw)
  To: law; +Cc: ak, Lassi.Tuura, rth, jbuck, egcs

>   > Another example where it could benefit: gotoless exits from nested loops:
>   > ...

Jeff writes:

> The stuff we're talking about probably won't help that case.  This needs
> global constant/copy propagation into if/else statements.  It's on the
> list of things to do.

If you're going to do that, it seems you could go one step further and
propagate ranges (min and max values of expressions).  Constant
propagation is then just a special case (min == max).  With this framework
one could then implement efficient array bounds checking (omitting checks
for expressions guaranteed to be within range).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-27 10:20             ` Joe Buck
@ 1998-01-27 13:09               ` Jeffrey A Law
  0 siblings, 0 replies; 11+ messages in thread
From: Jeffrey A Law @ 1998-01-27 13:09 UTC (permalink / raw)
  To: Joe Buck; +Cc: ak, Lassi.Tuura, rth, egcs

  In message < 199801271819.KAA01586@atrus.synopsys.com >you write:
  > > The stuff we're talking about probably won't help that case.  This needs
  > > global constant/copy propagation into if/else statements.  It's on the
  > > list of things to do.
  > 
  > If you're going to do that, it seems you could go one step further and
  > propagate ranges (min and max values of expressions).  Constant
  > propagation is then just a special case (min == max).  With this framework
  > one could then implement efficient array bounds checking (omitting checks
  > for expressions guaranteed to be within range).
That's the problem -- generalizing from min == max to a range test on RTL isn't
trivial :-)  That's the problem I'm looking into solving on a local level
in CSE.  The fact that it's a hard problem shouldn't be a suprise to anyone
that had the joy of debugging the tree based range checking code last year --
it took 4-8 weeks to stabilize once it went into the gcc snapshots.

There's also the problem of finding cc0 on machines that use it and a host
of other (purely technical) issues.

Odds are we'll have an incremental approach to solving the problem.

  * First we'll contribute gcse/pre, which includes global constant/copy propagation
  (which is done and will be submitted once we've received final payment from
  the customer who funded the work).

  * Getting constant/copy propagation into if/else conditionals on non-cc0
  machines would probably follow next.  This is happening under another
  contract, so we'll submit this code when we get paid on for its contract :-)

  * Then handle cc0 machines.

  * Then deal with true range checking.

jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
  1998-01-27  7:35       ` Lassi A. Tuura
  1998-01-27  9:36         ` Jeffrey A Law
@ 1998-01-27 13:09         ` Andi Kleen
  1998-01-27  9:59           ` Jeffrey A Law
  1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 1998-01-27 13:09 UTC (permalink / raw)
  To: Lassi A. Tuura; +Cc: Jeffrey A Law, Richard Henderson, Joe Buck, egcs

"Lassi A. Tuura" <Lassi.Tuura@cern.ch> writes:

> On Mon, 26 Jan 1998, Jeffrey A Law wrote:
> >   > 	if (x < 7) { if (x == 9) { } }
> > Yup.  However, it's become pretty clear to me that the direction I
> > was heading needs some serious work -- like full blown range checking
> > code as found in fold_const.c, except on RTL instead of trees (yuk).
> > 
> > It's also not clear how useful that would be in practice; my hacked
> > up version of cse.c only caught a small number of cases where it could
> > improve real code instead of contrived example code.
> 
> I would think this kind of an optimisation would pay off for C++ code
> where there are multitudes of small inlined methods.  The reason is that
> the methods cannot always tell what state the object is in and hence
> the body must be guarded with checks like this:
> 
>   if (! m_data) {
>      // initialise `m_data' to something
>   }
>   // real code here
> 
> As the methods get inlined, these checks can repeat several times even if
> it may be possible to prove that none of them (or only the first one)
> needs to be evaluated.  This may, again, result in significant dead code
> elimination and opportunities for further optimisation. 
> 
> It may well be that in C this optimisation does not produce any
> significant benefits because of the way code gets written: small inlines
> just aren't used all that much. 

Another example where it could benefit: gotoless exits from nested loops:

int flag = 0;
while (X > Y) {
	while (W > Z) {
		if (SOMETHING) {
			flag = 1;
			break;
		}
		....
	}
	if (flag)
		break;
}

That happens often in real-world code.

-Andi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: switch index optimization
       [not found] <egcs.199801271819.KAA01586@atrus.synopsys.com>
@ 1998-01-28  1:10 ` Todd P. Whitesel
  0 siblings, 0 replies; 11+ messages in thread
From: Todd P. Whitesel @ 1998-01-28  1:10 UTC (permalink / raw)
  To: mlist-egcs

Joe Buck <jbuck@synopsys.com> writes:

>If you're going to do that, it seems you could go one step further and
>propagate ranges (min and max values of expressions).  Constant
>propagation is then just a special case (min == max).  With this framework
>one could then implement efficient array bounds checking (omitting checks
>for expressions guaranteed to be within range).

But wait, there's more where that came from...

Given the min,max,signedness you can compute how many "information" bits
there are. This lets you draw conclusions about how the upper bits will
behave in certain circumstances (all examples 32 bit centric):

	(short) x
if x's value (x here means any expression) can be represented in a short,
nuke the cast. If your chip's register argument convention requires yucky
casts to pass char/short types correctly then this might be valuable.

A somewhat more complicated example:

	((x * 256) / 16)
easily becomes
	((x << 8) / 16)
but you want to do better than (x + (int) ( ((unsigned) (x >> 31)) >> 28)) >> 4
when x is signed or will be promoted to signed. If you can show that (x >= 0)
and (x <= 0x00FFFFFF) then you have (a) the divide collapses to unsigned >>
	((x << 8) >> 4)
and (b) the top bits in the danger zone are already zero.
	(x << 4)

I ran into this while tuning a customer benchmark once. It was plain C, but
with heavy macro use. Fairly quickly I realized I wanted a more generic pass,
but it never happened (at least not while I was there). BTW make sure CSE
doesn't yank (x*256) into a temp until after we've checked for this trick.
More fun:

typedef long LONGEST;
func(char *source1, char *source2, char *destination, int width)
{
    int i;
    LONGEST *p = (LONGEST *) source1;
    LONGEST *q = (LONGEST *) source2;
    LONGEST *r = (LONGEST *) destination;
    width /= sizeof(LONGEST);
    width *= sizeof(LONGEST);
    for (i = width; i >= 0; i -= sizeof(LONGEST)) {
        r[i/sizeof(LONGEST)] = p[i/sizeof(LONGEST)] & q[i/sizeof(LONGEST)];
    }
}

This is a slightly misguided attempt to get good bitmap code on machines like
sparc with reg+reg addressing. Here's what I got with a close-at-hand gcc:

func:
        !#PROLOGUE# 0
        !#PROLOGUE# 1
        andcc %o3,-4,%o3
        bl .LL3
        mov %o0,%o4
.LL5:
        and %o3,-4,%g2			# _we_ know this is a virtual NOP.
        ld [%o4+%g2],%g3
        ld [%o1+%g2],%o0
        addcc %o3,-4,%o3
        and %g3,%o0,%g3
        bpos .LL5
        st %g3,[%o2+%g2]
.LL3:
        retl
        nop
.LLfe1:
        .size    func,.LLfe1-func
        .ident  "GCC: (GNU) cygnus-2.7.2-960126"

Now this suggests tracking the low bits of an expression somehow (maybe we
don't care so much about making this go fast, but perhaps there are others
like it, especially the inline-C++ scenario that's been mentioned). Hmm:

	value = (stride * "N") + offset

Here "N" represents the integers in a number-theory sense, so we should
require (stride > 0) and (offset >= 0) and (stride > offset). In the
example above, we could get to the virtual NOP instruction and see that
it is clearing the bits; but when we get here stride=4,offset=0 so we
know those bits are already clear.

Seems like this might be a neat way to take analysis from the loop optimizer
and make it useful to other optimizations. By default an expression or lvalue
has stride=1,offset=0 and as you munge it we update as appropriate. When
things get too complicated we punt and drop back to the default.

Unfortunately I have a vague feeling of dread that this will turn out to be
too-expensive / not-enough-boost to make it attractive except as a "compile
all night, we're about to go to QA" O-level.

Todd Whitesel
toddpw @ ugcs.caltech.edu

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~1998-01-28  1:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-01-25 22:34 switch index optimization Richard Henderson
1998-01-25 23:02 ` Joe Buck
1998-01-25 23:14   ` Richard Henderson
1998-01-25 23:27     ` Jeffrey A Law
1998-01-27  7:35       ` Lassi A. Tuura
1998-01-27  9:36         ` Jeffrey A Law
1998-01-27 13:09         ` Andi Kleen
1998-01-27  9:59           ` Jeffrey A Law
1998-01-27 10:20             ` Joe Buck
1998-01-27 13:09               ` Jeffrey A Law
     [not found] <egcs.199801271819.KAA01586@atrus.synopsys.com>
1998-01-28  1:10 ` Todd P. Whitesel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).