Switch statement optimization

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Switch statement optimization
@ 2022-04-18 14:41 Paul Koning
  2022-04-19 14:33 ` Martin Liška
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Koning @ 2022-04-18 14:41 UTC (permalink / raw)
  To: GCC Development

In switch statements with dense case values, the typical result is a jump table, which is fast.  If the values are sparse, a tree of compares is generated instead.

What if nearly all cases are dense but there are a few outliers?  An example appears in the NFS protocol parser, where you get a switch statement with cases for each of the opcode values.  All but one are small integers assigned in sequence, but one is 10044.  So the "sparse" case kicks in and a compare tree is generated for everything.

I can avoid this by putting in special case code for the 10044 case, then all the rest ends up being a jump table.  That brings up the question if GCC should recognize such scenarios and break up the switch statement into "dense parts" handled by a jump table, leaving the sorting between those as a compare tree.

	paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Switch statement optimization
  2022-04-18 14:41 Switch statement optimization Paul Koning
@ 2022-04-19 14:33 ` Martin Liška
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Liška @ 2022-04-19 14:33 UTC (permalink / raw)
  To: Paul Koning, GCC Development

On 4/18/22 16:41, Paul Koning via Gcc wrote:
> In switch statements with dense case values, the typical result is a jump table, which is fast.  If the values are sparse, a tree of compares is generated instead.
> 
> What if nearly all cases are dense but there are a few outliers?  An example appears in the NFS protocol parser, where you get a switch statement with cases for each of the opcode values.  All but one are small integers assigned in sequence, but one is 10044.  So the "sparse" case kicks in and a compare tree is generated for everything.
> 
> I can avoid this by putting in special case code for the 10044 case, then all the rest ends up being a jump table.  That brings up the question if GCC should recognize such scenarios and break up the switch statement into "dense parts" handled by a jump table, leaving the sorting between those as a compare tree.

Hello.

We currently support identification of such dense/interesting groups of case values (we name them clusters).
See e.g. gcc/testsuite/gcc.dg/tree-ssa/switch-1.c where you can have a decision tree that contains
mix of individual cases, jump-tables (JT) and bit-tests (BT).

Can you please share your test-case so that I can analyze it?
You can investigate with -fdump-tree-switchlower1.

Cheers.
Martin

> 
> 	paul
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Switch statement optimization
  2006-07-10 14:48 Joern RENNECKE
@ 2006-07-11 10:03 ` Christian Hildner
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Hildner @ 2006-07-11 10:03 UTC (permalink / raw)
  To: Joern RENNECKE; +Cc: Ben Elliston, gcc mailing list

Joern RENNECKE schrieb:

> In http://gcc.gnu.org/ml/gcc/2006-07/msg00156.html, your wrote:
>
> We could use a cheap hash and base start compare / branch trees in 
> every hash bin.
> Say we have a 16 k range, 200 nodes, and want to keep the hash 
> bin/node ratio between 2 and 4.
> Let i be the switch argument.  Then we can calculate a 9 bit hash as
> (i ^ (x << n)) & 0x3fffffff, where n is a value between 5 and 9.  We 
> can pick the one which produces the flattest
> average search trees.
> Note that we no longer need to check that i is in range, as for 
> ordinary switch dispatch tables.  

What I suggest is to implement a cost-estimation based decision. This 
should include an architecture dependent modelling that could deliver an 
exact cost (in terms of size) for size-optimization and as well a good 
approximation for speed. For example in my case of 1500 labels inside 
the 16K range I would perfer a branch table instead of a binary compare 
tree. The size of the table is 16K * 4/8 bytes and this is really ok for 
a huge program. The frequently used entries are likely to be in the 
cache, so I expect the whole thing to perform well. However if I would 
optimize it for my case that may not help in general? So all 
modifications should be verified with a common standard benchmark like 
SPEC. Is there someone on the list who wants to support me with these 
tests or may deliver models for common architectures?

Christian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Switch statement optimization
@ 2006-07-10 14:48 Joern RENNECKE
  2006-07-11 10:03 ` Christian Hildner
  0 siblings, 1 reply; 6+ messages in thread
From: Joern RENNECKE @ 2006-07-10 14:48 UTC (permalink / raw)
  To: Ben Elliston; +Cc: Christian Hildner, gcc mailing list

In http://gcc.gnu.org/ml/gcc/2006-07/msg00156.html, your wrote:

> A paper at this year's GCC Summit talked about this:
>   http://www.gccsummit.org/2006/view_abstract.php?content_key=18
> You might like to follow up with Edmar (the author of the paper).

But that was about optimizing the trees for an uneven probability distribution which could be found by profiling.
This does not address the issue what to do when the distribution is mostly uniform or unknown.

We could use a cheap hash and base start compare / branch trees in every hash bin.
Say we have a 16 k range, 200 nodes, and want to keep the hash bin/node ratio between 2 and 4.
Let i be the switch argument.  Then we can calculate a 9 bit hash as
(i ^ (x << n)) & 0x3fffffff, where n is a value between 5 and 9.  We can pick the one which produces the flattest
average search trees.
Note that we no longer need to check that i is in range, as for ordinary switch dispatch tables.

Moreover, on a target that can do effectively multi-way compares like the x86, in order
to increase ILP, we can put into the table entry for each hash bin, in addition to the
jump address, a value to compare i against before the dispatch jump takes place.
If a cheap hash can be chosen so that there is no more than one entry per hash bin, we
can even do a single compare in the dispatch code, go to the default case for non-match,
and dispatch right to the handling code if we have a match.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Switch statement optimization
  2006-07-10  9:39 Christian Hildner
@ 2006-07-10 11:06 ` Ben Elliston
  0 siblings, 0 replies; 6+ messages in thread
From: Ben Elliston @ 2006-07-10 11:06 UTC (permalink / raw)
  To: Christian Hildner; +Cc: gcc

A paper at this year's GCC Summit talked about this:

  http://www.gccsummit.org/2006/view_abstract.php?content_key=18

You might like to follow up with Edmar (the author of the paper).

Cheers, Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Switch statement optimization
@ 2006-07-10  9:39 Christian Hildner
  2006-07-10 11:06 ` Ben Elliston
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Hildner @ 2006-07-10  9:39 UTC (permalink / raw)
  To: gcc

Hi,

while dealing with autogenerated code I found that GCC often outputs compare trees instead of branch tables for switch statements. I found that GCC uses the density of case labels within their spanned range as the key criterion. If the density of labels is smaller than 10% (33% for size optimization) then GCC uses compare instruction rather than a branch table. What I am wondering is why no real cost estimation is done here. In my case there are around 200 to 1500 labels within the 16K range, so they are completely resolved to compares. In contrast the Intel compiler outputs branch tables more likely.

The question now is whether the factors 10 (3 for size opt) have been found using any standard benchmark. And shouldn't at least the factor be optimized for a standard benchmark (like the upcoming SPEC CPU 2006 INT) if this simple linear method is used for modeling a logarithmic behavior? Or should a cost estimation be introduced?

For size optimization there could be a quite accurate architecture dependent size estimation.

Opinions, Comments?

Christian

PS: Please CC me.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-19 14:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18 14:41 Switch statement optimization Paul Koning
2022-04-19 14:33 ` Martin Liška
  -- strict thread matches above, loose matches on Subject: below --
2006-07-10 14:48 Joern RENNECKE
2006-07-11 10:03 ` Christian Hildner
2006-07-10  9:39 Christian Hildner
2006-07-10 11:06 ` Ben Elliston

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).