public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/63464] New: compare one character to many: faster
@ 2014-10-05 21:28 glisse at gcc dot gnu.org
  2014-10-06 11:27 ` [Bug tree-optimization/63464] " jakub at gcc dot gnu.org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-10-05 21:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63464

            Bug ID: 63464
           Summary: compare one character to many: faster
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org

This is inspired by reading this post:

http://stackoverflow.com/questions/26124620/why-does-msvc-emit-a-useless-movsx-before-performing-this-bit-test

which shows that bit testing can provide a very efficient lookup table for
small sizes, and in particular when we want to test if a character belongs to a
predefined set.

char*f1(char*s){
  while (*s == ' ' || *s == ',' || *s == '\r' || *s == '\n')
    ++s;
  return s;
}
char*f2(char*s){
  long n = (1L << ' ') | (1L << ',') | (1L << '\r') | (1L << '\n');
  int m = max(max(' ',','),max('\r','\n'));
  while(*s <= m && (n & (1L << *s)))
    ++s;
  return s;
}

On x86_64, the first one compiles to a bunch of cmpb+sete, combined with orl.
The second one has one cmpb+jg but uses btq+jc for the main job. A simple
benchmark on a string full of ',' shows running times of 158 vs 32, a very
large win for f2 (suspicious actually, but if I only test ' ', ',' and '\r', I
get the less surprising 50 for f1, which still shows f2 far ahead).

As is, it only works with characters less than 64, but more general versions
are possible with a bit more overhead (like applying the same to *s-64 after
testing its sign, or looking up the (*s/64)th element in a regular lookup table
and bit testing against that, etc).

The running time of 32 is exactly the same I get with a larger lookup table:
char issep[256]={0,0,...};
while(issep[*s])...

In this particular case, vectorization might also be an option, either the loop
kind if the string is likely to be long, but we don't even try because we can't
compute the number of iterations, or the slp kind by broadcasting *s, comparing
to all chars at once and reducing.

This PR is a bit vague and we cannot implement every weird optimization, but I
expect this type of comparison might be common enough that it would be worth
looking at. If you disagree, feel free to close, I never write code like that
myself ;-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-01-16 17:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-05 21:28 [Bug tree-optimization/63464] New: compare one character to many: faster glisse at gcc dot gnu.org
2014-10-06 11:27 ` [Bug tree-optimization/63464] " jakub at gcc dot gnu.org
2014-10-06 11:52 ` rguenther at suse dot de
2014-10-06 16:59 ` jakub at gcc dot gnu.org
2014-10-07 11:11 ` jakub at gcc dot gnu.org
2014-10-07 11:15 ` rguenther at suse dot de
2014-10-07 20:15 ` glisse at gcc dot gnu.org
2014-10-07 21:01 ` glisse at gcc dot gnu.org
2014-10-10 12:16 ` jakub at gcc dot gnu.org
2014-10-13 15:51 ` jakub at gcc dot gnu.org
2014-10-13 15:53 ` jakub at gcc dot gnu.org
2014-10-15 14:19 ` jakub at gcc dot gnu.org
2014-10-17 10:55 ` jakub at gcc dot gnu.org
2014-10-17 11:20 ` jakub at gcc dot gnu.org
2014-10-17 11:49 ` ubizjak at gmail dot com
2015-01-16 17:14 ` jakub at gcc dot gnu.org
2015-01-16 17:28 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).