public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53243] New: Use vector comparisons for if cascades
@ 2012-05-05  4:06 drepper.fsp at gmail dot com
  2012-05-07  9:33 ` [Bug tree-optimization/53243] " rguenth at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-05-05  4:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53243

             Bug #: 53243
           Summary: Use vector comparisons for if cascades
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: drepper.fsp@gmail.com
            Target: x86_64-linux


Created attachment 27312
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27312
Test program (compile with and without -DOLD)

The vector units can compare multiple comparisons concurrently but this is not
used automatically in gcc in situations where it can lead to better
performance.  Assume a function like this:

void
f(float a)
{
  if (a < 1.0)
    cb(1);
  else if (a < 2.0)
    cb(2);
  else if (a < 3.0)
    cb(3);
  else if (a < 4.0)
    cb(4);
  else if (a < 5.0)
    cb(5);
  else if (a < 6.0)
    cb(6);
  else if (a < 7.0)
    cb(7);
  else if (a < 8.0)
    cb(8);
  else
    ++o;
}

In this case the first or second if is not marked with __builtin_expect as
likely, otherwise the following *might* not apply.

The routine can be rewritten for AVX machines like this:

void
f(float a)
{
  const __m256 fv = _mm256_set_ps(8.0,7.0,6.0,5.0,4.0,3.0,2.0,1.0);
  __m256 r = _mm256_cmp_ps(fv, _mm256_set1_ps(a), _CMP_LT_OS);
  int i = _mm256_movemask_ps(r);
  asm goto ("bsr %0, %0; jz %l[less1]; .pushsection .rodata; 1: .quad %l2, %l3,
%l4, %l5, %l6, %l7, %l8, %l9; .popsection; jmp *1b(,%0,8)" : : "r" (i) : :
less1, less2, less3, less4, less5, less6, less7, less8, gt8);
  __builtin_unreachable ();
 less1:
  cb(1);
  return;
 less2:
  cb(2);
  return;
 less3:
  cb(3);
  return;
 less4:
  cb(4);
  return;
 less5:
  cb(5);
  return;
 less6:
  cb(6);
  return;
 less7:
  cb(7);
  return;
 less8:
  cb(8);
  return;
 gt8:
  ++o;
}

This might not generate the absolute best code but it runs for the test program
which I attach 20% faster.

The same technique can be applied to integer comparisons.  More complex if
cascades can also be simplified a lot by masking the integer bsr result
accordingly.  This should still be faster.


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug tree-optimization/53243] Use vector comparisons for if cascades
  2012-05-05  4:06 [Bug tree-optimization/53243] New: Use vector comparisons for if cascades drepper.fsp at gmail dot com
@ 2012-05-07  9:33 ` rguenth at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-07  9:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53243

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-05-07
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-07 09:31:59 UTC ---
How does performance look like when you do not have random distribution of
comparison results but branch pedictors can do a good job?  Thus, are
branch pedictors not disabled by this transform, only retaining branch
target buffering?

Your code can probably be optimized further by recognizing that only the
parameter to cb is conditional if we don't take the else branch, thus do

   if (!(a < 8.0))
     ++o;
   else
     do vector compare, compute cb argument from result and call cb

doing no branches at all.

I don't see where your or the above transform would best fit into.
It's straight-line code as written, still we'd need to perform
some if-conversion (difficult for calls, so only pairs with the above
idea) and then vectorization.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-05-07  9:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-05  4:06 [Bug tree-optimization/53243] New: Use vector comparisons for if cascades drepper.fsp at gmail dot com
2012-05-07  9:33 ` [Bug tree-optimization/53243] " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).