public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r12-3164] Improved handling of shifts/rotates in bit CCP.
@ 2021-08-26 17:58 Roger Sayle
  0 siblings, 0 replies; only message in thread
From: Roger Sayle @ 2021-08-26 17:58 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:b2ef23239f245871e9b35b902391f2e94a041627

commit r12-3164-gb2ef23239f245871e9b35b902391f2e94a041627
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Thu Aug 26 18:57:00 2021 +0100

    Improved handling of shifts/rotates in bit CCP.
    
    This patch is the next in the series to improve bit bounds in tree-ssa's
    bit CCP pass, this time: bounds for shifts and rotates by unknown amounts.
    This allows us to optimize expressions such as ((x&15)<<(y&24))&64.
    In this case, the expression (y&24) contains only two unknown bits,
    and can therefore have only four possible values: 0, 8, 16 and 24.
    From this (x&15)<<(y&24) has the nonzero bits 0x0f0f0f0f, and from
    that ((x&15)<<(y&24))&64 must always be zero.
    
    One clever use of computer science in this patch is the use of XOR
    to efficiently enumerate bit patterns in Gray code order.  As the
    order in which we generate values is not significant, it's faster
    and more convenient to enumerate values by flipping one bit at a
    time, rather than in numerical order [which would require carry
    bits and additional logic].
    
    There's a pre-existing ??? comment in tree-ssa-ccp.c that we should
    eventually be able to optimize (x<<(y|8))&255, but this patch takes the
    conservatively paranoid approach of only optimizing cases where the
    shift/rotate is guaranteed to be less than the target precision, and
    therefore avoids changing any cases that potentially might invoke
    undefined behavior.  This patch does optimize (x<<((y&31)|8))&255.
    
    2021-08-26  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/ChangeLog
            * tree-ssa-ccp.c (get_individual_bits): Helper function to
            extract the individual bits from a widest_int constant (mask).
            (gray_code_bit_flips): New read-only table for effiently
            enumerating permutations/combinations of bits.
            (bit_value_binop) [LROTATE_EXPR, RROTATE_EXPR]: Handle rotates
            by unknown counts that are guaranteed less than the target
            precision and four or fewer unknown bits by enumeration.
            [LSHIFT_EXPR, RSHIFT_EXPR]: Likewise, also handle shifts by
            enumeration under the same conditions.  Handle remaining
            shifts as a mask based upon the minimum possible shift value.
    
    gcc/testsuite/ChangeLog
            * gcc.dg/tree-ssa/ssa-ccp-41.c: New test case.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-41.c |  11 ++
 gcc/tree-ssa-ccp.c                         | 160 +++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-41.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-41.c
new file mode 100644
index 00000000000..d2b054e9355
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-41.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(int x)
+{
+    int p = x & 24;
+    int r = 1 << p; 
+    return r & (1<<17);
+}
+
+/* { dg-final { scan-tree-dump "return 0;" "optimized" } } */
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 1a94aebb0e4..f4a99aca4b4 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -1448,6 +1448,34 @@ bit_value_mult_const (signop sgn, int width,
   *mask = wi::ext (sum_mask, width, sgn);
 }
 
+/* Fill up to MAX values in the BITS array with values representing
+   each of the non-zero bits in the value X.  Returns the number of
+   bits in X (capped at the maximum value MAX).  For example, an X
+   value 11, places 1, 2 and 8 in BITS and returns the value 3.  */
+
+unsigned int
+get_individual_bits (widest_int *bits, widest_int x, unsigned int max)
+{
+  unsigned int count = 0;
+  while (count < max && x != 0)
+    {
+      int bitpos = wi::ctz (x);
+      bits[count] = wi::lshift (1, bitpos);
+      x ^= bits[count];
+      count++;
+    }
+  return count;
+}
+
+/* Array of 2^N - 1 values representing the bits flipped between
+   consecutive Gray codes.  This is used to efficiently enumerate
+   all permutations on N bits using XOR.  */
+static const unsigned char gray_code_bit_flips[63] = {
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0
+};
 
 /* Apply the operation CODE in type TYPE to the value, mask pairs
    R1VAL, R1MASK and R2VAL, R2MASK representing a values of type R1TYPE
@@ -1525,6 +1553,48 @@ bit_value_binop (enum tree_code code, signop sgn, int width,
 		}
 	    }
 	}
+      else if (wi::ltu_p (r2val | r2mask, width)
+	       && wi::popcount (r2mask) <= 4)
+	{
+	  widest_int bits[4];
+	  widest_int res_val, res_mask;
+	  widest_int tmp_val, tmp_mask;
+	  widest_int shift = wi::bit_and_not (r2val, r2mask);
+	  unsigned int bit_count = get_individual_bits (bits, r2mask, 4);
+	  unsigned int count = (1 << bit_count) - 1;
+
+	  /* Initialize result to rotate by smallest value of shift.  */
+	  if (code == RROTATE_EXPR)
+	    {
+	      res_mask = wi::rrotate (r1mask, shift, width);
+	      res_val = wi::rrotate (r1val, shift, width);
+	    }
+	  else
+	    {
+	      res_mask = wi::lrotate (r1mask, shift, width);
+	      res_val = wi::lrotate (r1val, shift, width);
+	    }
+
+	  /* Iterate through the remaining values of shift.  */
+	  for (unsigned int i=0; i<count; i++)
+	    {
+	      shift ^= bits[gray_code_bit_flips[i]];
+	      if (code == RROTATE_EXPR)
+		{
+		  tmp_mask = wi::rrotate (r1mask, shift, width);
+		  tmp_val = wi::rrotate (r1val, shift, width);
+		}
+	      else
+		{
+		  tmp_mask = wi::lrotate (r1mask, shift, width);
+		  tmp_val = wi::lrotate (r1val, shift, width);
+		}
+	      /* Accumulate the result.  */
+	      res_mask |= tmp_mask | (res_val ^ tmp_val);
+	    }
+	  *val = wi::bit_and_not (res_val, res_mask);
+	  *mask = res_mask;
+	}
       break;
 
     case LSHIFT_EXPR:
@@ -1556,6 +1626,96 @@ bit_value_binop (enum tree_code code, signop sgn, int width,
 		}
 	    }
 	}
+      else if (wi::ltu_p (r2val | r2mask, width))
+	{
+	  if (wi::popcount (r2mask) <= 4)
+	    {
+	      widest_int bits[4];
+	      widest_int arg_val, arg_mask;
+	      widest_int res_val, res_mask;
+	      widest_int tmp_val, tmp_mask;
+	      widest_int shift = wi::bit_and_not (r2val, r2mask);
+	      unsigned int bit_count = get_individual_bits (bits, r2mask, 4);
+	      unsigned int count = (1 << bit_count) - 1;
+
+	      /* Initialize result to shift by smallest value of shift.  */
+	      if (code == RSHIFT_EXPR)
+		{
+		  arg_mask = wi::ext (r1mask, width, sgn);
+		  arg_val = wi::ext (r1val, width, sgn);
+		  res_mask = wi::rshift (arg_mask, shift, sgn);
+		  res_val = wi::rshift (arg_val, shift, sgn);
+		}
+	      else
+		{
+		  arg_mask = r1mask;
+		  arg_val = r1val;
+		  res_mask = arg_mask << shift;
+		  res_val = arg_val << shift;
+		}
+
+	      /* Iterate through the remaining values of shift.  */
+	      for (unsigned int i=0; i<count; i++)
+		{
+		  shift ^= bits[gray_code_bit_flips[i]];
+		  if (code == RSHIFT_EXPR)
+		    {
+		      tmp_mask = wi::rshift (arg_mask, shift, sgn);
+		      tmp_val = wi::rshift (arg_val, shift, sgn);
+		    }
+		  else
+		    {
+		      tmp_mask = arg_mask << shift;
+		      tmp_val = arg_val << shift;
+		    }
+		  /* Accumulate the result.  */
+		  res_mask |= tmp_mask | (res_val ^ tmp_val);
+		}
+	      res_mask = wi::ext (res_mask, width, sgn);
+	      res_val = wi::ext (res_val, width, sgn);
+	      *val = wi::bit_and_not (res_val, res_mask);
+	      *mask = res_mask;
+	    }
+	  else if ((r1val | r1mask) == 0)
+	    {
+	      /* Handle shifts of zero to avoid undefined wi::ctz below.  */
+	      *mask = 0;
+	      *val = 0;
+	    }
+	  else if (code == LSHIFT_EXPR)
+	    {
+	      widest_int tmp = wi::mask <widest_int> (width, false);
+	      tmp <<= wi::ctz (r1val | r1mask);
+	      tmp <<= wi::bit_and_not (r2val, r2mask);
+	      *mask = wi::ext (tmp, width, sgn);
+	      *val = 0;
+	    }
+	  else if (!wi::neg_p (r1val | r1mask, sgn))
+	    {
+	      /* Logical right shift, or zero sign bit.  */
+	      widest_int arg = r1val | r1mask;
+	      int lzcount = wi::clz (arg);
+	      lzcount -= wi::get_precision (arg) - width;
+	      widest_int tmp = wi::mask <widest_int> (width, false);
+	      tmp = wi::lrshift (tmp, lzcount);
+	      tmp = wi::lrshift (tmp, wi::bit_and_not (r2val, r2mask));
+	      *mask = wi::ext (tmp, width, sgn);
+	      *val = 0;
+	    }
+	  else if (!wi::neg_p (r1mask))
+	    {
+	      /* Arithmetic right shift with set sign bit.  */
+	      widest_int arg = wi::bit_and_not (r1val, r1mask);
+	      int sbcount = wi::clrsb (arg);
+	      sbcount -= wi::get_precision (arg) - width;
+	      widest_int tmp = wi::mask <widest_int> (width, false);
+	      tmp = wi::lrshift (tmp, sbcount);
+	      tmp = wi::lrshift (tmp, wi::bit_and_not (r2val, r2mask));
+	      *mask = wi::sext (tmp, width);
+	      tmp = wi::bit_not (tmp);
+	      *val = wi::sext (tmp, width);
+	    }
+	}
       break;
 
     case PLUS_EXPR:


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-08-26 17:58 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26 17:58 [gcc r12-3164] Improved handling of shifts/rotates in bit CCP Roger Sayle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).