public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
@ 2020-10-08  9:58 Aldy Hernandez
  2020-10-08 10:22 ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Aldy Hernandez @ 2020-10-08  9:58 UTC (permalink / raw)
  To: gcc-patches

I am quoting my analysis from the PR.  Could an aarch64 expert 
pontificate here?

This test is checking the final assembly for a specific sequence.  I 
don't speak aarch64 assembly, but the IL is different coming out of evrp.

The first culprit is this difference in the mergephi1 dump:

    _9 = .CTZ (x_6(D));
-  _10 = _9 & 31;
+  _10 = _9;

These are unsigned ints, so assuming they are 32 bits on aarch64, 
__builtin_ctz is always less than 32.  This is because a CTZ of 0 is 
undefined according to the GCC manual:

[[
Built-in Function: int __builtin_ctz (unsigned int x)

     Returns the number of trailing 0-bits in x, starting at the least 
significant bit position. If x is 0, the result is undefined.
]]

So a bitwise AND of anything less than 32 with 0x1f (31) is a no-op.

Here are the full IL differences:

--- legacy-evrp/pr90838.c.038t.mergephi1        2020-10-07 
08:44:12.152358885 -0400
+++ ranger/pr90838.c.038t.mergephi1     2020-10-07 08:39:12.339296502 -0400
@@ -1,41 +1,41 @@

  ;; Function ctz1 (ctz1, funcdef_no=0, decl_uid=3587, cgraph_uid=1, 
symbol_order=0)

  ctz1 (unsigned int x)
  {
    static const char table[32] = 
"\x00\x01\x1c\x02\x1d\x0e\x18\x03\x1e\x16\x14\x0f\x19\x11\x04\b\x1f\x1b\r\x17\x15\x13\x10\x07\x1a\f\x12\x06\v\x05\n\t";
    unsigned int _1;
    unsigned int _2;
    unsigned int _3;
    unsigned int _4;
    char _5;
    int _9;
    int _10;

    <bb 2> :
    _1 = -x_6(D);
    _2 = _1 & x_6(D);
    _3 = _2 * 125613361;
    _4 = _3 >> 27;
    _9 = .CTZ (x_6(D));
-  _10 = _9 & 31;
+  _10 = _9;
    _5 = (char) _10;
    return _10;

  }



  ;; Function ctz2 (ctz2, funcdef_no=1, decl_uid=3591, cgraph_uid=2, 
symbol_order=1)

  ctz2 (unsigned int x)
  {
    static short int table[64] = {32, 0, 1, 12, 2, 6, 0, 13, 3, 0, 7, 0, 
0, 0, 0, 14, 10, 4, 0, 0, 8, 0, 0, 25, 0, 0, 0, 0, 0, 21, 27, 15, 31, 
11, 5, 0, 0, 0, 0, 0, 9, 0, 0,
24, 0, 0, 20, 26, 30, 0, 0, 0, 0, 23, 0, 19, 29, 0, 22, 18, 28, 17, 16, 0};
    unsigned int _1;
    unsigned int _2;
    unsigned int _3;
    short int _4;
    int _8;

    <bb 2> :
    _1 = -x_5(D);
@@ -87,27 +87,27 @@


  ;; Function ctz4 (ctz4, funcdef_no=3, decl_uid=3601, cgraph_uid=4, 
symbol_order=5)

  ctz4 (long unsigned int x)
  {
    long unsigned int lsb;
    long unsigned int _1;
    long long unsigned int _2;
    long long unsigned int _3;
    char _4;
    int _9;
    int _10;

    <bb 2> :
    _1 = -x_5(D);
    lsb_6 = _1 & x_5(D);
    _2 = lsb_6 * 283881067100198605;
    _3 = _2 >> 58;
    _9 = .CTZ (x_5(D));
-  _10 = _9 & 63;
+  _10 = _9;
    _4 = (char) _10;
    return _10;

  }

The difference in assembly matches.  We have 2 less AND's in the final 
output:

$ diff -u legacy.s ranger.s
--- legacy.s    2020-10-07 09:06:13.420446783 -0400
+++ ranger.s    2020-10-07 09:06:42.646646949 -0400
@@ -8,7 +8,6 @@
  ctz1:
         rbit    w0, w0
         clz     w0, w0
-       and     w0, w0, 31
         ret
         .size   ctz1, .-ctz1
         .align  2
@@ -36,7 +35,6 @@
  ctz4:
         rbit    x0, x0
         clz     x0, x0
-       and     w0, w0, 63
         ret
         .size   ctz4, .-ctz4

If my analysis is correct, we could just remove the line checking for 
"and", or perhaps check that we don't have any and's.

OK for trunk?
Aldy

     gcc/testsuite/ChangeLog:

             PR target/97312
             * gcc.target/aarch64/pr90838.c: Remove scan for AND.

diff --git a/gcc/testsuite/gcc.target/aarch64/pr90838.c 
b/gcc/testsuite/gcc.target/aarch64/pr90838.c
index e1e19ac6a61..76cd5e18d2e 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr90838.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr90838.c
@@ -60,5 +60,4 @@ int ctz4 (unsigned long x)
  }

  /* { dg-final { scan-assembler-times "clz\t" 4 } } */
-/* { dg-final { scan-assembler-times "and\t" 2 } } */
  /* { dg-final { scan-assembler-not "cmp\t.*0" } } */


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08  9:58 [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c Aldy Hernandez
@ 2020-10-08 10:22 ` Jakub Jelinek
  2020-10-08 10:27   ` Jakub Jelinek
  2020-10-08 13:54   ` [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801] Jakub Jelinek
  0 siblings, 2 replies; 18+ messages in thread
From: Jakub Jelinek @ 2020-10-08 10:22 UTC (permalink / raw)
  To: Aldy Hernandez, Richard Biener; +Cc: gcc-patches

On Thu, Oct 08, 2020 at 11:58:21AM +0200, Aldy Hernandez via Gcc-patches wrote:
> I am quoting my analysis from the PR.  Could an aarch64 expert pontificate
> here?
> 
> This test is checking the final assembly for a specific sequence.  I don't
> speak aarch64 assembly, but the IL is different coming out of evrp.
> 
> The first culprit is this difference in the mergephi1 dump:
> 
>    _9 = .CTZ (x_6(D));
> -  _10 = _9 & 31;
> +  _10 = _9;
> 
> These are unsigned ints, so assuming they are 32 bits on aarch64,
> __builtin_ctz is always less than 32.  This is because a CTZ of 0 is
> undefined according to the GCC manual:
> 
> [[
> Built-in Function: int __builtin_ctz (unsigned int x)
> 
>     Returns the number of trailing 0-bits in x, starting at the least
> significant bit position. If x is 0, the result is undefined.

In user __builtin_ctz yes, but it isn't that easy.

Aarch64 defines CTZ_DEFINED_VALUE_AT_ZERO to 2 and sets value to bitsize
of the mode (i.e. 32 in this case), 

And documentation says:
 -- Macro: CLZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE)
 -- Macro: CTZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE)
     A C expression that indicates whether the architecture defines a
     value for 'clz' or 'ctz' with a zero operand.  A result of '0'
     indicates the value is undefined.  If the value is defined for only
     the RTL expression, the macro should evaluate to '1'; if the value
     applies also to the corresponding optab entry (which is normally
     the case if it expands directly into the corresponding RTL), then
     the macro should evaluate to '2'.  In the cases where the value is
     defined, VALUE should be set to this value.

At least for *_DEFINED_VALUE_AT_ZERO (*) == 2, I'm afraid VRP can't
assume it is undefined at zero, but needs to use a range that covers
both the values for non-zero operand and the VALUE from
CTZ_DEFINED_VALUE_AT_ZERO, so [0, 32] range in this case, because
earlier GIMPLE passes could have already optimized away e.g. a != 0 check.
See simplify_count_trailing_zeroes in forwprop.

Perhaps another way out of this would be document and enforce that
__builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2, and then
we would need to make sure that e.g. in simplify_count_trailing_zeroes
we emit a .CTZ or __builtin_ctz call depending on whether it is undefined
there or not; or give .C[TL]Z an additional argument (0/1) which would tell
if it is defined at zero or not.

We have several enhancement reports in bugzilla from Gabriel Ravier on this
topic I think.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08 10:22 ` Jakub Jelinek
@ 2020-10-08 10:27   ` Jakub Jelinek
  2020-10-08 13:54   ` [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801] Jakub Jelinek
  1 sibling, 0 replies; 18+ messages in thread
From: Jakub Jelinek @ 2020-10-08 10:27 UTC (permalink / raw)
  To: Aldy Hernandez, Richard Biener, gcc-patches

On Thu, Oct 08, 2020 at 12:22:11PM +0200, Jakub Jelinek via Gcc-patches wrote:
> We have several enhancement reports in bugzilla from Gabriel Ravier on this
> topic I think.

See e.g. PR94801, PR94793, PR95863 on the topic.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801]
  2020-10-08 10:22 ` Jakub Jelinek
  2020-10-08 10:27   ` Jakub Jelinek
@ 2020-10-08 13:54   ` Jakub Jelinek
  2020-10-08 14:28     ` Aldy Hernandez
  1 sibling, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2020-10-08 13:54 UTC (permalink / raw)
  To: Aldy Hernandez, Richard Biener; +Cc: gcc-patches

On Thu, Oct 08, 2020 at 12:22:11PM +0200, Jakub Jelinek via Gcc-patches wrote:
> Perhaps another way out of this would be document and enforce that
> __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
> calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2

The following patch implements that, i.e. __builtin_c?z* now take full
advantage of them being UB at zero, while the ifns are well defined at zero
if *_DEFINED_VALUE_AT_ZERO (*) == 2.  That is what fixes PR94801.

Furthermore, to fix PR97312, if it is well defined at zero and the value at
zero is prec, we don't lower the maximum unless the argument is known to be
non-zero.
For gimple-range.cc I guess we could improve it if needed e.g. by returning
a [0,7][32,32] range for .CTZ of e.g. [0,137], but for now it (roughly)
matches what vr-values.c does.

Ok for trunk if it passes bootstrap/regtest?

2020-10-08  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/94801
	PR target/97312
	* vr-values.c (vr_values::extract_range_basic) <CASE_CFN_CLZ,
	CASE_CFN_CTZ>: When stmt is not an internal-fn call or
	C?Z_DEFINED_VALUE_AT_ZERO is not 2, assume argument is not zero
	and thus use [0, prec-1] range unless it can be further improved.
	For CTZ, don't update maxi from upper bound if it was previously prec.
	* gimple-range.cc (gimple_ranger::range_of_builtin_call) <CASE_CFN_CLZ,
	CASE_CFN_CTZ>: Likewise.

	* gcc.dg/tree-ssa/pr94801.c: New test.

--- gcc/vr-values.c.jj	2020-10-07 10:47:47.065983121 +0200
+++ gcc/vr-values.c	2020-10-08 15:23:56.042631592 +0200
@@ -1208,34 +1208,42 @@ vr_values::extract_range_basic (value_ra
 	  mini = 0;
 	  maxi = 1;
 	  goto bitop_builtin;
-	  /* __builtin_c[lt]z* return [0, prec-1], except for
+	  /* __builtin_clz* return [0, prec-1], except for
 	     when the argument is 0, but that is undefined behavior.
-	     On many targets where the CLZ RTL or optab value is defined
-	     for 0 the value is prec, so include that in the range
-	     by default.  */
+	     Always handle __builtin_clz* which can be only written
+	     by user as UB on 0 and so [0, prec-1] range, and the internal-fn
+	     calls depending on how CLZ_DEFINED_VALUE_AT_ZERO is defined.  */
 	CASE_CFN_CLZ:
 	  arg = gimple_call_arg (stmt, 0);
 	  prec = TYPE_PRECISION (TREE_TYPE (arg));
 	  mini = 0;
-	  maxi = prec;
+	  maxi = prec - 1;
 	  mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-	  if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
-	      && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov)
-	      /* Handle only the single common value.  */
-	      && zerov != prec)
-	    /* Magic value to give up, unless vr0 proves
-	       arg is non-zero.  */
-	    mini = -2;
+	  if (gimple_call_internal_p (stmt))
+	    {
+	      if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
+		  && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+		{
+		  /* Handle only the single common value.  */
+		  if (zerov == prec)
+		    maxi = prec;
+		  /* Magic value to give up, unless vr0 proves
+		     arg is non-zero.  */
+		  else
+		    mini = -2;
+		}
+	    }
 	  if (TREE_CODE (arg) == SSA_NAME)
 	    {
 	      const value_range_equiv *vr0 = get_value_range (arg);
 	      /* From clz of VR_RANGE minimum we can compute
 		 result maximum.  */
 	      if (vr0->kind () == VR_RANGE
-		  && TREE_CODE (vr0->min ()) == INTEGER_CST)
+		  && TREE_CODE (vr0->min ()) == INTEGER_CST
+		  && integer_nonzerop (vr0->min ()))
 		{
 		  maxi = prec - 1 - tree_floor_log2 (vr0->min ());
-		  if (maxi != prec)
+		  if (mini == -2)
 		    mini = 0;
 		}
 	      else if (vr0->kind () == VR_ANTI_RANGE
@@ -1251,9 +1259,14 @@ vr_values::extract_range_basic (value_ra
 	      if (vr0->kind () == VR_RANGE
 		  && TREE_CODE (vr0->max ()) == INTEGER_CST)
 		{
-		  mini = prec - 1 - tree_floor_log2 (vr0->max ());
-		  if (mini == prec)
-		    break;
+		  int newmini = prec - 1 - tree_floor_log2 (vr0->max ());
+		  if (newmini == prec)
+		    {
+		      if (maxi == prec)
+			mini = prec;
+		    }
+		  else
+		    mini = newmini;
 		}
 	    }
 	  if (mini == -2)
@@ -1261,27 +1274,30 @@ vr_values::extract_range_basic (value_ra
 	  goto bitop_builtin;
 	  /* __builtin_ctz* return [0, prec-1], except for
 	     when the argument is 0, but that is undefined behavior.
-	     If there is a ctz optab for this mode and
-	     CTZ_DEFINED_VALUE_AT_ZERO, include that in the range,
-	     otherwise just assume 0 won't be seen.  */
+	     Always handle __builtin_ctz* which can be only written
+	     by user as UB on 0 and so [0, prec-1] range, and the internal-fn
+	     calls depending on how CTZ_DEFINED_VALUE_AT_ZERO is defined.  */
 	CASE_CFN_CTZ:
 	  arg = gimple_call_arg (stmt, 0);
 	  prec = TYPE_PRECISION (TREE_TYPE (arg));
 	  mini = 0;
 	  maxi = prec - 1;
 	  mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-	  if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
-	      && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov))
+	  if (gimple_call_internal_p (stmt))
 	    {
-	      /* Handle only the two common values.  */
-	      if (zerov == -1)
-		mini = -1;
-	      else if (zerov == prec)
-		maxi = prec;
-	      else
-		/* Magic value to give up, unless vr0 proves
-		   arg is non-zero.  */
-		mini = -2;
+	      if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
+		  && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+		{
+		  /* Handle only the two common values.  */
+		  if (zerov == -1)
+		    mini = -1;
+		  else if (zerov == prec)
+		    maxi = prec;
+		  else
+		    /* Magic value to give up, unless vr0 proves
+		       arg is non-zero.  */
+		    mini = -2;
+		}
 	    }
 	  if (TREE_CODE (arg) == SSA_NAME)
 	    {
@@ -1300,10 +1316,16 @@ vr_values::extract_range_basic (value_ra
 	      if (vr0->kind () == VR_RANGE
 		  && TREE_CODE (vr0->max ()) == INTEGER_CST)
 		{
-		  maxi = tree_floor_log2 (vr0->max ());
-		  /* For vr0 [0, 0] give up.  */
-		  if (maxi == -1)
-		    break;
+		  int newmaxi = tree_floor_log2 (vr0->max ());
+		  if (newmaxi == -1)
+		    {
+		      if (mini == -1)
+			maxi = -1;
+		      else if (maxi == prec)
+			mini = prec;
+		    }
+		  else if (maxi != prec)
+		    maxi = newmaxi;
 		}
 	    }
 	  if (mini == -2)
--- gcc/gimple-range.cc.jj	2020-10-08 11:55:25.498313173 +0200
+++ gcc/gimple-range.cc	2020-10-08 15:36:14.926945183 +0200
@@ -636,28 +636,37 @@ gimple_ranger::range_of_builtin_call (ir
       // __builtin_c[lt]z* return [0, prec-1], except when the
       // argument is 0, but that is undefined behavior.
       //
-      // On many targets where the CLZ RTL or optab value is defined
-      // for 0, the value is prec, so include that in the range by
-      // default.
+      // For __builtin_c[lt]z* consider argument of 0 always undefined
+      // behavior, for internal fns depending on C?Z_DEFINED_VALUE_AT_ZERO.
       arg = gimple_call_arg (call, 0);
       prec = TYPE_PRECISION (TREE_TYPE (arg));
       mini = 0;
-      maxi = prec;
+      maxi = prec - 1;
       mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-      if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
-	  && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov)
-	  // Only handle the single common value.
-	  && zerov != prec)
-	// Magic value to give up, unless we can prove arg is non-zero.
-	mini = -2;
+      if (gimple_call_internal_p (call))
+	{
+	  if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
+	      && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+	    {
+	      // Only handle the single common value.
+	      if (zerov == prec)
+		maxi = prec;
+	      else
+		// Magic value to give up, unless we can prove arg is non-zero.
+		mini = -2;
+	    }
+	}
 
       gcc_assert (range_of_expr (r, arg, call));
       // From clz of minimum we can compute result maximum.
       if (r.constant_p ())
 	{
-	  maxi = prec - 1 - wi::floor_log2 (r.lower_bound ());
-	  if (maxi != prec)
-	    mini = 0;
+	  int newmaxi = prec - 1 - wi::floor_log2 (r.lower_bound ());
+	  if (newmaxi != prec)
+	    {
+	      mini = 0;
+	      maxi = newmaxi;
+	    }
 	}
       else if (!range_includes_zero_p (&r))
 	{
@@ -669,9 +678,14 @@ gimple_ranger::range_of_builtin_call (ir
       // From clz of maximum we can compute result minimum.
       if (r.constant_p ())
 	{
-	  mini = prec - 1 - wi::floor_log2 (r.upper_bound ());
-	  if (mini == prec)
-	    break;
+	  int newmini = prec - 1 - wi::floor_log2 (r.upper_bound ());
+	  if (newmini == prec)
+	    {
+	      if (maxi == prec)
+		mini = prec;
+	    }
+	  else
+	    mini = newmini;
 	}
       if (mini == -2)
 	break;
@@ -682,25 +696,27 @@ gimple_ranger::range_of_builtin_call (ir
       // __builtin_ctz* return [0, prec-1], except for when the
       // argument is 0, but that is undefined behavior.
       //
-      // If there is a ctz optab for this mode and
-      // CTZ_DEFINED_VALUE_AT_ZERO, include that in the range,
-      // otherwise just assume 0 won't be seen.
+      // For __builtin_ctz* consider argument of 0 always undefined
+      // behavior, for internal fns depending on CTZ_DEFINED_VALUE_AT_ZERO.
       arg = gimple_call_arg (call, 0);
       prec = TYPE_PRECISION (TREE_TYPE (arg));
       mini = 0;
       maxi = prec - 1;
       mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-      if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
-	  && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov))
+      if (gimple_call_internal_p (call))
 	{
-	  // Handle only the two common values.
-	  if (zerov == -1)
-	    mini = -1;
-	  else if (zerov == prec)
-	    maxi = prec;
-	  else
-	    // Magic value to give up, unless we can prove arg is non-zero.
-	    mini = -2;
+	  if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
+	      && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+	    {
+	      // Handle only the two common values.
+	      if (zerov == -1)
+		mini = -1;
+	      else if (zerov == prec)
+		maxi = prec;
+	      else
+		// Magic value to give up, unless we can prove arg is non-zero.
+		mini = -2;
+	    }
 	}
       gcc_assert (range_of_expr (r, arg, call));
       if (!r.undefined_p ())
@@ -714,8 +730,14 @@ gimple_ranger::range_of_builtin_call (ir
 	  // the maximum.
 	  wide_int max = r.upper_bound ();
 	  if (max == 0)
-	    break;
-	  maxi = wi::floor_log2 (max);
+	    {
+	      if (mini == -1)
+		maxi = -1;
+	      else if (maxi == prec)
+		mini = prec;
+	    }
+	  else if (maxi != prec)
+	    maxi = wi::floor_log2 (max);
 	}
       if (mini == -2)
 	break;
--- gcc/testsuite/gcc.dg/tree-ssa/pr94801.c.jj	2020-10-08 15:35:01.837005736 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr94801.c	2020-10-08 15:34:49.252188342 +0200
@@ -0,0 +1,16 @@
+/* PR tree-optimization/94801 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 0;" 2 "optimized" } } */
+
+int
+foo (int a)
+{
+  return __builtin_clz (a) >> 5;
+}
+
+int
+bar (int a)
+{
+  return __builtin_ctz (a) >> 5;
+}


	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801]
  2020-10-08 13:54   ` [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801] Jakub Jelinek
@ 2020-10-08 14:28     ` Aldy Hernandez
  2020-10-08 14:39       ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Aldy Hernandez @ 2020-10-08 14:28 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener; +Cc: gcc-patches



On 10/8/20 3:54 PM, Jakub Jelinek wrote:
> On Thu, Oct 08, 2020 at 12:22:11PM +0200, Jakub Jelinek via Gcc-patches wrote:
>> Perhaps another way out of this would be document and enforce that
>> __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
>> calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2

Huh, that magic 2 is not obvious.  I guess we should document the values 
of this macro in the source somewhere:

defaults.h:
/* Indicate that CLZ and CTZ are undefined at zero.  */
#ifndef CLZ_DEFINED_VALUE_AT_ZERO
#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE)  0
#endif
#ifndef CTZ_DEFINED_VALUE_AT_ZERO
#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE)  0
#endif

> 
> The following patch implements that, i.e. __builtin_c?z* now take full
> advantage of them being UB at zero, while the ifns are well defined at zero
> if *_DEFINED_VALUE_AT_ZERO (*) == 2.  That is what fixes PR94801.
> 
> Furthermore, to fix PR97312, if it is well defined at zero and the value at
> zero is prec, we don't lower the maximum unless the argument is known to be
> non-zero.

Heh.  I was just fixing the gimple-range.cc version.  Thanks.

BTW.  There's no reason why the vr-values can't just call the 
gimple_ranger::range_of_builtin_call.  In the original implementation we 
had vr-values call the ranger version and trap if they differed.  I'm 
pretty sure you can have vr-values call range_of_builtin_call with a 
value_range, and things will get squashed down appropriately.  We should 
really only have one version of this.  I'm not suggesting you do it, but 
I wouldn't object to it ;-).

> --- gcc/gimple-range.cc.jj	2020-10-08 11:55:25.498313173 +0200
> +++ gcc/gimple-range.cc	2020-10-08 15:36:14.926945183 +0200

> @@ -714,8 +730,14 @@ gimple_ranger::range_of_builtin_call (ir
>   	  // the maximum.
>   	  wide_int max = r.upper_bound ();
>   	  if (max == 0)
> -	    break;
> -	  maxi = wi::floor_log2 (max);
> +	    {
> +	      if (mini == -1)
> +		maxi = -1;
> +	      else if (maxi == prec)
> +		mini = prec;
> +	    }
> +	  else if (maxi != prec)
> +	    maxi = wi::floor_log2 (max);

Hmmm, if max == 0, that means the range is [0, 0], because if the 
highest bound of r is 0, there's nothing left on the bottom but another 
0 since R is unsigned.  Is that what you meant?

I think there was a bug in translation here:  It looks like the original 
code did:

		  maxi = tree_floor_log2 (vr0->max ());
		  /* For vr0 [0, 0] give up.  */
		  if (maxi == -1)
		    break;

so perhaps the above (prior to your change) should have been:

		wide_int max = r.upper_bound ();
		maxi = wi::floor_log2 (max);
		// For 0 give up.
		if (maxi == -1)
		  break;

You may want to adjust.

Again, thanks for working on this.
Aldy	


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801]
  2020-10-08 14:28     ` Aldy Hernandez
@ 2020-10-08 14:39       ` Jakub Jelinek
  2020-10-08 14:55         ` Aldy Hernandez
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2020-10-08 14:39 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Richard Biener, gcc-patches

On Thu, Oct 08, 2020 at 04:28:37PM +0200, Aldy Hernandez wrote:
> On 10/8/20 3:54 PM, Jakub Jelinek wrote:
> > On Thu, Oct 08, 2020 at 12:22:11PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > > Perhaps another way out of this would be document and enforce that
> > > __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
> > > calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2
> 
> Huh, that magic 2 is not obvious.  I guess we should document the values of
> this macro in the source somewhere:

The 2 is documented in gccint documentation.

> BTW.  There's no reason why the vr-values can't just call the
> gimple_ranger::range_of_builtin_call.  In the original implementation we had
> vr-values call the ranger version and trap if they differed.  I'm pretty
> sure you can have vr-values call range_of_builtin_call with a value_range,
> and things will get squashed down appropriately.  We should really only have
> one version of this.  I'm not suggesting you do it, but I wouldn't object to
> it ;-).

Will defer that to you or Andrew ;).

> > --- gcc/gimple-range.cc.jj	2020-10-08 11:55:25.498313173 +0200
> > +++ gcc/gimple-range.cc	2020-10-08 15:36:14.926945183 +0200
> 
> > @@ -714,8 +730,14 @@ gimple_ranger::range_of_builtin_call (ir
> >   	  // the maximum.
> >   	  wide_int max = r.upper_bound ();
> >   	  if (max == 0)
> > -	    break;
> > -	  maxi = wi::floor_log2 (max);
> > +	    {
> > +	      if (mini == -1)
> > +		maxi = -1;
> > +	      else if (maxi == prec)
> > +		mini = prec;
> > +	    }
> > +	  else if (maxi != prec)
> > +	    maxi = wi::floor_log2 (max);
> 
> Hmmm, if max == 0, that means the range is [0, 0], because if the highest
> bound of r is 0, there's nothing left on the bottom but another 0 since R is
> unsigned.  Is that what you meant?

Yes, for max == 0 aka [0, 0] I wanted:
1) if mini == -1, i.e. the DEFINED_VALUE_AT_ZERO == 2 VALUE is -1, return [-1, -1]
2) if maxi == prec, i.e. DEFINED_VALUE_AT_ZERO == 2 VALUE is prec, return [prec, prec]
3) otherwise it is an UB case, ignore the argument range, so either [0, prec-1] or
   VARYING (the latter for the mini == -2 case)
The 1) and 2) cases would be well defined, and for 3) I'm worried that e.g.
during VRP iteration if at one point we see range of argument say [0, 1] and
determine for that [0, prec-1] range, then in another iteration the argument
range is narrowed to just [0, 0] and all of sudden the result would become
VARYING, I'd be afraid that would be against the rules.  Perhaps the right
thing is to set range to UNDEFINED in the 3) case.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801]
  2020-10-08 14:39       ` Jakub Jelinek
@ 2020-10-08 14:55         ` Aldy Hernandez
  2020-10-08 15:08           ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Aldy Hernandez @ 2020-10-08 14:55 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, gcc-patches



On 10/8/20 4:39 PM, Jakub Jelinek wrote:
> On Thu, Oct 08, 2020 at 04:28:37PM +0200, Aldy Hernandez wrote:
>> On 10/8/20 3:54 PM, Jakub Jelinek wrote:
>>> On Thu, Oct 08, 2020 at 12:22:11PM +0200, Jakub Jelinek via Gcc-patches wrote:
>>>> Perhaps another way out of this would be document and enforce that
>>>> __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
>>>> calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2
>>
>> Huh, that magic 2 is not obvious.  I guess we should document the values of
>> this macro in the source somewhere:
> 
> The 2 is documented in gccint documentation.
> 
>> BTW.  There's no reason why the vr-values can't just call the
>> gimple_ranger::range_of_builtin_call.  In the original implementation we had
>> vr-values call the ranger version and trap if they differed.  I'm pretty
>> sure you can have vr-values call range_of_builtin_call with a value_range,
>> and things will get squashed down appropriately.  We should really only have
>> one version of this.  I'm not suggesting you do it, but I wouldn't object to
>> it ;-).
> 
> Will defer that to you or Andrew ;).

I can do it once the dust settles.

> 
>>> --- gcc/gimple-range.cc.jj	2020-10-08 11:55:25.498313173 +0200
>>> +++ gcc/gimple-range.cc	2020-10-08 15:36:14.926945183 +0200
>>
>>> @@ -714,8 +730,14 @@ gimple_ranger::range_of_builtin_call (ir
>>>    	  // the maximum.
>>>    	  wide_int max = r.upper_bound ();
>>>    	  if (max == 0)
>>> -	    break;
>>> -	  maxi = wi::floor_log2 (max);
>>> +	    {
>>> +	      if (mini == -1)
>>> +		maxi = -1;
>>> +	      else if (maxi == prec)
>>> +		mini = prec;
>>> +	    }
>>> +	  else if (maxi != prec)
>>> +	    maxi = wi::floor_log2 (max);
>>
>> Hmmm, if max == 0, that means the range is [0, 0], because if the highest
>> bound of r is 0, there's nothing left on the bottom but another 0 since R is
>> unsigned.  Is that what you meant?
> 
> Yes, for max == 0 aka [0, 0] I wanted:
> 1) if mini == -1, i.e. the DEFINED_VALUE_AT_ZERO == 2 VALUE is -1, return [-1, -1]
> 2) if maxi == prec, i.e. DEFINED_VALUE_AT_ZERO == 2 VALUE is prec, return [prec, prec]

Ah, I see.  Do you mind commenting that?  Or perhaps you could spell it 
out obviously like:

if (max == 0) {
	...
	if (DEFINED_VALUE_AT_ZERO)
		// do special things
	...
}

But whatever is fine.  I hope to never look at these bits ever again :).

Aldy


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801]
  2020-10-08 14:55         ` Aldy Hernandez
@ 2020-10-08 15:08           ` Jakub Jelinek
  2020-10-08 15:09             ` Aldy Hernandez
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2020-10-08 15:08 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches, Richard Biener

On Thu, Oct 08, 2020 at 04:55:07PM +0200, Aldy Hernandez via Gcc-patches wrote:
> > Yes, for max == 0 aka [0, 0] I wanted:
> > 1) if mini == -1, i.e. the DEFINED_VALUE_AT_ZERO == 2 VALUE is -1, return [-1, -1]
> > 2) if maxi == prec, i.e. DEFINED_VALUE_AT_ZERO == 2 VALUE is prec, return [prec, prec]
> 
> Ah, I see.  Do you mind commenting that?  Or perhaps you could spell it out
> obviously like:
> 
> if (max == 0) {
> 	...
> 	if (DEFINED_VALUE_AT_ZERO)
> 		// do special things
> 	...
> }
> 
> But whatever is fine.  I hope to never look at these bits ever again :).

Added several comments now (but just in gimple-range.cc, I assume
vr-values.c code is what you want to kill eventually).

2020-10-08  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/94801
	PR target/97312
	* vr-values.c (vr_values::extract_range_basic) <CASE_CFN_CLZ,
	CASE_CFN_CTZ>: When stmt is not an internal-fn call or
	C?Z_DEFINED_VALUE_AT_ZERO is not 2, assume argument is not zero
	and thus use [0, prec-1] range unless it can be further improved.
	For CTZ, don't update maxi from upper bound if it was previously prec.
	* gimple-range.cc (gimple_ranger::range_of_builtin_call) <CASE_CFN_CLZ,
	CASE_CFN_CTZ>: Likewise.

	* gcc.dg/tree-ssa/pr94801.c: New test.

--- gcc/vr-values.c.jj	2020-10-07 10:47:47.065983121 +0200
+++ gcc/vr-values.c	2020-10-08 15:23:56.042631592 +0200
@@ -1208,34 +1208,42 @@ vr_values::extract_range_basic (value_ra
 	  mini = 0;
 	  maxi = 1;
 	  goto bitop_builtin;
-	  /* __builtin_c[lt]z* return [0, prec-1], except for
+	  /* __builtin_clz* return [0, prec-1], except for
 	     when the argument is 0, but that is undefined behavior.
-	     On many targets where the CLZ RTL or optab value is defined
-	     for 0 the value is prec, so include that in the range
-	     by default.  */
+	     Always handle __builtin_clz* which can be only written
+	     by user as UB on 0 and so [0, prec-1] range, and the internal-fn
+	     calls depending on how CLZ_DEFINED_VALUE_AT_ZERO is defined.  */
 	CASE_CFN_CLZ:
 	  arg = gimple_call_arg (stmt, 0);
 	  prec = TYPE_PRECISION (TREE_TYPE (arg));
 	  mini = 0;
-	  maxi = prec;
+	  maxi = prec - 1;
 	  mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-	  if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
-	      && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov)
-	      /* Handle only the single common value.  */
-	      && zerov != prec)
-	    /* Magic value to give up, unless vr0 proves
-	       arg is non-zero.  */
-	    mini = -2;
+	  if (gimple_call_internal_p (stmt))
+	    {
+	      if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
+		  && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+		{
+		  /* Handle only the single common value.  */
+		  if (zerov == prec)
+		    maxi = prec;
+		  /* Magic value to give up, unless vr0 proves
+		     arg is non-zero.  */
+		  else
+		    mini = -2;
+		}
+	    }
 	  if (TREE_CODE (arg) == SSA_NAME)
 	    {
 	      const value_range_equiv *vr0 = get_value_range (arg);
 	      /* From clz of VR_RANGE minimum we can compute
 		 result maximum.  */
 	      if (vr0->kind () == VR_RANGE
-		  && TREE_CODE (vr0->min ()) == INTEGER_CST)
+		  && TREE_CODE (vr0->min ()) == INTEGER_CST
+		  && integer_nonzerop (vr0->min ()))
 		{
 		  maxi = prec - 1 - tree_floor_log2 (vr0->min ());
-		  if (maxi != prec)
+		  if (mini == -2)
 		    mini = 0;
 		}
 	      else if (vr0->kind () == VR_ANTI_RANGE
@@ -1251,9 +1259,14 @@ vr_values::extract_range_basic (value_ra
 	      if (vr0->kind () == VR_RANGE
 		  && TREE_CODE (vr0->max ()) == INTEGER_CST)
 		{
-		  mini = prec - 1 - tree_floor_log2 (vr0->max ());
-		  if (mini == prec)
-		    break;
+		  int newmini = prec - 1 - tree_floor_log2 (vr0->max ());
+		  if (newmini == prec)
+		    {
+		      if (maxi == prec)
+			mini = prec;
+		    }
+		  else
+		    mini = newmini;
 		}
 	    }
 	  if (mini == -2)
@@ -1261,27 +1274,30 @@ vr_values::extract_range_basic (value_ra
 	  goto bitop_builtin;
 	  /* __builtin_ctz* return [0, prec-1], except for
 	     when the argument is 0, but that is undefined behavior.
-	     If there is a ctz optab for this mode and
-	     CTZ_DEFINED_VALUE_AT_ZERO, include that in the range,
-	     otherwise just assume 0 won't be seen.  */
+	     Always handle __builtin_ctz* which can be only written
+	     by user as UB on 0 and so [0, prec-1] range, and the internal-fn
+	     calls depending on how CTZ_DEFINED_VALUE_AT_ZERO is defined.  */
 	CASE_CFN_CTZ:
 	  arg = gimple_call_arg (stmt, 0);
 	  prec = TYPE_PRECISION (TREE_TYPE (arg));
 	  mini = 0;
 	  maxi = prec - 1;
 	  mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-	  if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
-	      && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov))
+	  if (gimple_call_internal_p (stmt))
 	    {
-	      /* Handle only the two common values.  */
-	      if (zerov == -1)
-		mini = -1;
-	      else if (zerov == prec)
-		maxi = prec;
-	      else
-		/* Magic value to give up, unless vr0 proves
-		   arg is non-zero.  */
-		mini = -2;
+	      if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
+		  && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+		{
+		  /* Handle only the two common values.  */
+		  if (zerov == -1)
+		    mini = -1;
+		  else if (zerov == prec)
+		    maxi = prec;
+		  else
+		    /* Magic value to give up, unless vr0 proves
+		       arg is non-zero.  */
+		    mini = -2;
+		}
 	    }
 	  if (TREE_CODE (arg) == SSA_NAME)
 	    {
@@ -1300,10 +1316,16 @@ vr_values::extract_range_basic (value_ra
 	      if (vr0->kind () == VR_RANGE
 		  && TREE_CODE (vr0->max ()) == INTEGER_CST)
 		{
-		  maxi = tree_floor_log2 (vr0->max ());
-		  /* For vr0 [0, 0] give up.  */
-		  if (maxi == -1)
-		    break;
+		  int newmaxi = tree_floor_log2 (vr0->max ());
+		  if (newmaxi == -1)
+		    {
+		      if (mini == -1)
+			maxi = -1;
+		      else if (maxi == prec)
+			mini = prec;
+		    }
+		  else if (maxi != prec)
+		    maxi = newmaxi;
 		}
 	    }
 	  if (mini == -2)
--- gcc/gimple-range.cc.jj	2020-10-08 11:55:25.498313173 +0200
+++ gcc/gimple-range.cc	2020-10-08 15:36:14.926945183 +0200
@@ -636,28 +636,38 @@ gimple_ranger::range_of_builtin_call (ir
       // __builtin_c[lt]z* return [0, prec-1], except when the
       // argument is 0, but that is undefined behavior.
       //
-      // On many targets where the CLZ RTL or optab value is defined
-      // for 0, the value is prec, so include that in the range by
-      // default.
+      // For __builtin_c[lt]z* consider argument of 0 always undefined
+      // behavior, for internal fns depending on C?Z_DEFINED_VALUE_AT_ZERO.
       arg = gimple_call_arg (call, 0);
       prec = TYPE_PRECISION (TREE_TYPE (arg));
       mini = 0;
-      maxi = prec;
+      maxi = prec - 1;
       mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-      if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
-	  && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov)
-	  // Only handle the single common value.
-	  && zerov != prec)
-	// Magic value to give up, unless we can prove arg is non-zero.
-	mini = -2;
+      if (gimple_call_internal_p (call))
+	{
+	  if (optab_handler (clz_optab, mode) != CODE_FOR_nothing
+	      && CLZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+	    {
+	      // Only handle the single common value.
+	      if (zerov == prec)
+		maxi = prec;
+	      else
+		// Magic value to give up, unless we can prove arg is non-zero.
+		mini = -2;
+	    }
+	}
 
       gcc_assert (range_of_expr (r, arg, call));
       // From clz of minimum we can compute result maximum.
       if (r.constant_p ())
 	{
-	  maxi = prec - 1 - wi::floor_log2 (r.lower_bound ());
-	  if (maxi != prec)
-	    mini = 0;
+	  int newmaxi = prec - 1 - wi::floor_log2 (r.lower_bound ());
+	  // Argument is unsigned, so do nothing if it is [0, ...] range.
+	  if (newmaxi != prec)
+	    {
+	      mini = 0;
+	      maxi = newmaxi;
+	    }
 	}
       else if (!range_includes_zero_p (&r))
 	{
@@ -669,9 +679,17 @@ gimple_ranger::range_of_builtin_call (ir
       // From clz of maximum we can compute result minimum.
       if (r.constant_p ())
 	{
-	  mini = prec - 1 - wi::floor_log2 (r.upper_bound ());
-	  if (mini == prec)
-	    break;
+	  int newmini = prec - 1 - wi::floor_log2 (r.upper_bound ());
+	  if (newmini == prec)
+	    {
+	      // Argument range is [0, 0].  If CLZ_DEFINED_VALUE_AT_ZERO
+	      // is 2 with VALUE of prec, return [prec, prec], otherwise
+	      // ignore the range.
+	      if (maxi == prec)
+		mini = prec;
+	    }
+	  else
+	    mini = newmini;
 	}
       if (mini == -2)
 	break;
@@ -682,25 +700,27 @@ gimple_ranger::range_of_builtin_call (ir
       // __builtin_ctz* return [0, prec-1], except for when the
       // argument is 0, but that is undefined behavior.
       //
-      // If there is a ctz optab for this mode and
-      // CTZ_DEFINED_VALUE_AT_ZERO, include that in the range,
-      // otherwise just assume 0 won't be seen.
+      // For __builtin_ctz* consider argument of 0 always undefined
+      // behavior, for internal fns depending on CTZ_DEFINED_VALUE_AT_ZERO.
       arg = gimple_call_arg (call, 0);
       prec = TYPE_PRECISION (TREE_TYPE (arg));
       mini = 0;
       maxi = prec - 1;
       mode = SCALAR_INT_TYPE_MODE (TREE_TYPE (arg));
-      if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
-	  && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov))
+      if (gimple_call_internal_p (call))
 	{
-	  // Handle only the two common values.
-	  if (zerov == -1)
-	    mini = -1;
-	  else if (zerov == prec)
-	    maxi = prec;
-	  else
-	    // Magic value to give up, unless we can prove arg is non-zero.
-	    mini = -2;
+	  if (optab_handler (ctz_optab, mode) != CODE_FOR_nothing
+	      && CTZ_DEFINED_VALUE_AT_ZERO (mode, zerov) == 2)
+	    {
+	      // Handle only the two common values.
+	      if (zerov == -1)
+		mini = -1;
+	      else if (zerov == prec)
+		maxi = prec;
+	      else
+		// Magic value to give up, unless we can prove arg is non-zero.
+		mini = -2;
+	    }
 	}
       gcc_assert (range_of_expr (r, arg, call));
       if (!r.undefined_p ())
@@ -714,8 +734,20 @@ gimple_ranger::range_of_builtin_call (ir
 	  // the maximum.
 	  wide_int max = r.upper_bound ();
 	  if (max == 0)
-	    break;
-	  maxi = wi::floor_log2 (max);
+	    {
+	      // Argument is [0, 0].  If CTZ_DEFINED_VALUE_AT_ZERO
+	      // is 2 with value -1 or prec, return [-1, -1] or [prec, prec].
+	      // Otherwise ignore the range.
+	      if (mini == -1)
+		maxi = -1;
+	      else if (maxi == prec)
+		mini = prec;
+	    }
+	  // If value at zero is prec and 0 is in the range, we can't lower
+	  // the upper bound.  We could create two separate ranges though,
+	  // [0,floor_log2(max)][prec,prec] though.
+	  else if (maxi != prec)
+	    maxi = wi::floor_log2 (max);
 	}
       if (mini == -2)
 	break;
--- gcc/testsuite/gcc.dg/tree-ssa/pr94801.c.jj	2020-10-08 15:35:01.837005736 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr94801.c	2020-10-08 15:34:49.252188342 +0200
@@ -0,0 +1,16 @@
+/* PR tree-optimization/94801 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 0;" 2 "optimized" } } */
+
+int
+foo (int a)
+{
+  return __builtin_clz (a) >> 5;
+}
+
+int
+bar (int a)
+{
+  return __builtin_ctz (a) >> 5;
+}


	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801]
  2020-10-08 15:08           ` Jakub Jelinek
@ 2020-10-08 15:09             ` Aldy Hernandez
  0 siblings, 0 replies; 18+ messages in thread
From: Aldy Hernandez @ 2020-10-08 15:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Richard Biener



On 10/8/20 5:08 PM, Jakub Jelinek wrote:
> On Thu, Oct 08, 2020 at 04:55:07PM +0200, Aldy Hernandez via Gcc-patches wrote:
>>> Yes, for max == 0 aka [0, 0] I wanted:
>>> 1) if mini == -1, i.e. the DEFINED_VALUE_AT_ZERO == 2 VALUE is -1, return [-1, -1]
>>> 2) if maxi == prec, i.e. DEFINED_VALUE_AT_ZERO == 2 VALUE is prec, return [prec, prec]
>>
>> Ah, I see.  Do you mind commenting that?  Or perhaps you could spell it out
>> obviously like:
>>
>> if (max == 0) {
>> 	...
>> 	if (DEFINED_VALUE_AT_ZERO)
>> 		// do special things
>> 	...
>> }
>>
>> But whatever is fine.  I hope to never look at these bits ever again :).
> 
> Added several comments now (but just in gimple-range.cc, I assume
> vr-values.c code is what you want to kill eventually).

Thanks, and yes... vr-values is on life support :).

Aldy


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08 11:40         ` Jakub Jelinek
@ 2020-10-08 12:57           ` Wilco Dijkstra
  0 siblings, 0 replies; 18+ messages in thread
From: Wilco Dijkstra @ 2020-10-08 12:57 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Hi Jakub,

> On Thu, Oct 08, 2020 at 11:37:24AM +0000, Wilco Dijkstra via Gcc-patches wrote:
>> Which optimizations does it enable that aren't possible if the value is defined?
>
> See bugzilla.  Note other compilers heavily optimize on those builtins
> undefined at value zero.

You mean the PR94801, PR94793, PR95863 you mentioned before? The first doesn't
seem to be a useful optimization (would anyone ever write that?), the other 2 would
benefit from clz(0) being well defined. In particular, x86 without BMI would greatly
benefit from setting CTZ_DEFINED_VALUE_AT_ZERO to 2.

So I fail to see any "heavy" optimizations here that show a benefit of keeping the value
undefined at zero.

>> > We just should make sure that we optimize code like x ? __builtin_c[lt]z (x) : 32;
>> > etc. properly (and I believe we do).
>> 
>> I think we do, but both the external and internal documentation are not clear
>> enough that most targets actually do define a value and will optimize for it.
>> Otherwise we wouldn't have this bug now...
>
> The documentation is very clear that the builtins are undefined at zero,
> that is all that matters for users.

If we don't change the undefinedness, at least we should try to explain the above
idiom as a way to get a well-defined range that still results in a single instruction
on most targets.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08 11:37       ` Wilco Dijkstra
@ 2020-10-08 11:40         ` Jakub Jelinek
  2020-10-08 12:57           ` Wilco Dijkstra
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2020-10-08 11:40 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches

On Thu, Oct 08, 2020 at 11:37:24AM +0000, Wilco Dijkstra via Gcc-patches wrote:
> Which optimizations does it enable that aren't possible if the value is defined?

See bugzilla.  Note other compilers heavily optimize on those builtins
undefined at value zero.

> > We just should make sure that we optimize code like x ? __builtin_c[lt]z (x) : 32;
> > etc. properly (and I believe we do).
> 
> I think we do, but both the external and internal documentation are not clear
> enough that most targets actually do define a value and will optimize for it.
> Otherwise we wouldn't have this bug now...

The documentation is very clear that the builtins are undefined at zero,
that is all that matters for users.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08 11:25     ` Jakub Jelinek
@ 2020-10-08 11:37       ` Wilco Dijkstra
  2020-10-08 11:40         ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2020-10-08 11:37 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, aldyh

Hi Jakub, 

> Having it undefined allows optimizations, and has been that way for years.

Which optimizations does it enable that aren't possible if the value is defined?

> We just should make sure that we optimize code like x ? __builtin_c[lt]z (x) : 32;
> etc. properly (and I believe we do).

I think we do, but both the external and internal documentation are not clear
enough that most targets actually do define a value and will optimize for it.
Otherwise we wouldn't have this bug now...

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08 11:09 ` Jakub Jelinek
  2020-10-08 11:22   ` Wilco Dijkstra
@ 2020-10-08 11:30   ` Wilco Dijkstra
  1 sibling, 0 replies; 18+ messages in thread
From: Wilco Dijkstra @ 2020-10-08 11:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, aldyh


Btw for PowerPC is 0..32:

https://www.ibm.com/support/knowledgecenter/ssw_aix_72/assembler/idalangref_cntlzw_instrs.html

Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08 11:22   ` Wilco Dijkstra
@ 2020-10-08 11:25     ` Jakub Jelinek
  2020-10-08 11:37       ` Wilco Dijkstra
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2020-10-08 11:25 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, aldyh

On Thu, Oct 08, 2020 at 11:22:34AM +0000, Wilco Dijkstra wrote:
> >> I think a better way forward would be to make the builtin_clz/ctz more defined.
> >> Having undefined values is a source of unnecessary bugs given practically all
> >> modern targets return the number of bits for the zero input - it is relatively
> >> easy to ensure this on the few targets that don't.
> >
> > Well, e.g. i?86/x86_64 in most commonly used CPU flags is really undefined
> > (the register is unchanged).  And -1 is also quite commonly used value,
> > e.g. powerpc, gcn, xtensa.
> 
> So wouldn't it be easy to initialize the register before you do the bsr to get
> the same result as with BMI? I don't think an extra mov can affect performance
> in actual code (and GCC could still optimize the zero case if the input range
> doesn't include zero).
> 
> -1 is more complex, if these targets don't want to add extra instructions to fix
> it up, we could define the zero result either -1 or #bits depending on the target
> (still better than completely undefined).

Having it undefined allows optimizations, and has been that way for years.
We just should make sure that we optimize code like x ? __builtin_c[lt]z (x) : 32;
etc. properly (and I believe we do).

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08 11:09 ` Jakub Jelinek
@ 2020-10-08 11:22   ` Wilco Dijkstra
  2020-10-08 11:25     ` Jakub Jelinek
  2020-10-08 11:30   ` Wilco Dijkstra
  1 sibling, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2020-10-08 11:22 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, aldyh

Hi Jakub,

>> I think a better way forward would be to make the builtin_clz/ctz more defined.
>> Having undefined values is a source of unnecessary bugs given practically all
>> modern targets return the number of bits for the zero input - it is relatively
>> easy to ensure this on the few targets that don't.
>
> Well, e.g. i?86/x86_64 in most commonly used CPU flags is really undefined
> (the register is unchanged).  And -1 is also quite commonly used value,
> e.g. powerpc, gcn, xtensa.

So wouldn't it be easy to initialize the register before you do the bsr to get
the same result as with BMI? I don't think an extra mov can affect performance
in actual code (and GCC could still optimize the zero case if the input range
doesn't include zero).

-1 is more complex, if these targets don't want to add extra instructions to fix
it up, we could define the zero result either -1 or #bits depending on the target
(still better than completely undefined).

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
  2020-10-08 11:04 Wilco Dijkstra
@ 2020-10-08 11:09 ` Jakub Jelinek
  2020-10-08 11:22   ` Wilco Dijkstra
  2020-10-08 11:30   ` Wilco Dijkstra
  0 siblings, 2 replies; 18+ messages in thread
From: Jakub Jelinek @ 2020-10-08 11:09 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, aldyh

On Thu, Oct 08, 2020 at 11:04:01AM +0000, Wilco Dijkstra wrote:
> > Perhaps another way out of this would be document and enforce that
> > __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
> > calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2, and then
> > we would need to make sure that e.g. in simplify_count_trailing_zeroes
> > we emit a .CTZ or __builtin_ctz call depending on whether it is undefined
> > there or not; or give .C[TL]Z an additional argument (0/1) which would tell
> > if it is defined at zero or not.
> 
> I think a better way forward would be to make the builtin_clz/ctz more defined.
> Having undefined values is a source of unnecessary bugs given practically all
> modern targets return the number of bits for the zero input - it is relatively
> easy to ensure this on the few targets that don't.

Well, e.g. i?86/x86_64 in most commonly used CPU flags is really undefined
(the register is unchanged).  And -1 is also quite commonly used value,
e.g. powerpc, gcn, xtensa.

	Jakub


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
@ 2020-10-08 11:04 Wilco Dijkstra
  2020-10-08 11:09 ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Wilco Dijkstra @ 2020-10-08 11:04 UTC (permalink / raw)
  To: jakub; +Cc: GCC Patches, aldyh

Hi Jakub,

> Perhaps another way out of this would be document and enforce that
> __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
> calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2, and then
> we would need to make sure that e.g. in simplify_count_trailing_zeroes
> we emit a .CTZ or __builtin_ctz call depending on whether it is undefined
> there or not; or give .C[TL]Z an additional argument (0/1) which would tell
> if it is defined at zero or not.

I think a better way forward would be to make the builtin_clz/ctz more defined.
Having undefined values is a source of unnecessary bugs given practically all
modern targets return the number of bits for the zero input - it is relatively
easy to ensure this on the few targets that don't.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c
@ 2020-10-08 10:33 Wilco Dijkstra
  0 siblings, 0 replies; 18+ messages in thread
From: Wilco Dijkstra @ 2020-10-08 10:33 UTC (permalink / raw)
  To: aldyh; +Cc: GCC Patches

Hi,

> I am quoting my analysis from the PR.  Could an aarch64 expert 
> pontificate here?
>
> This test is checking the final assembly for a specific sequence.  I 
> don't speak aarch64 assembly, but the IL is different coming out of evrp.

The code currently generated is incorrect - you really need to preserve the AND.

The issue seems that evrp doesn't seem to take CLZ_DEFINED_VALUE_AT_ZERO
macros into account. When it is 2, the value of CLZ/CTZ is defined even in the
mid-end and you need to include it in the range (so if the CLZ/CTZ value is the
number of bits you should use ranges 0..32 and 0..64).

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-10-08 15:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-08  9:58 [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c Aldy Hernandez
2020-10-08 10:22 ` Jakub Jelinek
2020-10-08 10:27   ` Jakub Jelinek
2020-10-08 13:54   ` [PATCH] vrp: Fix up gcc.target/aarch64/pr90838.c [PR97312, PR94801] Jakub Jelinek
2020-10-08 14:28     ` Aldy Hernandez
2020-10-08 14:39       ` Jakub Jelinek
2020-10-08 14:55         ` Aldy Hernandez
2020-10-08 15:08           ` Jakub Jelinek
2020-10-08 15:09             ` Aldy Hernandez
2020-10-08 10:33 [PATCH] PR target/97312: Tweak gcc.target/aarch64/pr90838.c Wilco Dijkstra
2020-10-08 11:04 Wilco Dijkstra
2020-10-08 11:09 ` Jakub Jelinek
2020-10-08 11:22   ` Wilco Dijkstra
2020-10-08 11:25     ` Jakub Jelinek
2020-10-08 11:37       ` Wilco Dijkstra
2020-10-08 11:40         ` Jakub Jelinek
2020-10-08 12:57           ` Wilco Dijkstra
2020-10-08 11:30   ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).