public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] [RFC] New early __builtin_unreachable processing.
@ 2023-09-15 14:44 Andrew MacLeod
  2023-09-18  6:53 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew MacLeod @ 2023-09-15 14:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: hernandez, aldy, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 3179 bytes --]

Ive been looking at __builtin_unreachable () regressions.  The 
fundamental problem seems to be  a lack of consistent expectation for 
when we remove it earlier than the final pass of VRP.    After looking 
through them, I think this provides a sensible approach.

Ranger is pretty good at providing ranges in blocks dominated by the 
__builtin_unreachable  branch, so removing it isn't quite a critical as 
it once was.  Its also pretty good at identifying what in the block can 
be affected by the branch.

This patch provide an alternate removal algorithm for earlier passes.  
it looks at *all* the exports from the block, and if the branch 
dominates every use of all the exports, AND none of those values access 
memory, VRP will remove the unreachable call, rewrite the branch, update 
all the values globally, and finally perform the simple DCE on the 
branch's ssa-name.   This is kind of what it did before, but it wasn't 
as stringent on the requirements.

The memory access check is required because there are a couple of test 
cases for PRE in which there is a series of instruction leading to an 
unreachable call, and none of those ssa names are ever used in the IL 
again. The whole chunk is dead, and we update globals, however 
pointlessly.  However, one of ssa_names loads from memory, and a later 
passes commons this value with a later load, and then  the unreachable 
call provides additional information about the later load.    This is 
evident in tree-ssa/ssa-pre-34.c.   The only way I see to avoid this 
situation is to not remove the unreachable if there is a load feeding it.

What this does is a more sophisticated version of what DOM does in 
all_uses_feed_or_dominated_by_stmt.  THe feeding instructions dont have 
to be single use, but they do have to be dominated by the branch or be 
single use within the branches block..

If there are multiple uses in the same block as the branch, this does 
not remove the unreachable call.  If we could be sure there are no 
intervening calls or side effects, it would be allowable, but this a 
more expensive checking operation.  Ranger gets the ranges right anyway, 
so with more passes using ranger, Im not sure we'd see much benefit from 
the additional analysis.   It could always be added later.

This fixes at least 110249 and 110080 (and probably others).  The only 
regression is 93917 for which I changed the testcase to adjust 
expectations:

// PR 93917
void f1(int n)
{
   if(n<0)
     __builtin_unreachable();
   f3(n);
}

void f2(int*n)
{
   if(*n<0)
     __builtin_unreachable();
   f3 (*n);
}

We were removing both unreachable calls in VRP1, but only updating the 
global values in the first case, meaning we lose information.   With the 
change in semantics, we only update the global in the first case, but we 
leave the unreachable call in the second case now (due to the load from 
memory).  Ranger still calculates the contextual range correctly as [0, 
+INF] in the second case, it just doesn't set the global value until 
VRP2 when it is removed.

Does this seem reasonable?

Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK?

Andrew



[-- Attachment #2: 0003-New-early-__builtin_unreachable-processing.patch --]
[-- Type: text/x-patch, Size: 14587 bytes --]

From 87072ebfcd4f51276fc6ed1fb0557257d51ec446 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Wed, 13 Sep 2023 11:52:15 -0400
Subject: [PATCH 3/3] New early __builtin_unreachable processing.

in VRP passes before __builtin_unreachable MUST be removed, only remove it
if all exports affected by the unreachable can have global values updated, and
do not involve loads from memory.

	PR tree-optimization/110080
	PR tree-optimization/110249
	gcc/
	* tree-vrp.cc (remove_unreachable::final_p): New.
	(remove_unreachable::maybe_register): Rename from
	maybe_register_block and call early or final routine.
	(fully_replaceable): New.
	(remove_unreachable::handle_early): New.
	(remove_unreachable::remove_and_update_globals): Remove
	non-final processing.
	(rvrp_folder::rvrp_folder): Add final flag to constructor.
	(rvrp_folder::post_fold_bb): Remove unreachable registration.
	(rvrp_folder::pre_fold_stmt): Move unreachable processing to here.
	(execute_ranger_vrp): Adjust some call parameters.

	gcc/testsuite/
	* g++.dg/pr110249.C: New.
	* gcc.dg/pr110080.c: New.
	* gcc.dg/pr93917.c: Adjust.

Tweak vuse case

Adjusted testcase 93917
---
 gcc/testsuite/g++.dg/pr110249.C |  16 +++
 gcc/testsuite/gcc.dg/pr110080.c |  27 +++++
 gcc/testsuite/gcc.dg/pr93917.c  |   7 +-
 gcc/tree-vrp.cc                 | 203 ++++++++++++++++++++++++++------
 4 files changed, 214 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr110249.C
 create mode 100644 gcc/testsuite/gcc.dg/pr110080.c

diff --git a/gcc/testsuite/g++.dg/pr110249.C b/gcc/testsuite/g++.dg/pr110249.C
new file mode 100644
index 00000000000..2b737618bdb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr110249.C
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1-alias" } */
+
+#include <stdint.h>
+#include <string.h>
+
+uint64_t read64r(const uint64_t &x) {
+    if ((uint64_t) &x % 8 ) {
+        __builtin_unreachable();
+    }
+     uint64_t value;
+     memcpy( &value, &x, sizeof(uint64_t) );
+     return value;
+}
+
+/* { dg-final { scan-tree-dump "fff8" "vrp1" } } */
diff --git a/gcc/testsuite/gcc.dg/pr110080.c b/gcc/testsuite/gcc.dg/pr110080.c
new file mode 100644
index 00000000000..c10afe07b1e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110080.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void foo(void);
+static unsigned char a = 131;
+static int *b;
+static int **c = &b;
+static void d(int e, unsigned f) {
+    int *g;
+    if (f != 131) {
+        __builtin_unreachable();
+    }
+    if (!e){
+        for (; a; ++a)
+            for (e = 0; 0;)
+                ;
+        g = &e;
+        int **h = &g;
+        if (**h) {
+            foo();
+        }
+    }
+    *c = &e;
+}
+int main() { d(4 & a, a); }
+
+/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr93917.c b/gcc/testsuite/gcc.dg/pr93917.c
index 41d27fb9a8f..f09e1c41ae8 100644
--- a/gcc/testsuite/gcc.dg/pr93917.c
+++ b/gcc/testsuite/gcc.dg/pr93917.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-vrp2" } */
 
 void f3(int n);
 
@@ -17,4 +17,7 @@ void f2(int*n)
   f3 (*n);
 }
 
-/* { dg-final { scan-tree-dump-times "Global Exported" 2 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Global Export.*0, \\+INF" 1 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_unreachable" 1 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Global Export.*0, \\+INF" 1 "vrp2" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_unreachable" 0 "vrp2" } } */
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 2d9f6273280..d7b194f5904 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -56,11 +56,17 @@ along with GCC; see the file COPYING3.  If not see
 // This class is utilized by VRP and ranger to remove __builtin_unreachable
 // calls, and reflect any resulting global ranges.
 //
-// maybe_register_block () is called on basic blocks, and if that block
-// matches the pattern of one branch being a builtin_unreachable, register
-// the resulting executable edge in a list.
+// maybe_register() is called on condition statements , and if that
+// matches the pattern of one branch being a builtin_unreachable, either check
+// for early removal or register the resulting executable edge in a list.
 //
-// After all blocks have been processed, remove_and_update_globals() will
+// During early/non-final processing, we check to see if ALL exports from the
+// block can be safely updated with a new global value.  If they can, then
+// we rewrite the condition and update those values immediately.  Otherwise
+// the unreachable condition is left in the IL until the final pass.
+//
+// During final processing, after all blocks have been registered,
+// remove_and_update_globals() will
 // - check all exports from registered blocks
 // - ensure the cache entry of each export is set with the appropriate range
 // - rewrite the conditions to take the executable edge
@@ -71,23 +77,25 @@ along with GCC; see the file COPYING3.  If not see
 
 class remove_unreachable {
 public:
-  remove_unreachable (gimple_ranger &r) : m_ranger (r) { m_list.create (30); }
+  remove_unreachable (gimple_ranger &r, bool all) : m_ranger (r), final_p (all)
+    { m_list.create (30); }
   ~remove_unreachable () { m_list.release (); }
-  void maybe_register_block (basic_block bb);
-  bool remove_and_update_globals (bool final_p);
+  void handle_early (gimple *s, edge e);
+  void maybe_register (gimple *s);
+  bool remove_and_update_globals ();
   vec<std::pair<int, int> > m_list;
   gimple_ranger &m_ranger;
+  bool final_p;
 };
 
 // Check if block BB has a __builtin_unreachable () call on one arm, and
 // register the executable edge if so.
 
 void
-remove_unreachable::maybe_register_block (basic_block bb)
+remove_unreachable::maybe_register (gimple *s)
 {
-  gimple *s = gimple_outgoing_range_stmt_p (bb);
-  if (!s || gimple_code (s) != GIMPLE_COND)
-    return;
+  gcc_checking_assert  (gimple_code (s) == GIMPLE_COND);
+  basic_block bb = gimple_bb (s);
 
   edge e0 = EDGE_SUCC (bb, 0);
   basic_block bb0 = e0->dest;
@@ -102,21 +110,150 @@ remove_unreachable::maybe_register_block (basic_block bb)
   if (un0 == un1)
     return;
 
-  if (un0)
-    m_list.safe_push (std::make_pair (e1->src->index, e1->dest->index));
+  // Constant expressions are ignored.
+  if (TREE_CODE (gimple_cond_lhs (s)) != SSA_NAME
+      && TREE_CODE (gimple_cond_rhs (s)) != SSA_NAME)
+    return;
+
+  edge e = un0 ? e1 : e0;
+
+  if (!final_p)
+    handle_early (s, e);
   else
-    m_list.safe_push (std::make_pair (e0->src->index, e0->dest->index));
+    m_list.safe_push (std::make_pair (e->src->index, e->dest->index));
 }
 
+// Return true if all uses of NAME are dominated by by block BB.  1 use
+// is allowed in block BB, This is one we hope to remove.
+// ie
+//  _2 = _1 & 7;
+//  if (_2 != 0)
+//    goto <bb 3>; [0.00%]
+//  Any additional use of _1 or _2 in this block invalidates early replacement.
+
+static bool
+fully_replaceable (tree name, basic_block bb)
+{
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  bool saw_in_bb = false;
+
+  // If a name loads from memory, we may lose information used in
+  // commoning opportunities later.  See tree-ssa/ssa-pre-34.c.
+  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+  if (gimple_vuse (def_stmt))
+    return false;
+
+  FOR_EACH_IMM_USE_FAST (use_p, iter, name)
+    {
+      gimple *use_stmt = USE_STMT (use_p);
+      // Ignore debug stmts and the branch.
+      if (is_gimple_debug (use_stmt))
+	continue;
+      basic_block use_bb = gimple_bb (use_stmt);
+      // Only one use in the block allowed to avoid complicated cases.
+      if (use_bb == bb)
+	{
+	  if (saw_in_bb)
+	    return false;
+	  else
+	    saw_in_bb = true;
+	}
+      else if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb))
+	return false;
+    }
+  return true;
+}
+
+// This routine is called to check builtin_unreachable calls during any
+// time before final removal.  The only way we can be sure it does not
+// provide any additional information is to expect that we can update the
+// global values of all exports from a block.   This means the branch
+// to the unreachable call must dominate all uses of those ssa-names, with
+// the exception that there can be a single use in the block containing
+// the branch. IF the name used in the branch is defined in the block, it may
+// contain the name of something else that will be an export.  And likewise
+// that may also use another name that is an export etc.
+//
+// As long as there is only a single use, we can be sure that there are no other
+// side effects (like being passed to a call, or stored to a global, etc.
+// This means we will miss cases where there are 2 or more uses that have
+// no interveneing statements that may had side effects, but it catches most
+// of the caes we care about, and prevents expensive in depth analysis.
+//
+// Ranger will still reflect the proper ranges at other places in these missed
+// cases, we simply will not remove/set globals early.
+
+void
+remove_unreachable::handle_early (gimple *s, edge e)
+{
+  bool lhs_p = TREE_CODE (gimple_cond_lhs (s)) == SSA_NAME;
+  bool rhs_p = TREE_CODE (gimple_cond_rhs (s)) == SSA_NAME;
+  // Do not remove __builtin_unreachable if it confers a relation, or
+  // that relation may be lost in subsequent passes.
+  if (lhs_p && rhs_p)
+    return;
+  // Do not remove addresses early. ie if (x == &y)
+  if (lhs_p && TREE_CODE (gimple_cond_rhs (s)) == ADDR_EXPR)
+    return;
+
+  gcc_checking_assert (gimple_outgoing_range_stmt_p (e->src) == s);
+  gcc_checking_assert (!final_p);
+
+  // Check if every export use is dominated by this branch.
+  tree name;
+  FOR_EACH_GORI_EXPORT_NAME (m_ranger.gori (), e->src, name)
+    {
+      if (!fully_replaceable (name, e->src))
+	return;
+    }
+
+  // Set the global value for each.
+  FOR_EACH_GORI_EXPORT_NAME (m_ranger.gori (), e->src, name)
+    {
+      Value_Range r (TREE_TYPE (name));
+      m_ranger.range_on_entry (r, e->dest, name);
+      // Nothing at this late stage we can do if the write fails.
+      if (!set_range_info (name, r))
+	continue;
+      if (dump_file)
+	{
+	  fprintf (dump_file, "Global Exported (via early unreachable): ");
+	  print_generic_expr (dump_file, name, TDF_SLIM);
+	  fprintf (dump_file, " = ");
+	  gimple_range_global (r, name);
+	  r.dump (dump_file);
+	  fputc ('\n', dump_file);
+	}
+    }
+
+  tree ssa = lhs_p ? gimple_cond_lhs (s) : gimple_cond_rhs (s);
+
+  // Rewrite the condition.
+  if (e->flags & EDGE_TRUE_VALUE)
+    gimple_cond_make_true (as_a<gcond *> (s));
+  else
+    gimple_cond_make_false (as_a<gcond *> (s));
+  update_stmt (s);
+
+  // If the name on S is defined in this block, see if there is DCE work to do.
+  if (gimple_bb (SSA_NAME_DEF_STMT (ssa)) == e->src)
+    {
+      auto_bitmap dce;
+      bitmap_set_bit (dce, SSA_NAME_VERSION (ssa));
+      simple_dce_from_worklist (dce);
+    }
+}
+
+
 // Process the edges in the list, change the conditions and removing any
 // dead code feeding those conditions.  Calculate the range of any
 // names that may have been exported from those blocks, and determine if
 // there is any updates to their global ranges..
-// FINAL_P indicates all builtin_unreachable calls should be removed.
 // Return true if any builtin_unreachables/globals eliminated/updated.
 
 bool
-remove_unreachable::remove_and_update_globals (bool final_p)
+remove_unreachable::remove_and_update_globals ()
 {
   if (m_list.length () == 0)
     return false;
@@ -140,19 +277,7 @@ remove_unreachable::remove_and_update_globals (bool final_p)
       edge e = find_edge (src, dest);
       gimple *s = gimple_outgoing_range_stmt_p (e->src);
       gcc_checking_assert (gimple_code (s) == GIMPLE_COND);
-      bool lhs_p = TREE_CODE (gimple_cond_lhs (s)) == SSA_NAME;
-      bool rhs_p = TREE_CODE (gimple_cond_rhs (s)) == SSA_NAME;
-      // Do not remove __builtin_unreachable if it confers a relation, or
-      // that relation will be lost in subsequent passes.  Unless its the
-      // final pass.
-      if (!final_p && lhs_p && rhs_p)
-	continue;
-      // If this is already a constant condition, don't look either
-      if (!lhs_p && !rhs_p)
-	continue;
-      // Do not remove addresses early. ie if (x == &y)
-      if (!final_p && lhs_p && TREE_CODE (gimple_cond_rhs (s)) == ADDR_EXPR)
-	continue;
+
       bool dominate_exit_p = true;
       FOR_EACH_GORI_EXPORT_NAME (m_ranger.gori (), e->src, name)
 	{
@@ -827,9 +952,10 @@ class rvrp_folder : public substitute_and_fold_engine
 {
 public:
 
-  rvrp_folder (gimple_ranger *r) : substitute_and_fold_engine (),
-				   m_unreachable (*r),
-				   m_simplifier (r, r->non_executable_edge_flag)
+  rvrp_folder (gimple_ranger *r, bool all)
+    : substitute_and_fold_engine (),
+      m_unreachable (*r, all),
+      m_simplifier (r, r->non_executable_edge_flag)
   {
     m_ranger = r;
     m_pta = new pointer_equiv_analyzer (m_ranger);
@@ -883,8 +1009,6 @@ public:
   void post_fold_bb (basic_block bb) override
   {
     m_pta->leave (bb);
-    if (cfun->after_inlining)
-      m_unreachable.maybe_register_block (bb);
   }
 
   void pre_fold_stmt (gimple *stmt) override
@@ -893,7 +1017,12 @@ public:
     // If this is the last stmt and there are inferred ranges, reparse the
     // block for transitive inferred ranges that occur earlier in the block.
     if (stmt == m_last_bb_stmt)
-      m_ranger->register_transitive_inferred_ranges (gimple_bb (stmt));
+      {
+	m_ranger->register_transitive_inferred_ranges (gimple_bb (stmt));
+	// Also check for builtin_unreachable calls.
+	if (cfun->after_inlining && gimple_code (stmt) == GIMPLE_COND)
+	  m_unreachable.maybe_register (stmt);
+      }
   }
 
   bool fold_stmt (gimple_stmt_iterator *gsi) override
@@ -928,11 +1057,11 @@ execute_ranger_vrp (struct function *fun, bool warn_array_bounds_p,
 
   set_all_edges_as_executable (fun);
   gimple_ranger *ranger = enable_ranger (fun, false);
-  rvrp_folder folder (ranger);
+  rvrp_folder folder (ranger, final_p);
   phi_analysis_initialize (ranger->const_query ());
   folder.substitute_and_fold ();
   // Remove tagged builtin-unreachable and maybe update globals.
-  folder.m_unreachable.remove_and_update_globals (final_p);
+  folder.m_unreachable.remove_and_update_globals ();
   if (dump_file && (dump_flags & TDF_DETAILS))
     ranger->dump (dump_file);
 
-- 
2.41.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] New early __builtin_unreachable processing.
  2023-09-15 14:44 [PATCH] [RFC] New early __builtin_unreachable processing Andrew MacLeod
@ 2023-09-18  6:53 ` Richard Biener
  2023-09-18 13:48   ` Andrew MacLeod
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2023-09-18  6:53 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc-patches, hernandez, aldy

On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> Ive been looking at __builtin_unreachable () regressions.  The
> fundamental problem seems to be  a lack of consistent expectation for
> when we remove it earlier than the final pass of VRP.    After looking
> through them, I think this provides a sensible approach.
>
> Ranger is pretty good at providing ranges in blocks dominated by the
> __builtin_unreachable  branch, so removing it isn't quite a critical as
> it once was.  Its also pretty good at identifying what in the block can
> be affected by the branch.
>
> This patch provide an alternate removal algorithm for earlier passes.
> it looks at *all* the exports from the block, and if the branch
> dominates every use of all the exports, AND none of those values access
> memory, VRP will remove the unreachable call, rewrite the branch, update
> all the values globally, and finally perform the simple DCE on the
> branch's ssa-name.   This is kind of what it did before, but it wasn't
> as stringent on the requirements.
>
> The memory access check is required because there are a couple of test
> cases for PRE in which there is a series of instruction leading to an
> unreachable call, and none of those ssa names are ever used in the IL
> again. The whole chunk is dead, and we update globals, however
> pointlessly.  However, one of ssa_names loads from memory, and a later
> passes commons this value with a later load, and then  the unreachable
> call provides additional information about the later load.    This is
> evident in tree-ssa/ssa-pre-34.c.   The only way I see to avoid this
> situation is to not remove the unreachable if there is a load feeding it.
>
> What this does is a more sophisticated version of what DOM does in
> all_uses_feed_or_dominated_by_stmt.  THe feeding instructions dont have
> to be single use, but they do have to be dominated by the branch or be
> single use within the branches block..
>
> If there are multiple uses in the same block as the branch, this does
> not remove the unreachable call.  If we could be sure there are no
> intervening calls or side effects, it would be allowable, but this a
> more expensive checking operation.  Ranger gets the ranges right anyway,
> so with more passes using ranger, Im not sure we'd see much benefit from
> the additional analysis.   It could always be added later.
>
> This fixes at least 110249 and 110080 (and probably others).  The only
> regression is 93917 for which I changed the testcase to adjust
> expectations:
>
> // PR 93917
> void f1(int n)
> {
>    if(n<0)
>      __builtin_unreachable();
>    f3(n);
> }
>
> void f2(int*n)
> {
>    if(*n<0)
>      __builtin_unreachable();
>    f3 (*n);
> }
>
> We were removing both unreachable calls in VRP1, but only updating the
> global values in the first case, meaning we lose information.   With the
> change in semantics, we only update the global in the first case, but we
> leave the unreachable call in the second case now (due to the load from
> memory).  Ranger still calculates the contextual range correctly as [0,
> +INF] in the second case, it just doesn't set the global value until
> VRP2 when it is removed.
>
> Does this seem reasonable?

I wonder how this addresses the fundamental issue we always faced
in that when we apply the range this range info in itself allows the
branch to the __builtin_unreachable () to be statically determined,
so when the first VRP pass sets the range the next pass evaluating
the condition will remove it (and the guarded __builtin_unreachable ()).

In principle there's nothing wrong with that if we don't lose the range
info during optimizations, but that unfortunately happens more often
than wanted and with the __builtin_unreachable () gone we've lost
the ability to re-compute them.

I think it's good to explicitly remove the branch at the point we want
rather than relying on the "next" visitor to pick up the global range.

As I read the patch we now remove __builtin_unreachable () explicitly
as soon as possible but don't really address the fundamental issue
in any way?

> Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK?
>
> Andrew
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] New early __builtin_unreachable processing.
  2023-09-18  6:53 ` Richard Biener
@ 2023-09-18 13:48   ` Andrew MacLeod
  2023-09-19 12:56     ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew MacLeod @ 2023-09-18 13:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hernandez, aldy


On 9/18/23 02:53, Richard Biener wrote:
> On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>> Ive been looking at __builtin_unreachable () regressions.  The
>> fundamental problem seems to be  a lack of consistent expectation for
>> when we remove it earlier than the final pass of VRP.    After looking
>> through them, I think this provides a sensible approach.
>>
>> Ranger is pretty good at providing ranges in blocks dominated by the
>> __builtin_unreachable  branch, so removing it isn't quite a critical as
>> it once was.  Its also pretty good at identifying what in the block can
>> be affected by the branch.
>>
>> This patch provide an alternate removal algorithm for earlier passes.
>> it looks at *all* the exports from the block, and if the branch
>> dominates every use of all the exports, AND none of those values access
>> memory, VRP will remove the unreachable call, rewrite the branch, update
>> all the values globally, and finally perform the simple DCE on the
>> branch's ssa-name.   This is kind of what it did before, but it wasn't
>> as stringent on the requirements.
>>
>> The memory access check is required because there are a couple of test
>> cases for PRE in which there is a series of instruction leading to an
>> unreachable call, and none of those ssa names are ever used in the IL
>> again. The whole chunk is dead, and we update globals, however
>> pointlessly.  However, one of ssa_names loads from memory, and a later
>> passes commons this value with a later load, and then  the unreachable
>> call provides additional information about the later load.    This is
>> evident in tree-ssa/ssa-pre-34.c.   The only way I see to avoid this
>> situation is to not remove the unreachable if there is a load feeding it.
>>
>> What this does is a more sophisticated version of what DOM does in
>> all_uses_feed_or_dominated_by_stmt.  THe feeding instructions dont have
>> to be single use, but they do have to be dominated by the branch or be
>> single use within the branches block..
>>
>> If there are multiple uses in the same block as the branch, this does
>> not remove the unreachable call.  If we could be sure there are no
>> intervening calls or side effects, it would be allowable, but this a
>> more expensive checking operation.  Ranger gets the ranges right anyway,
>> so with more passes using ranger, Im not sure we'd see much benefit from
>> the additional analysis.   It could always be added later.
>>
>> This fixes at least 110249 and 110080 (and probably others).  The only
>> regression is 93917 for which I changed the testcase to adjust
>> expectations:
>>
>> // PR 93917
>> void f1(int n)
>> {
>>     if(n<0)
>>       __builtin_unreachable();
>>     f3(n);
>> }
>>
>> void f2(int*n)
>> {
>>     if(*n<0)
>>       __builtin_unreachable();
>>     f3 (*n);
>> }
>>
>> We were removing both unreachable calls in VRP1, but only updating the
>> global values in the first case, meaning we lose information.   With the
>> change in semantics, we only update the global in the first case, but we
>> leave the unreachable call in the second case now (due to the load from
>> memory).  Ranger still calculates the contextual range correctly as [0,
>> +INF] in the second case, it just doesn't set the global value until
>> VRP2 when it is removed.
>>
>> Does this seem reasonable?
> I wonder how this addresses the fundamental issue we always faced
> in that when we apply the range this range info in itself allows the
> branch to the __builtin_unreachable () to be statically determined,
> so when the first VRP pass sets the range the next pass evaluating
> the condition will remove it (and the guarded __builtin_unreachable ()).
>
> In principle there's nothing wrong with that if we don't lose the range
> info during optimizations, but that unfortunately happens more often
> than wanted and with the __builtin_unreachable () gone we've lost
> the ability to re-compute them.
>
> I think it's good to explicitly remove the branch at the point we want
> rather than relying on the "next" visitor to pick up the global range.
>
> As I read the patch we now remove __builtin_unreachable () explicitly
> as soon as possible but don't really address the fundamental issue
> in any way?


I think it pretty much addresses the issue completely.  No globals are 
updated by the unreachable branch unless it is removed.  We remove the 
unreachable early ONLY if every use of all the exports is dominated by 
the branch...    with the exception of a single use in the block used to 
define a different export. and those have to all have no other uses 
which are not dominated.  ie

  <bb 2> [local count: 1073741824]:
   y_2 = x_1(D) >> 1;
   t_3 = y_2 + 1;
   if (t_3 > 100)
     goto <bb 3>; [0.00%]
   else
     goto <bb 4>; [100.00%]

   <bb 3> [count: 0]:
   __builtin_unreachable ();

   <bb 4> [local count: 1073741824]:
   func (x_1(D), y_2, t_3);


In this case we will remove the unreachable call because we can provide 
an accurate global range for all values used in the definition chain for 
the program.

Global Exported (via early unreachable): x_1(D) = [irange] unsigned int 
[0, 199] MASK 0xff VALUE 0x0
Global Exported (via early unreachable): y_2 = [irange] unsigned int [0, 
99] MASK 0x7fffffff VALUE 0x0
Global Exported (via early unreachable): t_3 = [irange] unsigned int [1, 
100]


If conditions are not satisfied to do this early, we do not et ANY of 
the global values, and leave the unreachable alone.  ie. if that was fed 
from a memory load instead of a parameter:


  <bb 2> [local count: 1073741824]:
   x.0_1 = x;
   y_3 = x.0_1 >> 1;
   t_4 = y_3 + 1;
   if (t_4 > 100)
     goto <bb 3>; [0.00%]
   else
     goto <bb 4>; [100.00%]

   <bb 3> [count: 0]:
   __builtin_unreachable ();


we no longer remove the unreachable call, and do not sety any global 
values as a result.   that means a following pass is not goign to see 
the condition as foldable because we dont know anything about t_4's 
range globally.

We still contextually know via ranger that after the unreachable call we 
have those ranges:

=========== BB 4 ============
x.0_1   [irange] unsigned int [0, 199] MASK 0xff VALUE 0x0
y_3     [irange] unsigned int [0, 99] MASK 0x7fffffff VALUE 0x0
t_4     [irange] unsigned int [1, 100]
     <bb 4> [local count: 1073741824]:
     func (x.0_1, y_3, t_4);

but even ranger will not apply those ranges at the branch location.

The same thing happen if there is any other use of x.0_1, y_3 or t_4 
that isnt dominated by the branch.. we leave it alone.

So I don't think we lose any information... we just only ever move it to 
the global range early IFF we know that range applies to every export, 
and there is no chance there will be any commoning opportunities due to 
memory loads.

I dont see anything in the early VRP processing now that would allow a 
later pass to remove the unreachable unless it does its own analysis 
like DOM might do.

Andrew


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] New early __builtin_unreachable processing.
  2023-09-18 13:48   ` Andrew MacLeod
@ 2023-09-19 12:56     ` Richard Biener
  2023-09-19 13:41       ` Andrew MacLeod
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2023-09-19 12:56 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc-patches, hernandez, aldy

On Mon, Sep 18, 2023 at 3:48 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
>
> On 9/18/23 02:53, Richard Biener wrote:
> > On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >> Ive been looking at __builtin_unreachable () regressions.  The
> >> fundamental problem seems to be  a lack of consistent expectation for
> >> when we remove it earlier than the final pass of VRP.    After looking
> >> through them, I think this provides a sensible approach.
> >>
> >> Ranger is pretty good at providing ranges in blocks dominated by the
> >> __builtin_unreachable  branch, so removing it isn't quite a critical as
> >> it once was.  Its also pretty good at identifying what in the block can
> >> be affected by the branch.
> >>
> >> This patch provide an alternate removal algorithm for earlier passes.
> >> it looks at *all* the exports from the block, and if the branch
> >> dominates every use of all the exports, AND none of those values access
> >> memory, VRP will remove the unreachable call, rewrite the branch, update
> >> all the values globally, and finally perform the simple DCE on the
> >> branch's ssa-name.   This is kind of what it did before, but it wasn't
> >> as stringent on the requirements.
> >>
> >> The memory access check is required because there are a couple of test
> >> cases for PRE in which there is a series of instruction leading to an
> >> unreachable call, and none of those ssa names are ever used in the IL
> >> again. The whole chunk is dead, and we update globals, however
> >> pointlessly.  However, one of ssa_names loads from memory, and a later
> >> passes commons this value with a later load, and then  the unreachable
> >> call provides additional information about the later load.    This is
> >> evident in tree-ssa/ssa-pre-34.c.   The only way I see to avoid this
> >> situation is to not remove the unreachable if there is a load feeding it.
> >>
> >> What this does is a more sophisticated version of what DOM does in
> >> all_uses_feed_or_dominated_by_stmt.  THe feeding instructions dont have
> >> to be single use, but they do have to be dominated by the branch or be
> >> single use within the branches block..
> >>
> >> If there are multiple uses in the same block as the branch, this does
> >> not remove the unreachable call.  If we could be sure there are no
> >> intervening calls or side effects, it would be allowable, but this a
> >> more expensive checking operation.  Ranger gets the ranges right anyway,
> >> so with more passes using ranger, Im not sure we'd see much benefit from
> >> the additional analysis.   It could always be added later.
> >>
> >> This fixes at least 110249 and 110080 (and probably others).  The only
> >> regression is 93917 for which I changed the testcase to adjust
> >> expectations:
> >>
> >> // PR 93917
> >> void f1(int n)
> >> {
> >>     if(n<0)
> >>       __builtin_unreachable();
> >>     f3(n);
> >> }
> >>
> >> void f2(int*n)
> >> {
> >>     if(*n<0)
> >>       __builtin_unreachable();
> >>     f3 (*n);
> >> }
> >>
> >> We were removing both unreachable calls in VRP1, but only updating the
> >> global values in the first case, meaning we lose information.   With the
> >> change in semantics, we only update the global in the first case, but we
> >> leave the unreachable call in the second case now (due to the load from
> >> memory).  Ranger still calculates the contextual range correctly as [0,
> >> +INF] in the second case, it just doesn't set the global value until
> >> VRP2 when it is removed.
> >>
> >> Does this seem reasonable?
> > I wonder how this addresses the fundamental issue we always faced
> > in that when we apply the range this range info in itself allows the
> > branch to the __builtin_unreachable () to be statically determined,
> > so when the first VRP pass sets the range the next pass evaluating
> > the condition will remove it (and the guarded __builtin_unreachable ()).
> >
> > In principle there's nothing wrong with that if we don't lose the range
> > info during optimizations, but that unfortunately happens more often
> > than wanted and with the __builtin_unreachable () gone we've lost
> > the ability to re-compute them.
> >
> > I think it's good to explicitly remove the branch at the point we want
> > rather than relying on the "next" visitor to pick up the global range.
> >
> > As I read the patch we now remove __builtin_unreachable () explicitly
> > as soon as possible but don't really address the fundamental issue
> > in any way?
>
>
> I think it pretty much addresses the issue completely.  No globals are
> updated by the unreachable branch unless it is removed.  We remove the
> unreachable early ONLY if every use of all the exports is dominated by
> the branch...    with the exception of a single use in the block used to
> define a different export. and those have to all have no other uses
> which are not dominated.  ie
>
>   <bb 2> [local count: 1073741824]:
>    y_2 = x_1(D) >> 1;
>    t_3 = y_2 + 1;
>    if (t_3 > 100)
>      goto <bb 3>; [0.00%]
>    else
>      goto <bb 4>; [100.00%]
>
>    <bb 3> [count: 0]:
>    __builtin_unreachable ();
>
>    <bb 4> [local count: 1073741824]:
>    func (x_1(D), y_2, t_3);
>
>
> In this case we will remove the unreachable call because we can provide
> an accurate global range for all values used in the definition chain for
> the program.
>
> Global Exported (via early unreachable): x_1(D) = [irange] unsigned int
> [0, 199] MASK 0xff VALUE 0x0
> Global Exported (via early unreachable): y_2 = [irange] unsigned int [0,
> 99] MASK 0x7fffffff VALUE 0x0
> Global Exported (via early unreachable): t_3 = [irange] unsigned int [1,
> 100]
>
>
> If conditions are not satisfied to do this early, we do not et ANY of
> the global values, and leave the unreachable alone.  ie. if that was fed
> from a memory load instead of a parameter:
>
>
>   <bb 2> [local count: 1073741824]:
>    x.0_1 = x;
>    y_3 = x.0_1 >> 1;
>    t_4 = y_3 + 1;
>    if (t_4 > 100)
>      goto <bb 3>; [0.00%]
>    else
>      goto <bb 4>; [100.00%]
>
>    <bb 3> [count: 0]:
>    __builtin_unreachable ();
>
>
> we no longer remove the unreachable call, and do not sety any global
> values as a result.   that means a following pass is not goign to see
> the condition as foldable because we dont know anything about t_4's
> range globally.
>
> We still contextually know via ranger that after the unreachable call we
> have those ranges:
>
> =========== BB 4 ============
> x.0_1   [irange] unsigned int [0, 199] MASK 0xff VALUE 0x0
> y_3     [irange] unsigned int [0, 99] MASK 0x7fffffff VALUE 0x0
> t_4     [irange] unsigned int [1, 100]
>      <bb 4> [local count: 1073741824]:
>      func (x.0_1, y_3, t_4);
>
> but even ranger will not apply those ranges at the branch location.
>
> The same thing happen if there is any other use of x.0_1, y_3 or t_4
> that isnt dominated by the branch.. we leave it alone.
>
> So I don't think we lose any information... we just only ever move it to
> the global range early IFF we know that range applies to every export,
> and there is no chance there will be any commoning opportunities due to
> memory loads.

OK.

> I dont see anything in the early VRP processing now that would allow a
> later pass to remove the unreachable unless it does its own analysis
> like DOM might do.

Isn't it as simple as

  if (i_2 > 5) __builtin_unreachable ();

registering a global range of [6, INF] for i_2 and then the next time
we fold if (i_2 > 5) using range info will eliminate it?  Yes, historically
that required VRP or DOM since nothing else looked at ranges, not
sure how it behaves now given more match.pd patterns do look
at (global) ranges.

In any case, thanks for the explanation and OK for the patch.

Thanks,
Richard.


> Andrew
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] New early __builtin_unreachable processing.
  2023-09-19 12:56     ` Richard Biener
@ 2023-09-19 13:41       ` Andrew MacLeod
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew MacLeod @ 2023-09-19 13:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hernandez, aldy


On 9/19/23 08:56, Richard Biener wrote:
> On Mon, Sep 18, 2023 at 3:48 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>
>> OK.
>>
>> I dont see anything in the early VRP processing now that would allow a
>> later pass to remove the unreachable unless it does its own analysis
>> like DOM might do.
> Isn't it as simple as
>
>    if (i_2 > 5) __builtin_unreachable ();
>
> registering a global range of [6, INF] for i_2 and then the next time
> we fold if (i_2 > 5) using range info will eliminate it?  Yes, historically
> that required VRP or DOM since nothing else looked at ranges, not
> sure how it behaves now given more match.pd patterns do look
> at (global) ranges.

if we set the range yes.   What I meant was in the cases where we decide 
it can't be removed, we do NOT set the range globally in vrp1 now. This 
means  unless some other pass determines the range is [6, +INF] the 
unreachcable call will remain in the IL and any ranger aware pass will 
still get the contextual range info resulting from the unreachable.  We 
were sometimes removing the unreachable without being able to update 
every affected global/future optimization opportunity, which this fixes. 
Hopefully :-)   Its certainly much better at least.

In theory, if inlining were aware of global ranges and propagated them, 
we could also now remove these some of these unreachables in EVRP rather 
than VRP1...   as I think we're now sure there is no benefit to keeping 
the unreachable call when we remove it.



> In any case, thanks for the explanation and OK for the patch.

Will check it in shortly.

Andrew


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-19 13:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-15 14:44 [PATCH] [RFC] New early __builtin_unreachable processing Andrew MacLeod
2023-09-18  6:53 ` Richard Biener
2023-09-18 13:48   ` Andrew MacLeod
2023-09-19 12:56     ` Richard Biener
2023-09-19 13:41       ` Andrew MacLeod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).