public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
@ 2023-02-15 17:05 Andrew MacLeod
  2023-02-16  7:55 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew MacLeod @ 2023-02-15 17:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: hernandez, aldy

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

This patch implements the suggestion that we have an alternative 
ssa-cache which does not zero memory, and instead uses a bitmap to track 
whether a value is currently set or not.  It roughly mimics what 
path_range_query was doing internally.

For sparsely used cases, expecially in large programs, this is more 
efficient.  I changed path_range_query to use this, and removed it old 
bitmap (and a hack or two around PHI calculations), and also utilized 
this is the assume_query class.

Performance wise, the patch doesn't affect VRP (since that still uses 
the original version).  Switching to the lazy version caused a slowdown 
of 2.5% across VRP.

There was a noticeable improvement elsewhere.,  across 230 GCC source 
files, threading ran over 12% faster!.  Overall compilation improved by 
0.3%  Not sure it makes much difference in compiler.i, but it shouldn't 
hurt.

bootstraps on x86_64-pc-linux-gnu with no regressions.   OK for trunk?  
or do you want to wait for the next release...

Andrew

[-- Attachment #2: 0001-Create-a-lazy-ssa_cache.patch --]
[-- Type: text/x-patch, Size: 12356 bytes --]

From a4736b402d95b184659846ba308ce51f708472d1 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Wed, 8 Feb 2023 17:43:45 -0500
Subject: [PATCH 1/2] Create a lazy ssa_cache

Sparsely used ssa name caches can benefit from using a bitmap to
determine if a name already has an entry.  Utilize it in the path query
and remove its private bitmap for tracking the same info.
Also use it in the "assume" query class.

	* gimple-range-cache.cc (ssa_global_cache::clear_global_range): Do
	not clear the vector on an out of range query.
	(ssa_lazy_cache::set_global_range): New.
	* gimple-range-cache.h (class ssa_lazy_cache): New.
	(ssa_lazy_cache::ssa_lazy_cache): New.
	(ssa_lazy_cache::~ssa_lazy_cache): New.
	(ssa_lazy_cache::get_global_range): New.
	(ssa_lazy_cache::clear_global_range): New.
	(ssa_lazy_cache::clear): New.
	(ssa_lazy_cache::dump): New.
	* gimple-range-path.cc (path_range_query::path_range_query): Do
	not allocate a ssa_global_cache object not has_cache bitmap.
	(path_range_query::~path_range_query): Do not free objects.
	(path_range_query::clear_cache): Remove.
	(path_range_query::get_cache): Adjust.
	(path_range_query::set_cache): Remove.
	(path_range_query::dump): Don't call through a pointer.
	(path_range_query::internal_range_of_expr): Set cache directly.
	(path_range_query::reset_path): Clear cache directly.
	(path_range_query::ssa_range_in_phi): Fold with globals only.
	(path_range_query::compute_ranges_in_phis): Simply set range.
	(path_range_query::compute_ranges_in_block): Call cache directly.
	* gimple-range-path.h (class path_range_query): Replace bitmap
	and cache pointer with lazy cache object.
	* gimple-range.h (class assume_query): Use ssa_lazy_cache.
---
 gcc/gimple-range-cache.cc | 24 ++++++++++++--
 gcc/gimple-range-cache.h  | 33 +++++++++++++++++++-
 gcc/gimple-range-path.cc  | 66 +++++++++------------------------------
 gcc/gimple-range-path.h   |  7 +----
 gcc/gimple-range.h        |  2 +-
 5 files changed, 70 insertions(+), 62 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 546262c4794..9bfbdb2c9b3 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -525,14 +525,14 @@ ssa_global_cache::set_global_range (tree name, const vrange &r)
   return m != NULL;
 }
 
-// Set the range for NAME to R in the glonbal cache.
+// Set the range for NAME to R in the global cache.
 
 void
 ssa_global_cache::clear_global_range (tree name)
 {
   unsigned v = SSA_NAME_VERSION (name);
   if (v >= m_tab.length ())
-    m_tab.safe_grow_cleared (num_ssa_names + 1);
+    return;
   m_tab[v] = NULL;
 }
 
@@ -579,6 +579,26 @@ ssa_global_cache::dump (FILE *f)
     fputc ('\n', f);
 }
 
+
+// Set range of NAME to R in a lazy cache.  Return FALSE if it did not already
+// have a range.
+
+bool
+ssa_lazy_cache::set_global_range (tree name, const vrange &r)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  if (!bitmap_set_bit (active_p, v))
+    {
+      // There is already an entry, simply set it.
+      gcc_checking_assert (v < m_tab.length ());
+      return ssa_global_cache::set_global_range (name, r);
+    }
+  if (v >= m_tab.length ())
+    m_tab.safe_grow (num_ssa_names + 1);
+  m_tab[v] = m_range_allocator->clone (r);
+  return false;
+}
+
 // --------------------------------------------------------------------------
 
 
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 4ff435dc5c1..f1799b45738 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -62,11 +62,42 @@ public:
   void clear_global_range (tree name);
   void clear ();
   void dump (FILE *f = stderr);
-private:
+protected:
   vec<vrange *> m_tab;
   vrange_allocator *m_range_allocator;
 };
 
+// This is the same as global cache, except it maintains an active bitmap
+// rather than depending on a zero'd out vector of pointers.  This is better
+// for sparsely/lightly used caches.
+// It could be made a fully derived class, but at this point there doesnt seem
+// to be a need to take the performance hit for it.
+
+class ssa_lazy_cache : protected ssa_global_cache
+{
+public:
+  inline ssa_lazy_cache () { active_p = BITMAP_ALLOC (NULL); }
+  inline ~ssa_lazy_cache () { BITMAP_FREE (active_p); }
+  bool set_global_range (tree name, const vrange &r);
+  inline bool get_global_range (vrange &r, tree name) const;
+  inline void clear_global_range (tree name)
+    { bitmap_clear_bit (active_p, SSA_NAME_VERSION (name)); } ;
+  inline void clear () { bitmap_clear (active_p); }
+  inline void dump (FILE *f = stderr) { ssa_global_cache::dump (f); }
+protected:
+  bitmap active_p;
+};
+
+// Return TRUE if NAME has a range, and return it in R.
+
+bool
+ssa_lazy_cache::get_global_range (vrange &r, tree name) const
+{
+  if (!bitmap_bit_p (active_p, SSA_NAME_VERSION (name)))
+    return false;
+  return ssa_global_cache::get_global_range (r, name);
+}
+
 // This class provides all the caches a global ranger may need, and makes 
 // them available for gori-computes to query so outgoing edges can be
 // properly calculated.
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 7c45a8815cb..13303a42627 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -40,8 +40,7 @@ path_range_query::path_range_query (gimple_ranger &ranger,
 				    const vec<basic_block> &path,
 				    const bitmap_head *dependencies,
 				    bool resolve)
-  : m_cache (new ssa_global_cache),
-    m_has_cache_entry (BITMAP_ALLOC (NULL)),
+  : m_cache (),
     m_ranger (ranger),
     m_resolve (resolve)
 {
@@ -51,8 +50,7 @@ path_range_query::path_range_query (gimple_ranger &ranger,
 }
 
 path_range_query::path_range_query (gimple_ranger &ranger, bool resolve)
-  : m_cache (new ssa_global_cache),
-    m_has_cache_entry (BITMAP_ALLOC (NULL)),
+  : m_cache (),
     m_ranger (ranger),
     m_resolve (resolve)
 {
@@ -62,8 +60,6 @@ path_range_query::path_range_query (gimple_ranger &ranger, bool resolve)
 path_range_query::~path_range_query ()
 {
   delete m_oracle;
-  BITMAP_FREE (m_has_cache_entry);
-  delete m_cache;
 }
 
 // Return TRUE if NAME is an exit dependency for the path.
@@ -75,16 +71,6 @@ path_range_query::exit_dependency_p (tree name)
 	  && bitmap_bit_p (m_exit_dependencies, SSA_NAME_VERSION (name)));
 }
 
-// Mark cache entry for NAME as unused.
-
-void
-path_range_query::clear_cache (tree name)
-{
-  unsigned v = SSA_NAME_VERSION (name);
-  bitmap_clear_bit (m_has_cache_entry, v);
-}
-
-// If NAME has a cache entry, return it in R, and return TRUE.
 
 inline bool
 path_range_query::get_cache (vrange &r, tree name)
@@ -92,21 +78,7 @@ path_range_query::get_cache (vrange &r, tree name)
   if (!gimple_range_ssa_p (name))
     return get_global_range_query ()->range_of_expr (r, name);
 
-  unsigned v = SSA_NAME_VERSION (name);
-  if (bitmap_bit_p (m_has_cache_entry, v))
-    return m_cache->get_global_range (r, name);
-
-  return false;
-}
-
-// Set the cache entry for NAME to R.
-
-void
-path_range_query::set_cache (const vrange &r, tree name)
-{
-  unsigned v = SSA_NAME_VERSION (name);
-  bitmap_set_bit (m_has_cache_entry, v);
-  m_cache->set_global_range (name, r);
+  return m_cache.get_global_range (r, name);
 }
 
 void
@@ -130,7 +102,7 @@ path_range_query::dump (FILE *dump_file)
       fprintf (dump_file, "\n");
     }
 
-  m_cache->dump (dump_file);
+  m_cache.dump (dump_file);
 }
 
 void
@@ -174,7 +146,7 @@ path_range_query::internal_range_of_expr (vrange &r, tree name, gimple *stmt)
   if (m_resolve && defined_outside_path (name))
     {
       range_on_path_entry (r, name);
-      set_cache (r, name);
+      m_cache.set_global_range (name, r);
       return true;
     }
 
@@ -188,7 +160,7 @@ path_range_query::internal_range_of_expr (vrange &r, tree name, gimple *stmt)
 	  r.intersect (glob);
 	}
 
-      set_cache (r, name);
+      m_cache.set_global_range (name, r);
       return true;
     }
 
@@ -225,7 +197,7 @@ path_range_query::reset_path (const vec<basic_block> &path,
   m_path = path.copy ();
   m_pos = m_path.length () - 1;
   m_undefined_path = false;
-  bitmap_clear (m_has_cache_entry);
+  m_cache.clear ();
 
   compute_ranges (dependencies);
 }
@@ -255,7 +227,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       if (m_resolve && m_ranger.range_of_expr (r, name, phi))
 	return;
 
-      // Try to fold the phi exclusively with global or cached values.
+      // Try to fold the phi exclusively with global values.
       // This will get things like PHI <5(99), 6(88)>.  We do this by
       // calling range_of_expr with no context.
       unsigned nargs = gimple_phi_num_args (phi);
@@ -264,7 +236,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       for (size_t i = 0; i < nargs; ++i)
 	{
 	  tree arg = gimple_phi_arg_def (phi, i);
-	  if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
+	  if (m_ranger.range_of_expr (arg_range, arg, /*stmt=*/NULL))
 	    r.union_ (arg_range);
 	  else
 	    {
@@ -348,8 +320,6 @@ path_range_query::range_defined_in_block (vrange &r, tree name, basic_block bb)
 void
 path_range_query::compute_ranges_in_phis (basic_block bb)
 {
-  auto_bitmap phi_set;
-
   // PHIs must be resolved simultaneously on entry to the block
   // because any dependencies must be satistifed with values on entry.
   // Thus, we calculate all PHIs first, and then update the cache at
@@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
 
       Value_Range r (TREE_TYPE (name));
       if (range_defined_in_block (r, name, bb))
-	{
-	  unsigned v = SSA_NAME_VERSION (name);
-	  set_cache (r, name);
-	  bitmap_set_bit (phi_set, v);
-	  // Pretend we don't have a cache entry for this name until
-	  // we're done with all PHIs.
-	  bitmap_clear_bit (m_has_cache_entry, v);
-	}
+	m_cache.set_global_range (name, r);
     }
-  bitmap_ior_into (m_has_cache_entry, phi_set);
 }
 
 // Return TRUE if relations may be invalidated after crossing edge E.
@@ -408,7 +370,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
     {
       tree name = ssa_name (i);
       if (ssa_defined_in_bb (name, bb))
-	clear_cache (name);
+	m_cache.clear_global_range (name);
     }
 
   // Solve dependencies defined in this block, starting with the PHIs...
@@ -421,7 +383,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
 
       if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
 	  && range_defined_in_block (r, name, bb))
-	set_cache (r, name);
+	m_cache.set_global_range (name, r);
     }
 
   if (at_exit ())
@@ -457,7 +419,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
 	  if (get_cache (cached_range, name))
 	    r.intersect (cached_range);
 
-	  set_cache (r, name);
+	  m_cache.set_global_range (name, r);
 	  if (DEBUG_SOLVER)
 	    {
 	      fprintf (dump_file, "outgoing_edge_range_p for ");
@@ -500,7 +462,7 @@ path_range_query::adjust_for_non_null_uses (basic_block bb)
 	r.set_varying (TREE_TYPE (name));
 
       if (m_ranger.m_cache.m_exit.maybe_adjust_range (r, name, bb))
-	set_cache (r, name);
+	m_cache.set_global_range (name, r);
     }
 }
 
diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
index e8b06b60e66..34841e78c3d 100644
--- a/gcc/gimple-range-path.h
+++ b/gcc/gimple-range-path.h
@@ -54,9 +54,7 @@ private:
   path_oracle *get_path_oracle () { return (path_oracle *)m_oracle; }
 
   // Cache manipulation.
-  void set_cache (const vrange &r, tree name);
   bool get_cache (vrange &r, tree name);
-  void clear_cache (tree name);
 
   // Methods to compute ranges for the given path.
   bool range_defined_in_block (vrange &, tree name, basic_block bb);
@@ -83,10 +81,7 @@ private:
   void move_next ()	  { --m_pos; }
 
   // Range cache for SSA names.
-  ssa_global_cache *m_cache;
-
-  // Set for each SSA that has an active entry in the cache.
-  bitmap m_has_cache_entry;
+  ssa_lazy_cache m_cache;
 
   // Path being analyzed.
   auto_vec<basic_block> m_path;
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 4bf9c482921..eacb32d8ba3 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -95,7 +95,7 @@ protected:
   void calculate_phi (gphi *phi, vrange &lhs_range, fur_source &src);
   void check_taken_edge (edge e, fur_source &src);
 
-  ssa_global_cache global;
+  ssa_lazy_cache global;
   gori_compute m_gori;
 };
 
-- 
2.39.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
  2023-02-15 17:05 [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache Andrew MacLeod
@ 2023-02-16  7:55 ` Richard Biener
  2023-02-16  9:36   ` Aldy Hernandez
  2023-02-16 14:34   ` Andrew MacLeod
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Biener @ 2023-02-16  7:55 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc-patches, hernandez, aldy

On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch implements the suggestion that we have an alternative
> ssa-cache which does not zero memory, and instead uses a bitmap to track
> whether a value is currently set or not.  It roughly mimics what
> path_range_query was doing internally.
>
> For sparsely used cases, expecially in large programs, this is more
> efficient.  I changed path_range_query to use this, and removed it old
> bitmap (and a hack or two around PHI calculations), and also utilized
> this is the assume_query class.
>
> Performance wise, the patch doesn't affect VRP (since that still uses
> the original version).  Switching to the lazy version caused a slowdown
> of 2.5% across VRP.
>
> There was a noticeable improvement elsewhere.,  across 230 GCC source
> files, threading ran over 12% faster!.  Overall compilation improved by
> 0.3%  Not sure it makes much difference in compiler.i, but it shouldn't
> hurt.
>
> bootstraps on x86_64-pc-linux-gnu with no regressions.   OK for trunk?
> or do you want to wait for the next release...

I see

@@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)

       Value_Range r (TREE_TYPE (name));
       if (range_defined_in_block (r, name, bb))
-       {
-         unsigned v = SSA_NAME_VERSION (name);
-         set_cache (r, name);
-         bitmap_set_bit (phi_set, v);
-         // Pretend we don't have a cache entry for this name until
-         // we're done with all PHIs.
-         bitmap_clear_bit (m_has_cache_entry, v);
-       }
+       m_cache.set_global_range (name, r);
     }
-  bitmap_ior_into (m_has_cache_entry, phi_set);
 }

 // Return TRUE if relations may be invalidated after crossing edge E.

which I think is not correct - if we have

 # _1 = PHI <..., _2>
 # _2 = PHI <..., _1>

then their effects are supposed to be executed in parallel, that is,
both PHI argument _2 and _1 are supposed to see the "old" version.
The previous code tried to make sure the range of the new _1 doesn't
get seen when processing the argument _1 in the definition of _2.

The new version drops this, possibly resulting in wrong-code.

While I think it's appropriate to sort out compile-time issues like this
during stage4 at least the above makes me think it should be defered
to next stage1.

Richard.

>
> Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
  2023-02-16  7:55 ` Richard Biener
@ 2023-02-16  9:36   ` Aldy Hernandez
  2023-02-16 14:34   ` Andrew MacLeod
  1 sibling, 0 replies; 5+ messages in thread
From: Aldy Hernandez @ 2023-02-16  9:36 UTC (permalink / raw)
  To: Richard Biener, Andrew MacLeod; +Cc: gcc-patches



On 2/16/23 08:55, Richard Biener wrote:
> On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> This patch implements the suggestion that we have an alternative
>> ssa-cache which does not zero memory, and instead uses a bitmap to track
>> whether a value is currently set or not.  It roughly mimics what
>> path_range_query was doing internally.
>>
>> For sparsely used cases, expecially in large programs, this is more
>> efficient.  I changed path_range_query to use this, and removed it old
>> bitmap (and a hack or two around PHI calculations), and also utilized
>> this is the assume_query class.
>>
>> Performance wise, the patch doesn't affect VRP (since that still uses
>> the original version).  Switching to the lazy version caused a slowdown
>> of 2.5% across VRP.
>>
>> There was a noticeable improvement elsewhere.,  across 230 GCC source
>> files, threading ran over 12% faster!.  Overall compilation improved by
>> 0.3%  Not sure it makes much difference in compiler.i, but it shouldn't
>> hurt.
>>
>> bootstraps on x86_64-pc-linux-gnu with no regressions.   OK for trunk?
>> or do you want to wait for the next release...
> 
> I see
> 
> @@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
> 
>         Value_Range r (TREE_TYPE (name));
>         if (range_defined_in_block (r, name, bb))
> -       {
> -         unsigned v = SSA_NAME_VERSION (name);
> -         set_cache (r, name);
> -         bitmap_set_bit (phi_set, v);
> -         // Pretend we don't have a cache entry for this name until
> -         // we're done with all PHIs.
> -         bitmap_clear_bit (m_has_cache_entry, v);
> -       }
> +       m_cache.set_global_range (name, r);
>       }
> -  bitmap_ior_into (m_has_cache_entry, phi_set);
>   }
> 
>   // Return TRUE if relations may be invalidated after crossing edge E.
> 
> which I think is not correct - if we have
> 
>   # _1 = PHI <..., _2>
>   # _2 = PHI <..., _1>
> 
> then their effects are supposed to be executed in parallel, that is,
> both PHI argument _2 and _1 are supposed to see the "old" version.
> The previous code tried to make sure the range of the new _1 doesn't
> get seen when processing the argument _1 in the definition of _2.

Yes, the effects should appear in parallel, but ssa_range_in_phi() which 
is the only thing range_defined_in_block does for PHIs, is guaranteed to 
not do any additional cache lookups.  The comment there should be 
adjusted to make this clear:

// Since PHIs are calculated in parallel at the beginning of the
// block, we must be careful to never save anything to the cache here.
// It is the caller's responsibility to adjust the cache.  Also,
// calculating the PHI's range must not trigger additional lookups.

We should instead say:

"we must be careful to never set or access the cache here"...

This was the original intent, but a subtle access to the cache crept in 
here:

       // Try to fold the phi exclusively with global or cached values.
       // This will get things like PHI <5(99), 6(88)>.  We do this by
       // calling range_of_expr with no context.
       unsigned nargs = gimple_phi_num_args (phi);
       Value_Range arg_range (TREE_TYPE (name));
       r.set_undefined ();
       for (size_t i = 0; i < nargs; ++i)
	{
	  tree arg = gimple_phi_arg_def (phi, i);
	  if (range_of_expr (arg_range, arg, /*stmt=*/NULL))

This range_of_expr call will indeed access the cache incorrectly, but 
Andrew fixed that here:

@@ -264,7 +236,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi 
*phi)
        for (size_t i = 0; i < nargs; ++i)
  	{
  	  tree arg = gimple_phi_arg_def (phi, i);
-	  if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
+	  if (m_ranger.range_of_expr (arg_range, arg, /*stmt=*/NULL))
  	    r.union_ (arg_range);
  	  else
  	    {

...thus ensuring that function never uses the cache.  All the lookups 
are done with the global ranger at either the path entry or globally as 
above (with stmt=NULL).

I believe the switch from range_of_expr to m_ranger.range_of_expr is 
safe, as the original code was added to handle silly things like PHI 
<5(99), 6(88)> which shouldn't need path aware ranges.

As you've found out, the update to the cache in this case was not 
obvious at all.  Perhaps it should also be commented:

"It is safe to set the cache here, as range_defined_in_block for PHIs 
(ssa_range_in_phi) is guaranteed not to do any cache lookups."

> 
> The new version drops this, possibly resulting in wrong-code.
> 
> While I think it's appropriate to sort out compile-time issues like this
> during stage4 at least the above makes me think it should be defered
> to next stage1.

I defer to the release managers as to whether this is safe in light of 
my explanation above :).

Aldy


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
  2023-02-16  7:55 ` Richard Biener
  2023-02-16  9:36   ` Aldy Hernandez
@ 2023-02-16 14:34   ` Andrew MacLeod
  2023-02-17  7:54     ` Richard Biener
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew MacLeod @ 2023-02-16 14:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hernandez, aldy


On 2/16/23 02:55, Richard Biener wrote:
> On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> This patch implements the suggestion that we have an alternative
>> ssa-cache which does not zero memory, and instead uses a bitmap to track
>> whether a value is currently set or not.  It roughly mimics what
>> path_range_query was doing internally.
>>
>> For sparsely used cases, expecially in large programs, this is more
>> efficient.  I changed path_range_query to use this, and removed it old
>> bitmap (and a hack or two around PHI calculations), and also utilized
>> this is the assume_query class.
>>
>> Performance wise, the patch doesn't affect VRP (since that still uses
>> the original version).  Switching to the lazy version caused a slowdown
>> of 2.5% across VRP.
>>
>> There was a noticeable improvement elsewhere.,  across 230 GCC source
>> files, threading ran over 12% faster!.  Overall compilation improved by
>> 0.3%  Not sure it makes much difference in compiler.i, but it shouldn't
>> hurt.
>>
>> bootstraps on x86_64-pc-linux-gnu with no regressions.   OK for trunk?
>> or do you want to wait for the next release...
> I see
>
> @@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
>
>         Value_Range r (TREE_TYPE (name));
>         if (range_defined_in_block (r, name, bb))
> -       {
> -         unsigned v = SSA_NAME_VERSION (name);
> -         set_cache (r, name);
> -         bitmap_set_bit (phi_set, v);
> -         // Pretend we don't have a cache entry for this name until
> -         // we're done with all PHIs.
> -         bitmap_clear_bit (m_has_cache_entry, v);
> -       }
> +       m_cache.set_global_range (name, r);
>       }
> -  bitmap_ior_into (m_has_cache_entry, phi_set);
>   }
>
>   // Return TRUE if relations may be invalidated after crossing edge E.
>
> which I think is not correct - if we have
>
>   # _1 = PHI <..., _2>
>   # _2 = PHI <..., _1>
>
> then their effects are supposed to be executed in parallel, that is,
> both PHI argument _2 and _1 are supposed to see the "old" version.
> The previous code tried to make sure the range of the new _1 doesn't
> get seen when processing the argument _1 in the definition of _2.
>
> The new version drops this, possibly resulting in wrong-code.

This is dropped because it is actually handled properly in 
range_defined_in_block now.  (which I think Aldy was describing).

It didnt make sense to me why it was handled here like this, so I traced 
through the call chain to find out if it was still actually needed and 
discussed it with Aldy.  I think it was mostly a leftover wart.

>
> While I think it's appropriate to sort out compile-time issues like this
> during stage4 at least the above makes me think it should be defered
> to next stage1.

I am happy to defer it since its a marginal increase anyway.

Andrew



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
  2023-02-16 14:34   ` Andrew MacLeod
@ 2023-02-17  7:54     ` Richard Biener
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2023-02-17  7:54 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc-patches, hernandez, aldy

On Thu, Feb 16, 2023 at 3:34 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
>
> On 2/16/23 02:55, Richard Biener wrote:
> > On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >> This patch implements the suggestion that we have an alternative
> >> ssa-cache which does not zero memory, and instead uses a bitmap to track
> >> whether a value is currently set or not.  It roughly mimics what
> >> path_range_query was doing internally.
> >>
> >> For sparsely used cases, expecially in large programs, this is more
> >> efficient.  I changed path_range_query to use this, and removed it old
> >> bitmap (and a hack or two around PHI calculations), and also utilized
> >> this is the assume_query class.
> >>
> >> Performance wise, the patch doesn't affect VRP (since that still uses
> >> the original version).  Switching to the lazy version caused a slowdown
> >> of 2.5% across VRP.
> >>
> >> There was a noticeable improvement elsewhere.,  across 230 GCC source
> >> files, threading ran over 12% faster!.  Overall compilation improved by
> >> 0.3%  Not sure it makes much difference in compiler.i, but it shouldn't
> >> hurt.
> >>
> >> bootstraps on x86_64-pc-linux-gnu with no regressions.   OK for trunk?
> >> or do you want to wait for the next release...
> > I see
> >
> > @@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
> >
> >         Value_Range r (TREE_TYPE (name));
> >         if (range_defined_in_block (r, name, bb))
> > -       {
> > -         unsigned v = SSA_NAME_VERSION (name);
> > -         set_cache (r, name);
> > -         bitmap_set_bit (phi_set, v);
> > -         // Pretend we don't have a cache entry for this name until
> > -         // we're done with all PHIs.
> > -         bitmap_clear_bit (m_has_cache_entry, v);
> > -       }
> > +       m_cache.set_global_range (name, r);
> >       }
> > -  bitmap_ior_into (m_has_cache_entry, phi_set);
> >   }
> >
> >   // Return TRUE if relations may be invalidated after crossing edge E.
> >
> > which I think is not correct - if we have
> >
> >   # _1 = PHI <..., _2>
> >   # _2 = PHI <..., _1>
> >
> > then their effects are supposed to be executed in parallel, that is,
> > both PHI argument _2 and _1 are supposed to see the "old" version.
> > The previous code tried to make sure the range of the new _1 doesn't
> > get seen when processing the argument _1 in the definition of _2.
> >
> > The new version drops this, possibly resulting in wrong-code.
>
> This is dropped because it is actually handled properly in
> range_defined_in_block now.  (which I think Aldy was describing).
>
> It didnt make sense to me why it was handled here like this, so I traced
> through the call chain to find out if it was still actually needed and
> discussed it with Aldy.  I think it was mostly a leftover wart.

Ah, thanks for checking.

> >
> > While I think it's appropriate to sort out compile-time issues like this
> > during stage4 at least the above makes me think it should be defered
> > to next stage1.
>
> I am happy to defer it since its a marginal increase anyway.

Sure - thus OK for stage1.

Thanks,
Richard.

>
> Andrew
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-02-17  7:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-15 17:05 [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache Andrew MacLeod
2023-02-16  7:55 ` Richard Biener
2023-02-16  9:36   ` Aldy Hernandez
2023-02-16 14:34   ` Andrew MacLeod
2023-02-17  7:54     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).