* [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
@ 2023-02-15 17:05 Andrew MacLeod
2023-02-16 7:55 ` Richard Biener
0 siblings, 1 reply; 5+ messages in thread
From: Andrew MacLeod @ 2023-02-15 17:05 UTC (permalink / raw)
To: gcc-patches; +Cc: hernandez, aldy
[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]
This patch implements the suggestion that we have an alternative
ssa-cache which does not zero memory, and instead uses a bitmap to track
whether a value is currently set or not. It roughly mimics what
path_range_query was doing internally.
For sparsely used cases, expecially in large programs, this is more
efficient. I changed path_range_query to use this, and removed it old
bitmap (and a hack or two around PHI calculations), and also utilized
this is the assume_query class.
Performance wise, the patch doesn't affect VRP (since that still uses
the original version). Switching to the lazy version caused a slowdown
of 2.5% across VRP.
There was a noticeable improvement elsewhere., across 230 GCC source
files, threading ran over 12% faster!. Overall compilation improved by
0.3% Not sure it makes much difference in compiler.i, but it shouldn't
hurt.
bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?
or do you want to wait for the next release...
Andrew
[-- Attachment #2: 0001-Create-a-lazy-ssa_cache.patch --]
[-- Type: text/x-patch, Size: 12356 bytes --]
From a4736b402d95b184659846ba308ce51f708472d1 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Wed, 8 Feb 2023 17:43:45 -0500
Subject: [PATCH 1/2] Create a lazy ssa_cache
Sparsely used ssa name caches can benefit from using a bitmap to
determine if a name already has an entry. Utilize it in the path query
and remove its private bitmap for tracking the same info.
Also use it in the "assume" query class.
* gimple-range-cache.cc (ssa_global_cache::clear_global_range): Do
not clear the vector on an out of range query.
(ssa_lazy_cache::set_global_range): New.
* gimple-range-cache.h (class ssa_lazy_cache): New.
(ssa_lazy_cache::ssa_lazy_cache): New.
(ssa_lazy_cache::~ssa_lazy_cache): New.
(ssa_lazy_cache::get_global_range): New.
(ssa_lazy_cache::clear_global_range): New.
(ssa_lazy_cache::clear): New.
(ssa_lazy_cache::dump): New.
* gimple-range-path.cc (path_range_query::path_range_query): Do
not allocate a ssa_global_cache object not has_cache bitmap.
(path_range_query::~path_range_query): Do not free objects.
(path_range_query::clear_cache): Remove.
(path_range_query::get_cache): Adjust.
(path_range_query::set_cache): Remove.
(path_range_query::dump): Don't call through a pointer.
(path_range_query::internal_range_of_expr): Set cache directly.
(path_range_query::reset_path): Clear cache directly.
(path_range_query::ssa_range_in_phi): Fold with globals only.
(path_range_query::compute_ranges_in_phis): Simply set range.
(path_range_query::compute_ranges_in_block): Call cache directly.
* gimple-range-path.h (class path_range_query): Replace bitmap
and cache pointer with lazy cache object.
* gimple-range.h (class assume_query): Use ssa_lazy_cache.
---
gcc/gimple-range-cache.cc | 24 ++++++++++++--
gcc/gimple-range-cache.h | 33 +++++++++++++++++++-
gcc/gimple-range-path.cc | 66 +++++++++------------------------------
gcc/gimple-range-path.h | 7 +----
gcc/gimple-range.h | 2 +-
5 files changed, 70 insertions(+), 62 deletions(-)
diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 546262c4794..9bfbdb2c9b3 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -525,14 +525,14 @@ ssa_global_cache::set_global_range (tree name, const vrange &r)
return m != NULL;
}
-// Set the range for NAME to R in the glonbal cache.
+// Set the range for NAME to R in the global cache.
void
ssa_global_cache::clear_global_range (tree name)
{
unsigned v = SSA_NAME_VERSION (name);
if (v >= m_tab.length ())
- m_tab.safe_grow_cleared (num_ssa_names + 1);
+ return;
m_tab[v] = NULL;
}
@@ -579,6 +579,26 @@ ssa_global_cache::dump (FILE *f)
fputc ('\n', f);
}
+
+// Set range of NAME to R in a lazy cache. Return FALSE if it did not already
+// have a range.
+
+bool
+ssa_lazy_cache::set_global_range (tree name, const vrange &r)
+{
+ unsigned v = SSA_NAME_VERSION (name);
+ if (!bitmap_set_bit (active_p, v))
+ {
+ // There is already an entry, simply set it.
+ gcc_checking_assert (v < m_tab.length ());
+ return ssa_global_cache::set_global_range (name, r);
+ }
+ if (v >= m_tab.length ())
+ m_tab.safe_grow (num_ssa_names + 1);
+ m_tab[v] = m_range_allocator->clone (r);
+ return false;
+}
+
// --------------------------------------------------------------------------
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 4ff435dc5c1..f1799b45738 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -62,11 +62,42 @@ public:
void clear_global_range (tree name);
void clear ();
void dump (FILE *f = stderr);
-private:
+protected:
vec<vrange *> m_tab;
vrange_allocator *m_range_allocator;
};
+// This is the same as global cache, except it maintains an active bitmap
+// rather than depending on a zero'd out vector of pointers. This is better
+// for sparsely/lightly used caches.
+// It could be made a fully derived class, but at this point there doesnt seem
+// to be a need to take the performance hit for it.
+
+class ssa_lazy_cache : protected ssa_global_cache
+{
+public:
+ inline ssa_lazy_cache () { active_p = BITMAP_ALLOC (NULL); }
+ inline ~ssa_lazy_cache () { BITMAP_FREE (active_p); }
+ bool set_global_range (tree name, const vrange &r);
+ inline bool get_global_range (vrange &r, tree name) const;
+ inline void clear_global_range (tree name)
+ { bitmap_clear_bit (active_p, SSA_NAME_VERSION (name)); } ;
+ inline void clear () { bitmap_clear (active_p); }
+ inline void dump (FILE *f = stderr) { ssa_global_cache::dump (f); }
+protected:
+ bitmap active_p;
+};
+
+// Return TRUE if NAME has a range, and return it in R.
+
+bool
+ssa_lazy_cache::get_global_range (vrange &r, tree name) const
+{
+ if (!bitmap_bit_p (active_p, SSA_NAME_VERSION (name)))
+ return false;
+ return ssa_global_cache::get_global_range (r, name);
+}
+
// This class provides all the caches a global ranger may need, and makes
// them available for gori-computes to query so outgoing edges can be
// properly calculated.
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 7c45a8815cb..13303a42627 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -40,8 +40,7 @@ path_range_query::path_range_query (gimple_ranger &ranger,
const vec<basic_block> &path,
const bitmap_head *dependencies,
bool resolve)
- : m_cache (new ssa_global_cache),
- m_has_cache_entry (BITMAP_ALLOC (NULL)),
+ : m_cache (),
m_ranger (ranger),
m_resolve (resolve)
{
@@ -51,8 +50,7 @@ path_range_query::path_range_query (gimple_ranger &ranger,
}
path_range_query::path_range_query (gimple_ranger &ranger, bool resolve)
- : m_cache (new ssa_global_cache),
- m_has_cache_entry (BITMAP_ALLOC (NULL)),
+ : m_cache (),
m_ranger (ranger),
m_resolve (resolve)
{
@@ -62,8 +60,6 @@ path_range_query::path_range_query (gimple_ranger &ranger, bool resolve)
path_range_query::~path_range_query ()
{
delete m_oracle;
- BITMAP_FREE (m_has_cache_entry);
- delete m_cache;
}
// Return TRUE if NAME is an exit dependency for the path.
@@ -75,16 +71,6 @@ path_range_query::exit_dependency_p (tree name)
&& bitmap_bit_p (m_exit_dependencies, SSA_NAME_VERSION (name)));
}
-// Mark cache entry for NAME as unused.
-
-void
-path_range_query::clear_cache (tree name)
-{
- unsigned v = SSA_NAME_VERSION (name);
- bitmap_clear_bit (m_has_cache_entry, v);
-}
-
-// If NAME has a cache entry, return it in R, and return TRUE.
inline bool
path_range_query::get_cache (vrange &r, tree name)
@@ -92,21 +78,7 @@ path_range_query::get_cache (vrange &r, tree name)
if (!gimple_range_ssa_p (name))
return get_global_range_query ()->range_of_expr (r, name);
- unsigned v = SSA_NAME_VERSION (name);
- if (bitmap_bit_p (m_has_cache_entry, v))
- return m_cache->get_global_range (r, name);
-
- return false;
-}
-
-// Set the cache entry for NAME to R.
-
-void
-path_range_query::set_cache (const vrange &r, tree name)
-{
- unsigned v = SSA_NAME_VERSION (name);
- bitmap_set_bit (m_has_cache_entry, v);
- m_cache->set_global_range (name, r);
+ return m_cache.get_global_range (r, name);
}
void
@@ -130,7 +102,7 @@ path_range_query::dump (FILE *dump_file)
fprintf (dump_file, "\n");
}
- m_cache->dump (dump_file);
+ m_cache.dump (dump_file);
}
void
@@ -174,7 +146,7 @@ path_range_query::internal_range_of_expr (vrange &r, tree name, gimple *stmt)
if (m_resolve && defined_outside_path (name))
{
range_on_path_entry (r, name);
- set_cache (r, name);
+ m_cache.set_global_range (name, r);
return true;
}
@@ -188,7 +160,7 @@ path_range_query::internal_range_of_expr (vrange &r, tree name, gimple *stmt)
r.intersect (glob);
}
- set_cache (r, name);
+ m_cache.set_global_range (name, r);
return true;
}
@@ -225,7 +197,7 @@ path_range_query::reset_path (const vec<basic_block> &path,
m_path = path.copy ();
m_pos = m_path.length () - 1;
m_undefined_path = false;
- bitmap_clear (m_has_cache_entry);
+ m_cache.clear ();
compute_ranges (dependencies);
}
@@ -255,7 +227,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
if (m_resolve && m_ranger.range_of_expr (r, name, phi))
return;
- // Try to fold the phi exclusively with global or cached values.
+ // Try to fold the phi exclusively with global values.
// This will get things like PHI <5(99), 6(88)>. We do this by
// calling range_of_expr with no context.
unsigned nargs = gimple_phi_num_args (phi);
@@ -264,7 +236,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
for (size_t i = 0; i < nargs; ++i)
{
tree arg = gimple_phi_arg_def (phi, i);
- if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
+ if (m_ranger.range_of_expr (arg_range, arg, /*stmt=*/NULL))
r.union_ (arg_range);
else
{
@@ -348,8 +320,6 @@ path_range_query::range_defined_in_block (vrange &r, tree name, basic_block bb)
void
path_range_query::compute_ranges_in_phis (basic_block bb)
{
- auto_bitmap phi_set;
-
// PHIs must be resolved simultaneously on entry to the block
// because any dependencies must be satistifed with values on entry.
// Thus, we calculate all PHIs first, and then update the cache at
@@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
Value_Range r (TREE_TYPE (name));
if (range_defined_in_block (r, name, bb))
- {
- unsigned v = SSA_NAME_VERSION (name);
- set_cache (r, name);
- bitmap_set_bit (phi_set, v);
- // Pretend we don't have a cache entry for this name until
- // we're done with all PHIs.
- bitmap_clear_bit (m_has_cache_entry, v);
- }
+ m_cache.set_global_range (name, r);
}
- bitmap_ior_into (m_has_cache_entry, phi_set);
}
// Return TRUE if relations may be invalidated after crossing edge E.
@@ -408,7 +370,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
{
tree name = ssa_name (i);
if (ssa_defined_in_bb (name, bb))
- clear_cache (name);
+ m_cache.clear_global_range (name);
}
// Solve dependencies defined in this block, starting with the PHIs...
@@ -421,7 +383,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
&& range_defined_in_block (r, name, bb))
- set_cache (r, name);
+ m_cache.set_global_range (name, r);
}
if (at_exit ())
@@ -457,7 +419,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
if (get_cache (cached_range, name))
r.intersect (cached_range);
- set_cache (r, name);
+ m_cache.set_global_range (name, r);
if (DEBUG_SOLVER)
{
fprintf (dump_file, "outgoing_edge_range_p for ");
@@ -500,7 +462,7 @@ path_range_query::adjust_for_non_null_uses (basic_block bb)
r.set_varying (TREE_TYPE (name));
if (m_ranger.m_cache.m_exit.maybe_adjust_range (r, name, bb))
- set_cache (r, name);
+ m_cache.set_global_range (name, r);
}
}
diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
index e8b06b60e66..34841e78c3d 100644
--- a/gcc/gimple-range-path.h
+++ b/gcc/gimple-range-path.h
@@ -54,9 +54,7 @@ private:
path_oracle *get_path_oracle () { return (path_oracle *)m_oracle; }
// Cache manipulation.
- void set_cache (const vrange &r, tree name);
bool get_cache (vrange &r, tree name);
- void clear_cache (tree name);
// Methods to compute ranges for the given path.
bool range_defined_in_block (vrange &, tree name, basic_block bb);
@@ -83,10 +81,7 @@ private:
void move_next () { --m_pos; }
// Range cache for SSA names.
- ssa_global_cache *m_cache;
-
- // Set for each SSA that has an active entry in the cache.
- bitmap m_has_cache_entry;
+ ssa_lazy_cache m_cache;
// Path being analyzed.
auto_vec<basic_block> m_path;
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 4bf9c482921..eacb32d8ba3 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -95,7 +95,7 @@ protected:
void calculate_phi (gphi *phi, vrange &lhs_range, fur_source &src);
void check_taken_edge (edge e, fur_source &src);
- ssa_global_cache global;
+ ssa_lazy_cache global;
gori_compute m_gori;
};
--
2.39.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
2023-02-15 17:05 [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache Andrew MacLeod
@ 2023-02-16 7:55 ` Richard Biener
2023-02-16 9:36 ` Aldy Hernandez
2023-02-16 14:34 ` Andrew MacLeod
0 siblings, 2 replies; 5+ messages in thread
From: Richard Biener @ 2023-02-16 7:55 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: gcc-patches, hernandez, aldy
On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch implements the suggestion that we have an alternative
> ssa-cache which does not zero memory, and instead uses a bitmap to track
> whether a value is currently set or not. It roughly mimics what
> path_range_query was doing internally.
>
> For sparsely used cases, expecially in large programs, this is more
> efficient. I changed path_range_query to use this, and removed it old
> bitmap (and a hack or two around PHI calculations), and also utilized
> this is the assume_query class.
>
> Performance wise, the patch doesn't affect VRP (since that still uses
> the original version). Switching to the lazy version caused a slowdown
> of 2.5% across VRP.
>
> There was a noticeable improvement elsewhere., across 230 GCC source
> files, threading ran over 12% faster!. Overall compilation improved by
> 0.3% Not sure it makes much difference in compiler.i, but it shouldn't
> hurt.
>
> bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?
> or do you want to wait for the next release...
I see
@@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
Value_Range r (TREE_TYPE (name));
if (range_defined_in_block (r, name, bb))
- {
- unsigned v = SSA_NAME_VERSION (name);
- set_cache (r, name);
- bitmap_set_bit (phi_set, v);
- // Pretend we don't have a cache entry for this name until
- // we're done with all PHIs.
- bitmap_clear_bit (m_has_cache_entry, v);
- }
+ m_cache.set_global_range (name, r);
}
- bitmap_ior_into (m_has_cache_entry, phi_set);
}
// Return TRUE if relations may be invalidated after crossing edge E.
which I think is not correct - if we have
# _1 = PHI <..., _2>
# _2 = PHI <..., _1>
then their effects are supposed to be executed in parallel, that is,
both PHI argument _2 and _1 are supposed to see the "old" version.
The previous code tried to make sure the range of the new _1 doesn't
get seen when processing the argument _1 in the definition of _2.
The new version drops this, possibly resulting in wrong-code.
While I think it's appropriate to sort out compile-time issues like this
during stage4 at least the above makes me think it should be defered
to next stage1.
Richard.
>
> Andrew
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
2023-02-16 7:55 ` Richard Biener
@ 2023-02-16 9:36 ` Aldy Hernandez
2023-02-16 14:34 ` Andrew MacLeod
1 sibling, 0 replies; 5+ messages in thread
From: Aldy Hernandez @ 2023-02-16 9:36 UTC (permalink / raw)
To: Richard Biener, Andrew MacLeod; +Cc: gcc-patches
On 2/16/23 08:55, Richard Biener wrote:
> On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> This patch implements the suggestion that we have an alternative
>> ssa-cache which does not zero memory, and instead uses a bitmap to track
>> whether a value is currently set or not. It roughly mimics what
>> path_range_query was doing internally.
>>
>> For sparsely used cases, expecially in large programs, this is more
>> efficient. I changed path_range_query to use this, and removed it old
>> bitmap (and a hack or two around PHI calculations), and also utilized
>> this is the assume_query class.
>>
>> Performance wise, the patch doesn't affect VRP (since that still uses
>> the original version). Switching to the lazy version caused a slowdown
>> of 2.5% across VRP.
>>
>> There was a noticeable improvement elsewhere., across 230 GCC source
>> files, threading ran over 12% faster!. Overall compilation improved by
>> 0.3% Not sure it makes much difference in compiler.i, but it shouldn't
>> hurt.
>>
>> bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?
>> or do you want to wait for the next release...
>
> I see
>
> @@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
>
> Value_Range r (TREE_TYPE (name));
> if (range_defined_in_block (r, name, bb))
> - {
> - unsigned v = SSA_NAME_VERSION (name);
> - set_cache (r, name);
> - bitmap_set_bit (phi_set, v);
> - // Pretend we don't have a cache entry for this name until
> - // we're done with all PHIs.
> - bitmap_clear_bit (m_has_cache_entry, v);
> - }
> + m_cache.set_global_range (name, r);
> }
> - bitmap_ior_into (m_has_cache_entry, phi_set);
> }
>
> // Return TRUE if relations may be invalidated after crossing edge E.
>
> which I think is not correct - if we have
>
> # _1 = PHI <..., _2>
> # _2 = PHI <..., _1>
>
> then their effects are supposed to be executed in parallel, that is,
> both PHI argument _2 and _1 are supposed to see the "old" version.
> The previous code tried to make sure the range of the new _1 doesn't
> get seen when processing the argument _1 in the definition of _2.
Yes, the effects should appear in parallel, but ssa_range_in_phi() which
is the only thing range_defined_in_block does for PHIs, is guaranteed to
not do any additional cache lookups. The comment there should be
adjusted to make this clear:
// Since PHIs are calculated in parallel at the beginning of the
// block, we must be careful to never save anything to the cache here.
// It is the caller's responsibility to adjust the cache. Also,
// calculating the PHI's range must not trigger additional lookups.
We should instead say:
"we must be careful to never set or access the cache here"...
This was the original intent, but a subtle access to the cache crept in
here:
// Try to fold the phi exclusively with global or cached values.
// This will get things like PHI <5(99), 6(88)>. We do this by
// calling range_of_expr with no context.
unsigned nargs = gimple_phi_num_args (phi);
Value_Range arg_range (TREE_TYPE (name));
r.set_undefined ();
for (size_t i = 0; i < nargs; ++i)
{
tree arg = gimple_phi_arg_def (phi, i);
if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
This range_of_expr call will indeed access the cache incorrectly, but
Andrew fixed that here:
@@ -264,7 +236,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi
*phi)
for (size_t i = 0; i < nargs; ++i)
{
tree arg = gimple_phi_arg_def (phi, i);
- if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
+ if (m_ranger.range_of_expr (arg_range, arg, /*stmt=*/NULL))
r.union_ (arg_range);
else
{
...thus ensuring that function never uses the cache. All the lookups
are done with the global ranger at either the path entry or globally as
above (with stmt=NULL).
I believe the switch from range_of_expr to m_ranger.range_of_expr is
safe, as the original code was added to handle silly things like PHI
<5(99), 6(88)> which shouldn't need path aware ranges.
As you've found out, the update to the cache in this case was not
obvious at all. Perhaps it should also be commented:
"It is safe to set the cache here, as range_defined_in_block for PHIs
(ssa_range_in_phi) is guaranteed not to do any cache lookups."
>
> The new version drops this, possibly resulting in wrong-code.
>
> While I think it's appropriate to sort out compile-time issues like this
> during stage4 at least the above makes me think it should be defered
> to next stage1.
I defer to the release managers as to whether this is safe in light of
my explanation above :).
Aldy
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
2023-02-16 7:55 ` Richard Biener
2023-02-16 9:36 ` Aldy Hernandez
@ 2023-02-16 14:34 ` Andrew MacLeod
2023-02-17 7:54 ` Richard Biener
1 sibling, 1 reply; 5+ messages in thread
From: Andrew MacLeod @ 2023-02-16 14:34 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches, hernandez, aldy
On 2/16/23 02:55, Richard Biener wrote:
> On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> This patch implements the suggestion that we have an alternative
>> ssa-cache which does not zero memory, and instead uses a bitmap to track
>> whether a value is currently set or not. It roughly mimics what
>> path_range_query was doing internally.
>>
>> For sparsely used cases, expecially in large programs, this is more
>> efficient. I changed path_range_query to use this, and removed it old
>> bitmap (and a hack or two around PHI calculations), and also utilized
>> this is the assume_query class.
>>
>> Performance wise, the patch doesn't affect VRP (since that still uses
>> the original version). Switching to the lazy version caused a slowdown
>> of 2.5% across VRP.
>>
>> There was a noticeable improvement elsewhere., across 230 GCC source
>> files, threading ran over 12% faster!. Overall compilation improved by
>> 0.3% Not sure it makes much difference in compiler.i, but it shouldn't
>> hurt.
>>
>> bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?
>> or do you want to wait for the next release...
> I see
>
> @@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
>
> Value_Range r (TREE_TYPE (name));
> if (range_defined_in_block (r, name, bb))
> - {
> - unsigned v = SSA_NAME_VERSION (name);
> - set_cache (r, name);
> - bitmap_set_bit (phi_set, v);
> - // Pretend we don't have a cache entry for this name until
> - // we're done with all PHIs.
> - bitmap_clear_bit (m_has_cache_entry, v);
> - }
> + m_cache.set_global_range (name, r);
> }
> - bitmap_ior_into (m_has_cache_entry, phi_set);
> }
>
> // Return TRUE if relations may be invalidated after crossing edge E.
>
> which I think is not correct - if we have
>
> # _1 = PHI <..., _2>
> # _2 = PHI <..., _1>
>
> then their effects are supposed to be executed in parallel, that is,
> both PHI argument _2 and _1 are supposed to see the "old" version.
> The previous code tried to make sure the range of the new _1 doesn't
> get seen when processing the argument _1 in the definition of _2.
>
> The new version drops this, possibly resulting in wrong-code.
This is dropped because it is actually handled properly in
range_defined_in_block now. (which I think Aldy was describing).
It didnt make sense to me why it was handled here like this, so I traced
through the call chain to find out if it was still actually needed and
discussed it with Aldy. I think it was mostly a leftover wart.
>
> While I think it's appropriate to sort out compile-time issues like this
> during stage4 at least the above makes me think it should be defered
> to next stage1.
I am happy to defer it since its a marginal increase anyway.
Andrew
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
2023-02-16 14:34 ` Andrew MacLeod
@ 2023-02-17 7:54 ` Richard Biener
0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2023-02-17 7:54 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: gcc-patches, hernandez, aldy
On Thu, Feb 16, 2023 at 3:34 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
>
> On 2/16/23 02:55, Richard Biener wrote:
> > On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >> This patch implements the suggestion that we have an alternative
> >> ssa-cache which does not zero memory, and instead uses a bitmap to track
> >> whether a value is currently set or not. It roughly mimics what
> >> path_range_query was doing internally.
> >>
> >> For sparsely used cases, expecially in large programs, this is more
> >> efficient. I changed path_range_query to use this, and removed it old
> >> bitmap (and a hack or two around PHI calculations), and also utilized
> >> this is the assume_query class.
> >>
> >> Performance wise, the patch doesn't affect VRP (since that still uses
> >> the original version). Switching to the lazy version caused a slowdown
> >> of 2.5% across VRP.
> >>
> >> There was a noticeable improvement elsewhere., across 230 GCC source
> >> files, threading ran over 12% faster!. Overall compilation improved by
> >> 0.3% Not sure it makes much difference in compiler.i, but it shouldn't
> >> hurt.
> >>
> >> bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?
> >> or do you want to wait for the next release...
> > I see
> >
> > @@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
> >
> > Value_Range r (TREE_TYPE (name));
> > if (range_defined_in_block (r, name, bb))
> > - {
> > - unsigned v = SSA_NAME_VERSION (name);
> > - set_cache (r, name);
> > - bitmap_set_bit (phi_set, v);
> > - // Pretend we don't have a cache entry for this name until
> > - // we're done with all PHIs.
> > - bitmap_clear_bit (m_has_cache_entry, v);
> > - }
> > + m_cache.set_global_range (name, r);
> > }
> > - bitmap_ior_into (m_has_cache_entry, phi_set);
> > }
> >
> > // Return TRUE if relations may be invalidated after crossing edge E.
> >
> > which I think is not correct - if we have
> >
> > # _1 = PHI <..., _2>
> > # _2 = PHI <..., _1>
> >
> > then their effects are supposed to be executed in parallel, that is,
> > both PHI argument _2 and _1 are supposed to see the "old" version.
> > The previous code tried to make sure the range of the new _1 doesn't
> > get seen when processing the argument _1 in the definition of _2.
> >
> > The new version drops this, possibly resulting in wrong-code.
>
> This is dropped because it is actually handled properly in
> range_defined_in_block now. (which I think Aldy was describing).
>
> It didnt make sense to me why it was handled here like this, so I traced
> through the call chain to find out if it was still actually needed and
> discussed it with Aldy. I think it was mostly a leftover wart.
Ah, thanks for checking.
> >
> > While I think it's appropriate to sort out compile-time issues like this
> > during stage4 at least the above makes me think it should be defered
> > to next stage1.
>
> I am happy to defer it since its a marginal increase anyway.
Sure - thus OK for stage1.
Thanks,
Richard.
>
> Andrew
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-02-17 7:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-15 17:05 [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache Andrew MacLeod
2023-02-16 7:55 ` Richard Biener
2023-02-16 9:36 ` Aldy Hernandez
2023-02-16 14:34 ` Andrew MacLeod
2023-02-17 7:54 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).