public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/108697] New: constructing a path-range-query is expensive
@ 2023-02-07 13:31 rguenth at gcc dot gnu.org
  2023-02-07 14:13 ` [Bug tree-optimization/108697] " aldyh at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-02-07 13:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

            Bug ID: 108697
           Summary: constructing a path-range-query is expensive
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Looking at the profile of the compiler.i testcase from PR26854 one can see
ssa_global_cache::set_global_range very high in the profile, specifically
the clearing of m_tab:

bool
ssa_global_cache::set_global_range (tree name, const vrange &r)
{
  unsigned v = SSA_NAME_VERSION (name);
  if (v >= m_tab.length ())
    m_tab.safe_grow_cleared (num_ssa_names + 1);

note this cache is allocated freshly each time a path_range_query is
allocated which makes this process (in case any global range is registered)
O(num-ssa-names) which for large functions can be very expensive.

None of the path_range_query CTORs supports sharing the cache (and I'm not
sure that would be "correct").  For the testcase at hand nearly all
ssa_global_cache objects are allocated from path_range_query and the
backwards threader.  But there are CTOR calls from the ranger_cache CTOR
as well - the backwards threader also has a ranger instance, so I wonder
if one can take the global cache from that (co-incidentially the path-range
query instance gets a reference to a ranger instance ...).

Maybe ssa_global_cache is just a very bad representation for the path
range query.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/108697] constructing a path-range-query is expensive
  2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
@ 2023-02-07 14:13 ` aldyh at gcc dot gnu.org
  2023-02-07 14:31 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: aldyh at gcc dot gnu.org @ 2023-02-07 14:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #1 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
Sharing a cache with say the ranger is a no go, because the path_range_query's
cache is specific to the path, but perhaps we could share the path's cache
between constructions providing a constructor that takes the cache as you
suggest.

The path query has a bitmap to determine if a cache entry is used
(m_has_cache_entry), so I think we could reuse a dirty cache since the bitmap
gets cleared at construction.  I assume the main culprit is the
path_range_query's being instantiated from class back_threader?  If so, there
could be one cache per back_threader?

Hmmm, perhaps we'd have to call ssa_global_cache::clear_global_range() before
each set_global_range() for reused caches otherwise the latter would try to
reuse slots (fits_p and all that).

Also, is it the clearing part of safe_grow_cleared() that is taking up the
time, or just safe_grow.  Cause ISTM we don't need to clear the cache for use
in path_range_query.  So we could even add a flag to use safe_grow() instead of
safe_grow_cleared() for the path query?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/108697] constructing a path-range-query is expensive
  2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
  2023-02-07 14:13 ` [Bug tree-optimization/108697] " aldyh at gcc dot gnu.org
@ 2023-02-07 14:31 ` rguenth at gcc dot gnu.org
  2023-02-07 14:34 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-02-07 14:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 54418
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54418&action=edit
replace vector with hash_map

it's the clearing that shows up on the profile, I've experimented with the
attached which removes it but the -ftime-report didn't change.

Maybe that this function is top in the profile is an artifact of
callgrind (I'm usually using that because you get nice coverage & call traces)

still, an O(num-ssa-names) operation for each possible thread is a source
of "quadraticness"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/108697] constructing a path-range-query is expensive
  2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
  2023-02-07 14:13 ` [Bug tree-optimization/108697] " aldyh at gcc dot gnu.org
  2023-02-07 14:31 ` rguenth at gcc dot gnu.org
@ 2023-02-07 14:34 ` rguenth at gcc dot gnu.org
  2023-02-07 15:03 ` aldyh at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-02-07 14:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
But yes, your observation about m_has_cache_entry is correct - if that has any
value (it makes reset_path "cheap"), then it should be also relied on upon
growth.  Maybe make this bitmap an optional feature of the cache itself
and handle it transparently there?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/108697] constructing a path-range-query is expensive
  2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-02-07 14:34 ` rguenth at gcc dot gnu.org
@ 2023-02-07 15:03 ` aldyh at gcc dot gnu.org
  2023-02-07 19:52 ` aldyh at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: aldyh at gcc dot gnu.org @ 2023-02-07 15:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #4 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
Created attachment 54420
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54420&action=edit
reuse path_range_query SSA cache for all of back_threader class

Off of the top of my head (i.e. probably very broken ;-)), this is what I had
in mind...  Share a cache between subsequent instantiations of
path_range_query.

Note the change to the set_cache:

 path_range_query::set_cache (const vrange &r, tree name)
 {
   unsigned v = SSA_NAME_VERSION (name);
-  bitmap_set_bit (m_has_cache_entry, v);
+  if (bitmap_set_bit (m_has_cache_entry, v))
+    m_cache->clear_global_range (name);
   m_cache->set_global_range (name, r);
 }

Since we're sharing the cache, make sure we nuke whatever it had there before,
otherwise set_global_range will try to reuse the slot if it thinks it fits. 
Although...maybe that's not even necessary.  I'm not sure...you'd have to play
around with it, but I think the general idea would work.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/108697] constructing a path-range-query is expensive
  2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-02-07 15:03 ` aldyh at gcc dot gnu.org
@ 2023-02-07 19:52 ` aldyh at gcc dot gnu.org
  2023-02-10  2:54 ` amacleod at redhat dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: aldyh at gcc dot gnu.org @ 2023-02-07 19:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #5 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> But yes, your observation about m_has_cache_entry is correct - if that has
> any value (it makes reset_path "cheap"), then it should be also relied on
> upon
> growth.  Maybe make this bitmap an optional feature of the cache itself
> and handle it transparently there?

BTW, if putting the bitmap in the ssa_global_cache itself works, that's fine as
well, especially if it makes the ranger's cache faster :).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/108697] constructing a path-range-query is expensive
  2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-02-07 19:52 ` aldyh at gcc dot gnu.org
@ 2023-02-10  2:54 ` amacleod at redhat dot com
  2023-04-26 19:26 ` cvs-commit at gcc dot gnu.org
  2023-10-11 19:51 ` amacleod at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: amacleod at redhat dot com @ 2023-02-10  2:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #6 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Aldy Hernandez from comment #5)
> (In reply to Richard Biener from comment #3)
> > But yes, your observation about m_has_cache_entry is correct - if that has
> > any value (it makes reset_path "cheap"), then it should be also relied on
> > upon
> > growth.  Maybe make this bitmap an optional feature of the cache itself
> > and handle it transparently there?
> 
> BTW, if putting the bitmap in the ssa_global_cache itself works, that's fine
> as well, especially if it makes the ranger's cache faster :).

FYI, I have a prototype to do this... I'm just looking at whether it is better
to integrate it in the class, or derive it from a performance point of view. 
results soon.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/108697] constructing a path-range-query is expensive
  2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-02-10  2:54 ` amacleod at redhat dot com
@ 2023-04-26 19:26 ` cvs-commit at gcc dot gnu.org
  2023-10-11 19:51 ` amacleod at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-26 19:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:0a38f677463ff8a4fb61b049263aa596ef6471a7

commit r14-275-g0a38f677463ff8a4fb61b049263aa596ef6471a7
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Tue Mar 28 11:35:26 2023 -0400

    Create a lazy ssa_cache.

    Sparsely used ssa caches can benefit from using a bitmap to
    determine if a name already has an entry.  Utilize it in the path query
    and remove its private bitmap for tracking the same info.
    Also use it in the "assume" query class.

            PR tree-optimization/108697
            * gimple-range-cache.cc (ssa_global_cache::clear_range): Do
            not clear the vector on an out of range query.
            (ssa_cache::dump): Use dump_range_query instead of get_range.
            (ssa_cache::dump_range_query): New.
            (ssa_lazy_cache::dump_range_query): New.
            (ssa_lazy_cache::set_range): New.
            * gimple-range-cache.h (ssa_cache::dump_range_query): New.
            (class ssa_lazy_cache): New.
            (ssa_lazy_cache::ssa_lazy_cache): New.
            (ssa_lazy_cache::~ssa_lazy_cache): New.
            (ssa_lazy_cache::get_range): New.
            (ssa_lazy_cache::clear_range): New.
            (ssa_lazy_cache::clear): New.
            (ssa_lazy_cache::dump): New.
            * gimple-range-path.cc (path_range_query::path_range_query): Do
            not allocate a ssa_cache object nor has_cache bitmap.
            (path_range_query::~path_range_query): Do not free objects.
            (path_range_query::clear_cache): Remove.
            (path_range_query::get_cache): Adjust.
            (path_range_query::set_cache): Remove.
            (path_range_query::dump): Don't call through a pointer.
            (path_range_query::internal_range_of_expr): Set cache directly.
            (path_range_query::reset_path): Clear cache directly.
            (path_range_query::ssa_range_in_phi): Fold with globals only.
            (path_range_query::compute_ranges_in_phis): Simply set range.
            (path_range_query::compute_ranges_in_block): Call cache directly.
            * gimple-range-path.h (class path_range_query): Replace bitmap
            and cache pointer with lazy cache object.
            * gimple-range.h (class assume_query): Use ssa_lazy_cache.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/108697] constructing a path-range-query is expensive
  2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-04-26 19:26 ` cvs-commit at gcc dot gnu.org
@ 2023-10-11 19:51 ` amacleod at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: amacleod at redhat dot com @ 2023-10-11 19:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

Andrew Macleod <amacleod at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Andrew Macleod <amacleod at redhat dot com> ---
fixed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-11 19:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-07 13:31 [Bug tree-optimization/108697] New: constructing a path-range-query is expensive rguenth at gcc dot gnu.org
2023-02-07 14:13 ` [Bug tree-optimization/108697] " aldyh at gcc dot gnu.org
2023-02-07 14:31 ` rguenth at gcc dot gnu.org
2023-02-07 14:34 ` rguenth at gcc dot gnu.org
2023-02-07 15:03 ` aldyh at gcc dot gnu.org
2023-02-07 19:52 ` aldyh at gcc dot gnu.org
2023-02-10  2:54 ` amacleod at redhat dot com
2023-04-26 19:26 ` cvs-commit at gcc dot gnu.org
2023-10-11 19:51 ` amacleod at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).