[Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query
@ 2022-08-03 13:53 rguenth at gcc dot gnu.org
  2022-08-03 13:53 ` [Bug tree-optimization/106514] " rguenth at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-03 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

            Bug ID: 106514
           Summary: [12/13 Regression] ranger slowness in path query
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When you bump --param max-jump-thread-duplication-stmts only so slightly, like
by a factor of two to 30, making the effective backwards threading limit 15,
the gcc.dg/pr69592.c -O2 compile-time explodes and you'll see

 backwards jump threading           :   9.17 ( 92%)   0.01 ( 20%)   9.18 ( 92%)
   24  (  0%)

note the testcase is somewhat degenerate with a series of diamonds also
"nicely" showing the backwards threader exponential behavior in exploring
the threading path space (plus ontop the quadraticness with starting on
every condition).  The current effective limit of 7 copied stmts limits the
effective thread length to a single diamond, avoiding the issue.

perf shows you

Samples: 143K of event 'cycles:u', Event count (approx.): 127516678963          
Overhead       Samples  Command  Shared Object     Symbol                       
  24.36%         34962  cc1      cc1               [.] bitmap_bit_p
  18.78%         26962  cc1      cc1               [.] bitmap_list_find_element
  14.98%         21505  cc1      cc1               [.] path_oracle::killing_def
   1.94%          2791  cc1      cc1               [.]
path_range_query::compute_ranges_in_block

so it's not the exponential search space per-se but the high overhead of
ranger, specifically the relation oracle which seems to be unbound.

path_range_query::range_defined_in_block calling path_oracle::killing_def
87 000 times gets you 600 000 000 bitmap_bit_p queries (resulting in
10 billion(!) bitmap list walk steps).

Part of the sub-optimality is probably the equiv chain becoming very long
(can we simply limit that?) and clearing bits in all the very many
bitmaps linked.  Not to say that linked lists (for relations and equivalences)
are hardly a good data structure for anything but inserts/removals :/

The bitmap (on the list) + linked list combos should probably be all replaced
with splay trees.  There's (unused) splay-tree-utils.h that seem to be
splay tree "adaptors" ontop of something with links, but libiberty splay-trees
of course work as well.  I think it's worth optimizing for small number of
elements, thus favor a balanced tree over hashing.

Note for the testcase at hand it's walking of the m_equiv list, not the
m_relations one so it might be a bit more difficult to fix this than
if the issue were the m_relations chain.

I'm also seeing missed micro-optimization like

  // Walk the equivalency list and remove SSA from any equivalencies.
  if (bitmap_bit_p (m_equiv.m_names, v))
...
  else
    bitmap_set_bit (m_equiv.m_names, v);

which can be written as

  if (!bitmap_set_bit (m_equiv.m_names, v))
....

likewise for bitmap_clear_bit.  Both return whether the bit changed
with the operation.  Of course that's just constant factor, the issue
here is complexity involving the linear list walks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
@ 2022-08-03 13:53 ` rguenth at gcc dot gnu.org
  2022-08-03 16:11 ` amacleod at redhat dot com
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-03 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.2
                 CC|                            |amacleod at redhat dot com
           Keywords|                            |compile-time-hog

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
  2022-08-03 13:53 ` [Bug tree-optimization/106514] " rguenth at gcc dot gnu.org
@ 2022-08-03 16:11 ` amacleod at redhat dot com
  2022-08-03 19:37 ` cvs-commit at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: amacleod at redhat dot com @ 2022-08-03 16:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #1 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Richard Biener from comment #0)

> 
> Part of the sub-optimality is probably the equiv chain becoming very long
> (can we simply limit that?) and clearing bits in all the very many
> bitmaps linked.  N

we certainly could.  Especially in the path version which has that killing-def
issue which doesn't exist in the normal oracle.   The path oracle basically
takes a normal oracle, then bolts the path following code onto it, and has to
deal with new defintions invalidating any existing equivalences.  I'd first
look for inefficiencies elsewhere as we didnt spend a lot of time tweaking it
once it was working.

Given the way equivalences have to be matched, Im not sure that we even need to
walk the list.  The new equivalence set for the killed def will only contain
itself, or any new equivalences encountered since the kill.   In order to be
equivalent, 2 names must be in each others set, which they won't be.  I'm not
convinced we need to remove them at all.

Im also not sure why the path oracle changes the root oracle requirement that
they be the same equivalence set, not just in each others. I think it has
something to do with the  transitory nature of the path equivalence/relations
vs the root oracles "permanent" sets.  I think we can do better here too.

And finally, Aldy has a list of all the ssa-names in the path that are relevant
to the calculations in the path.  I suspect we can reduce any equivalence sets
immediately to just those names, as any on-entry ranges should reflect existing
equivalences.  in theory :-)

We'll see if any or all of those have any effect and get back to you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
  2022-08-03 13:53 ` [Bug tree-optimization/106514] " rguenth at gcc dot gnu.org
  2022-08-03 16:11 ` amacleod at redhat dot com
@ 2022-08-03 19:37 ` cvs-commit at gcc dot gnu.org
  2022-08-04 18:29 ` cvs-commit at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-03 19:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:19ffb35d17474bb4dd3eb78963c28d10b5362321

commit r13-1952-g19ffb35d17474bb4dd3eb78963c28d10b5362321
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Wed Aug 3 13:55:42 2022 -0400

    Do not walk equivalence set in path_oracle::killing_def.

    When killing a def in the path ranger, there is no need to walk the set
    of existing equivalences clearing bits.  An equivalence match requires
    that both ssa-names have to be in each others set.  As killing_def
    creates a new empty set contianing only the current def,  it already
    ensures false equivaelnces won't happen.

            PR tree-optimization/106514
            * value-relation.cc (path_oracle::killing_def) Do not walk the
              equivalence set clearing bits.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-08-03 19:37 ` cvs-commit at gcc dot gnu.org
@ 2022-08-04 18:29 ` cvs-commit at gcc dot gnu.org
  2022-08-05 12:06 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-04 18:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:8e34d92ef29a175b84cc7f5185db43656ae762bb

commit r13-1965-g8e34d92ef29a175b84cc7f5185db43656ae762bb
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Thu Aug 4 12:22:59 2022 -0400

    Loop over intersected bitmaps.

    compute_ranges_in_block loops over the import list and then checks the
    same bit in exports.  It is nmore efficent to loop over the intersection
    of the 2 bitmaps.

            PR tree-optimization/106514
            * gimple-range-path.cc (path_range_query::compute_ranges_in_block):
            Use EXECUTE_IF_AND_IN_BITMAP to loop over 2 bitmaps.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-08-04 18:29 ` cvs-commit at gcc dot gnu.org
@ 2022-08-05 12:06 ` cvs-commit at gcc dot gnu.org
  2022-08-09  8:24 ` cvs-commit at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-05 12:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>:

https://gcc.gnu.org/g:47964e766270f349f5b171bcd68ff7c1e60d85d8

commit r13-1973-g47964e766270f349f5b171bcd68ff7c1e60d85d8
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Fri Aug 5 08:04:10 2022 +0200

    Inline unsupported_range constructor.

    An unsupported_range temporary is instantiated in every Value_Range
    for completeness sake and should be mostly a NOP.  However, it's
    showing up in the callgrind stats, because it's not inline.  This
    fixes the oversight.

            PR tree-optimization/106514

    gcc/ChangeLog:

            * value-range.cc (unsupported_range::unsupported_range): Move...
            * value-range.h (unsupported_range::unsupported_range): ...here.
            (unsupported_range::set_undefined): New.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-08-05 12:06 ` cvs-commit at gcc dot gnu.org
@ 2022-08-09  8:24 ` cvs-commit at gcc dot gnu.org
  2022-08-09  9:09 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-09  8:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:409978d58dafa689c5b3f85013e2786526160f2c

commit r13-1998-g409978d58dafa689c5b3f85013e2786526160f2c
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Aug 8 12:20:04 2022 +0200

    tree-optimization/106514 - add --param max-jump-thread-paths

    The following adds a limit for the exponential greedy search of
    the backwards jump threader.  The idea is to limit the search
    space in a way that the paths considered are the same if the search
    were in BFS order rather than DFS.  In particular it stops considering
    incoming edges into a block if the product of the in-degrees of
    blocks on the path exceeds the specified limit.

    When considering the low stmt copying limit of 7 (or 1 in the size
    optimize case) this means the degenerate case with maximum search
    space is a sequence of conditions with no actual code

      B1
       |\
       | empty
       |/
      B2
       |\
       ...
      Bn
       |\

    GIMPLE_CONDs are costed 2, an equivalent GIMPLE_SWITCH already 4, so
    we reach 7 already with 3 middle conditions (B1 and Bn do not count).
    The search space would be 2^4 == 16 to reach this.  The FSM threads
    historically allowed for a thread length of 10 but is really looking
    for a single multiway branch threaded across the backedge.  I've
    chosen the default of the new parameter to 64 which effectively
    limits the outdegree of the switch statement (the cases reaching the
    backedge) to that number (divided by 2 until I add some special
    pruning for FSM threads due to the loop header indegree).  The
    testcase ssa-dom-thread-7.c requires 56 at the moment (as said,
    some special FSM thread pruning of considered edges would bring
    it down to half of that), but we now get one more threading
    and quite some more in later threadfull.  This testcase seems to
    be difficult to check for expected transforms.

    The new testcases add the degenerate case we currently thread
    (without deciding whether that's a good idea ...) plus one with
    an approripate limit that should prevent the threading.

    This obsoletes the mentioned --param max-fsm-thread-length but
    I am not removing it as part of this patch.  When the search
    space is limited the thread stmt size limit effectively provides
    max-fsm-thread-length.

    The param with its default does not help PR106514 enough to unleash
    path searching with the higher FSM stmt count limit.

            PR tree-optimization/106514
            * params.opt (max-jump-thread-paths): New.
            * doc/invoke.texi (max-jump-thread-paths): Document.
            * tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
            Honor max-jump-thread-paths, take overall_path argument.
            (back_threader::find_paths): Pass 1 as initial overall_path.

            * gcc.dg/tree-ssa/ssa-thread-16.c: New testcase.
            * gcc.dg/tree-ssa/ssa-thread-17.c: Likewise.
            * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-08-09  8:24 ` cvs-commit at gcc dot gnu.org
@ 2022-08-09  9:09 ` rguenth at gcc dot gnu.org
  2022-08-09  9:38 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-09  9:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So one now needs to bump the limit to 60 to get enough samples for perf.  Then
we now see

Samples: 55K of event 'cycles:u', Event count (approx.): 49013411833            
Overhead       Samples  Command  Shared Object     Symbol                       
  51.19%         28195  cc1      cc1               [.]
path_range_query::compute_ranges_in_block
  11.67%          6427  cc1      cc1               [.]
path_range_query::adjust_for_non_null_uses
   9.20%          5069  cc1      cc1               [.]
path_range_query::range_defined_in_block
   3.39%          1869  cc1      cc1               [.] bitmap_set_bit
   1.95%          1072  cc1      cc1               [.]
back_threader::find_paths_to_names
   1.93%          1066  cc1      cc1               [.] bitmap_bit_p

the compute_ranges_in_block is also top with 30 but adjust_for_non_null_uses
pops up newly with 60.

The compute_ranges_in_block slowness is attributed to

  // ...and then the rest of the imports.
  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
    {
      tree name = ssa_name (i);
      Value_Range r (TREE_TYPE (name));

      if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
          && range_defined_in_block (r, name, bb))
^^^^

plus

  gori_compute &g = m_ranger->gori ();
  bitmap exports = g.exports (bb);
  EXECUTE_IF_AND_IN_BITMAP (m_imports, exports, 0, i, bi)
    {
      tree name = ssa_name (i);
      Value_Range r (TREE_TYPE (name));
      if (g.outgoing_edge_range_p (r, e, name, *this))
^^^^

for this testcase there seem to be a lot of imports but not many exports
so range_defined_in_block is called very many times compared to
outgoing_edge_range_p but the latter is comparatively more expensive.

For the path query I wonder why we are interested in computing (aka
updating the cache) for any but the exports?  When we
compute the exports, why is the cache not lazily computed just for
the interesting names?  AFAICS we invalidate all local defs (but even
then, why?  we get to see a def exactly once, why do we have to even
think about clearing sth we should not have seen?)

That is, in path_range_query::compute_ranges

  while (1)
    {
      basic_block bb = curr_bb ();

      compute_ranges_in_block (bb);
      adjust_for_non_null_uses (bb);

      if (at_exit ())
        break;

      move_next ();
    }

I'd expect only a small portion of the actual compute_ranges_in_block
work to be done for all blocks and the real resolving work only for
the block ending the path?  Maybe the backwards threader is just using
the wrong (expensive) API here?  It does

 m_solver->compute_ranges (path, m_imports);
 m_solver->range_of_stmt (r, cond);

--

Btw, I wondered if path-range-query can handle parts of the path being a
"black box", aka, skip to the immediate dominator instead of one of the
predecessor edges?  I _think_ analysis wise this would be quite straight
forward but of course we'd have to represent this somehow in the path.
Maybe it works by simply leaving out the intermediate blocks?  Thus,

   B
   |\
   A
  / \
 C   D
  \ /
   E
    \

the path would be from B to E but we don't care whether we go the C or D
way, and when duplicating the path we'd simply duplicate the whole diamond
instead of duplicating only one branch, say A->D, and keeping the edge
A->C to the original block C, defeating the threading of E to its successor
if we happen to go that way.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-08-09  9:09 ` rguenth at gcc dot gnu.org
@ 2022-08-09  9:38 ` rguenth at gcc dot gnu.org
  2022-08-10 13:51 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-09  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
For the testcase m_imports is so big because we have

...
  <bb 1198> [local count: 1073741824]:
  # c_1198 = PHI <c_1197(1196), c_2400(1197)>
  _599 = MEM[(unsigned int *)b_1201(D) + 2792B];
  d_2401 = _599 + d_2399;
  if (d_2399 > d_2401)
    goto <bb 1199>; [50.00%]
  else
    goto <bb 1200>; [50.00%]

  <bb 1199> [local count: 536870913]:
  c_2402 = c_1198 + 1;

  <bb 1200> [local count: 1073741824]:
  # c_1199 = PHI <c_1198(1198), c_2402(1199)>
  _600 = MEM[(unsigned int *)b_1201(D) + 2796B];
  d_2403 = _600 + d_2401;
  if (d_2401 > d_2403)
    goto <bb 1201>; [50.00%]
  else
    goto <bb 1202>; [50.00%]

so when back_threader::find_paths does ->compute_imports (.., bb 1200) we
walk up the whole d_2403 definition chain unbound (for PHIs we restrict
to edges on the path which is empty).  I realize that there's no good way
to pick up extra imports on the fly cheaply - we could handle it when
we prune local defs from the imports at which point we could add operands
but it's not clear to me that will be a good trade-off.  In fact
pruning imports looks suspicious as the final path-range query will
be limited there?  Likewise for any import we add via PHI-translation
we fail to add local def operands - we're only getting those from the
initial import compute which basically picks those from blocks dominating
the exit but no others.

I will experiment with re-wiring this.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2022-08-09  9:38 ` rguenth at gcc dot gnu.org
@ 2022-08-10 13:51 ` rguenth at gcc dot gnu.org
  2022-08-11 13:01 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-10 13:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-08-10
     Ever confirmed|0                           |1
           Priority|P3                          |P2
             Status|UNCONFIRMED                 |NEW

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2022-08-10 13:51 ` rguenth at gcc dot gnu.org
@ 2022-08-11 13:01 ` cvs-commit at gcc dot gnu.org
  2022-12-13 13:56 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-11 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:16b013c9d9b4d950f89821476e791bf18c1295df

commit r13-2020-g16b013c9d9b4d950f89821476e791bf18c1295df
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Aug 9 13:37:26 2022 +0200

    tree-optimization/106514 - revisit m_import compute in backward threading

    This revisits how we compute imports later used for the ranger path
    query during backwards threading.  The compute_imports function
    of the path solver ends up pulling the SSA def chain of regular
    stmts without limit and since it starts with just the gori imports
    of the path exit it misses some interesting names to translate
    during path discovery.  In fact with a still empty path this
    compute_imports function looks like not the correct tool.

    The following instead implements what it does during the path discovery
    and since we add the exit block we seed the initial imports and
    interesting names from just the exit conditional.  When we then
    process interesting names (aka imports we did not yet see the definition
    of) we prune local defs but add their uses in a similar way as
    compute_imports would have done.

    compute_imports also is lacking in its walking of the def chain
    compared to range_def_chain::get_def_chain which for example
    handles &_1->x specially through range_op_handler and
    gimple_range_operand1, so the code copies this.  A fix for
    compute_imports will be done separately, also fixing the unbound
    walk there.

    The patch also properly unwinds m_imports during the path discovery
    backtracking and from a debugging session I have verified the two
    sets evolve as expected now while previously behaving slightly erratic.

    Fortunately the m_imports set now also is shrunken significantly for
    the PR69592 testcase (aka PR106514) so that there's overall speedup
    when increasing --param max-jump-thread-duplication-stmts as
    15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
    1s -> 2s -> 4s -> 8s.

    This runs into a latent issue in X which doesn't seem to expect
    any PHI nodes with a constant argument on an edge inside the path.
    But we now have those as interesting, for example for the ICEing
    g++.dg/torture/pr100925.C which just has sth like

      if (i)
        x = 1;
      else
        x = 5;
      if (x == 1)
        ...

    where we now have the path from if (i) to if (x) and the PHI for x
    in the set of imports to consider for resolving x == 1 which IMHO
    looks exactly like what we want.  The path_range_query::ssa_range_in_phi
    papers over the issue and drops the range to varying instead of
    crashing.  I didn't want to mess with this any further in this patch
    (but I couldn't resist replacing the loop over PHI args with
    PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).

            PR tree-optimization/106514
            * tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
            Compute and unwind both m_imports and interesting on the fly during
            path discovery.
            (back_threader::find_paths): Compute the original m_imports
            from just the SSA uses of the exit conditional.  Drop
            handling single_succ_to_potentially_threadable_block.
            * gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
            constant PHI arguments without crashing.  Use
PHI_ARG_DEF_FROM_EDGE.

            * gcc.dg/tree-ssa/ssa-thread-19.c: Un-XFAIL.
            * gcc.dg/tree-ssa/ssa-thread-20.c: New testcase.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2022-08-11 13:01 ` cvs-commit at gcc dot gnu.org
@ 2022-12-13 13:56 ` rguenth at gcc dot gnu.org
  2022-12-13 14:02 ` rguenth at gcc dot gnu.org
  2023-05-08 12:25 ` [Bug tree-optimization/106514] [12/13/14 " rguenth at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-12-13 13:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Re-confirmed.  =15 vs =30 goes from

 backwards jump threading           :   0.58 ( 13%)

to

 backwards jump threading           :   7.00 ( 65%)

so it still shows exponential behavior for CFGs like

 if (a)
   ...
 if (b)
   ...
 if (c)
   ...

because we explore all paths through the CFG that fit in some size limit.
I'm not sure if there's any low hanging fruit besides this (like if we
properly avoid re-doing local computes when visiting blocks as part of a
path multiple times).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2022-12-13 13:56 ` rguenth at gcc dot gnu.org
@ 2022-12-13 14:02 ` rguenth at gcc dot gnu.org
  2023-05-08 12:25 ` [Bug tree-optimization/106514] [12/13/14 " rguenth at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-12-13 14:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #10)
> Re-confirmed.  =15 vs =30 goes from
> 
>  backwards jump threading           :   0.58 ( 13%)
> 
> to
> 
>  backwards jump threading           :   7.00 ( 65%)
> 
> so it still shows exponential behavior for CFGs like
> 
>  if (a)
>    ...
>  if (b)
>    ...
>  if (c)
>    ...
> 
> because we explore all paths through the CFG that fit in some size limit.
> I'm not sure if there's any low hanging fruit besides this (like if we
> properly avoid re-doing local computes when visiting blocks as part of a
> path multiple times).

We do have --param max-jump-thread-paths putting a hard limit on the number
of branches in a paths we explore, the default might just be a bit high
for this testcase.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/106514] [12/13/14 Regression] ranger slowness in path query
  2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2022-12-13 14:02 ` rguenth at gcc dot gnu.org
@ 2023-05-08 12:25 ` rguenth at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.3                        |12.4

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-05-08 12:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 13:53 [Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query rguenth at gcc dot gnu.org
2022-08-03 13:53 ` [Bug tree-optimization/106514] " rguenth at gcc dot gnu.org
2022-08-03 16:11 ` amacleod at redhat dot com
2022-08-03 19:37 ` cvs-commit at gcc dot gnu.org
2022-08-04 18:29 ` cvs-commit at gcc dot gnu.org
2022-08-05 12:06 ` cvs-commit at gcc dot gnu.org
2022-08-09  8:24 ` cvs-commit at gcc dot gnu.org
2022-08-09  9:09 ` rguenth at gcc dot gnu.org
2022-08-09  9:38 ` rguenth at gcc dot gnu.org
2022-08-10 13:51 ` rguenth at gcc dot gnu.org
2022-08-11 13:01 ` cvs-commit at gcc dot gnu.org
2022-12-13 13:56 ` rguenth at gcc dot gnu.org
2022-12-13 14:02 ` rguenth at gcc dot gnu.org
2023-05-08 12:25 ` [Bug tree-optimization/106514] [12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).