public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
@ 2022-11-09  8:17 ` rguenth at gcc dot gnu.org
  2022-11-09  8:39 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-09  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aldyh at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #2)
> The backwards threader discovers the threads, but refuses to optimize them
> because it thinks doing so will create an irreducible loop without enough
> benefit.

The reason is now:

Checking profitability of path (backwards):  bb:3 (2 insns) bb:7 (4 insns) bb:6
(1 insns) (latch) bb:5
  Control statement insns: 2
  Overall: 5 insns

 Registering value_relation (path_oracle) (j_17 < m_23(D)) (root: bb5)
path: 5->6->7->3->xx REJECTED (unreachable)

it seems the path oracle is confused here.


  <bb 3> [local count: 955630225]:
  if (running_14 != 0)
    goto <bb 4>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 4> [local count: 477815112]:
  _1 = (long unsigned int) i_16;
  _2 = _1 * 4;
  _3 = p_25(D) + _2;
  _4 = *_3;
  _5 = (long unsigned int) j_17;
  _6 = _5 * 4;
  _7 = q_26(D) + _6;
  _8 = *_7;
  _9 = _4 * _8;
  sum_27 = _9 + sum_11;
  if (sum_27 > 19999)
    goto <bb 5>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 5> [local count: 238907556]:

  <bb 6> [local count: 955630225]:
  # sum_10 = PHI <sum_11(3), 20000(5), sum_27(4)>
  # running_13 = PHI <0(3), 0(5), 1(4)>
  j_28 = j_17 + 1;

  <bb 7> [local count: 1073741824]:
  # sum_11 = PHI <sum_12(11), sum_10(6)>
  # running_14 = PHI <running_15(11), running_13(6)>
  # j_17 = PHI <0(11), j_28(6)>
  if (j_17 < m_23(D))
    goto <bb 3>; [89.00%]
  else
    goto <bb 8>; [11.00%]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
  2022-11-09  8:17 ` [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops rguenth at gcc dot gnu.org
@ 2022-11-09  8:39 ` rguenth at gcc dot gnu.org
  2022-11-09  9:01 ` aldyh at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-09  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's

edge
back_threader::maybe_register_path (back_threader_profitability &profit)
{
  edge taken_edge = find_taken_edge (m_path);

  if (taken_edge && taken_edge != UNREACHABLE_EDGE)
    {
      if (m_visited_bbs.contains (taken_edge->dest))
        {
          // Avoid circular paths by indicating there is nothing to
          // see in this direction.
          taken_edge = UNREACHABLE_EDGE;

not sure why though?  If we remove the above we get

Checking profitability of path (backwards):  bb:3 (2 insns) bb:7 (4 insns) bb:6
(1 insns) (latch) bb:5
  Control statement insns: 2
  Overall: 5 insns

 Registering value_relation (path_oracle) (j_17 < m_23(D)) (root: bb5)
Checking profitability of path (backwards):
Path crosses loop header but does not exit it:   Cancelling jump thread: (5, 6)
incoming edge;  (6, 7) normal (back) (7, 3) normal (3, 6) nocopy;

path: 5->6->7->3->xx REJECTED

so we refuse the thread because .. well, not exactly sure why.
r12-4526-gd8edfadfc7a979 added

+  if (crossed_loop_header)
+    {
+      cancel_thread (&path, "Path crosses loop header but does not exit it");
+      return true;
+    }

From the look we want to avoid creating sub-loops here and that's indeed
what would happen here if we elide this.  It also makes the loop have two
exits instead of one and one jump thread is still not done but we eventually
get to do the desired simplification.

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 2a8cfa3ee01..31c3eef9bb3 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -249,7 +249,7 @@ back_threader::maybe_register_path
(back_threader_profitabil
ity &profit)

   if (taken_edge && taken_edge != UNREACHABLE_EDGE)
     {
-      if (m_visited_bbs.contains (taken_edge->dest))
+      if (0 && m_visited_bbs.contains (taken_edge->dest))
        {
          // Avoid circular paths by indicating there is nothing to
          // see in this direction.
diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
index 59c268a3567..14df3ee42a7 100644
--- a/gcc/tree-ssa-threadupdate.cc
+++ b/gcc/tree-ssa-threadupdate.cc
@@ -2765,7 +2765,6 @@ jt_path_registry::cancel_invalid_paths
(vec<jump_thread_ed
ge *> &path)
   bool seen_latch = false;
   int loops_crossed = 0;
   bool crossed_latch = false;
-  bool crossed_loop_header = false;
   // Use ->dest here instead of ->src to ignore the first block.  The
   // first block is allowed to be in a different loop, since it'll be
   // redirected.  See similar comment in profitable_path_p: "we don't
@@ -2799,14 +2798,6 @@ jt_path_registry::cancel_invalid_paths
(vec<jump_thread_edge *> &path)
          ++loops_crossed;
        }

-      // ?? Avoid threading through loop headers that remain in the
-      // loop, as such threadings tend to create sub-loops which
-      // _might_ be OK ??.
-      if (e->dest->loop_father->header == e->dest
-         && !flow_loop_nested_p (exit->dest->loop_father,
-                                 e->dest->loop_father))
-       crossed_loop_header = true;
-
       if (flag_checking && !m_backedge_threads)
        gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
     }
@@ -2842,11 +2833,6 @@ jt_path_registry::cancel_invalid_paths
(vec<jump_thread_edge *> &path)
       cancel_thread (&path, "Path rotates loop");
       return true;
     }
-  if (crossed_loop_header)
-    {
-      cancel_thread (&path, "Path crosses loop header but does not exit it");
-      return true;
-    }
   return false;
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
  2022-11-09  8:17 ` [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops rguenth at gcc dot gnu.org
  2022-11-09  8:39 ` rguenth at gcc dot gnu.org
@ 2022-11-09  9:01 ` aldyh at gcc dot gnu.org
  2022-11-09 12:54 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: aldyh at gcc dot gnu.org @ 2022-11-09  9:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #5 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> It's
> 
> edge
> back_threader::maybe_register_path (back_threader_profitability &profit)
> {
>   edge taken_edge = find_taken_edge (m_path);
> 
>   if (taken_edge && taken_edge != UNREACHABLE_EDGE)
>     {
>       if (m_visited_bbs.contains (taken_edge->dest))
>         {
>           // Avoid circular paths by indicating there is nothing to
>           // see in this direction.
>           taken_edge = UNREACHABLE_EDGE;
> 
> not sure why though?  If we remove the above we get

There was a test we were failing because we were threading ircular paths, but
you know more about this than me ;-).  So if there are no regressions, feel
free to nuke it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2022-11-09  9:01 ` aldyh at gcc dot gnu.org
@ 2022-11-09 12:54 ` rguenth at gcc dot gnu.org
  2022-11-10 11:22 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-09 12:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm testing a patch removing the premature rejection of the cycle in the
backwards threader but will leave the cancellation code for now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2022-11-09 12:54 ` rguenth at gcc dot gnu.org
@ 2022-11-10 11:22 ` rguenth at gcc dot gnu.org
  2022-11-10 13:13 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-10 11:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
commit 837be6c7cfb49e16a18ef8f6c44d98bfa6d2396b
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Nov 9 13:52:58 2022 +0100

    tree-optimization/84646 - remove premature thread path rejection

    This removes a premature rejection that's done later in a different
    way.

            PR tree-optimization/84646
            * tree-ssa-threadbackward.cc (back_threader::maybe_register_path):
            Remove premature cycle rejection.


The last threadfull pass now performs the desired threading but we lack
a later pass that elides the endless loop that remains:

<bb 9> [local count: 477815113]:
# sum_10 = PHI <sum_51(7), sum_10(9)>
# ivtmp.9_24 = PHI <ivtmp.9_53(7), ivtmp.9_31(9)>
ivtmp.9_31 = ivtmp.9_24 + 4;
if (_15 != ivtmp.9_31)
  goto <bb 9>; [89.00%]
else
  goto <bb 10>; [11.00%]

<bb 10> [local count: 118111600]:
# sum_33 = PHI <sum_10(9), sum_35(11), 20000(6), sum_27(8)>
# running_37 = PHI <0(9), running_38(11), 0(6), running_38(8)>

the loop isn't removed by DCE because sum_10 is needed.  This case looks
like a genuine missed copy propagation or value numbering since the
value is always equal to sum_51.  But after threadfull2 we have none
of those.  VRP is no longer doing copy propagation, we end up with

  <bb 11> [local count: 477815113]:
  # sum_10 = PHI <sum_51(8), sum_48(12)>
  # ivtmp.9_24 = PHI <ivtmp.9_53(8), ivtmp.9_50(12)>
  ivtmp.9_31 = ivtmp.9_24 + 4;
  if (_15 != ivtmp.9_31)
    goto <bb 12>; [89.00%]
  else
    goto <bb 13>; [11.00%]

  <bb 12> [local count: 425255451]:
  # sum_48 = PHI <sum_10(11)>
  # ivtmp.9_50 = PHI <ivtmp.9_31(11)>
  goto <bb 11>; [100.00%]

  <bb 13> [local count: 118111600]:
  # sum_33 = PHI <sum_10(11), sum_35(14), 20000(6), sum_27(9)>

there.  A copyprop pass doesn't handle this degenerate case, non-iterating
FRE neither, nor iterating FRE.  Both CCP and FRE fall into the trap
of starting sum_10 as 20000 and on iteration the above makes sum_10
varying.  FRE would handle the first quoted IL with sum_48 removed though
(even when not iterating).  Currently it's forwprop that turns the 2nd
into the first by means of copy propagating.  The idea was that VRP would
do the job fully clearing out copies but appearantly that no longer happens.

We've had copy_prop in place of CCP but CCP doesn't cleanup this singleton
PHI copy, investigating why.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2022-11-10 11:22 ` rguenth at gcc dot gnu.org
@ 2022-11-10 13:13 ` cvs-commit at gcc dot gnu.org
  2022-11-10 13:37 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-10 13:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:203b127fccc9abe5373c9e3cc03a476c35b1f594

commit r13-3877-g203b127fccc9abe5373c9e3cc03a476c35b1f594
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Nov 10 14:08:35 2022 +0100

    Restore CCP copy propagation

    The following restores copy propagation in CCP for the case the
    lattice was constant before trying to transition to a copy.  At
    some point we changed to use the meet operator to handle
    integer constant -> integer constant transitions but that screws
    up the const -> copy lattice transition.

            PR tree-optimization/84646
            * tree-ssa-ccp.cc (set_lattice_value): Make sure we
            allow a const -> copy transition and avoid using meet
            in that case.

            * gcc.dg/tree-ssa/ssa-ccp-42.c: New testcase.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2022-11-10 13:13 ` cvs-commit at gcc dot gnu.org
@ 2022-11-10 13:37 ` rguenth at gcc dot gnu.org
  2022-11-10 14:19 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-10 13:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I have a patch to make forwprop do the loop PHI elimination but then there's no
CD-DCE pass after that to remove the loop itself :/  It might be tempting to
swap cddce3 and dce7 or turn dce7 into a cd-dce ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2022-11-10 13:37 ` rguenth at gcc dot gnu.org
@ 2022-11-10 14:19 ` cvs-commit at gcc dot gnu.org
  2022-11-10 15:13 ` rguenth at gcc dot gnu.org
  2022-11-11 13:32 ` cvs-commit at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-10 14:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:f1b76811f2c3773e8cabcc07932bf13e82e264db

commit r13-3879-gf1b76811f2c3773e8cabcc07932bf13e82e264db
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Nov 10 15:02:37 2022 +0100

    better PHI copy propagation for forwprop

    We can handle _1 = PHI <_1, _2> as a copy.

            PR tree-optimization/84646
            * tree-ssa-forwprop.cc (pass_forwprop::execute): Improve
            copy propagation across PHIs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2022-11-10 14:19 ` cvs-commit at gcc dot gnu.org
@ 2022-11-10 15:13 ` rguenth at gcc dot gnu.org
  2022-11-11 13:32 ` cvs-commit at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-10 15:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fxue at gcc dot gnu.org

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
So with the last patch I have we eliminate the empty loop that's created by
threading but the result is still (or now "again") the imperfect result
mentioned in the original description - we fail to exit the outer loop.

The main thing the patches in this series did is restore the threading
that did the inner loop optimization and the required cleanup.  I don't
think that this particular thread itself can be enhanced to cover exiting
the outer loop.  In particular we ask to thread across the loop exit but
we know nothing about that apart from the code in the remaining iterations
having no side-effect.

I'm not sure which kind of pass/transform would be suited to cover this
in a more general way than jump threading does.  We do have loop splitting
which handles this as part of splitting on a "semi-invariant" condition
but that fails quite early because it's

   if (running)
     {
       if (other)
         running = 0;
     }

and we don't seem to handle the conditional "semi-invariant" condition
case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops
       [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2022-11-10 15:13 ` rguenth at gcc dot gnu.org
@ 2022-11-11 13:32 ` cvs-commit at gcc dot gnu.org
  9 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-11 13:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:be2c74fdcd0e8d66c3667008ba2561ab5dcc379b

commit r13-3897-gbe2c74fdcd0e8d66c3667008ba2561ab5dcc379b
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Nov 10 15:04:10 2022 +0100

    Make last DCE remove empty loops

    The following makes the last DCE pass CD-DCE and in turn the
    last CD-DCE pass a DCE one.  That ensues we remove empty loops
    that become empty between the two.  I've also moved the tail-call
    pass after DCE since DCE can only improve things here.

    The two testcases were the only ones scanning cddce3 so I've
    changed them to scan the dce7 pass that's now in this place.
    The testcases scanning dce7 also work when that's in the earlier
    position.

            PR tree-optimization/84646
            * tree-ssa-dce.cc (pass_dce::set_pass_param): Add param
            wheter to run update-address-taken.
            (pass_dce::execute): Honor it.
            * passes.def: Exchange last DCE and CD-DCE invocations.
            Swap pass_tail_calls and the last DCE.

            * g++.dg/tree-ssa/pr106922.C: Continue to scan earlier DCE dump.
            * gcc.dg/tree-ssa/20030808-1.c: Likewise.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-11 13:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-84646-4@http.gcc.gnu.org/bugzilla/>
2022-11-09  8:17 ` [Bug tree-optimization/84646] Missed optimisation for hoisting conditions outside nested loops rguenth at gcc dot gnu.org
2022-11-09  8:39 ` rguenth at gcc dot gnu.org
2022-11-09  9:01 ` aldyh at gcc dot gnu.org
2022-11-09 12:54 ` rguenth at gcc dot gnu.org
2022-11-10 11:22 ` rguenth at gcc dot gnu.org
2022-11-10 13:13 ` cvs-commit at gcc dot gnu.org
2022-11-10 13:37 ` rguenth at gcc dot gnu.org
2022-11-10 14:19 ` cvs-commit at gcc dot gnu.org
2022-11-10 15:13 ` rguenth at gcc dot gnu.org
2022-11-11 13:32 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).