From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A9BD83846020; Mon, 14 Jun 2021 12:56:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A9BD83846020 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/100934] [9/10/11/12 Regression] wrong code at -O3 during unrolling since r9-6299 Date: Mon, 14 Jun 2021 12:56:19 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 9.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jun 2021 12:56:19 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100934 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot = gnu.org Status|NEW |ASSIGNED --- Comment #8 from Richard Biener --- (In reply to Jeffrey A. Law from comment #7) > So when we're finding jump threads we know if we thread through the loop > latch and we note when that's going to create an irreducible region. We > generally suppress threading through the latch before the loop optimizers > have run, but allow it afterwards. >=20 > But I'm not aware of a really good place to adjust the loop bound estimat= es, > particularly for the backwards threader. THe backwards threader uses > copy_bbs API, so much of the guts of what's happening is opaque. >=20 > Peek at jump_thread_path_registry:::duplicate_thread_path. All the > backwards threader bits go through there at some point. OK, so it looks like mark_threaded_blocks already has code to deal with thi= s: if (crossed_headers > 1) { vect_free_loop_info_assumptions ((*path)[path->length () - 1]->e->dest->loop_father= ); break; but mark_threaded_blocks is only called after we process all FSM paths thou= gh it's handling of the case wouldn't necessarily fix this instance where we just duplicate the loop header incoming its backedge and to one of its destination still in the same loop. It looks like applying the FSM threads goes through different code paths than the regular threading... But the issue at hand must be more subtle since invalidating the number of iterations on the loop that is then removed of course doesn't change anything. Still it is likely going to be an issue. Possible fix for that but not this testcase: diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index a86302be18e..4e31fa8f5c2 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2379,12 +2379,15 @@ jump_thread_path_registry::duplicate_thread_path (e= dge entry, missuses of the functions. I.e. if you ask to copy something weird, it will work, but the state of structures probably will not be correct. */ + bool loop_header_duplicated =3D false; for (i =3D 0; i < n_region; i++) { /* We do not handle subloops, i.e. all the blocks must belong to the same loop. */ if (region[i]->loop_father !=3D loop) return false; + if (region[i]->loop_father =3D=3D loop) + loop_header_duplicated =3D true; } initialize_original_copy_tables (); @@ -2501,6 +2504,11 @@ jump_thread_path_registry::duplicate_thread_path (ed= ge entry, /* Add the other PHI node arguments. */ add_phi_args_after_copy (region_copy, n_region, NULL); + /* Invalidate loop niter information if the loop header was part of the + thread path. */ + if (loop_header_duplicated) + vect_free_loop_info_assumptions (loop); + free (region_copy); adjust_paths_after_duplication (current_path_no); The actual issue seems to be that the DOM pass following the problematic threading threads additional paths through loop headers which ends up rotating the loop and also affecting niter estimates since it crosses a irreducible region boundary. Ah - but we have _not_ marked irreducible regions but we are querying them from DOM threading! diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 075b1ccb9de..c231e6c8467 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -727,7 +727,8 @@ pass_dominator::execute (function *fun) gcc.dg/tree-ssa/pr21417.c can't be threaded if loop preheader is missing. We should improve jump threading in future then LOOPS_HAVE_PREHEADERS won't be needed here. */ - loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES); + loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES + | LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS); /* We need accurate information regarding back edges in the CFG for jump threading; this may include back edges that are not part of fixes the testcase.=