From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 562FF3858034; Thu, 10 Mar 2022 09:42:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 562FF3858034 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/102879] [12 Regression] Dead Code Elimination Regression at -O3 Date: Thu, 10 Mar 2022 09:42:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc priority Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2022 09:42:43 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102879 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org Priority|P3 |P1 --- Comment #3 from Richard Biener --- There's an interesting missing value-numbering optimization here, call_may_clobber_ref_p_1 considers the call to foo () possibly clobbering '= c' even though 'c' does not escape the TU. Since 'foo' is external there's no IPA reference or modref data but we do know that !may_be_aliased (base) so we could amend /* If the reference is based on a decl that is not aliased the call cannot possibly clobber it. */ if (DECL_P (base) && !may_be_aliased (base) /* But local non-readonly statics can be modified through recursion or the call may implement a threading barrier which we must treat as may-def. */ && (TREE_READONLY (base) || !is_global_var (base))) return false; to constrain the "But local ..." (note nested functions make 'local' difficult to express so we use !is_global_var). Of course the threading barrier issue would still exist, but then the call itself isn't clobbering it just serves as a barrier for code motion - I'm not sure what kind of transforms we have to forbid. Now, we _do_ have to ensure that foo () cannot access 'c' which it for example might do if there's a bar() { c =3D 3 }; void (*hook)() =3D bar; and foo calls the exported *hook. In the end we have c/1 (c) @0x7ffff7ff3180 Type: variable definition analyzed Visibility: semantic_interposition prevailing_def_ironly References: Referring: main/4 (write) main/4 (read) Availability: available Varpool flags: used-by-single-function (semantic_interposition!?), used-by-single-function might be the "trick" to use here. Maybe we can also compute a non-recursive flag on main/4 to say that control flow cannot possibly be (indirectly) recursive. For the threading issue we might need a flag like not-called-by-address-taken-functions (including not address taken itself) = on functions which should practically rule out being a thread. Anyway, the testcase in GCC 11 relies on cunrolli unrolling the inner loop and cunroll unrolling the outer loop while GCC 12 no longer unrolls the outer loop because size: 18-3, last_iteration: 17-3 Loop size: 18 Estimated size after unrolling: 19 Not unrolling loop 1: contains call and code would grow. while GCC 11 has size: 17-3, last_iteration: 16-3 Loop size: 17 Estimated size after unrolling: 18 Making edge 14->9 impossible by redistributing probability to other edges. Making edge 4->5 impossible by redistributing probability to other edges. t.c:8:21: optimized: loop with 1 iterations completely unrolled (header execution count 134197598) Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true. Forced exit to be taken: if (0 !=3D 0) The difference is get_loop_hot_path () which on trunk gets presented with a loop body where some extra path duplication has occured, duplicating the store to d and directing the path to foo () where the respective edge has 66% probability vs. 33% on trunk and on the GCC 11 branch the situation is reversed with 67% for the skip over the call. On trunk threadfull1 duplicates the path with the store to 'd' and that is also what wrecks the edge probabilities. I think that's what we definitely need to fix here - the profile wreckage done by threadfull1.=