From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7A1CE3858C3A; Thu, 12 Oct 2023 16:32:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7A1CE3858C3A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1697128340; bh=uwC+NNf7BvHoppjcGFP0CNq1M+PPtjuRhwmhrUvJSOM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ldTbOSYZTaoQ4vIHP92j+CTaW3nKR6WcEWgCklAEzm1rGva4lEoN+YmiytqTaOcZb iQZVsvtsDHCdXkl03qrWEtVSt1zVCRG7cfV+QyjnVIHaLiGMISqGnyfvgmeikOT7EN CVsxx3DRZlRRvCS++s6LE/nKq5l9Z774fYsfORQ0= From: "amacleod at redhat dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc Date: Thu, 12 Oct 2023 16:32:19 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.2.0 X-Bugzilla-Keywords: compile-time-hog, needs-bisection, needs-reduction X-Bugzilla-Severity: normal X-Bugzilla-Who: amacleod at redhat dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111622 --- Comment #6 from Andrew Macleod --- Interesting. The "fix" turns out to be: commit 9ea74d235c7e7816b996a17c61288f02ef767985 Author: Richard Biener Date: Thu Sep 14 09:31:23 2023 +0200 tree-optimization/111294 - better DCE after forwprop=20 The following adds more aggressive DCE to forwprop to clean up dead stmts when folding a stmt leaves some operands unused. The patch=20 uses simple_dce_from_worklist for this purpose, queueing original=20 operands before substitution and folding, but only if we folded the stmt. This removes one dead stmt biasing threading costs in a later pass but it doesn't resolve the optimization issue in the PR yet. Which implies something pathological was triggering in VRP, so I dug a litt= le deeper... It seems to be a massive number of partial equivalencies generated by seque= nces like: _5 =3D (unsigned int) _1; _10 =3D (unsigned int) _1; _15 =3D (unsigned int) _1; _20 =3D (unsigned int) _1; _25 =3D (unsigned int) _1; <...> for a couple of hundred statements. these are all then members of a parti= al equivalence set, and we end up doing obscene amounts of pointless looping a= nd recomputing of ranges of things in the set when say _1 may change. The intent of partial equivalence is to allow us to reflect known subranges thru casts, but not to build up large groups like in an equivalence. There should be a limit to the size. We start to lose most of the usefulness when= the grouping gets too large.=