From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E2E9C38708B9; Mon, 28 Sep 2020 06:59:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E2E9C38708B9 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled Date: Mon, 28 Sep 2020 06:59:30 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: linkw at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Sep 2020 06:59:31 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96789 --- Comment #31 from Richard Biener --- (In reply to Kewen Lin from comment #29) > (In reply to Hongtao.liu from comment #28) > > > Probably you can try to tweak it in ix86_add_stmt_cost? when the stat= ement > >=20 > > Yes, it's the place. > >=20 > > > is UB to UH conversion statement, further check if the def of the inp= ut UB > > > is MEM. > >=20 > > Only if there's no multi-use for UB. More generally, it's quite difficu= lt to > > guess later optimizations for the purpose of more accurate vectorization > > cost model, :(. >=20 > Yeah, it's hard sadly. The generic cost modeling is rough, > ix86_add_stmt_cost is more fine-grain (at least than what we have on Power > :)), if you want to check it more, it seems doable in target specific hook > finish_cost where you can get the whole vinfo object, but it could end up > with very heavy analysis and might not be worthy. >=20 > Do you mind to check if it can also fix this degradation on x86 to run FRE > and DSE just after cunroll? I found it worked for Power, hoped it can help > there too. Btw, we could try sth like adding a TODO_force_next_scalar_cleanup to be returned from passes that see cleanup opportunities and have the pass manager queue that up, looking for a special marked pass and enabling that so we could have NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); NEXT_PASS (pass_scalar_cleanup); PUSH_INSERT_PASSES_WITHIN (pass_scalar_cleanup); NEXT_PASS (pass_fre, false /* may_iterate */); NEXT_PASS (pass_dse); POP_INSERT_PASSES (); with pass_scalar_cleanup gate() returning false otherwise. Eventually pass properties would match this better, or sth else. That said, running a cleanup on the whole function should be done via a separate pass - running a cleanup on a sub-CFG can be done from within another pass. But mind that sub-CFG cleanup really has to be of O(size-of-sub-CFG), otherwise it doesn't help.=