From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C77B93840C32; Tue, 29 Mar 2022 10:39:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C77B93840C32 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/104271] [12 Regression] 538.imagick_r run-time at -Ofast -march=native regressed by 26% on Intel Cascade Lake server CPU Date: Tue, 29 Mar 2022 10:39:01 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Mar 2022 10:39:01 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104271 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |NEW CC| |hubicka at gcc dot gnu.org, | |jamborm at gcc dot gnu.org --- Comment #8 from Richard Biener --- (In reply to cuilili from comment #7) > Created attachment 52706 [details] > Add a heuristic for eliminate redundant load and store in inline pass. >=20 > Hi Richard, >=20 > Could you help take a look? This is my first time adding code in mid-end, > hope you can give me some advice, thank you! >=20 > I add a INLINE_HINT_eliminate_load_and_store hint in to inline pass. when > callee's memory access is caller's local memory parameter and access size= is > greater than the target threshold, we will enable the hint. with the hint, > inlining_insns_auto will enlarge the bound. The target hook is only enabl= ed > for x86 now. >=20 > With the patch applied > Icelake server: 538.imagic_r get 15.18% improvement for multicopy and 40.= 78% > improvement for single copy with no measurable changes for other benchmar= ks. >=20 > Casecadelake: 538.imagic_r get 12.4% improvement for multicopy with and c= ode > size increased by 0.4%. With no measurable changes for other benchmarks. >=20 > Znver3 server: 538.imagic_r get 9.6% improvement for multicopy with and c= ode > size increased by 0.5%. With no measurable changes for other benchmarks. It's an interesting idea, note Honza knows better about IPA modref and inlining than me. What I doubt is that you can directly use IPA modref info to determine whether inlining will likely elide a store/load pair since IIRC the modref info is for the whole function. IPA SRA might perform the kind of analysis that is contained to the call context and that might be available here already (and eventually even IPA SRA considers passing the stored/loaded values by value?) But yes, having a stream of up to N (independent?) stores before each call plus a stream of up to M (independent hoistable to function start?) loads at each function start would make such analysis possible.=