From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 1D64A3858028; Fri, 26 Mar 2021 09:33:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1D64A3858028 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug ipa/99785] Awful lot of time spent building gl.cc in Firefox Date: Fri, 26 Mar 2021 09:33:04 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: ipa X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: compile-time-hog, memory-hog X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc version Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2021 09:33:05 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99785 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org Version|unknown |11.0 --- Comment #10 from Richard Biener --- Did anybody check the actual output from clang as to whether it performs the desired optimizations? I only have clang 9 around and that rejects the TU (maybe there's clang specific code paths and the preprocessed source is not representative here) Inlining blend_pixels without first constant propagating 'blend_key' (I sup= pose at all call paths that's eventually supposed to be constant propagated somehow?) looks quite stupid given the large switch. Sure, saving %xmm around calls = can have a cost but trashing icache should be worse. If all of this is auto-generated the auto-generation might also be able to improve the blend_key dispatch. Another strathegy might be to not put always_inline on everything (because that in turn will cause exponential growth) but instead inline everything into the finally important function(s) via 'flatten'. That is, you do sth like static __attribute__((always_inline)) inline void large_leaf () { /* large = */ } static __attribute__((always_inline)) inline void inter1 () { large_leaf ()= ; } static __attribute__((always_inline)) inline void inter2 () { inter1 (); in= ter1 (); } static __attribute__((always_inline)) inline void inter3 () { inter2 (); in= ter2 (); } and what you get is (intermediate) 8 copies of the large_leaf body. Which is because we inline expand from leafs rather than first inlining the small always-inline wrappers (and throwing them away before inlining into them). I suppose we could try to not inline into always-inline functions at the expense of needing to iterate on inlined always-inline bodies. Or somehow at least delay inlining large bodies into always-inline bodies. Anyway, marking such large functions as always-inline is asking for trouble= .=