From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 577553857804; Fri, 28 May 2021 11:34:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 577553857804 From: "grasland at lal dot in2p3.fr" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/100811] New: Consider not omitting frame pointers by default on targets with many registers Date: Fri, 28 May 2021 11:34:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.3.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: grasland at lal dot in2p3.fr X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 May 2021 11:34:38 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100811 Bug ID: 100811 Summary: Consider not omitting frame pointers by default on targets with many registers Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: grasland at lal dot in2p3.fr Target Milestone: --- Since at least GCC 4 (Bugzilla's duplicate search points me to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D13822), GCC has been omitting frame pointers by defaults when optimizations are enabled, unless the extra -fno-omit-frame-pointer flag is specified. As far as I know, the rationale for doing this was that : - On architectures with very few general purpose registers like 32-bit x86, strictly following frame pointer retention discipline has a prohibitive performance cost. - Debuggers do not need frame pointers to do their job, as they can leverage DWARF or PDB debug information instead. While these arguments are valid, I would like to make the case that frame pointers may be worth keeping by default on hardware architectures where th= is is not too expensive (like x86_64), for the purpose of making software performance analysis easier. Unlike debuggers, sampling profilers like perf cannot afford the luxury of walking the process stack using DWARF any time a sample is taken, as that w= ould take too much time and bias the measured performance profile. Instead, when using DWARF for stack unwinding purposes, they have to take stack samples a= nd post-process them after the fact. Furthermore, since copying the full progr= am stack on every sample would generate an unbearable volume of data, they usu= ally can only afford to copy the top of the stack (upper 64KB at maximum for per= f), which will lead to corrupted stack traces when application stacks get deep = or there are lots of / large stack-allocated objects. For all these reasons, DWARF-based stack unwinding is a somewhat unreliable technique in profiling, where it's really hard to get >90% of your profile's stack traces to be correctly reconstructed all the way to _start or _clone.= The remaining misreconstructed stack traces will translate into profile bias (underestimated "children" overhead measurements), and thus performance analysis mistakes. To make matters worse, DWARF-based unwinding is relatively complex, and not every useful runtime performance analysis tool supports it. For example, BPF-based tracing tools, which are nowadays becoming popular due to their highly appealing ability to instrument every kernel or user function on the fly, do not currently support DWARF-based stack unwinding, most likely beca= use feeding the DWARF debug info into the kernel-based BPF program would either= be prohibitively expensive, a security hole, or a source of recursive tracing incidents (tracing tool generates syscalls of the kind that it is tracing, creating an infinite loop). Therefore, I think -fno-omit-frame-pointer should be the default on architectures where the price to pay is not too high (like x86_64), which should ensure that modern performance analysis tooling works on all popular Linux distributions without rebuilding the entire world. In this scheme, -fomit-frame-pointer would remain as a default option for targets where it = is really needed (like legacy 32-bit x86), and as a specialist option for those cases where the extra ~1% of performance is really truly needed and worth i= ts cost. What do you think?=