From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from angie.orcam.me.uk (angie.orcam.me.uk [78.133.224.34]) by sourceware.org (Postfix) with ESMTP id E6F193858C42 for ; Sat, 17 Feb 2024 00:38:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E6F193858C42 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E6F193858C42 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=78.133.224.34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708130317; cv=none; b=u+bjubgRDwZb6LNhnX/172sTWsS4/ek68CsdwAAMy3ZDtp4GfJD9nk9L5smFy0UVy2t36rifn0sL1WKTtpdHXi0tJdKfjBWpOeW5Kqn3O/NTpp8ERGRxt85cAjBQGNsmVc9bh6xprfJJ9mYnNkwYg873QDFIf1Obi5iY2ClJ2BA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708130317; c=relaxed/simple; bh=vGCnZJeetr0/n+rQC31AKT17HI5HlIeR2qx3UEkiGnU=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=HwwYSacItS2nY7WxDooUiqjx2fQzmBWJzrlzNVkZzNC68Nv1cDVYGeueRkT8bhje9nC5eRstVUK/lwVtYetdRaovxZSEYWU3KfMs77tcKbvOAOZtCWQvoGUJVsSoVgYFEYVaABZH8VvaFM+IKE3RG4Pv/jqsp82EqeTiq4RuSyo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id 0E46B92009C; Sat, 17 Feb 2024 01:38:34 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 073F992009B; Sat, 17 Feb 2024 00:38:34 +0000 (GMT) Date: Sat, 17 Feb 2024 00:38:33 +0000 (GMT) From: "Maciej W. Rozycki" To: Jakub Jelinek cc: Segher Boessenkool , Paul Koning , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Turn on LRA on all targets In-Reply-To: Message-ID: References: <283c45ca085ced958cbce6e64331252c83a5899f.1682268126.git.segher@kernel.crashing.org> <20230423203328.GL19790@gate.crashing.org> <2A759520-2D62-472E-A97F-35E09B6E50F5@comcast.net> <20240216134748.GF19790@gate.crashing.org> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-1163.3 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_INFOUSMEBIZ,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 16 Feb 2024, Maciej W. Rozycki wrote: > On Fri, 16 Feb 2024, Jakub Jelinek wrote: > > > > There is no function prologue to optimise in the VAX case, because all > > > the frame setup has already been made by the CALLS instruction itself in > > > the caller. The first machine instruction of the callee is technically > > > already past the "prologue". And then RET serves as the whole function > > > "epilogue". > > > > So, what is the problem with DWARF unwinding? Just make sure to emit > > appropriate instructions describing the saving of the corresponding > > registers at specific points based on CFA at the start of the function > > (so that it appears in CIE instructions) and that should be all that is > > needed, no? > > I may not remember all the issues correctly offhand as it's been a while > since I looked into it, but as I recall DWARF handling code has not been > prepared for all the frame to have been already allocated and initialised > at a function's entry point, and also at least DWARF-4 is IIRC required to > have statics at offsets positive from FP (for a stack growing downwards). There is a further complication actually where lazy binding is in use. In that case a function that has been jumped to indirectly from the lazy resolver will often have a different number of statics saved in the frame from where the function has been called directly via a fully resolved PLT GOT entry. This is because at the time the lazy resolver is being called it is not known what statics the ultimate callee wants to save, as it is not a part of the ABI. Therefore the worst condition is assumed and the resolver requests all the statics (R6-R11) to be saved, observing that saving more statics than required makes no change to code semantics, it just hurts performance (but calls to the lazy resolver are rare, so this is not a big deal). Conversely when the function has been already resolved, the PLT GOT entry points at the callee instead, which will then only save the statics it has requested itself, knowing them to be used. Obviously a frame that has all the statics saved will have a different size of its variable part and slots will have been assigned differently there from the case where only some statics have been saved. Of course it does not matter for regular code execution as RET will always correctly interpret a stack frame and restore exactly these statics that have been saved in the frame, but for unwinding actual frame contents have to be interpreted. I am not sure if this run-time dependent frame layout can be described in DWARF terms even, so I am leaning towards concluding a native unwinder is the only feasible way to go. For those who are unaware how information as to what statics are to be saved is made available by functions with VAX hardware: it is embedded at the function's address in a form of a 16-bit data quantity, which is a register save bitmask (an entry mask in VAX-speak) for registers R0-R11; 1 in the mask requests that the corresponding register be saved in the callee's frame by the CALLS instruction. Once the frame has been built by CALLS, control is then passed to the location immediately following the bitmask, which is the function's actual entry point, i.e. the PC is set to the function's address + 2. Maciej