From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 26354385AF8D for ; Mon, 31 Jul 2023 12:43:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 26354385AF8D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4A2152222B; Mon, 31 Jul 2023 12:43:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1690807396; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=5P3lDVf42OpN0IdZUVP3J//Y4wd2MW2swRbLVKT5OXc=; b=fqP2gctpk/cKjvTUp0YIpWT4uHnh4vLp9YBWuPXXPfRRDaOGAZssQNbrxkpTDh/uqeXpE4 HQcO/00lHWaVgDPWtgfGCJZUjNsiFb5LZOThKEh9nW62oimn+WyXCHT784HdlO+p4LejBv ICklcc3FPlsb+YNn3KwvC0+JSUQISQc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1690807396; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=5P3lDVf42OpN0IdZUVP3J//Y4wd2MW2swRbLVKT5OXc=; b=jMiA8IFNXHwzTlsmueIqtmqaYfMVuQDQr7YZqT24n4g4w/fbijKDu2adrbvlCu+7fitVRL OpfXO9L9DdR4dRCg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 3C3A52C142; Mon, 31 Jul 2023 12:43:16 +0000 (UTC) Received: by wotan.suse.de (Postfix, from userid 10510) id 2F9726944; Mon, 31 Jul 2023 12:43:16 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by wotan.suse.de (Postfix) with ESMTP id 2DFCD6933; Mon, 31 Jul 2023 12:43:16 +0000 (UTC) Date: Mon, 31 Jul 2023 12:43:14 +0000 (UTC) From: Michael Matz To: Thomas Koenig cc: gcc mailing list Subject: Re: Calling convention for Intel APX extension In-Reply-To: <9c5a3b02-25f7-9014-d1f5-d5496ab7f430@netcologne.de> Message-ID: References: <4fef10d7-71ea-e31d-3dd5-97ba0994b61c@netcologne.de> <9c5a3b02-25f7-9014-d1f5-d5496ab7f430@netcologne.de> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, On Sun, 30 Jul 2023, Thomas Koenig wrote: > > I've recently submitted a patch that adds some attributes that basically > > say "these-and-those regs aren't clobbered by this function" (I did them > > for not clobbered xmm8-15). Something similar could be used for the new > > GPRs as well. Then it would be a matter of ensuring that the interesting > > functions are marked with that attributes (and then of course do the > > necessary call-save/restore). > > Interesting. > > Taking this a bit further: The compiler knows which registers it used > (and which ones might get clobbered by called functions) and could > generate such information automatically and embed it in the assembly > file, and the assembler could, in turn, put it into the object file. > > A linker (or LTO) could then check this and elide save/restore pairs > where they are not needed. LTO with interprocedural register allocation (-fipa-ra) already does this. Doing it without LTO is possible to implement in the way you suggest, but is very hard to get effective: the problem is that saving/restoring of registers might be scheduled in non-trivial ways and getting rid of instruction bytes within function bodies at link time is fairly non-trivial: it needs excessive meta-information to be effective (e.g. all jumps that potentially cross the removed bytes must get relocations). So you either limit the ways that prologue and epilogues are emitted to help the linker (thereby limiting effectiveness of unchanged xlogues) or you emit more meta-info than the instruction bytes themself, bloating object files for dubious outcomes. > It would probably be impossible for calls into shared libraries, since > the saved registers might change from version to version. The above scheme could be extended to also allow introducing stubs (wrappers) for shared lib functions, handled by the dynamic loader. But then you would get hard problems to solve related to function addresses and their uniqueness. > Still, potential gains could be substantial, and it could have an > effect which could come close to inlining, while actually saving space > instead of using extra. > > Comments? I think it would be an interesting experiment to implement such scheme fully just to see how effective it would be in practice. But it's very non-trivial to do, and my guess is that it won't be super effective. So, could be a typical research paper topic :-) At least outside of extreme cases like the SSE regs, where none are callee-saved, and which can be handled in a different way like the explicit attributes. Ciao, Michael.