From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <fw@deneb.enyo.de>
Received: from albireo.enyo.de (albireo.enyo.de [37.24.231.21])
	by sourceware.org (Postfix) with ESMTPS id 87FF1385275F
	for <libc-alpha@sourceware.org>; Sat, 17 Sep 2022 14:45:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 87FF1385275F
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=deneb.enyo.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=deneb.enyo.de
Received: from [172.17.203.2] (port=51317 helo=deneb.enyo.de)
	by albireo.enyo.de ([172.17.140.2]) with esmtps (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256)
	id 1oZZ4l-005BMW-0b; Sat, 17 Sep 2022 14:45:47 +0000
Received: from fw by deneb.enyo.de with local (Exim 4.94.2)
	(envelope-from <fw@deneb.enyo.de>)
	id 1oZZ4k-0006M2-M7; Sat, 17 Sep 2022 16:45:46 +0200
From: Florian Weimer <fw@deneb.enyo.de>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Chris Kennelly <ckennelly@google.com>,  libc-coord@lists.openwall.com,
  "carlos@redhat.com" <carlos@redhat.com>,  libc-alpha
 <libc-alpha@sourceware.org>,  szabolcs.nagy@arm.com
Subject: Re: [libc-coord] Re: RSEQ symbols: __rseq_size, __rseq_flags vs
 __rseq_feature_size
References: <def16bfb-369f-865d-5c45-d3368415765d@efficios.com>
	<87y1uj49t4.fsf@mid.deneb.enyo.de>
	<CAEE+ybkXCYoX73ksO0yutpc+4QZ_RuKnLxquVymzwj01d0=x-g@mail.gmail.com>
	<87fsgryphl.fsf@mid.deneb.enyo.de>
	<a9fe2b30-9253-9120-3627-aed4ebf95973@efficios.com>
Date: Sat, 17 Sep 2022 16:45:46 +0200
In-Reply-To: <a9fe2b30-9253-9120-3627-aed4ebf95973@efficios.com> (Mathieu
	Desnoyers's message of "Sat, 17 Sep 2022 13:51:44 +0200")
Message-ID: <87mtayvz39.fsf@mid.deneb.enyo.de>
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>

* Mathieu Desnoyers:

> On 2022-09-16 23:32, Florian Weimer wrote:
>> * Chris Kennelly:
>> 
>>>> If the kernel does not currently overwrite the padding, glibc can do
>>>> its own per-thread initialization there to support its malloc
>>>> implementation (because the padding is undefined today from an
>>>> application perspective).  That is, we would initialize these
>>>> invisible vCPU IDs the same way we assign arenas today.  That would
>>>> cover this specific malloc use case only, of course.
>> 
>>> If a user program updates to a new kernel before glibc does, would it be
>>> able to easily take advantage of it?
>> 
>> No, as far as I understand it, there is presently no signaling from
>> kernel to applications that bypasses the rseq area registration.  So
>> the only thing you could do is to unregister and re-register with a
>> compatible value.  And that is of course quite undefined and assumes
>> that you can do this early enough during the life-time of each thread.
>> 
>> But if we have the extension handshake, I'll expect us to backport it
>> quite widely, after some time to verify that it works with CRIU etc.
>
> I don't think this is what Chris is asking here.
>
> I think the requirement here is to make sure that the extensibility 
> scheme we come up with will allow to extend struct rseq simply by 
> upgrading the kernel, without any need to upgrade glibc. (that's indeed 
> a requirement of mine). So a new application and a new kernel can use a 
> newly available extended field, even with an old glibc.

I took it for granted that we'd like to get libc out of the picture
for future changes, so I interpreted Chris' question in the context of
the initial switch (i.e., enabling rseq features that need
extensibility on currently released glibc, without upgrading glibc).

> If we want to keep the kernel ABI as simple as we can, then we just 
> expose the rseq feature size (and required alignment), and don't expose 
> any rseq feature flags. This in turn means that glibc would have to 
> somehow expose the rseq feature size in its ABI. If glibc decides 
> against exposing this rseq feature size symbol, then it would be up to 
> the application to combine information about __rseq_size and 
> getauxval(rseq feature size) to figure out which fields are actually 
> populated. It would "work", but chances are that some users will get it 
> wrong. It seems simpler for a user to simply do:
>
> if (__rseq_feature_size >= offsetofend(struct rseq, vm_vcpu_id))
>
> to validate whether a given field is indeed populated.
>
> The rseq feature size approach would scale to very large feature 
> numbers. It would *not* allow deprecation of fields after they are 
> published, but I see this as a gain in simplicity for users of the ABI, 
> even though we lose a knob as kernel developers.

I think glibc can register rseq with a new flag once it sees
AT_RSEQ_FEATURE_SIZE in the auxiliary vector (even if it's 32).  That
flag would naturally end up in __rseq_flags.  For future extensions
__rseq_size should work directly.

But as I said, we better use all the padding at once during the first
step.  Or we could add even more stuff to move past the current 32,
then we wouldn't need the flag dance. 8-)