On Fri, Sep 16, 2022 at 11:29 AM Florian Weimer wrote: > * Mathieu Desnoyers: > > > /* > > * C) Check only rseq flags. 32 features at most. One mask and one > > * comparison. > > */ > > > > void fC(void) > > { > > if (likely(__rseq_flags & __RSEQ_FLAG_FEATURE_VM_VCPU_ID)) { > > /* Use rseq with vcpu_id. */ > > asm volatile ("ud2\n\t"); > > } else { > > /* Fallback. */ > > asm volatile ("int3\n\t"); > > } > > I think it has to be this because we cannot lower __rseq_flags below > 32 now, not if rseq is active. > > If you don't find a better use fot the remaining 32 bits of padding, > maybe put the PID or TID there, so that we can create a > system-call-less version of getpid/gettid. So the flag would just say > that the padding is now completely used. > > Going forward, we can use the size increasing above 32 as a support > indicator. > > > I can think of 4 approaches that applications will use to detect > > availability of their specific rseq feature for each rseq critical > > section: > > > > 1) Dynamically check whether the feature is implemented at runtime > > with conditional branches. Those using this approach will probably > > not want to have the overhead of the two comparisons in approach (A) > > above. Applications and libraries should probably use their own copy > > of the glibc symbols for speed purposes. > TCMalloc, which has an implementation of this, uses an offset to adjust which field it reads (cpu_id versus vcpu_id). > > > > 2) Implement the entire function as IFUNC and select whether a rseq or > > non-rseq implementation should be used at C startup. The tradeoff > > here is code size vs speed, and using IFUNC for things like malloc > > may add additional constraints on the startup order. > IFUNC has significant performance overheads as well. For frequently used code (like memcpy), avoiding them has been an optimization for us ( https://research.google/pubs/pub50338/) even if it leaves some nominal microbenchmark performance on the table. > > > 3) Code rewrite (dynamic code patching) between rseq and non-rseq code. > > This may be frowned upon in the security area and may not always be > > possible depending on the context. > > > > 3) JIT compilation of specialized rseq vs non-rseq code. Not generally > > available in C. > > > > I suspect that glibc may rely on approaches 1+2 depending on the > > situation, and many applications may use approach (1) for simplicity > > reasons. > > If the kernel does not currently overwrite the padding, glibc can do > its own per-thread initialization there to support its malloc > implementation (because the padding is undefined today from an > application perspective). That is, we would initialize these > invisible vCPU IDs the same way we assign arenas today. That would > cover this specific malloc use case only, of course. > If a user program updates to a new kernel before glibc does, would it be able to easily take advantage of it? Chris