public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Maybe we should get rid of ifuncs
@ 2024-04-23 18:14 Zack Weinberg
  2024-04-23 18:39 ` enh
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Zack Weinberg @ 2024-04-23 18:14 UTC (permalink / raw)
  To: libc-alpha

I've been thinking about the XZ exploit (two versions of the compression
library `liblzma` included Trojan horse code that injected a back
door into sshd; see https://research.swtch.com/xz-timeline) and what
it means for glibc, and what I've come to is we should reconsider the
entire idea of ifuncs.

The SSH protocol does not use XZ compression.  liblzma.so was loaded
into sshd's address space because some Linux distributions patched
sshd to use libsystemd, and some libsystemd functions (having to do
with systemd's "journal" logging subsystem, IIUC) do use liblzma, but
by itself that wouldn't have been enough to give the exploit control,
because the patched sshd doesn't use any of those functions.  But
these same Linux distributions also compile libsshd with -z now
(ironically, as a hardening measure, together with -z relro) and that
means the resolvers for all the ifuncs in *all* the loaded shared
libraries will be invoked, early enough in process startup that the
PLT and GOT are still writable.  The XZ exploit used an ifunc resolver
to rewrite a whole bunch of PLT entries, intercepting both calls
within sshd proper, and calls from sshd to libcrypto.so
(i.e. OpenSSL's general-purpose cryptography library).

Ifuncs were already a problem -- resolvers are arbitrary application
code that gets called from deep within the guts of the dynamic loader,
possibly while internal locks are held (I don't know for sure).
In -z now mode, they are called not just before the core C library is
fully initialized, but before symbol resolution is complete, meaning
that they can't necessarily make *any* function calls; we've had any
number of bug reports about this.  They seem to be less troublesome in
lazy binding mode, as far as I can tell, but I can still imagine them
causing trouble (e.g. due to recursive invocation of the lazy symbol
resolution machinery, or due to injecting non-async-signal-safe code
into a call, from a signal handler, to a function that's *supposed* to
be async-signal-safe).  The glibc wiki page for ifuncs
(https://sourceware.org/glibc/wiki/GNU_IFUNC) warns readers that ifunc
resolvers are subject to severe restrictions that aren't documented or
even agreed upon.

As far as I know, the only legitimate (non-malicious) use case anyone
wants for ifuncs is to allow a library to select one of several
implementations of a single function, based on the characteristics of
the CPU -- such as how glibc itself selects the best available
implementation of `memcpy` for the CPU.  It seems to me that we ought
to be able to come up with a completely declarative mechanism for this
use case.  Perhaps a library could supply an array of candidate
implementations of a function, each paired with a bit vector that
declares all of the CPU capabilities that that implementation
requires, sorted from most to least stringent, and the dynamic loader
could run down the list and pick the first one that will work.  This
would avoid all the problems with calling application code from the
guts of the loader.  And, in -z relro -z now mode, it would mean that
no application code could run before the PLT and GOT are made
read-only, closing the path that the XZ trojan used to hook itself
into sshd.  We'd have to keep STT_GNU_IFUNC support around for at
least a few releases, but we could officially deprecate it and provide
a tunable and/or a build-time switch to disable it.

To figure out if this is a workable idea, questions for you all:
(1) Are there other use cases for ifuncs that I don't know about?
(2) Are there existing ifuncs that perform CPU-capability-based
function selection that *could not* be replaced with an array of bit
vectors like what I sketched in the previous paragraph?

zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:14 Maybe we should get rid of ifuncs Zack Weinberg
@ 2024-04-23 18:39 ` enh
  2024-04-23 19:46   ` Palmer Dabbelt
  2024-04-24 13:56   ` Zack Weinberg
  2024-04-23 18:52 ` Sam James
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: enh @ 2024-04-23 18:39 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha

On Tue, Apr 23, 2024 at 11:14 AM Zack Weinberg <zack@owlfolio.org> wrote:
>
> I've been thinking about the XZ exploit (two versions of the compression
> library `liblzma` included Trojan horse code that injected a back
> door into sshd; see https://research.swtch.com/xz-timeline) and what
> it means for glibc, and what I've come to is we should reconsider the
> entire idea of ifuncs.
>
> The SSH protocol does not use XZ compression.  liblzma.so was loaded
> into sshd's address space because some Linux distributions patched
> sshd to use libsystemd, and some libsystemd functions (having to do
> with systemd's "journal" logging subsystem, IIUC) do use liblzma, but
> by itself that wouldn't have been enough to give the exploit control,
> because the patched sshd doesn't use any of those functions.  But
> these same Linux distributions also compile libsshd with -z now
> (ironically, as a hardening measure, together with -z relro) and that
> means the resolvers for all the ifuncs in *all* the loaded shared
> libraries will be invoked, early enough in process startup that the
> PLT and GOT are still writable.  The XZ exploit used an ifunc resolver
> to rewrite a whole bunch of PLT entries, intercepting both calls
> within sshd proper, and calls from sshd to libcrypto.so
> (i.e. OpenSSL's general-purpose cryptography library).
>
> Ifuncs were already a problem -- resolvers are arbitrary application
> code that gets called from deep within the guts of the dynamic loader,
> possibly while internal locks are held (I don't know for sure).
> In -z now mode, they are called not just before the core C library is
> fully initialized, but before symbol resolution is complete, meaning
> that they can't necessarily make *any* function calls; we've had any
> number of bug reports about this.

(apparently FreeBSD uses two passes to avoid this, but bionic has the
same issue as glibc, and iirc musl doesn't implement ifuncs.)

> They seem to be less troublesome in
> lazy binding mode, as far as I can tell, but I can still imagine them
> causing trouble (e.g. due to recursive invocation of the lazy symbol
> resolution machinery, or due to injecting non-async-signal-safe code
> into a call, from a signal handler, to a function that's *supposed* to
> be async-signal-safe).  The glibc wiki page for ifuncs
> (https://sourceware.org/glibc/wiki/GNU_IFUNC) warns readers that ifunc
> resolvers are subject to severe restrictions that aren't documented or
> even agreed upon.
>
> As far as I know, the only legitimate (non-malicious) use case anyone
> wants for ifuncs is to allow a library to select one of several
> implementations of a single function, based on the characteristics of
> the CPU -- such as how glibc itself selects the best available
> implementation of `memcpy` for the CPU.  It seems to me that we ought
> to be able to come up with a completely declarative mechanism for this
> use case.  Perhaps a library could supply an array of candidate
> implementations of a function, each paired with a bit vector that
> declares all of the CPU capabilities that that implementation
> requires, sorted from most to least stringent, and the dynamic loader
> could run down the list and pick the first one that will work.  This
> would avoid all the problems with calling application code from the
> guts of the loader.  And, in -z relro -z now mode, it would mean that
> no application code could run before the PLT and GOT are made
> read-only, closing the path that the XZ trojan used to hook itself
> into sshd.  We'd have to keep STT_GNU_IFUNC support around for at
> least a few releases, but we could officially deprecate it and provide
> a tunable and/or a build-time switch to disable it.
>
> To figure out if this is a workable idea, questions for you all:
> (1) Are there other use cases for ifuncs that I don't know about?

one thing i think is interesting (having been looking at ifuncs while
adding riscv64 support to Android) is that afaict bionic [Android's
libc] is basically the only current user. every library i'm aware of
(and certainly every library that's part of the OS) does _not_ use
ifuncs, if only because iOS/macOS has no equivalent (and Windows
too?), and if you've got to have the manual function pointer
manipulation implementation for them...

that said, for llvm at least there's work on function multi-versioning
where the compiler basically writes the ifunc resolver. but (a) that's
not quite finished yet (?) and (b) i haven't seen anyone _use_ it yet
and (c) is at least by definition of being machine-generated pretty
regular.

> (2) Are there existing ifuncs that perform CPU-capability-based
> function selection that *could not* be replaced with an array of bit
> vectors like what I sketched in the previous paragraph?

arm32 was a horrific mess where SoCs and their kernels were pretty
confused about what they did/didn't have. Android's libc ifunc
resolvers basically had to end up with "look, just write down in this
file what you want us to use, and we'll use those" so the _device_
owners could override the mess the SoC vendors had made.

arm64 was slightly awkward when hwcap2 appeared out of nowhere...

...and riscv64 is the pathological case of that where you don't have
hwcap but do have __riscv_hwprobe(2) and an unbounded set of keys.

but that's not a "no" :-)

i'm curious if anyone has real-world examples of hand-written ifunc
resolvers _not_ in a libc?

> zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:14 Maybe we should get rid of ifuncs Zack Weinberg
  2024-04-23 18:39 ` enh
@ 2024-04-23 18:52 ` Sam James
  2024-04-23 18:54 ` Florian Weimer
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Sam James @ 2024-04-23 18:52 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha, Carlos O'Donell, Lasse Collin, Fangrui Song

[-- Attachment #1: Type: text/plain, Size: 3916 bytes --]

"Zack Weinberg" <zack@owlfolio.org> writes:

> I've been thinking about the XZ exploit (two versions of the compression
> library `liblzma` included Trojan horse code that injected a back
> door into sshd; see https://research.swtch.com/xz-timeline) and what
> it means for glibc, and what I've come to is we should reconsider the
> entire idea of ifuncs.
>
> [...]
>
> Ifuncs were already a problem -- resolvers are arbitrary application
> code that gets called from deep within the guts of the dynamic loader,
> possibly while internal locks are held (I don't know for sure).
> In -z now mode, they are called not just before the core C library is
> fully initialized, but before symbol resolution is complete, meaning
> that they can't necessarily make *any* function calls; we've had any
> number of bug reports about this.  They seem to be less troublesome in
> lazy binding mode, as far as I can tell, but I can still imagine them
> causing trouble (e.g. due to recursive invocation of the lazy symbol
> resolution machinery, or due to injecting non-async-signal-safe code
> into a call, from a signal handler, to a function that's *supposed* to
> be async-signal-safe).

Yeah, no example comes to mind for lazy binding but I can see it
happening.

> The glibc wiki page for ifuncs
> (https://sourceware.org/glibc/wiki/GNU_IFUNC) warns readers that ifunc
> resolvers are subject to severe restrictions that aren't documented or
> even agreed upon.
>

There's a huge number of caveats and it's *very* difficult to use them
safely.

https://gcc.gnu.org/PR114115 was an example of a legitimate (no evil) issue of
IFUNC misuse with no compiler diagnostic.

Carlos observed in https://gcc.gnu.org/PR70082 that:
> In 2005 the GNU IFUNC support was documented and added to GCC via the
> ifunc attribute. To be honest this was a mistake, the feature is not
> documented and the implementation has so many caveats that it is incredibly difficult to use.

I don't really think the situation has improved substantially since
2005 or 2016 when Carlos (CC'd) made that comment.


> As far as I know, the only legitimate (non-malicious) use case anyone
> wants for ifuncs is to allow a library to select one of several
> implementations of a single function, based on the characteristics of
> the CPU -- such as how glibc itself selects the best available
> implementation of `memcpy` for the CPU.  It seems to me that we ought
> to be able to come up with a completely declarative mechanism for this
> use case.  Perhaps a library could supply an array of candidate
> implementations of a function, each paired with a bit vector that
> declares all of the CPU capabilities that that implementation
> requires, sorted from most to least stringent, and the dynamic loader
> could run down the list and pick the first one that will work.  This
> would avoid all the problems with calling application code from the
> guts of the loader.  And, in -z relro -z now mode, it would mean that
> no application code could run before the PLT and GOT are made
> read-only, closing the path that the XZ trojan used to hook itself
> into sshd.  We'd have to keep STT_GNU_IFUNC support around for at
> least a few releases, but we could officially deprecate it and provide
> a tunable and/or a build-time switch to disable it.
>

The only other thing I can say is that maskray (CC'd) mentioned recently that
there's an alternative model of IFUNC which Apple are using, but I only
looked a bit at how it works, so I'm not in the best position to
describe it.

Thanks for bringing this up.

> To figure out if this is a workable idea, questions for you all:
> (1) Are there other use cases for ifuncs that I don't know about?
> (2) Are there existing ifuncs that perform CPU-capability-based
> function selection that *could not* be replaced with an array of bit
> vectors like what I sketched in the previous paragraph?
>
> zw

thanks,
sam

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:14 Maybe we should get rid of ifuncs Zack Weinberg
  2024-04-23 18:39 ` enh
  2024-04-23 18:52 ` Sam James
@ 2024-04-23 18:54 ` Florian Weimer
  2024-04-24 13:53   ` Zack Weinberg
  2024-04-23 19:26 ` Andreas Schwab
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2024-04-23 18:54 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha

* Zack Weinberg:

> I've been thinking about the XZ exploit (two versions of the compression
> library `liblzma` included Trojan horse code that injected a back
> door into sshd; see https://research.swtch.com/xz-timeline) and what
> it means for glibc, and what I've come to is we should reconsider the
> entire idea of ifuncs.
>
> The SSH protocol does not use XZ compression.  liblzma.so was loaded
> into sshd's address space because some Linux distributions patched
> sshd to use libsystemd, and some libsystemd functions (having to do
> with systemd's "journal" logging subsystem, IIUC) do use liblzma, but
> by itself that wouldn't have been enough to give the exploit control,
> because the patched sshd doesn't use any of those functions.  But
> these same Linux distributions also compile libsshd with -z now
> (ironically, as a hardening measure, together with -z relro) and that
> means the resolvers for all the ifuncs in *all* the loaded shared
> libraries will be invoked, early enough in process startup that the
> PLT and GOT are still writable.  The XZ exploit used an ifunc resolver
> to rewrite a whole bunch of PLT entries, intercepting both calls
> within sshd proper, and calls from sshd to libcrypto.so
> (i.e. OpenSSL's general-purpose cryptography library).

GOT rewriting wasn't required.  OpenSSL itself has support for hooking
the relevant functions:

  <https://openssl.org/docs/man3.0/man3/RSA_set_method.html>

For some of the link orders I've seen, plain ELF symbol interposition
would have worked as well.  We don't know if such a thing gets detected
in practice.

> Ifuncs were already a problem -- resolvers are arbitrary application
> code that gets called from deep within the guts of the dynamic loader,
> possibly while internal locks are held (I don't know for sure).

Internal locks are held.  That also happens for ELF constructors (but is
perhaps fixable there).

> In -z now mode, they are called not just before the core C library is
> fully initialized, but before symbol resolution is complete, meaning
> that they can't necessarily make *any* function calls; we've had any
> number of bug reports about this.

We are getting closer to be able to fully initialize libc before IFUNC
resolvers run.  It should be a relatively short patch (a few dozen
lines), at least for Linux.  Since commit 78ca44da0160a0b442f ("elf:
Relocate libc.so early during startup and dlmopen (bug 31083)") we
already relocate libc out of order.

Beyond libc, there are still issues around symbol interposition (or
underlinking) and execution of IFUNC resolvers implemented in
yet-to-be-relocated shared objects.

In the other direction, I think it would be valuable to offer a mode
where we run ELF constructors when .data.rel.ro is still writable.

(In general, I'd be worried to chase last month's problem.)

> As far as I know, the only legitimate (non-malicious) use case anyone
> wants for ifuncs is to allow a library to select one of several
> implementations of a single function, based on the characteristics of
> the CPU -- such as how glibc itself selects the best available
> implementation of `memcpy` for the CPU.  It seems to me that we ought
> to be able to come up with a completely declarative mechanism for this
> use case.

Selection rules for string functions can be quite complicated, depending
not just on advertised CPU capabilities but also on CPU models (and
preferences derived from that).  For each new selection criteria, we'd
have to update glibc to implement it before it becomes usable by
applications.

I'm not sure how valuable it is to prevent code execution at the
relocation stage when a few microseconds later, the ELF constructor is
invoked, which by definition contains arbitrary code.

> And, in -z relro -z now mode, it would mean that no application code
> could run before the PLT and GOT are made read-only, closing the path
> that the XZ trojan used to hook itself into sshd.

It's possible to revert RELRO.  We do that for static dlopen on some
architectures (something that could be avoided with early libc
initialization described above).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:14 Maybe we should get rid of ifuncs Zack Weinberg
                   ` (2 preceding siblings ...)
  2024-04-23 18:54 ` Florian Weimer
@ 2024-04-23 19:26 ` Andreas Schwab
  2024-04-24 13:54   ` Zack Weinberg
  2024-04-24  1:41 ` Richard Henderson
  2024-04-30  8:42 ` Simon Josefsson
  5 siblings, 1 reply; 16+ messages in thread
From: Andreas Schwab @ 2024-04-23 19:26 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha

libgcc on powerpc uses ifuncs for selection of IEEE 128-bit float
software vs hardware implementation (they are linked into every binary
using long double arithmetic).

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:39 ` enh
@ 2024-04-23 19:46   ` Palmer Dabbelt
  2024-04-24 13:56   ` Zack Weinberg
  1 sibling, 0 replies; 16+ messages in thread
From: Palmer Dabbelt @ 2024-04-23 19:46 UTC (permalink / raw)
  To: enh; +Cc: zack, libc-alpha

On Tue, 23 Apr 2024 11:39:21 PDT (-0700), enh@google.com wrote:
> On Tue, Apr 23, 2024 at 11:14 AM Zack Weinberg <zack@owlfolio.org> wrote:
>>
>> I've been thinking about the XZ exploit (two versions of the compression
>> library `liblzma` included Trojan horse code that injected a back
>> door into sshd; see https://research.swtch.com/xz-timeline) and what
>> it means for glibc, and what I've come to is we should reconsider the
>> entire idea of ifuncs.
>>
>> The SSH protocol does not use XZ compression.  liblzma.so was loaded
>> into sshd's address space because some Linux distributions patched
>> sshd to use libsystemd, and some libsystemd functions (having to do
>> with systemd's "journal" logging subsystem, IIUC) do use liblzma, but
>> by itself that wouldn't have been enough to give the exploit control,
>> because the patched sshd doesn't use any of those functions.  But
>> these same Linux distributions also compile libsshd with -z now
>> (ironically, as a hardening measure, together with -z relro) and that
>> means the resolvers for all the ifuncs in *all* the loaded shared
>> libraries will be invoked, early enough in process startup that the
>> PLT and GOT are still writable.  The XZ exploit used an ifunc resolver
>> to rewrite a whole bunch of PLT entries, intercepting both calls
>> within sshd proper, and calls from sshd to libcrypto.so
>> (i.e. OpenSSL's general-purpose cryptography library).
>>
>> Ifuncs were already a problem -- resolvers are arbitrary application
>> code that gets called from deep within the guts of the dynamic loader,
>> possibly while internal locks are held (I don't know for sure).
>> In -z now mode, they are called not just before the core C library is
>> fully initialized, but before symbol resolution is complete, meaning
>> that they can't necessarily make *any* function calls; we've had any
>> number of bug reports about this.
>
> (apparently FreeBSD uses two passes to avoid this, but bionic has the
> same issue as glibc, and iirc musl doesn't implement ifuncs.)
>
>> They seem to be less troublesome in
>> lazy binding mode, as far as I can tell, but I can still imagine them
>> causing trouble (e.g. due to recursive invocation of the lazy symbol
>> resolution machinery, or due to injecting non-async-signal-safe code
>> into a call, from a signal handler, to a function that's *supposed* to
>> be async-signal-safe).  The glibc wiki page for ifuncs
>> (https://sourceware.org/glibc/wiki/GNU_IFUNC) warns readers that ifunc
>> resolvers are subject to severe restrictions that aren't documented or
>> even agreed upon.
>>
>> As far as I know, the only legitimate (non-malicious) use case anyone
>> wants for ifuncs is to allow a library to select one of several
>> implementations of a single function, based on the characteristics of
>> the CPU -- such as how glibc itself selects the best available
>> implementation of `memcpy` for the CPU.  It seems to me that we ought
>> to be able to come up with a completely declarative mechanism for this
>> use case.  Perhaps a library could supply an array of candidate
>> implementations of a function, each paired with a bit vector that
>> declares all of the CPU capabilities that that implementation
>> requires, sorted from most to least stringent, and the dynamic loader
>> could run down the list and pick the first one that will work.  This
>> would avoid all the problems with calling application code from the
>> guts of the loader.  And, in -z relro -z now mode, it would mean that
>> no application code could run before the PLT and GOT are made
>> read-only, closing the path that the XZ trojan used to hook itself
>> into sshd.  We'd have to keep STT_GNU_IFUNC support around for at
>> least a few releases, but we could officially deprecate it and provide
>> a tunable and/or a build-time switch to disable it.
>>
>> To figure out if this is a workable idea, questions for you all:
>> (1) Are there other use cases for ifuncs that I don't know about?
>
> one thing i think is interesting (having been looking at ifuncs while
> adding riscv64 support to Android) is that afaict bionic [Android's
> libc] is basically the only current user. every library i'm aware of
> (and certainly every library that's part of the OS) does _not_ use
> ifuncs, if only because iOS/macOS has no equivalent (and Windows
> too?), and if you've got to have the manual function pointer
> manipulation implementation for them...
>
> that said, for llvm at least there's work on function multi-versioning
> where the compiler basically writes the ifunc resolver. but (a) that's
> not quite finished yet (?) and (b) i haven't seen anyone _use_ it yet
> and (c) is at least by definition of being machine-generated pretty
> regular.
>
>> (2) Are there existing ifuncs that perform CPU-capability-based
>> function selection that *could not* be replaced with an array of bit
>> vectors like what I sketched in the previous paragraph?
>
> arm32 was a horrific mess where SoCs and their kernels were pretty
> confused about what they did/didn't have. Android's libc ifunc
> resolvers basically had to end up with "look, just write down in this
> file what you want us to use, and we'll use those" so the _device_
> owners could override the mess the SoC vendors had made.
>
> arm64 was slightly awkward when hwcap2 appeared out of nowhere...
>
> ...and riscv64 is the pathological case of that where you don't have
> hwcap but do have __riscv_hwprobe(2) and an unbounded set of keys.

I don't really have an opinion on the global glibc side of things here, 
but I don't think RISC-V is a good argument for keeping anything around 
if people don't want it otherwise.  We're just not that important of 
a target right now, there's no competitive hardware and the 
fragmentation is so pathologically bad that we'll just end up adding 
some crazy constraints to whatever people are trying to do.  Hopefully 
that will change at some point, but for now it's hard to make much of an 
argument for adding global complexity to improve RISC-V performance.

That said, I think some sort of "list of required features -> 
implementation pairs" would match what we're doing with IFUNCs.  The 
"required feature" set would have to be very big as we have many 
ISA/vendor features in RISC-V, and the features are a bit more complex 
than just a bitmap (some are multiple bits, possibly even quantitative) 
otherwise that's just what we're doing with the IFUNCs I can think of.

Right now we're just using IFUNCs because we can write adhoc arbitrary C 
code to encode those rules, but I don't see any reason we couldn't have 
an interface that's more specific to the features and priority rules.  
We'll need to end up doing something along those lines when we do 
function multi-versioning in GCC, so maybe we just piggyback on that 
effort?

> but that's not a "no" :-)
>
> i'm curious if anyone has real-world examples of hand-written ifunc
> resolvers _not_ in a libc?
>
>> zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:14 Maybe we should get rid of ifuncs Zack Weinberg
                   ` (3 preceding siblings ...)
  2024-04-23 19:26 ` Andreas Schwab
@ 2024-04-24  1:41 ` Richard Henderson
  2024-04-24 14:43   ` Zack Weinberg
  2024-04-30  8:42 ` Simon Josefsson
  5 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2024-04-24  1:41 UTC (permalink / raw)
  To: Zack Weinberg, libc-alpha

On 4/23/24 11:14, Zack Weinberg wrote:
> (2) Are there existing ifuncs that perform CPU-capability-based
> function selection that*could not*  be replaced with an array of bit
> vectors like what I sketched in the previous paragraph?

How much is in actual use, I have no idea.  However:

Even x86 cpuid generates a staggeringly large bit vector.

Similarly the arm cpu id register space (some of which is reflected in hwcap; all of which 
is available via el0 trap-to-read-filtered-value).

In principal I think the idea of declarative selection is good. It's what I would have 
chosen once upon a time if I had been clever enough to invent the correct (almost 
certainly target-specific) data structure(s).


r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:54 ` Florian Weimer
@ 2024-04-24 13:53   ` Zack Weinberg
  0 siblings, 0 replies; 16+ messages in thread
From: Zack Weinberg @ 2024-04-24 13:53 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Tue Apr 23, 2024 at 2:54 PM EDT, Florian Weimer wrote:
> * Zack Weinberg:
> > The XZ exploit used an ifunc resolver to rewrite a whole bunch of
> > PLT entries, intercepting both calls within sshd proper, and calls
> > from sshd to libcrypto.so (i.e. OpenSSL's general-purpose
> > cryptography library).
>
> GOT rewriting wasn't required.  OpenSSL itself has support for hooking
> the relevant functions:
>
>   <https://openssl.org/docs/man3.0/man3/RSA_set_method.html>
>
> For some of the link orders I've seen, plain ELF symbol interposition
> would have worked as well.  We don't know if such a thing gets detected
> in practice.

I haven't looked closely at the actual Trojan payload myself; I didn't
know it overtly defined symbols that collide with libcrypto entry
points (thus potentially interposing those symbols).

My goal with this proposal is not to make XZ-type exploits
*impossible*, but rather, to make them easier to detect.  Calls to
RSA_set_method, calls to the "newer OSSL_PROVIDER API" as mentioned in
the above manpage, and symbol collisions between libraries that have
no business interposing on each other: these can all be mechanically
detected, at least in principle.

> > Ifuncs were already a problem -- resolvers are arbitrary application
> > code that gets called from deep within the guts of the dynamic loader,
> > possibly while internal locks are held (I don't know for sure).
>
> Internal locks are held.

Yikes! That by itself seems like a strong argument for replacing this
mechanism.

> > In -z now mode, they are called not just before the core C library is
> > fully initialized, but before symbol resolution is complete, meaning
> > that they can't necessarily make *any* function calls; we've had any
> > number of bug reports about this.
>
> We are getting closer to be able to fully initialize libc before IFUNC
> resolvers run.

It's good to know that it's *possible* to make ifunc resolvers more
robust, but right now I'm not sure it's *worthwhile*.  Unless someone
has a use case that isn't CPU-based code selection, I currently think
replacing ifuncs with something declarative would be a good idea
regardless of potential security benefits.

> In the other direction, I think it would be valuable to offer a mode
> where we run ELF constructors when .data.rel.ro is still writable.

What uses do you see for this mode?

> (In general, I'd be worried to chase last month's problem.)

Well, this is hardening -- every little bit helps.  As I said above,
my security goal here is to make it so potential interception of
function calls to library A by library B is easily machine detectable.
We would still need to build the machine, of course, but that's well
within the scope of existing projects (e.g. archive-wide Lintian scans).

> I'm not sure how valuable it is to prevent code execution at the
> relocation stage when a few microseconds later, the ELF constructor is
> invoked, which by definition contains arbitrary code.

I see it as valuable precisely because ELF constructors (should)
run after the PLT and GOT are made read-only.

I do have a sketch of what it would take to move ld.so completely
out of process, if you're interested in bigger hammers.

> It's possible to revert RELRO.  We do that for static dlopen on some
> architectures (something that could be avoided with early libc
> initialization described above).

What would it take to get rid of that possibility?

zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 19:26 ` Andreas Schwab
@ 2024-04-24 13:54   ` Zack Weinberg
  0 siblings, 0 replies; 16+ messages in thread
From: Zack Weinberg @ 2024-04-24 13:54 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

On Tue Apr 23, 2024 at 3:26 PM EDT, Andreas Schwab wrote:
> libgcc on powerpc uses ifuncs for selection of IEEE 128-bit float
> software vs hardware implementation (they are linked into every binary
> using long double arithmetic).

That's still selecting machine code based on characteristics of the
CPU, isn't it?

zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:39 ` enh
  2024-04-23 19:46   ` Palmer Dabbelt
@ 2024-04-24 13:56   ` Zack Weinberg
  2024-04-24 14:25     ` enh
  1 sibling, 1 reply; 16+ messages in thread
From: Zack Weinberg @ 2024-04-24 13:56 UTC (permalink / raw)
  To: enh; +Cc: libc-alpha

On Tue Apr 23, 2024 at 2:39 PM EDT, enh wrote:
> one thing i think is interesting (having been looking at ifuncs while
> adding riscv64 support to Android) is that afaict bionic [Android's
> libc] is basically the only current user. every library i'm aware of
> (and certainly every library that's part of the OS) does _not_ use
> ifuncs, if only because iOS/macOS has no equivalent (and Windows
> too?), and if you've got to have the manual function pointer
> manipulation implementation for them...

I assume you meant "every library that's *not* part of the OS"?

> that said, for llvm at least there's work on function multi-versioning
> where the compiler basically writes the ifunc resolver. but (a) that's
> not quite finished yet (?) and (b) i haven't seen anyone _use_ it yet
> and (c) is at least by definition of being machine-generated pretty
> regular.

Do you know where to find the llvm work in progress?

zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-24 13:56   ` Zack Weinberg
@ 2024-04-24 14:25     ` enh
  0 siblings, 0 replies; 16+ messages in thread
From: enh @ 2024-04-24 14:25 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha

On Wed, Apr 24, 2024 at 6:56 AM Zack Weinberg <zack@owlfolio.org> wrote:
>
> On Tue Apr 23, 2024 at 2:39 PM EDT, enh wrote:
> > one thing i think is interesting (having been looking at ifuncs while
> > adding riscv64 support to Android) is that afaict bionic [Android's
> > libc] is basically the only current user. every library i'm aware of
> > (and certainly every library that's part of the OS) does _not_ use
> > ifuncs, if only because iOS/macOS has no equivalent (and Windows
> > too?), and if you've got to have the manual function pointer
> > manipulation implementation for them...
>
> I assume you meant "every library that's *not* part of the OS"?

no, for the OS [except the already-mentioned exception of libc] there
are _no_ ifuncs. for apps, well, there were none in a _sampling_ of
popular apps i did, but doing a complete survey of all apps ... well,
there are a _lot_ of apps.

> > that said, for llvm at least there's work on function multi-versioning
> > where the compiler basically writes the ifunc resolver. but (a) that's
> > not quite finished yet (?) and (b) i haven't seen anyone _use_ it yet
> > and (c) is at least by definition of being machine-generated pretty
> > regular.
>
> Do you know where to find the llvm work in progress?

the docs are here:
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

https://github.com/android/ndk/issues/1909 implies that -- despite the
"beta" in the docs -- enough code might be in llvm and then downstream
in the ndk that arm64 FMV is now usable. (but as i said, i haven't
used it yet, and i don't yet know of anyone who does.)

> zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-24  1:41 ` Richard Henderson
@ 2024-04-24 14:43   ` Zack Weinberg
  2024-04-24 15:09     ` enh
  2024-04-28  0:24     ` Peter Bergner
  0 siblings, 2 replies; 16+ messages in thread
From: Zack Weinberg @ 2024-04-24 14:43 UTC (permalink / raw)
  To: Richard Henderson, libc-alpha

On Tue Apr 23, 2024 at 9:41 PM EDT, Richard Henderson wrote:
> On 4/23/24 11:14, Zack Weinberg wrote:
> > (2) Are there existing ifuncs that perform CPU-capability-based
> > function selection that*could not*  be replaced with an array of bit
> > vectors like what I sketched in the previous paragraph?
>
> How much is in actual use, I have no idea.  However:
> Even x86 cpuid generates a staggeringly large bit vector.

Oof, yeah.  sizeof(struct cpu_features) == 488 on x86_64, and over
half of that is cpuid dumps.  It probably _could_ be compacted,
but as Florian pointed out any compaction we implement means glibc
has to be updated for new CPU features (but then again we have to
do that _anyway_...)

Another thing that looking at cpu_features makes obvious is that
several architectures include numbers that can't easily be reduced
to one-hot representation.  It'd be reasonable to want to dispatch on
cache line size, for instance.  I don't like the idea of embedding
something even vaguely resembling a bytecode interpreter in ld.so,
and yet...

I'm very curious what the plan for function multiversioning in GCC
and LLVM is, and how close to declarative it gets.

zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-24 14:43   ` Zack Weinberg
@ 2024-04-24 15:09     ` enh
  2024-04-28  0:24     ` Peter Bergner
  1 sibling, 0 replies; 16+ messages in thread
From: enh @ 2024-04-24 15:09 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Richard Henderson, libc-alpha

On Wed, Apr 24, 2024 at 7:43 AM Zack Weinberg <zack@owlfolio.org> wrote:
>
> On Tue Apr 23, 2024 at 9:41 PM EDT, Richard Henderson wrote:
> > On 4/23/24 11:14, Zack Weinberg wrote:
> > > (2) Are there existing ifuncs that perform CPU-capability-based
> > > function selection that*could not*  be replaced with an array of bit
> > > vectors like what I sketched in the previous paragraph?
> >
> > How much is in actual use, I have no idea.  However:
> > Even x86 cpuid generates a staggeringly large bit vector.
>
> Oof, yeah.  sizeof(struct cpu_features) == 488 on x86_64, and over
> half of that is cpuid dumps.  It probably _could_ be compacted,
> but as Florian pointed out any compaction we implement means glibc
> has to be updated for new CPU features (but then again we have to
> do that _anyway_...)
>
> Another thing that looking at cpu_features makes obvious is that
> several architectures include numbers that can't easily be reduced
> to one-hot representation.  It'd be reasonable to want to dispatch on
> cache line size, for instance.  I don't like the idea of embedding
> something even vaguely resembling a bytecode interpreter in ld.so,
> and yet...
>
> I'm very curious what the plan for function multiversioning in GCC
> and LLVM is, and how close to declarative it gets.

https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
is the arm64 spec. the generated code is basically an ifunc resolver
that does "if ((hwcap & ... ) && (hwcap2 & ...))".

you can see the basic idea in https://reviews.llvm.org/D155026.

> zw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-24 14:43   ` Zack Weinberg
  2024-04-24 15:09     ` enh
@ 2024-04-28  0:24     ` Peter Bergner
  2024-05-02  2:59       ` Michael Meissner
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Bergner @ 2024-04-28  0:24 UTC (permalink / raw)
  To: Zack Weinberg, Richard Henderson, libc-alpha; +Cc: Michael Meissner

On 4/24/24 9:43 AM, Zack Weinberg wrote:
> I'm very curious what the plan for function multiversioning in GCC
> and LLVM is, and how close to declarative it gets.

GCC (at least on powerpc) already supports it via the target_clones
attribute.  See gcc/testsuite/gcc.target/powerpc/clone*.c for examples.
Basically, it looks like (from clone3.c):

__attribute__((target_clones("cpu=power10,cpu=power9,default")))
long mod_func (long a, long b)
{
  return (a % b) + s;
}

long mod_func_or (long a, long b, long c)
{
  return mod_func (a, b) | c;
}


Mike knows how this works better than I, but GCC automatically emits an
ifunc resolver for the different clones and looks to use the HWCAP*
architecture mask associated with the cpu we're compiling for.
The "default" function being called in the case our ifunc resolver
doesn't match any of the HWCAP* masks from the cpus we're compiling
for.

Mike, it seems like this is more of a "cpu" clone and not a true HWCAP
test, so this specific thing doesn't (at least currently) work for
something like __attribute__((target_clones("vsx,mma,default"))) ?
Or did I misread the code?


I'll note I'm pretty sure we (IBM/powerpc) have added ifunc usage to
OpenBLAS and some other libraries outside of glibc.


Peter



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-23 18:14 Maybe we should get rid of ifuncs Zack Weinberg
                   ` (4 preceding siblings ...)
  2024-04-24  1:41 ` Richard Henderson
@ 2024-04-30  8:42 ` Simon Josefsson
  5 siblings, 0 replies; 16+ messages in thread
From: Simon Josefsson @ 2024-04-30  8:42 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

"Zack Weinberg" <zack@owlfolio.org> writes:

> To figure out if this is a workable idea, questions for you all:
> (1) Are there other use cases for ifuncs that I don't know about?
> (2) Are there existing ifuncs that perform CPU-capability-based
> function selection that *could not* be replaced with an array of bit
> vectors like what I sketched in the previous paragraph?

Interesting take.  I have recently looked at lib25519 and libmceliece
which uses ifuncs to select crypto primitives.  While I think it may be
possible to rewrite it for something you suggest, I think libraries like
this suggests that some developer find the ifunc interface useful and
that rewriting things will require some discussion.  Take this as a +0.9.

https://salsa.debian.org/debian/lib25519
https://salsa.debian.org/debian/libmceliece

Some relevant selector logic code:
https://salsa.debian.org/debian/libmceliece/-/blob/main/scripts-build/dispatch
https://salsa.debian.org/debian/libmceliece/-/blob/main/cpuid/amd64.c

/Simon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Maybe we should get rid of ifuncs
  2024-04-28  0:24     ` Peter Bergner
@ 2024-05-02  2:59       ` Michael Meissner
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2024-05-02  2:59 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Zack Weinberg, Richard Henderson, libc-alpha, Michael Meissner

On Sat, Apr 27, 2024 at 07:24:05PM -0500, Peter Bergner wrote:
> On 4/24/24 9:43 AM, Zack Weinberg wrote:
> > I'm very curious what the plan for function multiversioning in GCC
> > and LLVM is, and how close to declarative it gets.
> 
> GCC (at least on powerpc) already supports it via the target_clones
> attribute.  See gcc/testsuite/gcc.target/powerpc/clone*.c for examples.
> Basically, it looks like (from clone3.c):
> 
> __attribute__((target_clones("cpu=power10,cpu=power9,default")))
> long mod_func (long a, long b)
> {
>   return (a % b) + s;
> }
> 
> long mod_func_or (long a, long b, long c)
> {
>   return mod_func (a, b) | c;
> }
> 
> 
> Mike knows how this works better than I, but GCC automatically emits an
> ifunc resolver for the different clones and looks to use the HWCAP*
> architecture mask associated with the cpu we're compiling for.
> The "default" function being called in the case our ifunc resolver
> doesn't match any of the HWCAP* masks from the cpus we're compiling
> for.

Sorry, I've been in and out of the hospital with my wife.

> Mike, it seems like this is more of a "cpu" clone and not a true HWCAP
> test, so this specific thing doesn't (at least currently) work for
> something like __attribute__((target_clones("vsx,mma,default"))) ?
> Or did I misread the code?

There are 3 things GCC provides:

1) Is the ability to write an ifunc function.  Any call to func is always
indirect.  The loader calls resolver at program/shared library load to get the
address of the function to use:

	extern int func_power10 (void);
	extern int func_power9 (void);
	extern int func_default (void);

	int func (void) __attribute__ ((__ifunc__ ("resolver")));

	void *
	resolver (void)
	{
	  if (__builtin_cpu_supports ("arch_3_1"))
	    return (void *) func_power10;

	  else if (__builtin_cpu_supports ("arch_3_00"))
	    return (void *) func_power9;

	  else
	    return (void *) func_default;
	}

2) The ability to change the target defaults for a particular function:

	int func_power10 (void) __attribute__((__target__("cpu=power10")));

	int func_power10 (void)
	{
	  // this function will be compiled for power10
	}

GCC allows the stuff inside __attribute__ to have 2 prefix underscores and 2
suffix underscores or not.  I prefer to always use the underscore prefixes and
suffixes just in case the user defined a 'target' macro (i.e. the stuff within
attributes is subject to macro replacement).

An alternative is to use #pragmas to change the defaults for a bit:

	#pragma GCC push_options
	#pragma GCC target ("cpu=power10")

	int func_power10 (void)
	{
	  // compiled with power10 options
	}

	#pragma GCC target ("cpu=power9")

	int func_power9 (void)
	{
	  // compiled with power9 options
	}

	#pragma GCC pop_options

	int func_default (void)
	{
	  // compiled with the default options
	}


3) The ability to use target clones, where the compiler constructs the ifunc
function, and recompiles the function multiple times with different target
defaults.

	extern int func (void)
	  __attribute__((__target_clones__("cpu=power10,cpu=power9,default")));

	int func (void) {
	  // 3 versions of func are compiled along with an ifunc resolver.
	}

Note, 'default' must always be listed in the target clones.  You can only
specify one option (i.e. you can't do something like compile -mcpu=power9 and
-mtune=power10 into one option).  So in practice, only -mcpu=<xxx> options are
useful.

If we need better fine grained support, we could have -mcpu options that adds
or subtracts the options.

The automatic ifunc only looks at hwcap/hwcap2 bits, and it sorts it so that it
checks for power10 first, etc.  At present, we have target clone support for:

	power6
	power7
	power8
	power9
	power10

Note since there is no real hwcap bit for power11, with my current patches for
power11, if you do:

	extern int func (void)
	  __attribute__((__target_clones__("cpu=power11,cpu=power10,cpu=power9,default")));

it will compile both power11 and power10 clones, but the resolver will only
call the power10 clone because we don't have a separate hwcap bit for power11
(that I know of).  If we do have a separate hwcap bit, it is easy to add
support for power11.

Now one thing that I thought had been done, but it appears no longer being done
is that the #ifdefs (i.e. _ARCH_PWR10, etc.) aren't changed when compiling the
target clone.

> 
> I'll note I'm pretty sure we (IBM/powerpc) have added ifunc usage to
> OpenBLAS and some other libraries outside of glibc.
> 
> 
> Peter
> 
> 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-05-02  2:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-23 18:14 Maybe we should get rid of ifuncs Zack Weinberg
2024-04-23 18:39 ` enh
2024-04-23 19:46   ` Palmer Dabbelt
2024-04-24 13:56   ` Zack Weinberg
2024-04-24 14:25     ` enh
2024-04-23 18:52 ` Sam James
2024-04-23 18:54 ` Florian Weimer
2024-04-24 13:53   ` Zack Weinberg
2024-04-23 19:26 ` Andreas Schwab
2024-04-24 13:54   ` Zack Weinberg
2024-04-24  1:41 ` Richard Henderson
2024-04-24 14:43   ` Zack Weinberg
2024-04-24 15:09     ` enh
2024-04-28  0:24     ` Peter Bergner
2024-05-02  2:59       ` Michael Meissner
2024-04-30  8:42 ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).