[Bug dynamic-link/30007] rfe: dlopen to specified address

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "stsp at users dot sourceforge.net" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug dynamic-link/30007] rfe: dlopen to specified address
Date: Tue, 28 Mar 2023 09:14:33 +0000	[thread overview]
Message-ID: <bug-30007-131-J92F2oSSKV@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-30007-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=30007

--- Comment #28 from Stas Sergeev <stsp at users dot sourceforge.net> ---
(In reply to Jonathon Anderson from comment #27)
> Briefly summarizing the main points from the original email in the mailing
> list [1]:

You are creatively summarizing. :)
To me, all Carlos's concerns were addressed
and yours are completely new to me.

> > dlmem() works at a lower level of abstraction than the rest of the dl* APIs, i.e. memory instead of solibs/objects. That has widespread impacts across many users of Glibc, including but not limited to security, LD_AUDIT, and developer tools (GDB). Some reasons follow:

I think we need _all_ reasons for such
a broad claims, not "some".

> >   - dlmem() does not ensure that the passed memory is a correctly mmap()'d object. It would be strongly preferable that the API ensures we CAN'T end up in an inconsistent state, instead of making it UB if the user slips up.

That's a not valid assumption.
The refactors in my patch are done not
out of nothing to do, but exactly to have
the common path for dlopen() and dlmem().
All elf sanity checks done by dopen(), are
applied also to dlmem().

> >   - dlmem() removes the "file descriptor" abstraction out of the link_map.

Could you please clarify?
In struct link_map I don't remember the
fd field, and the object name, which is
there, is supported by dlmem().

> A lot of tooling has to change to fit this new reality, both inside and outside Glibc: LD_AUDIT, developer tools (e.g. GDB), etc.

This needs a clarification, I don't
understand that part. What should they
change any why? Maybe gdb needs to be
able to trap dlmem() to auto-load debug
symbols - yes, that's what I admitted
long ago. But anything else than that?

> >   - dlmem() skips many syscalls, removing the kernel-side auditable events required for security tooling.

There are 2 use-cases.
1 is when dlmem() skips nothing, in a
sense that you yourself need to mmap()
an elf beforehand. So kernel still sees
everything, and even /proc/self/map_files
are correct.
2 is when the memory buffer comes out of
some other world, like from VM. In that
case it doesn't matter if the extra call
like memfd_create() is not done, as verifying
the code source is impossible in that case.

> In contrast, "dlopenfd" requires both memfd_create() (or similar) and mmap() of that fd, allowing e.g. FFI/JIT to be locked down by a security seccomp filter.

You can still lock down your jit by a
seccomp filter. Not sure why you need
memfd_create() to do that.

> Adding my own concern as well:

They were all your own though. :)

>   - dlmem() seems to to expect the user to parse the program headers and
> mmap() the binary as required. That requires the application to re-implement
> a core, delicate piece of ld.so...

Not sure what are you talking about.
My patch adds quite comprehensive test-cases
that try to cover the basic scenarios. So it
will help if you refer to a particular test
of mine that does something like this, as I
don't remember it did.
Like I said before, dlmem() uses essentially
the same code path in glibc as does dlopen().
And only a few small refacts were needed to
accomplish that.

> and do so correctly. From an API design
> perspective, that seems like a very poor choice of abstraction.

If I know what are you referring to, maybe
I'll answer. :)

> AFAICS none of these issues have been resolved in the latest patches.

This is because, as I said above, your summary
of Carlos's concerns is "creative". I addressed
his concerns: I dropped LD_AUDIT bits and I showed
how to implement fdlopen() and dlopen_with_offset().

> Some
> of these issues are intrinsic to the dlmem() semantics. So if another,
> better API will work for your case, that certainly would be preferred.

I am all for discussing any better API that can
work for me.

> > The primary problem is that this API
> > doesn't allow to preserve the user's
> > mapping. It is only using that mapping
> > to specify the reloc address, while
> > dlmem() can optionally preserve it (I
> > use the separate flag for that).
> This is precisely one of the concerns with dlmem(). Why must the user's
> mapping be preserved? So that the mirroring can be set up before the object
> is loaded?

Indeed.
This behavior is optional.

> Would replacing the dla_premap hook with some kind of custom-mmap()
> (dla_mmap()) hook fit your use case better? That could allow you to set up
> mirroring *as* the object is loaded, instead of before.

With the only difference being to give the
user 100 times more work? :) Instead of
dealing with mmap flags and file copies,
he has 1 small and simple call-back in my
impl.

> FWIW, do you need page-mirroring at all if you can just choose the reloc
> address to be within the VM space?

Yes because the VM see the pointers as if
VM_window_start==0. So all pointers there
will be incorrect and not passable to the
32bit world. Reloc address is planned to be
within MAP_32BIT.

> Presumably you won't intercept (all of) your own syscalls, primarily you're
> aiming for the syscalls while the 3rd-party "ancient code" is loading. So
> isn't it pretty much the same?

This is where the 64bit library does the
loading. The foreign code all runs under
KVM, so I don't even need a seccomp filter
for it. You propose me to intercept my own
syscalls, and this is what no other project
does.

> > Most of dl_audit framework can be implemented
> > with syscall interception, but why don't you
> > want to do that?
> Because (1) LD_AUDIT hearkens back to the days of Solaris and so is already
> on literally every GNU/Linux box in active use, and because (2) symbol
> binding (la_symbind) is done completely in userspace and can't be
> intercepted by syscalls.
> 
> Very different situation.

Which is why I said "most", not "all".
You actually can implement most/some parts
of LD_AUDIT via a syscall trapping, leaving
things like symbind or la_activity in glibc,
but you don't want to do that.

> > Mirrored and also reloc address specified.
> > AFAICT fdlopen()+memfd gives neither.
> And based on prior comments, I assume you also want to preserve user
> mappings here.

Only for the sake of mirroring.
Its a more broad feature of course, but me -
I only need it for mirroring.

> Note that the mprotect() calls are only if(__glibc_unlikely((c->prot &
> PROT_WRITE) == 0)).

Well, and otherwise (when PROT_WRITE is set)
I'd need the file copy. Which means I always
need.

> > Contrary to what you say, no one is
> > intercepting his own syscalls.
> I beg to disagree. Many projects filter or intercept their own syscalls.
> This *specific* approach hasn't been done before (I would point you to it if
> it was), but intercepting (or at least filtering) syscalls in-process is
> nothing new.

I think its only done when that process
executes an alien code. And even that is
likely wine-specific: I would be very
surprised if any other alien code can
execute a "syscall" instruction. For
example the js code can't execute a syscall,
so, as you already confirmed, chromium
mostly does filtering to catch occasional
bugs of its own.
What I don't believe you can ever find, is
some project intercepting the syscalls of
its own, and "emulating" them as if its an
alien code running. More generally, I don't
think someone uses that technique to extend
the functionality. They either implement that
for security reasons (chromium), or for
debugging reasons (gdb), or for an emulation
of an alien code (wine). Extending the
functionality on a syscall level looks like
a gross hack, given that a very simple
high-level API suits well.

> > which will
> > basically mean to just copy the initially
> > memory-based solib into a file on hdd rather
> > than to even properly use memfd.
> Why is the HDD required here, can't you just copy to a memfd file? That's
> what I suggested above.

There are 2 "files" in that picture.
One memfd comes from the solib in memory,
and another memfd seems to come from your
suggestion. So I won't be able to even use
the solib's memfd properly, and will instead
have to copy it to the file on hdd (or to
the second memfd).

> But it seems like your latest patches are shorter than I had remembered, I
> stand corrected. IIRC at one point there was a 1300-addition patch, which is
> where my comment came from, but that seems to have been cleaned up now.
> Great! :D

Thanks!
Knowing that the patches are at least
looked into, is a big relief. :)

> I don't run the show here... but AFAIK the code here is carefully, heavily,
> manually optimized to generate the best performance with a wide range of C
> compilers. Carelessly refactoring it and especially adding additional
> function calls will destroy a lot of that work. (Although I dislike the
> spaghetti as much as you do. :P)

Well, if not for the musl that demonstrated
the possibility of writing a libc without any
spaghetti code (or a small and structured,
but completely obfuscated code as in uclibc),
I would believe that argument. :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

next prev parent reply	other threads:[~2023-03-28  9:14 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16 14:13 [Bug dynamic-link/30007] New: rfe: dlopen to user buffer or " stsp at users dot sourceforge.net
2023-01-17 14:17 ` [Bug dynamic-link/30007] " adhemerval.zanella at linaro dot org
2023-01-17 14:35 ` stsp at users dot sourceforge.net
2023-01-19 13:23 ` adhemerval.zanella at linaro dot org
2023-01-19 13:46 ` stsp at users dot sourceforge.net
2023-01-20  6:55 ` [Bug dynamic-link/30007] rfe: dlopen " stsp at users dot sourceforge.net
2023-02-15 16:58 ` stsp at users dot sourceforge.net
2023-03-14  3:20 ` stsp at users dot sourceforge.net
2023-03-14 13:21 ` adhemerval.zanella at linaro dot org
2023-03-15  5:34 ` janderson at rice dot edu
2023-03-15  6:33 ` stsp at users dot sourceforge.net
2023-03-15 10:03 ` stsp at users dot sourceforge.net
2023-03-15 11:05 ` stsp at users dot sourceforge.net
2023-03-15 11:41 ` stsp at users dot sourceforge.net
2023-03-15 12:07 ` stsp at users dot sourceforge.net
2023-03-17  6:44 ` stsp at users dot sourceforge.net
2023-03-18 23:28 ` janderson at rice dot edu
2023-03-19  2:12 ` stsp at users dot sourceforge.net
2023-03-19  2:37 ` stsp at users dot sourceforge.net
2023-03-19  9:47 ` stsp at users dot sourceforge.net
2023-03-19 10:14 ` stsp at users dot sourceforge.net
2023-03-19 13:56 ` stsp at users dot sourceforge.net
2023-03-23 10:34 ` stsp at users dot sourceforge.net
2023-03-27  7:12 ` stsp at users dot sourceforge.net
2023-03-27 17:16 ` janderson at rice dot edu
2023-03-28  1:29 ` stsp at users dot sourceforge.net
2023-03-28  5:02 ` stsp at users dot sourceforge.net
2023-03-28  5:37 ` stsp at users dot sourceforge.net
2023-03-28  6:17 ` janderson at rice dot edu
2023-03-28  9:14 ` stsp at users dot sourceforge.net [this message]
2023-03-29 15:26 ` stsp at users dot sourceforge.net
2023-03-31 15:43 ` stsp at users dot sourceforge.net
2023-03-31 15:53 ` stsp at users dot sourceforge.net
2023-03-31 16:08 ` stsp at users dot sourceforge.net
2023-04-03  9:28 ` stsp at users dot sourceforge.net
2023-04-03  9:28 ` stsp at users dot sourceforge.net
2023-04-14 19:09 ` stsp at users dot sourceforge.net
2023-04-14 19:11 ` adhemerval.zanella at linaro dot org
2023-04-14 19:12 ` stsp at users dot sourceforge.net
2023-05-08 14:51 ` stsp at users dot sourceforge.net

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-30007-131-J92F2oSSKV@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).