From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
	id EBB15385840E; Tue, 28 Mar 2023 01:29:48 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EBB15385840E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1679966988;
	bh=YbJZPdYtsaz1NjyBRpwGtX3SNmlP4s7bqtwPAYyJQjY=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=lYyNcNpq512uZeeye0KoGAhe4nBu0Z9QrpI/EGbirozXZyl8+JpMnmhYBpZKCnXJb
	 U8RhT1ppjq3npZ9vStBxSsVNhtKd/uNobwFZMcACgY7qpChKt1/4GfFgATYUcF5YM6
	 rXBSUZeXJ9JvlVWYna+At588tFipGCQ2WKq/vAcc=
From: "stsp at users dot sourceforge.net"
 <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug dynamic-link/30007] rfe: dlopen to specified address
Date: Tue, 28 Mar 2023 01:29:47 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: dynamic-link
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: stsp at users dot sourceforge.net
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-30007-131-27niIYGRA6@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-30007-131@http.sourceware.org/bugzilla/>
References: <bug-30007-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <glibc-bugs.sourceware.org>

https://sourceware.org/bugzilla/show_bug.cgi?id=3D30007
--- Comment #24 from Stas Sergeev <stsp at users dot sourceforge.net> ---
(In reply to Jonathon Anderson from comment #23)
> AFAIK your patches will be looked at once a use case that requires it is
> solidified, that can't be solved with current tech nor any better proposed
> API. So far, it has been unclear why the primary function of dlmem() is
> needed for your use case. Why do you need to load solibs straight from
> memory at all?=20

While this is quite handy for my
use-case (solib image comes from a vm,
so its already in memory and has no
host fd), the primary problem is that
any file-based API destroys the existing
mapping by definition.
So I choose dlmem() because it both suits
surprisingly well and has the potential
to preserve the user's mapping.
Other than that, its completely agnostic
of my use-case. It just allows to dlmem()
into the user's buffer.


> No. I'm certain it works for unaligned SHT_NOBITS sections, any changes m=
ade
> to one side of the "mirror" are reflected in the other. (Although there is
> another flaw I missed before, an updated version of the technique is towa=
rds
> the bottom of this message. :P)

I think its the same problem that you
try to avoid by introducing the writable
file now. Unaligned SHT_NOBITS section
results in re-protecting the file-backed
MAP_PARIVATE page into a writable one.


> There is not yet a solid use case for the primary function of this API, t=
he
> fact that it "loads an solib from memory." This primary functionality is =
the
> main source of concern originally raised by Carlos O'Donell, and AFAICT
> hasn't been resolved.

Could you please explain the concern
itself? I mean, what problem is there
to have an API to dlmem() from memory?
Is it a security concern, or what kind of?
What justifies the straight "no" or
"no unless you disprove 1024+ tricks
to do the same with unportable syscall-
intercepting techniques"?


> The following API is close to your use case but doesn't raise the same
> concerns as dlmem(). Does this solve your problem, if not what's missing?
>     void *dlopen4(const char *filename, int flags, const struct dlopen4_a=
rgs
> *ext, size_t ext_size /* =3D sizeof(struct dlopen3_args) */);
>     void *dlmopen5(Lmid_t lmid, const char *filename, int flags, const
> struct dlopen4_args *ext, size_t ext_size /* =3D sizeof(struct dlopen3_ar=
gs)
> */);
>     struct dlopen4_args {
>       /* If not NULL, function called before mmap when loading the object
> [and its dependencies?].
>          Returns the base of a mmapped range of given length and alignmen=
t.
> This mapping will be
>          overwritten by the loaded object.  */
>       void *(*dla_premap)(void *preferred_addr, size_t length, size_t ali=
gn,
> void *userdata);
>       /* User data passed to dla_premap.  */
>       void *dla_premap_userdata;
>     };

The primary problem is that this API
doesn't allow to preserve the user's
mapping. It is only using that mapping
to specify the reloc address, while
dlmem() can optionally preserve it (I
use the separate flag for that).
The secondary problem is "filename",
but yes, I know you'll suggest to get
it from /proc/self/fd.


> These are niceties, but I think we can agree a direct implementation of
> dlopen_with_offset() would be better for the use cases that need it. It
> would also require far less refactors than dlmem().

I can remove all refactors and replace
them with copy/pasts. Much bigger code
but no change to existing code.
Will that be any better?
OTOH all refactors I did, just take some
code chunk and move it to a separate func
with the different indentation level.
These diffs should be looked into with
some tool that ignores indentation.
Only then it would be clear how small they
are.


> As I mentioned before, syscall interception is a technique used in many
> VM-adjacent and widely used technologies, to name a few: containers
> Windows emulation (Wine), browser sandboxes
> (Firefox/Chromium),

I wonder if the above ones are actually
do the syscall interception, or just use
the bpf filters to avoid malicious code
from using syscalls?


> Given all this, I consider it much easier to write a syscall interception
> code than to write a shim library to translate between 32- and 64-bit call
> ABIs. FWIW. :D

Its a bit strange to intercept the syscalls
of your own code. I am quite sure none of
the projects you mentioned, actually do this.
They intercept the syscalls of some 3rd-party
code running along, but never their own syscalls.
gdb/strace definitely intercept the syscalls
of the debugee, same with the rest of the projects.
Most of dl_audit framework can be implemented
with syscall interception, but why don't you
want to do that?


> > dlopenfd()+memfd doesn't give even the
> > possibility of specifying the reloc address,
> > and that's a very minimal, insufficient requirement.
> Because you need the pages to be mirrored? Or is there another requirement
> here?

Mirrored and also reloc address specified.
AFAICT fdlopen()+memfd gives neither.


> There are a number of cases that need to be handled. The "base case" is
> (MAP_SHARED & ~MAP_ANONYMOUS & ~MAP_FIXED),

Not used by libdl AFAIK, so skipping.


> If flags contains MAP_ANONYMOUS, an extra step (0) is added before step (=
1).

That's quite clear.


> If flags contains MAP_PRIVATE, extra steps are once again needed. If this=
 is
> a read-only mapping (~PROT_WRITE) and assuming mprotect() is not used lat=
er
> to add write access (IIRC I have not observed Glibc's ld.so do so with
> strace),

But this is exactly what happens if you
have unaligned SHT_NOBITS section. It
goes to the same page that used MAP_PRIVATE
to load an elf segment. glibc then re-protects
and memsets that part. Even if you haven't
seen that with strace, I was pointing to the
exact code that does this.


> then simply replace MAP_PRIVATE with MAP_SHARED in step (0) and the
> rest will work.

If the page is never re-protected, then
MAP_SHARED is not even needed. You can
just have 2 private mappings from same file.


> Otherwise, if flags contains MAP_PRIVATE and prot contains PROT_WRITE, the

AFAIK there is no such case.
PROT_WRITE is applied later with mprotect()
if you have an unaligned SHT_NOBITS section,
but is AFAICS never applied initially.


> That's it, that's the entire technique. It's a powerful approach reminisc=
ent
> of container tech, which I find fitting for a use case messing with a VM.
> It's a straightforward technique with good similar examples in the
> open-source community, for example strace's --inject=3D options. It's a s=
mall
> technique, I would budget at around 100-300 lines for a PoC implementatio=
n.
> It's not a performant approach, but presumably your apps aren't
> dlopen()/dlclose()'ing solibs like there's no tomorrow. What's wrong with=
 it?

Contrary to what you say, no one is
intercepting his own syscalls.
And the SHT_NOBITS section problem is not
yet addressed, although of course you will
propose to intercept also mprotect() to get
it in.


> > Of course now I have some very bad feeling
> > that your next proposal will be "trap
> > all mmaps, not just the first one"...
> > Well, before you do that, consider the
> > following:
> > 1. Some mappings are converted from
> >    file-based to anonymous via mprotect+memset.
> The fact that the pages are mirrored handles this, changes in one are

Pages are not mirrored in case of a
MAP_PRIVATE mapping that was later
re-protected to r/w. Of course you
can always use MAP_SHARED beforehand,
and do a writable file copy, which will
basically mean to just copy the initially
memory-based solib into a file on hdd rather
than to even properly use memfd.


> IIRC ld.so only uses mprotect() to mark the RELRO segments as read-only, =
so
> they don't need to be mirrored in a simple PoC implementation. At least f=
or
> simple cases, YMMV.=20

Not sure if the unaligned SHT_NOBITS
(that causes re-protect to R/W) is a
"simple case" or not.


> > I am very surprised you make the claims like
> > "your patch is very difficult to review"
> > w/o even looking into the very small patches
> > that mostly split the huge multi-thousands-line
> > funcs into a reusable parts...
> Your patch is difficult to review for reasons that have to do with the API
> and use case, not the implementation. It's also a refactor touching over a
> thousand lines, that's enough reason to make it hard to review. :P

If indentation is ignored, then my patches
touch a dozen of lines. There are just the
moves of a large chunks of code to a separate
funcs.

--=20
You are receiving this mail because:
You are on the CC list for the bug.=