Re: [PATCH 2/3] dlfcn,elf: implement dlmem() and audit [BZ #11767]

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Carlos O'Donell <carlos@redhat.com>
To: Stas Sergeev <stsp2@yandex.ru>,
	libc-alpha@sourceware.org,
	Paul Pluzhnikov <ppluzhnikov@google.com>
Subject: Re: [PATCH 2/3] dlfcn,elf: implement dlmem() and audit [BZ #11767]
Date: Mon, 20 Feb 2023 10:50:44 -0500	[thread overview]
Message-ID: <e89233f4-f5a0-a31b-3bda-0b076c8eb8e7@redhat.com> (raw)
In-Reply-To: <20230215165541.1107137-3-stsp2@yandex.ru>

On 2/15/23 11:55, Stas Sergeev via Libc-alpha wrote:
> This patch adds the following function:
> void *dlmem(const unsigned char *buffer, size_t size, int flags);

Stas,

Thank you very much for this contribution. This patch series was raised during the
weekly patch review and I wanted to take the time review before your implementation
moves too far forward.

My understanding of your use case is based on the discussion you had in bug 30007,
and I also reviewed bug 11767 for background. I also reviewed the discussions on
libc-alpha from Joseph Myers, Florian Weimer, and Andreas Schwab. Florian has open
questions about gdb and audit requirements which I didn't see addressed in this
series.

tl;dr

My position is that the semantics of dlmem() is at a level of abstraction that
impacts the present security, audit, and developer requirements, and that something
like BSDs fdlopen() or Google's dlopen_with_offset() would serve as a better
solution overall.

Details
=======

There are several paths forward for the uses cases discussed, but two of them
come to mind and have seen some discussion on this list:

(a) fdlopen() as in BSD.

* All normal operations that dlopen would do are done, but starting from an fd.

* Mirrors other new APIs e.g. signalfd, pidfd etc. Maybe we call dlopenfd() on Linux.

(b) dlmem() as in your patch.

The semantic issues I see with dlmem() are as follows:

* No guarantee that the memory that is loaded meets the preconditions for the
  dynamic loader and application uses.

  - We could argue that it is the application that needs to meet those requirements
    or the system is in an undefined state, however the better an API is the more
    it ensures by design that we cannot be in such states.

* The API pushes the abstraction of a "file descriptor" completely out of the design
  and in doing so introduces a level of change that requires a new LD_AUDIT interface,
  and likely other tooling changes.

If instead the semantics are raised up a level to fdlopen() then we have:

* Application developer can still do everything they wanted, but they need one
  additional syscall, memfd_create() to turn the memory into a file descriptor
  and that has the added benefit of being a kernel-side auditable event which is
  already used in JIT/FFI e.g. libffi.

  - Kernel support appeared in Linux 3.17 and glibc 2.27 (2018).

  - Kernel lockdown of memfd_create() is a known issue and discussed by JITs
    with kernel developers e.g. OpenJDK uses memfd_create(), libffi uses
    memfd_create().

* Additionally users with the use case that they have an embedded DSO on disk
  now need to do less work in their application because they can use fdlopen()
  with an fd already open and at a suitable offset. Note that the fd passed is
  never closed.

* No new LD_AUDIT interfaces reuqired so auditors and developer tooling does
  not need to be updated.

  - Though now it's possible to skip la_objsearch for the opening of the object,
    but that was always a possible scenario.

As of today I  Google has dlopen_with_offset(), almost fdlopen(),
in google/grte/v5-2.27/master, but it has not been contributed for general
inclusion. This to me seems to indicate that industry best practice today
continues to be around the use of a file descriptor.

My suggestion therefore is to attempt to refactor what you have around an API
that is like fdlopen(), and see what the implementation and performance looks
like in glibc.

Thoughts?

> It is the same as dlopen() but allows to dynamic-link solibs from
> the memory buffer, rather than from a file as dlopen() does.
> 
> "buffer" arg is the pointer to the solib image in memory.
> "size" is the solib image size. Must be smaller-or-equal to the
> actual buffer size.
> "flags" is the same flags argument used in dlopen().
> 
> The idea behind the implementation is very simple: where the
> dlopen() would mmap() the file, dlmem() does anonymous
> mmap()+memcpy().
> Unfortunately the glibc code was too bound to the file reads,
> so the patch looks bigger than it should. Some refactorings
> were needed to avoid big copy/pasts and code duplications.
> In particular, _dl_map_object_from_fd() was split and now calls
> 2 functions that are also called from __dl_map_object_from_mem().
> The same treatment was applied to open_verify() - the part that
> can be shared, was split into do_open_verify().
> 
> This patch adds a test-case named tst-dlmem.
> 
> This patch also adds the _dl_audit_premap_dlmem LD_AUDIT
> extension. It passes map length to la_premap_dlmem() and gets
> back an fd. If returned fd==-1 then the dlmem() space is
> anonymously mapped, which is a default behavior w/o audit.
> But it is possible to create a shm area of the needed size
> with shm_open()/ftruncate() and return that fd. Then you
> will be able to access the solib image via that fd, so that
> you can mmap() it to another process or another address of
> the same process.
> 
> The test-case for that functionality is called tst-audit-dlmem.
> 
> The test-suite was run on x86_64/64 and showed no regressions.
> 
> Signed-off-by: Stas Sergeev <stsp2@yandex.ru>

-- 
Cheers,
Carlos.

next prev parent reply	other threads:[~2023-02-20 15:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-15 16:55 [PATCH v7 0/3] implement dlmem() with audit extensions Stas Sergeev
2023-02-15 16:55 ` [PATCH 1/3] elf: strdup() l_name if no realname [BZ #30100] Stas Sergeev
2023-02-15 16:55 ` [PATCH 2/3] dlfcn,elf: implement dlmem() and audit [BZ #11767] Stas Sergeev
2023-02-20 15:50   ` Carlos O'Donell [this message]
2023-02-20 16:35     ` stsp
2023-03-18 17:04     ` stsp
2023-02-15 16:55 ` [PATCH 3/3] elf/dl-audit: add _dl_audit_premap fn [BZ #30007] Stas Sergeev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e89233f4-f5a0-a31b-3bda-0b076c8eb8e7@redhat.com \
    --to=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=ppluzhnikov@google.com \
    --cc=stsp2@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).