From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from forward501b.mail.yandex.net (forward501b.mail.yandex.net [IPv6:2a02:6b8:c02:900:1:45:d181:d501]) by sourceware.org (Postfix) with ESMTPS id B036A3858D1E for ; Mon, 20 Feb 2023 16:35:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B036A3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=yandex.ru Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=yandex.ru Received: from sas9-2d24a7e69f58.qloud-c.yandex.net (sas9-2d24a7e69f58.qloud-c.yandex.net [IPv6:2a02:6b8:c11:2298:0:640:2d24:a7e6]) by forward501b.mail.yandex.net (Yandex) with ESMTP id 1A88D5EE70; Mon, 20 Feb 2023 19:35:08 +0300 (MSK) Received: by sas9-2d24a7e69f58.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id 6ZsnIvZaOCg1-0UDZuNMP; Mon, 20 Feb 2023 19:35:07 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1676910907; bh=8Od9/6rkKwnlRt78SwBoa5b+WOp7IuX9tTFPDcujqmk=; h=In-Reply-To:From:Date:References:To:Subject:Message-ID; b=Dy52xBbbQ74zqg1jM+P9fzbKgFDb/Po34SNXYVKom2Qccn6Hujb0+s5lbXj1liWIS nKimeGWlVoP2Hwwsip1heHT+Xln//r9hgVr7xg/wwjy3Npq42p0rws6xGivTG7ccdN nsFi3zV5sfd8TyF4LMy6e8Bz7/Eh1us1aFZtmbVk= Authentication-Results: sas9-2d24a7e69f58.qloud-c.yandex.net; dkim=pass header.i=@yandex.ru Message-ID: Date: Mon, 20 Feb 2023 21:35:06 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCH 2/3] dlfcn,elf: implement dlmem() and audit [BZ #11767] Content-Language: en-US To: Carlos O'Donell , libc-alpha@sourceware.org, Paul Pluzhnikov References: <20230215165541.1107137-1-stsp2@yandex.ru> <20230215165541.1107137-3-stsp2@yandex.ru> From: stsp In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Carlos, thanks for a feed-back! 20.02.2023 20:50, Carlos O'Donell пишет: > My understanding of your use case is based on the discussion you had in bug 30007, > and I also reviewed bug 11767 for background. I also reviewed the discussions on > libc-alpha from Joseph Myers, Florian Weimer, and Andreas Schwab. Florian has open > questions about gdb and audit requirements which I didn't see addressed in this > series. I probably misunderstood Florian if he meant I need to address something particular. The problem I see with gdb is that it doesn't recognize such library load and therefore doesn't autoload the debug info from solib. Manual debuginfo loading might be a temporary solution. If something else was meant and I need to do some changes to the patch, please explain to me once again, what is it. Overall, if there are some requested changes pending, please explain them to me again, as currently I am not aware of any. > My position is that the semantics of dlmem() is at a level of abstraction that > impacts the present security, audit, and developer requirements, and that something > like BSDs fdlopen() or Google's dlopen_with_offset() would serve as a better > solution overall. > > Details > ======= > > There are several paths forward for the uses cases discussed, but two of them > come to mind and have seen some discussion on this list: Let me first assert (and explain down below) that none of them brings in the functionality that I need and have in dlmem(). > (a) fdlopen() as in BSD. > > * All normal operations that dlopen would do are done, but starting from an fd. > > * Mirrors other new APIs e.g. signalfd, pidfd etc. Maybe we call dlopenfd() on Linux. fdlopen() is a very good API. But it serves another purposes and gives nothing for me. > (b) dlmem() as in your patch. > > The semantic issues I see with dlmem() are as follows: > > * No guarantee that the memory that is loaded meets the preconditions for the > dynamic loader and application uses. Not sure what you mean. dlmem() works very simply: you give it the buffer with file image and it gives you the buffer with relocated solib image. What preconditions? > * The API pushes the abstraction of a "file descriptor" completely out of the design > and in doing so introduces a level of change that requires a new LD_AUDIT interface, > and likely other tooling changes. No, let me clarify. The patch has 2 new audit interfaces: - premap_dlmem - it allows to feed in the   backing-store fd for solib image, making   dlmem() unique. You simply can't, by definition,   make something like that on any API with fd. - premap - this is a generic audit interface that   can as well be used with dlopen(). It allows   to specify the relocation address for PIC solib. So no, nothing of the above is _required_ for dlmem() to work. But one extends it in a way you can't do with any competing API, and another one is a generic extension because I need to be able to set the relocation address for PIC solibs, no matter from what media they are loaded. Here I feel that the design is not properly understood, and maybe I need to address that by doing some write-up? Or is the above short explanation sufficient? Note that I provided examples for both extensions. There is an example that feeds a backing-store fd and maps solib to 2 processes. It then makes sure that when you change the var in 1 process, that var is changed in another process. How would you do the same with any competing API? Impossible. Another example shows that you can use MAP_32BIT flag to relocate the solib into a first 4Gb space. This is a generic extension, works with dlopen() too, and of course can work with fdlopen() or whatever, but the first extension - backing-store fd - is unique for dlmem(). Which is why I need dlmem(). :) > If instead the semantics are raised up a level to fdlopen() then we have: > > * Application developer can still do everything they wanted, No, no! And that's the point. How would you re-create my solib-sharing example with anything but my dlmem() impl? > but they need one > additional syscall, memfd_create() to turn the memory into a file descriptor > and that has the added benefit of being a kernel-side auditable event which is > already used in JIT/FFI e.g. libffi. > > - Kernel support appeared in Linux 3.17 and glibc 2.27 (2018). I am not even sure why memfd_create() is always mentioned. I used shm_open() for such things, and I can even dlopen() from shm object. > * Additionally users with the use case that they have an embedded DSO on disk > now need to do less work in their application because they can use fdlopen() > with an fd already open and at a suitable offset. Note that the fd passed is > never closed. I am all for fdlopen()! Its good API. Please implement it. :) But its different, and in that particular case I don't need it and couldn't use it even if it is there. > * No new LD_AUDIT interfaces reuqired so auditors and developer tooling does > not need to be updated. They are not _required_. But they add 2 valuable extensions. And in fact, the dlmem() itself is out of any interest w/o those 2 extensions. So I need these extensions, and dlmem() is what makes them (one of them) possible. Second extension is generic, but I need both so I see no way around dlmem(). > - Though now it's possible to skip la_objsearch for the opening of the object, > but that was always a possible scenario. > > As of today I Google has dlopen_with_offset(), almost fdlopen(), > in google/grte/v5-2.27/master, but it has not been contributed for general > inclusion. This to me seems to indicate that industry best practice today > continues to be around the use of a file descriptor. > > My suggestion therefore is to attempt to refactor what you have around an API > that is like fdlopen(), and see what the implementation and performance looks > like in glibc. > > Thoughts? Only if you can show how to relocate the solib into the shared buffer. Which is IMO not possible w/o dlmem().