From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21085 invoked by alias); 17 Nov 2014 00:06:37 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 21070 invoked by uid 89); 17 Nov 2014 00:06:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-la0-f42.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=vKbRVv9FaTP0D7u9jBffWuprBebLLI5HFUpyArPDcIY=; b=PFL4FiMGZVJOMs4M2zSbkyez0mC5hLuyCHXOO223NsZT5DoyIgo/SJuymHVky8uJNi 4OEE5eN3bfG0qq+yVLP/wd2wKZAIpUNIxk3mmAYy7BXv0v2nFMQ6mLOEYyeeFpFh2dfh 6lsIt11N3zKtYn61M3Z2zUZNUCZ/b7FdjtNOsmHWs95KcYJ4U10IIpwjMNbWrPml6nto 3tpKvgz9tUbOq9KlHxEjse70s8iVKyCnOLhbVaRFgNiba/41Ty4v2bKRX0rv2orKvyLG 4jbabsU99VaL03ej+9bVatbCdpHltmniEIm8Itjz+0Sfc8tC+EPwrRO5Lmuy2LKX08/c Ma5g== X-Gm-Message-State: ALoCoQna6BGfjdQTBX+KnFOXxGe5npBD3FD7t9qNHD3OIIiZNbzbNlDIvjroreW3TkKX8Y6xvBhR X-Received: by 10.112.168.97 with SMTP id zv1mr23830931lbb.6.1416182791214; Sun, 16 Nov 2014 16:06:31 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20141116233202.GA22465@brightrain.aerifal.cx> References: <20141116195246.GX22465@brightrain.aerifal.cx> <20141116220859.GY22465@brightrain.aerifal.cx> <20141116233202.GA22465@brightrain.aerifal.cx> From: Andy Lutomirski Date: Mon, 17 Nov 2014 00:06:00 -0000 Message-ID: Subject: Re: [musl] Re: [RFC] Possible new execveat(2) Linux syscall To: Rich Felker Cc: libc-alpha , musl@lists.openwall.com, Andrew Morton , David Drysdale , Linux API , Christoph Hellwig Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2014-11/txt/msg00390.txt.bz2 On Sun, Nov 16, 2014 at 3:32 PM, Rich Felker wrote: > On Sun, Nov 16, 2014 at 02:34:32PM -0800, Andy Lutomirski wrote: >> On Sun, Nov 16, 2014 at 2:08 PM, Rich Felker wrote: >> > On Sun, Nov 16, 2014 at 01:20:39PM -0800, Andy Lutomirski wrote: >> >> On Nov 16, 2014 11:53 AM, "Rich Felker" wrote: >> >> > >> >> > On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: >> >> > > Hi, >> >> > > >> >> > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), >> >> > > and it would be good to hear a glibc perspective about it (and whether there >> >> > > are any interface changes that would make it easier to use from userspace). >> >> > > >> >> > > The syscall prototype is: >> >> > > int execveat(int fd, const char *pathname, >> >> > > char *const argv[], char *const envp[], >> >> > > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ >> >> > > and it works similarly to execve(2) except: >> >> > > - the executable to run is identified by the combination of fd+pathname, like >> >> > > other *at(2) syscalls >> >> > > - there's an extra flags field to control behaviour. >> >> > > (I've attached a text version of the suggested man page below) >> >> > > >> >> > > One particular benefit of this is that it allows an fexecve(3) implementation >> >> > > that doesn't rely on /proc being accessible, which is useful for sandboxed >> >> > > applications. (However, that does only work for non-interpreted programs: >> >> > > the name passed to a script interpreter is of the form "/dev/fd//" >> >> > > or "/dev/fd/", so the executed interpreter will normally still need /proc >> >> > > access to load the script file). >> >> > > >> >> > > How does this sound from a glibc perspective? >> >> > >> >> > I've been following the discussions so far and everything looks mostly >> >> > okay. There are still issues to be resolved with the different >> >> > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and >> >> > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to >> >> > save the permissions at the time of open and cause them to be used in >> >> > place of the current file permissions at the time of execveat >> >> >> >> Is something missing here? >> >> >> >> FWIW, I don't understand O_PATH or O_EXEC very well, so from my POV, >> >> help would be appreciated. >> > >> > Yes. POSIX requires that permission checks for execution (fexecve with >> > O_EXEC file descriptors) and directory-search (*at functions with >> > O_SEARCH file descriptors) succeed if the open operation succeeded -- >> > the permissions check is required to take place at open time rather >> > than at exec/search time. There's a separate discussion about how to >> > make this work on the kernel side. >> >> It may be worth making this work as part of adding execveat to the >> kernel. Does the kernel even have O_EXEC right now? > > No. The proposal is that O_EXEC and O_SEARCH would both be equal to > O_PATH|3 (3 being the rarely-used O_ACCMODE for "neither read or > write, but some weird ioctls are accepted") which gracefully falls > back for both current kernels with O_PATH (in which case the 3 is > ignored and the discrepency from POSIX is just the time at which > permissions are checked) and for pre-O_PATH kernels (in which case the > access mode used is 3, and read/write ops fail on the fd, but it's > still usable for fexecve and *at functions with /proc-based fallback > implementations). > > I would be happy to see this work get done at the same time. > >> >> > One major issue however is FD_CLOEXEC with scripts. Last I checked, >> >> > this didn't work because the file is already closed by the time the >> >> > interpreted runs. The intended usage of fexecve is almost certainly to >> >> > call it with the file descriptor set close-on-exec; otherwise, there >> >> > would be no clean way to close it, since the program being executed >> >> > doesn't know that it's being executed via fexecve. So this is a >> >> > serious problem that needs to be solved if it hasn't already. I have >> >> > some ideas I could offer, but I'm not an expert on the kernel side >> >> > things so I'm not sure they'd be correct. >> >> >> >> Bring on the ideas. >> > >> > My thought is that when the kernel opens the binary and sees that it's >> > a script that needs an interpreter, the kernel should not pass >> > /proc/self/fd/%d to the interpreter, but instead should pass the name >> > of a new magic symlink in /proc/self that's connected to the inode for >> > the script to be executed but that ceases to exist as soon as it's >> > opened. In theory this could also be used for suid scripts to make >> > them secure. >> >> This doesn't help if /proc is not mounted, which is an important use case. > > I don't know what can be done in this case short of some really ugly > hacks, like giving open() special behavior when the pathname points to > a magic address in the argv region, or having the kernel create temp > files in some magic path. > >> >> FWIW, I've often thought that interpreter binaries should mark >> >> themselves as such to enable better interactions with the kernel. >> > >> > That's hard since users expect to be able to use arbitrary >> > interpreters (and sometimes even pass through multiple ones, e.g. >> > #!/usr/bin/env perl). >> >> Hmm. I'd be okay with old interpreters having a somewhat degraded experience. >> >> I guess that #!/some/interpreted/script isn't allowed, but maybe >> #!/usr/bin/env some-interpreted-script should work. >> >> It could be that all that's really needed is some convention to tell >> an interpreter that it should use fd N as a script *and close it*. >> Something like /dev/fd_and_close/N could work, but that has all kinds >> of problems. >> >> Alternatively, if we could have a way to mark an fd so that it's >> close-on-exec after exec, that would solve the nesting problem, as >> long as every interpreter in the chain does it. And the kernel could >> certainly implement execve on a close-on-exec fd by passing /dev/fd/N >> where N is a close-on-exec fd, at least in the non-nested case. > > This doesn't solve the problem of needing /proc though (/dev/fd is > just a link to /proc/self/fd). > Al Viro was talking about having a special fs just for /dev/fd. And interpreters could special-case path names of a certain form. --Andy