* [RFC] Possible new execveat(2) Linux syscall @ 2014-11-14 14:54 David Drysdale 2014-11-16 19:52 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: David Drysdale @ 2014-11-14 14:54 UTC (permalink / raw) To: libc-alpha Cc: Andrew Morton, Christoph Hellwig, Rich Felker, Linux API, Andy Lutomirski Hi, Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), and it would be good to hear a glibc perspective about it (and whether there are any interface changes that would make it easier to use from userspace). The syscall prototype is: int execveat(int fd, const char *pathname, char *const argv[], char *const envp[], int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ and it works similarly to execve(2) except: - the executable to run is identified by the combination of fd+pathname, like other *at(2) syscalls - there's an extra flags field to control behaviour. (I've attached a text version of the suggested man page below) One particular benefit of this is that it allows an fexecve(3) implementation that doesn't rely on /proc being accessible, which is useful for sandboxed applications. (However, that does only work for non-interpreted programs: the name passed to a script interpreter is of the form "/dev/fd/<fd>/<path>" or "/dev/fd/<fd>", so the executed interpreter will normally still need /proc access to load the script file). How does this sound from a glibc perspective? Thanks, David [1] https://lkml.org/lkml/2014/11/7/512, with earlier discussions at https://lkml.org/lkml/2014/11/6/469, https://lkml.org/lkml/2014/10/22/275 and https://lkml.org/lkml/2014/10/17/428 ---- EXECVEAT(2) Linux Programmer's Manual EXECVEAT(2) NAME execveat - execute program relative to a directory file descriptor SYNOPSIS #include <unistd.h> int execveat(int fd, const char *pathname, char *const argv[], char *const envp[], int flags); DESCRIPTION The execveat() system call executes the program pointed to by the combination of fd and pathname. The execveat() system call oper‐ ates in exactly the same way as execve(2), except for the differ‐ ences described in this manual page. If the pathname given in pathname is relative, then it is inter‐ preted relative to the directory referred to by the file descriptor fd (rather than relative to the current working directory of the calling process, as is done by execve(2) for a relative pathname). If pathname is relative and fd is the special value AT_FDCWD, then pathname is interpreted relative to the current working directory of the calling process (like execve(2)). If pathname is absolute, then fd is ignored. If pathname is an empty string and the AT_EMPTY_PATH flag is speci‐ fied, then the file descriptor fd specifies the file to be exe‐ cuted. flags can either be 0, or include the following flags: AT_EMPTY_PATH If pathname is an empty string, operate on the file referred to by fd (which may have been obtained using the open(2) O_PATH flag). AT_SYMLINK_NOFOLLOW If the file identified by fd and a non-NULL pathname is a symbolic link, then the call fails with the error EINVAL. RETURN VALUE On success, execveat() does not return. On error -1 is returned, and errno is set appropriately. ERRORS The same errors that occur for execve(2) can also occur for execveat(). The following additional errors can occur for execveat(): EBADF fd is not a valid file descriptor. ENOENT The program identified by fd and pathname requires the use of an interpreter program (such as a script starting with "#!") but the file descriptor fd was opened with the O_CLOEXEC flag and so the program file is inaccessible to the launched interpreter. EINVAL Invalid flag specified in flags. ENOTDIR pathname is relative and fd is a file descriptor referring to a file other than a directory. VERSIONS execveat() was added to Linux in kernel 3.???. NOTES In addition to the reasons explained in openat(2), the execveat() system call is also needed to allow fexecve(3) to be implemented on systems that do not have the /proc filesystem mounted. SEE ALSO execve(2), fexecve(3) Linux 2014-04-02 EXECVEAT(2) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-14 14:54 [RFC] Possible new execveat(2) Linux syscall David Drysdale @ 2014-11-16 19:52 ` Rich Felker 2014-11-16 21:21 ` Andy Lutomirski 2014-11-21 10:13 ` Christoph Hellwig 0 siblings, 2 replies; 14+ messages in thread From: Rich Felker @ 2014-11-16 19:52 UTC (permalink / raw) To: David Drysdale Cc: libc-alpha, Andrew Morton, Christoph Hellwig, Linux API, Andy Lutomirski, musl On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: > Hi, > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), > and it would be good to hear a glibc perspective about it (and whether there > are any interface changes that would make it easier to use from userspace). > > The syscall prototype is: > int execveat(int fd, const char *pathname, > char *const argv[], char *const envp[], > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ > and it works similarly to execve(2) except: > - the executable to run is identified by the combination of fd+pathname, like > other *at(2) syscalls > - there's an extra flags field to control behaviour. > (I've attached a text version of the suggested man page below) > > One particular benefit of this is that it allows an fexecve(3) implementation > that doesn't rely on /proc being accessible, which is useful for sandboxed > applications. (However, that does only work for non-interpreted programs: > the name passed to a script interpreter is of the form "/dev/fd/<fd>/<path>" > or "/dev/fd/<fd>", so the executed interpreter will normally still need /proc > access to load the script file). > > How does this sound from a glibc perspective? I've been following the discussions so far and everything looks mostly okay. There are still issues to be resolved with the different semantics between Linux O_PATH and what POSIX requires for O_EXEC (and O_SEARCH) but as long as the intent is that, once O_EXEC is defined to save the permissions at the time of open and cause them to be used in place of the current file permissions at the time of execveat One major issue however is FD_CLOEXEC with scripts. Last I checked, this didn't work because the file is already closed by the time the interpreted runs. The intended usage of fexecve is almost certainly to call it with the file descriptor set close-on-exec; otherwise, there would be no clean way to close it, since the program being executed doesn't know that it's being executed via fexecve. So this is a serious problem that needs to be solved if it hasn't already. I have some ideas I could offer, but I'm not an expert on the kernel side things so I'm not sure they'd be correct. Rich > Thanks, > David > > [1] https://lkml.org/lkml/2014/11/7/512, with earlier discussions at > https://lkml.org/lkml/2014/11/6/469, https://lkml.org/lkml/2014/10/22/275 > and https://lkml.org/lkml/2014/10/17/428 > > ---- > > EXECVEAT(2) Linux Programmer's Manual EXECVEAT(2) > > NAME > execveat - execute program relative to a directory file descriptor > > SYNOPSIS > #include <unistd.h> > > int execveat(int fd, const char *pathname, > char *const argv[], char *const envp[], > int flags); > > DESCRIPTION > The execveat() system call executes the program pointed to by the > combination of fd and pathname. The execveat() system call operâ > ates in exactly the same way as execve(2), except for the differâ > ences described in this manual page. > > If the pathname given in pathname is relative, then it is interâ > preted relative to the directory referred to by the file descriptor > fd (rather than relative to the current working directory of the > calling process, as is done by execve(2) for a relative pathname). > > If pathname is relative and fd is the special value AT_FDCWD, then > pathname is interpreted relative to the current working directory > of the calling process (like execve(2)). > > If pathname is absolute, then fd is ignored. > > If pathname is an empty string and the AT_EMPTY_PATH flag is speciâ > fied, then the file descriptor fd specifies the file to be exeâ > cuted. > > flags can either be 0, or include the following flags: > > AT_EMPTY_PATH > If pathname is an empty string, operate on the file referred > to by fd (which may have been obtained using the open(2) > O_PATH flag). > > AT_SYMLINK_NOFOLLOW > If the file identified by fd and a non-NULL pathname is a > symbolic link, then the call fails with the error EINVAL. > > RETURN VALUE > On success, execveat() does not return. On error -1 is returned, > and errno is set appropriately. > > ERRORS > The same errors that occur for execve(2) can also occur for > execveat(). The following additional errors can occur for > execveat(): > > EBADF fd is not a valid file descriptor. > > ENOENT The program identified by fd and pathname requires the use > of an interpreter program (such as a script starting with > "#!") but the file descriptor fd was opened with the > O_CLOEXEC flag and so the program file is inaccessible to > the launched interpreter. > > EINVAL Invalid flag specified in flags. > > ENOTDIR > pathname is relative and fd is a file descriptor referring > to a file other than a directory. > > VERSIONS > execveat() was added to Linux in kernel 3.???. > > NOTES > In addition to the reasons explained in openat(2), the execveat() > system call is also needed to allow fexecve(3) to be implemented on > systems that do not have the /proc filesystem mounted. > > SEE ALSO > execve(2), fexecve(3) > > Linux 2014-04-02 EXECVEAT(2) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-16 19:52 ` Rich Felker @ 2014-11-16 21:21 ` Andy Lutomirski 2014-11-16 22:09 ` Rich Felker 2014-11-21 10:13 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2014-11-16 21:21 UTC (permalink / raw) To: Rich Felker Cc: libc-alpha, musl, Andrew Morton, David Drysdale, Linux API, Christoph Hellwig On Nov 16, 2014 11:53 AM, "Rich Felker" <dalias@aerifal.cx> wrote: > > On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: > > Hi, > > > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), > > and it would be good to hear a glibc perspective about it (and whether there > > are any interface changes that would make it easier to use from userspace). > > > > The syscall prototype is: > > int execveat(int fd, const char *pathname, > > char *const argv[], char *const envp[], > > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ > > and it works similarly to execve(2) except: > > - the executable to run is identified by the combination of fd+pathname, like > > other *at(2) syscalls > > - there's an extra flags field to control behaviour. > > (I've attached a text version of the suggested man page below) > > > > One particular benefit of this is that it allows an fexecve(3) implementation > > that doesn't rely on /proc being accessible, which is useful for sandboxed > > applications. (However, that does only work for non-interpreted programs: > > the name passed to a script interpreter is of the form "/dev/fd/<fd>/<path>" > > or "/dev/fd/<fd>", so the executed interpreter will normally still need /proc > > access to load the script file). > > > > How does this sound from a glibc perspective? > > I've been following the discussions so far and everything looks mostly > okay. There are still issues to be resolved with the different > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to > save the permissions at the time of open and cause them to be used in > place of the current file permissions at the time of execveat Is something missing here? FWIW, I don't understand O_PATH or O_EXEC very well, so from my POV, help would be appreciated. > > One major issue however is FD_CLOEXEC with scripts. Last I checked, > this didn't work because the file is already closed by the time the > interpreted runs. The intended usage of fexecve is almost certainly to > call it with the file descriptor set close-on-exec; otherwise, there > would be no clean way to close it, since the program being executed > doesn't know that it's being executed via fexecve. So this is a > serious problem that needs to be solved if it hasn't already. I have > some ideas I could offer, but I'm not an expert on the kernel side > things so I'm not sure they'd be correct. Bring on the ideas. FWIW, I've often thought that interpreter binaries should mark themselves as such to enable better interactions with the kernel. --Andy > > Rich > > > Thanks, > > David > > > > [1] https://lkml.org/lkml/2014/11/7/512, with earlier discussions at > > https://lkml.org/lkml/2014/11/6/469, https://lkml.org/lkml/2014/10/22/275 > > and https://lkml.org/lkml/2014/10/17/428 > > > > ---- > > > > EXECVEAT(2) Linux Programmer's Manual EXECVEAT(2) > > > > NAME > > execveat - execute program relative to a directory file descriptor > > > > SYNOPSIS > > #include <unistd.h> > > > > int execveat(int fd, const char *pathname, > > char *const argv[], char *const envp[], > > int flags); > > > > DESCRIPTION > > The execveat() system call executes the program pointed to by the > > combination of fd and pathname. The execveat() system call oper‐ > > ates in exactly the same way as execve(2), except for the differ‐ > > ences described in this manual page. > > > > If the pathname given in pathname is relative, then it is inter‐ > > preted relative to the directory referred to by the file descriptor > > fd (rather than relative to the current working directory of the > > calling process, as is done by execve(2) for a relative pathname). > > > > If pathname is relative and fd is the special value AT_FDCWD, then > > pathname is interpreted relative to the current working directory > > of the calling process (like execve(2)). > > > > If pathname is absolute, then fd is ignored. > > > > If pathname is an empty string and the AT_EMPTY_PATH flag is speci‐ > > fied, then the file descriptor fd specifies the file to be exe‐ > > cuted. > > > > flags can either be 0, or include the following flags: > > > > AT_EMPTY_PATH > > If pathname is an empty string, operate on the file referred > > to by fd (which may have been obtained using the open(2) > > O_PATH flag). > > > > AT_SYMLINK_NOFOLLOW > > If the file identified by fd and a non-NULL pathname is a > > symbolic link, then the call fails with the error EINVAL. > > > > RETURN VALUE > > On success, execveat() does not return. On error -1 is returned, > > and errno is set appropriately. > > > > ERRORS > > The same errors that occur for execve(2) can also occur for > > execveat(). The following additional errors can occur for > > execveat(): > > > > EBADF fd is not a valid file descriptor. > > > > ENOENT The program identified by fd and pathname requires the use > > of an interpreter program (such as a script starting with > > "#!") but the file descriptor fd was opened with the > > O_CLOEXEC flag and so the program file is inaccessible to > > the launched interpreter. > > > > EINVAL Invalid flag specified in flags. > > > > ENOTDIR > > pathname is relative and fd is a file descriptor referring > > to a file other than a directory. > > > > VERSIONS > > execveat() was added to Linux in kernel 3.???. > > > > NOTES > > In addition to the reasons explained in openat(2), the execveat() > > system call is also needed to allow fexecve(3) to be implemented on > > systems that do not have the /proc filesystem mounted. > > > > SEE ALSO > > execve(2), fexecve(3) > > > > Linux 2014-04-02 EXECVEAT(2) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-16 21:21 ` Andy Lutomirski @ 2014-11-16 22:09 ` Rich Felker 2014-11-16 22:34 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2014-11-16 22:09 UTC (permalink / raw) To: Andy Lutomirski Cc: libc-alpha, musl, Andrew Morton, David Drysdale, Linux API, Christoph Hellwig On Sun, Nov 16, 2014 at 01:20:39PM -0800, Andy Lutomirski wrote: > On Nov 16, 2014 11:53 AM, "Rich Felker" <dalias@aerifal.cx> wrote: > > > > On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: > > > Hi, > > > > > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), > > > and it would be good to hear a glibc perspective about it (and whether there > > > are any interface changes that would make it easier to use from userspace). > > > > > > The syscall prototype is: > > > int execveat(int fd, const char *pathname, > > > char *const argv[], char *const envp[], > > > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ > > > and it works similarly to execve(2) except: > > > - the executable to run is identified by the combination of fd+pathname, like > > > other *at(2) syscalls > > > - there's an extra flags field to control behaviour. > > > (I've attached a text version of the suggested man page below) > > > > > > One particular benefit of this is that it allows an fexecve(3) implementation > > > that doesn't rely on /proc being accessible, which is useful for sandboxed > > > applications. (However, that does only work for non-interpreted programs: > > > the name passed to a script interpreter is of the form "/dev/fd/<fd>/<path>" > > > or "/dev/fd/<fd>", so the executed interpreter will normally still need /proc > > > access to load the script file). > > > > > > How does this sound from a glibc perspective? > > > > I've been following the discussions so far and everything looks mostly > > okay. There are still issues to be resolved with the different > > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and > > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to > > save the permissions at the time of open and cause them to be used in > > place of the current file permissions at the time of execveat > > Is something missing here? > > FWIW, I don't understand O_PATH or O_EXEC very well, so from my POV, > help would be appreciated. Yes. POSIX requires that permission checks for execution (fexecve with O_EXEC file descriptors) and directory-search (*at functions with O_SEARCH file descriptors) succeed if the open operation succeeded -- the permissions check is required to take place at open time rather than at exec/search time. There's a separate discussion about how to make this work on the kernel side. > > One major issue however is FD_CLOEXEC with scripts. Last I checked, > > this didn't work because the file is already closed by the time the > > interpreted runs. The intended usage of fexecve is almost certainly to > > call it with the file descriptor set close-on-exec; otherwise, there > > would be no clean way to close it, since the program being executed > > doesn't know that it's being executed via fexecve. So this is a > > serious problem that needs to be solved if it hasn't already. I have > > some ideas I could offer, but I'm not an expert on the kernel side > > things so I'm not sure they'd be correct. > > Bring on the ideas. My thought is that when the kernel opens the binary and sees that it's a script that needs an interpreter, the kernel should not pass /proc/self/fd/%d to the interpreter, but instead should pass the name of a new magic symlink in /proc/self that's connected to the inode for the script to be executed but that ceases to exist as soon as it's opened. In theory this could also be used for suid scripts to make them secure. > FWIW, I've often thought that interpreter binaries should mark > themselves as such to enable better interactions with the kernel. That's hard since users expect to be able to use arbitrary interpreters (and sometimes even pass through multiple ones, e.g. #!/usr/bin/env perl). Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-16 22:09 ` Rich Felker @ 2014-11-16 22:34 ` Andy Lutomirski 2014-11-16 23:32 ` [musl] " Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2014-11-16 22:34 UTC (permalink / raw) To: Rich Felker Cc: libc-alpha, musl, Andrew Morton, David Drysdale, Linux API, Christoph Hellwig On Sun, Nov 16, 2014 at 2:08 PM, Rich Felker <dalias@aerifal.cx> wrote: > On Sun, Nov 16, 2014 at 01:20:39PM -0800, Andy Lutomirski wrote: >> On Nov 16, 2014 11:53 AM, "Rich Felker" <dalias@aerifal.cx> wrote: >> > >> > On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: >> > > Hi, >> > > >> > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), >> > > and it would be good to hear a glibc perspective about it (and whether there >> > > are any interface changes that would make it easier to use from userspace). >> > > >> > > The syscall prototype is: >> > > int execveat(int fd, const char *pathname, >> > > char *const argv[], char *const envp[], >> > > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ >> > > and it works similarly to execve(2) except: >> > > - the executable to run is identified by the combination of fd+pathname, like >> > > other *at(2) syscalls >> > > - there's an extra flags field to control behaviour. >> > > (I've attached a text version of the suggested man page below) >> > > >> > > One particular benefit of this is that it allows an fexecve(3) implementation >> > > that doesn't rely on /proc being accessible, which is useful for sandboxed >> > > applications. (However, that does only work for non-interpreted programs: >> > > the name passed to a script interpreter is of the form "/dev/fd/<fd>/<path>" >> > > or "/dev/fd/<fd>", so the executed interpreter will normally still need /proc >> > > access to load the script file). >> > > >> > > How does this sound from a glibc perspective? >> > >> > I've been following the discussions so far and everything looks mostly >> > okay. There are still issues to be resolved with the different >> > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and >> > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to >> > save the permissions at the time of open and cause them to be used in >> > place of the current file permissions at the time of execveat >> >> Is something missing here? >> >> FWIW, I don't understand O_PATH or O_EXEC very well, so from my POV, >> help would be appreciated. > > Yes. POSIX requires that permission checks for execution (fexecve with > O_EXEC file descriptors) and directory-search (*at functions with > O_SEARCH file descriptors) succeed if the open operation succeeded -- > the permissions check is required to take place at open time rather > than at exec/search time. There's a separate discussion about how to > make this work on the kernel side. It may be worth making this work as part of adding execveat to the kernel. Does the kernel even have O_EXEC right now? > >> > One major issue however is FD_CLOEXEC with scripts. Last I checked, >> > this didn't work because the file is already closed by the time the >> > interpreted runs. The intended usage of fexecve is almost certainly to >> > call it with the file descriptor set close-on-exec; otherwise, there >> > would be no clean way to close it, since the program being executed >> > doesn't know that it's being executed via fexecve. So this is a >> > serious problem that needs to be solved if it hasn't already. I have >> > some ideas I could offer, but I'm not an expert on the kernel side >> > things so I'm not sure they'd be correct. >> >> Bring on the ideas. > > My thought is that when the kernel opens the binary and sees that it's > a script that needs an interpreter, the kernel should not pass > /proc/self/fd/%d to the interpreter, but instead should pass the name > of a new magic symlink in /proc/self that's connected to the inode for > the script to be executed but that ceases to exist as soon as it's > opened. In theory this could also be used for suid scripts to make > them secure. This doesn't help if /proc is not mounted, which is an important use case. > >> FWIW, I've often thought that interpreter binaries should mark >> themselves as such to enable better interactions with the kernel. > > That's hard since users expect to be able to use arbitrary > interpreters (and sometimes even pass through multiple ones, e.g. > #!/usr/bin/env perl). > Hmm. I'd be okay with old interpreters having a somewhat degraded experience. I guess that #!/some/interpreted/script isn't allowed, but maybe #!/usr/bin/env some-interpreted-script should work. It could be that all that's really needed is some convention to tell an interpreter that it should use fd N as a script *and close it*. Something like /dev/fd_and_close/N could work, but that has all kinds of problems. Alternatively, if we could have a way to mark an fd so that it's close-on-exec after exec, that would solve the nesting problem, as long as every interpreter in the chain does it. And the kernel could certainly implement execve on a close-on-exec fd by passing /dev/fd/N where N is a close-on-exec fd, at least in the non-nested case. --Andy > Rich -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-16 22:34 ` Andy Lutomirski @ 2014-11-16 23:32 ` Rich Felker 2014-11-17 0:06 ` Andy Lutomirski 2014-11-17 15:42 ` David Drysdale 0 siblings, 2 replies; 14+ messages in thread From: Rich Felker @ 2014-11-16 23:32 UTC (permalink / raw) To: Andy Lutomirski Cc: libc-alpha, musl, Andrew Morton, David Drysdale, Linux API, Christoph Hellwig On Sun, Nov 16, 2014 at 02:34:32PM -0800, Andy Lutomirski wrote: > On Sun, Nov 16, 2014 at 2:08 PM, Rich Felker <dalias@aerifal.cx> wrote: > > On Sun, Nov 16, 2014 at 01:20:39PM -0800, Andy Lutomirski wrote: > >> On Nov 16, 2014 11:53 AM, "Rich Felker" <dalias@aerifal.cx> wrote: > >> > > >> > On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: > >> > > Hi, > >> > > > >> > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), > >> > > and it would be good to hear a glibc perspective about it (and whether there > >> > > are any interface changes that would make it easier to use from userspace). > >> > > > >> > > The syscall prototype is: > >> > > int execveat(int fd, const char *pathname, > >> > > char *const argv[], char *const envp[], > >> > > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ > >> > > and it works similarly to execve(2) except: > >> > > - the executable to run is identified by the combination of fd+pathname, like > >> > > other *at(2) syscalls > >> > > - there's an extra flags field to control behaviour. > >> > > (I've attached a text version of the suggested man page below) > >> > > > >> > > One particular benefit of this is that it allows an fexecve(3) implementation > >> > > that doesn't rely on /proc being accessible, which is useful for sandboxed > >> > > applications. (However, that does only work for non-interpreted programs: > >> > > the name passed to a script interpreter is of the form "/dev/fd/<fd>/<path>" > >> > > or "/dev/fd/<fd>", so the executed interpreter will normally still need /proc > >> > > access to load the script file). > >> > > > >> > > How does this sound from a glibc perspective? > >> > > >> > I've been following the discussions so far and everything looks mostly > >> > okay. There are still issues to be resolved with the different > >> > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and > >> > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to > >> > save the permissions at the time of open and cause them to be used in > >> > place of the current file permissions at the time of execveat > >> > >> Is something missing here? > >> > >> FWIW, I don't understand O_PATH or O_EXEC very well, so from my POV, > >> help would be appreciated. > > > > Yes. POSIX requires that permission checks for execution (fexecve with > > O_EXEC file descriptors) and directory-search (*at functions with > > O_SEARCH file descriptors) succeed if the open operation succeeded -- > > the permissions check is required to take place at open time rather > > than at exec/search time. There's a separate discussion about how to > > make this work on the kernel side. > > It may be worth making this work as part of adding execveat to the > kernel. Does the kernel even have O_EXEC right now? No. The proposal is that O_EXEC and O_SEARCH would both be equal to O_PATH|3 (3 being the rarely-used O_ACCMODE for "neither read or write, but some weird ioctls are accepted") which gracefully falls back for both current kernels with O_PATH (in which case the 3 is ignored and the discrepency from POSIX is just the time at which permissions are checked) and for pre-O_PATH kernels (in which case the access mode used is 3, and read/write ops fail on the fd, but it's still usable for fexecve and *at functions with /proc-based fallback implementations). I would be happy to see this work get done at the same time. > >> > One major issue however is FD_CLOEXEC with scripts. Last I checked, > >> > this didn't work because the file is already closed by the time the > >> > interpreted runs. The intended usage of fexecve is almost certainly to > >> > call it with the file descriptor set close-on-exec; otherwise, there > >> > would be no clean way to close it, since the program being executed > >> > doesn't know that it's being executed via fexecve. So this is a > >> > serious problem that needs to be solved if it hasn't already. I have > >> > some ideas I could offer, but I'm not an expert on the kernel side > >> > things so I'm not sure they'd be correct. > >> > >> Bring on the ideas. > > > > My thought is that when the kernel opens the binary and sees that it's > > a script that needs an interpreter, the kernel should not pass > > /proc/self/fd/%d to the interpreter, but instead should pass the name > > of a new magic symlink in /proc/self that's connected to the inode for > > the script to be executed but that ceases to exist as soon as it's > > opened. In theory this could also be used for suid scripts to make > > them secure. > > This doesn't help if /proc is not mounted, which is an important use case. I don't know what can be done in this case short of some really ugly hacks, like giving open() special behavior when the pathname points to a magic address in the argv region, or having the kernel create temp files in some magic path. > >> FWIW, I've often thought that interpreter binaries should mark > >> themselves as such to enable better interactions with the kernel. > > > > That's hard since users expect to be able to use arbitrary > > interpreters (and sometimes even pass through multiple ones, e.g. > > #!/usr/bin/env perl). > > Hmm. I'd be okay with old interpreters having a somewhat degraded experience. > > I guess that #!/some/interpreted/script isn't allowed, but maybe > #!/usr/bin/env some-interpreted-script should work. > > It could be that all that's really needed is some convention to tell > an interpreter that it should use fd N as a script *and close it*. > Something like /dev/fd_and_close/N could work, but that has all kinds > of problems. > > Alternatively, if we could have a way to mark an fd so that it's > close-on-exec after exec, that would solve the nesting problem, as > long as every interpreter in the chain does it. And the kernel could > certainly implement execve on a close-on-exec fd by passing /dev/fd/N > where N is a close-on-exec fd, at least in the non-nested case. This doesn't solve the problem of needing /proc though (/dev/fd is just a link to /proc/self/fd). Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-16 23:32 ` [musl] " Rich Felker @ 2014-11-17 0:06 ` Andy Lutomirski 2014-11-17 15:42 ` David Drysdale 1 sibling, 0 replies; 14+ messages in thread From: Andy Lutomirski @ 2014-11-17 0:06 UTC (permalink / raw) To: Rich Felker Cc: libc-alpha, musl, Andrew Morton, David Drysdale, Linux API, Christoph Hellwig On Sun, Nov 16, 2014 at 3:32 PM, Rich Felker <dalias@aerifal.cx> wrote: > On Sun, Nov 16, 2014 at 02:34:32PM -0800, Andy Lutomirski wrote: >> On Sun, Nov 16, 2014 at 2:08 PM, Rich Felker <dalias@aerifal.cx> wrote: >> > On Sun, Nov 16, 2014 at 01:20:39PM -0800, Andy Lutomirski wrote: >> >> On Nov 16, 2014 11:53 AM, "Rich Felker" <dalias@aerifal.cx> wrote: >> >> > >> >> > On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: >> >> > > Hi, >> >> > > >> >> > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), >> >> > > and it would be good to hear a glibc perspective about it (and whether there >> >> > > are any interface changes that would make it easier to use from userspace). >> >> > > >> >> > > The syscall prototype is: >> >> > > int execveat(int fd, const char *pathname, >> >> > > char *const argv[], char *const envp[], >> >> > > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ >> >> > > and it works similarly to execve(2) except: >> >> > > - the executable to run is identified by the combination of fd+pathname, like >> >> > > other *at(2) syscalls >> >> > > - there's an extra flags field to control behaviour. >> >> > > (I've attached a text version of the suggested man page below) >> >> > > >> >> > > One particular benefit of this is that it allows an fexecve(3) implementation >> >> > > that doesn't rely on /proc being accessible, which is useful for sandboxed >> >> > > applications. (However, that does only work for non-interpreted programs: >> >> > > the name passed to a script interpreter is of the form "/dev/fd/<fd>/<path>" >> >> > > or "/dev/fd/<fd>", so the executed interpreter will normally still need /proc >> >> > > access to load the script file). >> >> > > >> >> > > How does this sound from a glibc perspective? >> >> > >> >> > I've been following the discussions so far and everything looks mostly >> >> > okay. There are still issues to be resolved with the different >> >> > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and >> >> > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to >> >> > save the permissions at the time of open and cause them to be used in >> >> > place of the current file permissions at the time of execveat >> >> >> >> Is something missing here? >> >> >> >> FWIW, I don't understand O_PATH or O_EXEC very well, so from my POV, >> >> help would be appreciated. >> > >> > Yes. POSIX requires that permission checks for execution (fexecve with >> > O_EXEC file descriptors) and directory-search (*at functions with >> > O_SEARCH file descriptors) succeed if the open operation succeeded -- >> > the permissions check is required to take place at open time rather >> > than at exec/search time. There's a separate discussion about how to >> > make this work on the kernel side. >> >> It may be worth making this work as part of adding execveat to the >> kernel. Does the kernel even have O_EXEC right now? > > No. The proposal is that O_EXEC and O_SEARCH would both be equal to > O_PATH|3 (3 being the rarely-used O_ACCMODE for "neither read or > write, but some weird ioctls are accepted") which gracefully falls > back for both current kernels with O_PATH (in which case the 3 is > ignored and the discrepency from POSIX is just the time at which > permissions are checked) and for pre-O_PATH kernels (in which case the > access mode used is 3, and read/write ops fail on the fd, but it's > still usable for fexecve and *at functions with /proc-based fallback > implementations). > > I would be happy to see this work get done at the same time. > >> >> > One major issue however is FD_CLOEXEC with scripts. Last I checked, >> >> > this didn't work because the file is already closed by the time the >> >> > interpreted runs. The intended usage of fexecve is almost certainly to >> >> > call it with the file descriptor set close-on-exec; otherwise, there >> >> > would be no clean way to close it, since the program being executed >> >> > doesn't know that it's being executed via fexecve. So this is a >> >> > serious problem that needs to be solved if it hasn't already. I have >> >> > some ideas I could offer, but I'm not an expert on the kernel side >> >> > things so I'm not sure they'd be correct. >> >> >> >> Bring on the ideas. >> > >> > My thought is that when the kernel opens the binary and sees that it's >> > a script that needs an interpreter, the kernel should not pass >> > /proc/self/fd/%d to the interpreter, but instead should pass the name >> > of a new magic symlink in /proc/self that's connected to the inode for >> > the script to be executed but that ceases to exist as soon as it's >> > opened. In theory this could also be used for suid scripts to make >> > them secure. >> >> This doesn't help if /proc is not mounted, which is an important use case. > > I don't know what can be done in this case short of some really ugly > hacks, like giving open() special behavior when the pathname points to > a magic address in the argv region, or having the kernel create temp > files in some magic path. > >> >> FWIW, I've often thought that interpreter binaries should mark >> >> themselves as such to enable better interactions with the kernel. >> > >> > That's hard since users expect to be able to use arbitrary >> > interpreters (and sometimes even pass through multiple ones, e.g. >> > #!/usr/bin/env perl). >> >> Hmm. I'd be okay with old interpreters having a somewhat degraded experience. >> >> I guess that #!/some/interpreted/script isn't allowed, but maybe >> #!/usr/bin/env some-interpreted-script should work. >> >> It could be that all that's really needed is some convention to tell >> an interpreter that it should use fd N as a script *and close it*. >> Something like /dev/fd_and_close/N could work, but that has all kinds >> of problems. >> >> Alternatively, if we could have a way to mark an fd so that it's >> close-on-exec after exec, that would solve the nesting problem, as >> long as every interpreter in the chain does it. And the kernel could >> certainly implement execve on a close-on-exec fd by passing /dev/fd/N >> where N is a close-on-exec fd, at least in the non-nested case. > > This doesn't solve the problem of needing /proc though (/dev/fd is > just a link to /proc/self/fd). > Al Viro was talking about having a special fs just for /dev/fd. And interpreters could special-case path names of a certain form. --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-16 23:32 ` [musl] " Rich Felker 2014-11-17 0:06 ` Andy Lutomirski @ 2014-11-17 15:42 ` David Drysdale 2014-11-17 18:30 ` Rich Felker 1 sibling, 1 reply; 14+ messages in thread From: David Drysdale @ 2014-11-17 15:42 UTC (permalink / raw) To: Rich Felker Cc: Andy Lutomirski, libc-alpha, musl, Andrew Morton, Linux API, Christoph Hellwig On Sun, Nov 16, 2014 at 11:32 PM, Rich Felker <dalias@aerifal.cx> wrote: > On Sun, Nov 16, 2014 at 02:34:32PM -0800, Andy Lutomirski wrote: >> On Sun, Nov 16, 2014 at 2:08 PM, Rich Felker <dalias@aerifal.cx> wrote: >> > On Sun, Nov 16, 2014 at 01:20:39PM -0800, Andy Lutomirski wrote: >> >> On Nov 16, 2014 11:53 AM, "Rich Felker" <dalias@aerifal.cx> wrote: >> >> > >> >> > On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: >> >> > > Hi, >> >> > > >> >> > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), >> >> > > and it would be good to hear a glibc perspective about it (and whether there >> >> > > are any interface changes that would make it easier to use from userspace). >> >> > > >> >> > > The syscall prototype is: >> >> > > int execveat(int fd, const char *pathname, >> >> > > char *const argv[], char *const envp[], >> >> > > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ >> >> > > and it works similarly to execve(2) except: >> >> > > - the executable to run is identified by the combination of fd+pathname, like >> >> > > other *at(2) syscalls >> >> > > - there's an extra flags field to control behaviour. >> >> > > (I've attached a text version of the suggested man page below) >> >> > > >> >> > > One particular benefit of this is that it allows an fexecve(3) implementation >> >> > > that doesn't rely on /proc being accessible, which is useful for sandboxed >> >> > > applications. (However, that does only work for non-interpreted programs: >> >> > > the name passed to a script interpreter is of the form "/dev/fd/<fd>/<path>" >> >> > > or "/dev/fd/<fd>", so the executed interpreter will normally still need /proc >> >> > > access to load the script file). >> >> > > >> >> > > How does this sound from a glibc perspective? >> >> > >> >> > I've been following the discussions so far and everything looks mostly >> >> > okay. There are still issues to be resolved with the different >> >> > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and >> >> > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to >> >> > save the permissions at the time of open and cause them to be used in >> >> > place of the current file permissions at the time of execveat >> >> >> >> Is something missing here? >> >> >> >> FWIW, I don't understand O_PATH or O_EXEC very well, so from my POV, >> >> help would be appreciated. >> > >> > Yes. POSIX requires that permission checks for execution (fexecve with >> > O_EXEC file descriptors) and directory-search (*at functions with >> > O_SEARCH file descriptors) succeed if the open operation succeeded -- >> > the permissions check is required to take place at open time rather >> > than at exec/search time. There's a separate discussion about how to >> > make this work on the kernel side. I'm not familiar with O_EXEC either, I'm afraid, so to be clear -- does O_EXEC mean the permission check is explicitly skipped later, at execute time? In other words, if you open(O_EXEC) an executable then remove the execute bit from the file, does a subsequent fexecve() still work? If it does, then from an implementation perspective that presumably implies the need for a record of the permission check in the struct file (and that this property would be inherited by any dup()ed file descriptors). From a security perspective, having a gap between time-of-check and time-of-use always sounds worrying... >> >> It may be worth making this work as part of adding execveat to the >> kernel. Does the kernel even have O_EXEC right now? > > No. The proposal is that O_EXEC and O_SEARCH would both be equal to > O_PATH|3 (3 being the rarely-used O_ACCMODE for "neither read or > write, but some weird ioctls are accepted") which gracefully falls > back for both current kernels with O_PATH (in which case the 3 is > ignored and the discrepency from POSIX is just the time at which > permissions are checked) and for pre-O_PATH kernels (in which case the > access mode used is 3, and read/write ops fail on the fd, but it's > still usable for fexecve and *at functions with /proc-based fallback > implementations). > > I would be happy to see this work get done at the same time. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-17 15:42 ` David Drysdale @ 2014-11-17 18:30 ` Rich Felker 2014-11-21 10:10 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2014-11-17 18:30 UTC (permalink / raw) To: David Drysdale Cc: Andy Lutomirski, libc-alpha, musl, Andrew Morton, Linux API, Christoph Hellwig On Mon, Nov 17, 2014 at 03:42:15PM +0000, David Drysdale wrote: > I'm not familiar with O_EXEC either, I'm afraid, so to be clear -- does > O_EXEC mean the permission check is explicitly skipped later, at execute > time? In other words, if you open(O_EXEC) an executable then remove the > execute bit from the file, does a subsequent fexecve() still work? Yes. It's just like how read and write permissions work. If you open a file for read then remove read permissions, or open it for write then remove write permissions, the existing permissions to the open file are not lost. Of course open with O_EXEC/O_SEARCH needs to fail if the caller does not have +x access to the file/directory at the time of open. > If it does, then from an implementation perspective that presumably implies > the need for a record of the permission check in the struct file (and that > this property would be inherited by any dup()ed file descriptors). From a > security perspective, having a gap between time-of-check and time-of-use > always sounds worrying... This record already exists for read and write. All that's needed is for an extra bit to be added to record exec/search permission. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-17 18:30 ` Rich Felker @ 2014-11-21 10:10 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2014-11-21 10:10 UTC (permalink / raw) To: Rich Felker Cc: David Drysdale, Andy Lutomirski, libc-alpha, musl, Andrew Morton, Linux API, Christoph Hellwig On Mon, Nov 17, 2014 at 01:30:10PM -0500, Rich Felker wrote: > On Mon, Nov 17, 2014 at 03:42:15PM +0000, David Drysdale wrote: > > I'm not familiar with O_EXEC either, I'm afraid, so to be clear -- does > > O_EXEC mean the permission check is explicitly skipped later, at execute > > time? In other words, if you open(O_EXEC) an executable then remove the > > execute bit from the file, does a subsequent fexecve() still work? > > Yes. It's just like how read and write permissions work. If you open a > file for read then remove read permissions, or open it for write then > remove write permissions, the existing permissions to the open file > are not lost. Of course open with O_EXEC/O_SEARCH needs to fail if the > caller does not have +x access to the file/directory at the time of > open. Adding a FMODE_EXEC similar to FMODE_READ/WRITE would be trivial. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-16 19:52 ` Rich Felker 2014-11-16 21:21 ` Andy Lutomirski @ 2014-11-21 10:13 ` Christoph Hellwig 2014-11-21 13:50 ` David Drysdale 2014-11-21 14:11 ` Rich Felker 1 sibling, 2 replies; 14+ messages in thread From: Christoph Hellwig @ 2014-11-21 10:13 UTC (permalink / raw) To: Rich Felker Cc: David Drysdale, libc-alpha, Andrew Morton, Christoph Hellwig, Linux API, Andy Lutomirski, musl On Sun, Nov 16, 2014 at 02:52:46PM -0500, Rich Felker wrote: > I've been following the discussions so far and everything looks mostly > okay. There are still issues to be resolved with the different > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to > save the permissions at the time of open and cause them to be used in > place of the current file permissions at the time of execveat As far as I can tell we only need the little patch below to make Linux O_PATH a valid O_SEARCH implementation. Rich, you said you wanted to look over it? For O_EXEC my interpretation is that we basically just need this new execveat syscall + a patch to add FMODE_EXEC and enforce it. So we wouldn't even need the O_PATH|3 hack. But unless someone more familar with the arcane details of the Posix language verifies it I'm tempted to give up trying to help to implent these flags :( diff --git a/fs/open.c b/fs/open.c index d6fd3ac..ee24720 100644 --- a/fs/open.c +++ b/fs/open.c @@ -512,7 +512,7 @@ out_unlock: SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode) { - struct fd f = fdget(fd); + struct fd f = fdget_raw(fd); int err = -EBADF; if (f.file) { @@ -633,7 +633,7 @@ SYSCALL_DEFINE3(lchown, const char __user *, filename, uid_t, user, gid_t, group SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group) { - struct fd f = fdget(fd); + struct fd f = fdget_raw(fd); int error = -EBADF; if (!f.file) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-21 10:13 ` Christoph Hellwig @ 2014-11-21 13:50 ` David Drysdale 2014-11-21 14:15 ` [musl] " Rich Felker 2014-11-21 14:11 ` Rich Felker 1 sibling, 1 reply; 14+ messages in thread From: David Drysdale @ 2014-11-21 13:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Rich Felker, libc-alpha, Andrew Morton, Linux API, Andy Lutomirski, musl On Fri, Nov 21, 2014 at 10:13 AM, Christoph Hellwig <hch@infradead.org> wrote: > On Sun, Nov 16, 2014 at 02:52:46PM -0500, Rich Felker wrote: >> I've been following the discussions so far and everything looks mostly >> okay. There are still issues to be resolved with the different >> semantics between Linux O_PATH and what POSIX requires for O_EXEC (and >> O_SEARCH) but as long as the intent is that, once O_EXEC is defined to >> save the permissions at the time of open and cause them to be used in >> place of the current file permissions at the time of execveat > > As far as I can tell we only need the little patch below to make Linux > O_PATH a valid O_SEARCH implementation. Rich, you said you wanted to > look over it? > > For O_EXEC my interpretation is that we basically just need this new > execveat syscall + a patch to add FMODE_EXEC and enforce it. So we > wouldn't even need the O_PATH|3 hack. But unless someone more familar > with the arcane details of the Posix language verifies it I'm tempted to > give up trying to help to implent these flags :( I'm not particularly familiar with POSIX details either, but I thought the O_PATH|3 hack would be needed for the interaction with O_ACCMODE -- just using FMODE_EXEC as O_EXEC would confuse existing code that examines (flags & O_ACCMODE). From [1]: "Applications shall specify exactly one of the ...five ... file access modes ... O_EXEC / O_RDONLY / O_RDWR / O_SEARCH / O_WRONLY" (and O_EXEC and O_SEARCH are allowed to be the same value, as one only applies to files and the other only applies to directories). As O_ACCMODE is 3, there are only 4 possible access modes that work with any existing code that checks (flags & O_ACCMODE), and 3 of the values are taken (0=O_RDONLY, 1=O_WRONLY, 2=O_RDWR). So I guess that's where the idea for the |3 hack comes from. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-21 13:50 ` David Drysdale @ 2014-11-21 14:15 ` Rich Felker 0 siblings, 0 replies; 14+ messages in thread From: Rich Felker @ 2014-11-21 14:15 UTC (permalink / raw) To: David Drysdale Cc: Christoph Hellwig, libc-alpha, Andrew Morton, Linux API, Andy Lutomirski, musl On Fri, Nov 21, 2014 at 01:49:35PM +0000, David Drysdale wrote: > On Fri, Nov 21, 2014 at 10:13 AM, Christoph Hellwig <hch@infradead.org> wrote: > > On Sun, Nov 16, 2014 at 02:52:46PM -0500, Rich Felker wrote: > >> I've been following the discussions so far and everything looks mostly > >> okay. There are still issues to be resolved with the different > >> semantics between Linux O_PATH and what POSIX requires for O_EXEC (and > >> O_SEARCH) but as long as the intent is that, once O_EXEC is defined to > >> save the permissions at the time of open and cause them to be used in > >> place of the current file permissions at the time of execveat > > > > As far as I can tell we only need the little patch below to make Linux > > O_PATH a valid O_SEARCH implementation. Rich, you said you wanted to > > look over it? > > > > For O_EXEC my interpretation is that we basically just need this new > > execveat syscall + a patch to add FMODE_EXEC and enforce it. So we > > wouldn't even need the O_PATH|3 hack. But unless someone more familar > > with the arcane details of the Posix language verifies it I'm tempted to > > give up trying to help to implent these flags :( > > I'm not particularly familiar with POSIX details either, but I thought the > O_PATH|3 hack would be needed for the interaction with O_ACCMODE -- just > using FMODE_EXEC as O_EXEC would confuse existing code that examines > (flags & O_ACCMODE). To conform to POSIX, O_ACCMODE needs to contain all the bits of O_RDONLY|O_WRONLY|O_RDWR|O_SEARCH|O_EXEC. Certainly it's possible that code compiled with an old definition of O_ACCMODE as 3 could inherit (or otherwise obtain) a file descriptor in O_SEARCH/O_EXEC mode, so it's preferable to have the low 2 bits be distinct from the existing access modes, but O_ACCMODE's definition (at least in userspace) really does need to be updated to equal O_PATH|3. > >From [1]: > "Applications shall specify exactly one of the ...five ... file access > modes ... O_EXEC / O_RDONLY / O_RDWR / O_SEARCH / O_WRONLY" > (and O_EXEC and O_SEARCH are allowed to be the same value, > as one only applies to files and the other only applies to directories). > > As O_ACCMODE is 3, there are only 4 possible access modes that work > with any existing code that checks (flags & O_ACCMODE), and 3 of the > values are taken (0=O_RDONLY, 1=O_WRONLY, 2=O_RDWR). So I > guess that's where the idea for the |3 hack comes from. 3 is also "taken" too, but it's a mostly-undocumented hack. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Re: [RFC] Possible new execveat(2) Linux syscall 2014-11-21 10:13 ` Christoph Hellwig 2014-11-21 13:50 ` David Drysdale @ 2014-11-21 14:11 ` Rich Felker 1 sibling, 0 replies; 14+ messages in thread From: Rich Felker @ 2014-11-21 14:11 UTC (permalink / raw) To: Christoph Hellwig Cc: David Drysdale, libc-alpha, Andrew Morton, Linux API, Andy Lutomirski, musl On Fri, Nov 21, 2014 at 02:13:18AM -0800, Christoph Hellwig wrote: > On Sun, Nov 16, 2014 at 02:52:46PM -0500, Rich Felker wrote: > > I've been following the discussions so far and everything looks mostly > > okay. There are still issues to be resolved with the different > > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and > > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to > > save the permissions at the time of open and cause them to be used in > > place of the current file permissions at the time of execveat > > As far as I can tell we only need the little patch below to make Linux > O_PATH a valid O_SEARCH implementation. Rich, you said you wanted to > look over it? I think the below looks correct, but it's not complete. The *at functions also need to use FMODE_EXEC rather than rechecking +x permissions at the time of the operation. > For O_EXEC my interpretation is that we basically just need this new > execveat syscall + a patch to add FMODE_EXEC and enforce it. So we > wouldn't even need the O_PATH|3 hack. But unless someone more familar > with the arcane details of the Posix language verifies it I'm tempted to > give up trying to help to implent these flags :( O_EXEC/O_SEARCH cannot be equal to O_PATH, because of differing semantics on open. With O_NOFOLLOW, O_PATH yields a file descriptor referring to the symlink itself. With O_EXEC or O_SEARCH, O_NOFOLLOW is required to make open fail if the target is a symlink. It would be a serious regression to eliminate the ability of O_PATH to open symlinks like this. Note that enforcing O_NOFOLLOW failure on symlinks can be implemented in userspace instead of (or in addition to, for better behavior with old kernels) kernelspace, but it still requires a different value from O_PATH or userspace would be eliminating access to an important O_PATH feature. Further, O_PATH|3 was the best value I could find to yield nearly reasonable fallback behavior on most old kernels. Simply using 3 fails to open directories and files to which the caller does not have write permission (mode 3 is a nearly-undocumented hack for opening devices for ioctl-only read-write access, it seems). On pre-O_PATH kernels, using O_PATH|3 would fallback to this failing case, yielding spurious failure-to-open for all O_SEARCH and some O_EXEC operations, but those kernels are old enough to be irrelevant to most users anyway. On kernels that do have O_PATH, using O_PATH|3 ignores the 3 and yields the current O_PATH semantics, which are nearly correct. Of course O_PATH|1 or O_PATH|2 would also work in principle, as would adding a completely new bit in addition to O_PATH, but these all seem less desirable. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-11-21 14:15 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-11-14 14:54 [RFC] Possible new execveat(2) Linux syscall David Drysdale 2014-11-16 19:52 ` Rich Felker 2014-11-16 21:21 ` Andy Lutomirski 2014-11-16 22:09 ` Rich Felker 2014-11-16 22:34 ` Andy Lutomirski 2014-11-16 23:32 ` [musl] " Rich Felker 2014-11-17 0:06 ` Andy Lutomirski 2014-11-17 15:42 ` David Drysdale 2014-11-17 18:30 ` Rich Felker 2014-11-21 10:10 ` Christoph Hellwig 2014-11-21 10:13 ` Christoph Hellwig 2014-11-21 13:50 ` David Drysdale 2014-11-21 14:15 ` [musl] " Rich Felker 2014-11-21 14:11 ` Rich Felker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).