public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-02-01 16:21 [PATCH 0/3] posix: Execute file function fixes Adhemerval Zanella
@ 2016-02-01 16:21 ` Adhemerval Zanella
  2016-02-02 13:06   ` Florian Weimer
  2016-08-31 21:13   ` Rasmus Villemoes
  2016-02-01 16:21 ` [PATCH v4 2/3] posix: execvpe cleanup Adhemerval Zanella
  2016-02-01 16:21 ` [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p} Adhemerval Zanella
  2 siblings, 2 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-01 16:21 UTC (permalink / raw)
  To: libc-alpha

This patch implements a new posix_spawn{p} implementation for Linux.  The main
difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK
flags and a direct allocated stack.  The new stack and start function solves
most the vfork limitation (possible parent clobber due stack spilling).  The
remaning issue are related to signal handling:

  1. That no signal handlers must run in child context, to avoid corrupt
     parent's state.
  2. Child must synchronize with parent to enforce stack deallocation and
     to possible return execv issues.

The first one is solved by blocking all signals in child, even NPTL-internal
ones (SIGCANCEL and SIGSETXID).  The second issue is done by a stack allocation
in parent and a synchronization with using a pipe or waitpid (in case or error).
The pipe has the advantage of allowing the child signal an exec error (checked
with new tst-spawn2 test).

There is an inherent race condition in pipe2 usage for architectures that do not
support the syscall directly.  In such cases the a pipe plus fctnl is used
instead and it may lead to file descriptor leak in parent (as decribed by fcntl
documentation).

The child process stack is allocate with a mmap with MAP_STACK flag using
default architecture stack size.  Although it is slower than use a stack buffer
from parent, it allows some slack for the compatibility code to run scripts
with no shebang (which may use a buffer with size depending of argument list
count).

Performance should be similar to the vfork default posix implementation and
way faster than fork path (vfork on mostly linux ports are basically
clone with CLONE_VM plus CLONE_VFORK).  The only difference is the syscalls
required for the stack allocation/deallocation.

It fixes BZ#10354, BZ#14750, and BZ#18433.

Changes from previous version:

- Remove O_CLOCEXEC define (since one can assume it always defined within
  glibc/linux).

Tested on i386, x86_64, powerpc64le, and aarch64.

	[BZ #14750]
	[BZ #10354]
	[BZ #18433]
	* include/sched.h (__clone): Add hidden prototype.
	(__clone2): Likewise.
	* include/unistd.h (__dup): Likewise.
	* posix/Makefile (tests): Add tst-spawn2.
	* posix/tst-spawn2.c: New file.
	* sysdeps/posix/dup.c (__dup): Add hidden definition.
	* sysdeps/unix/sysv/linux/aarch64/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/alpha/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/hppa/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/i386/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/ia64/clone2.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/m68k/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/microblaze/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/mips/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/nios2/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S (__clone):
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S (__clone):
	Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/sh/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/tile/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/x86_64/clone.S (__clone): Likewise.
	* sysdeps/unix/sysv/linux/nptl-signals.h
	(____nptl_is_internal_signal): New function.
	* sysdeps/unix/sysv/linux/spawni.c: New file.
---
 include/sched.h                                   |   2 +
 include/unistd.h                                  |   1 +
 posix/Makefile                                    |   2 +-
 posix/tst-spawn2.c                                |  73 ++++
 sysdeps/posix/dup.c                               |   2 +-
 sysdeps/unix/sysv/linux/aarch64/clone.S           |   1 +
 sysdeps/unix/sysv/linux/alpha/clone.S             |   1 +
 sysdeps/unix/sysv/linux/arm/clone.S               |   1 +
 sysdeps/unix/sysv/linux/hppa/clone.S              |   1 +
 sysdeps/unix/sysv/linux/i386/clone.S              |   1 +
 sysdeps/unix/sysv/linux/ia64/clone2.S             |   2 +
 sysdeps/unix/sysv/linux/m68k/clone.S              |   1 +
 sysdeps/unix/sysv/linux/microblaze/clone.S        |   1 +
 sysdeps/unix/sysv/linux/mips/clone.S              |   1 +
 sysdeps/unix/sysv/linux/nios2/clone.S             |   1 +
 sysdeps/unix/sysv/linux/nptl-signals.h            |  10 +
 sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S |   1 +
 sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S |   1 +
 sysdeps/unix/sysv/linux/s390/s390-32/clone.S      |   2 +
 sysdeps/unix/sysv/linux/s390/s390-64/clone.S      |   2 +
 sysdeps/unix/sysv/linux/sh/clone.S                |   1 +
 sysdeps/unix/sysv/linux/sparc/sparc32/clone.S     |   1 +
 sysdeps/unix/sysv/linux/sparc/sparc64/clone.S     |   1 +
 sysdeps/unix/sysv/linux/spawni.c                  | 424 ++++++++++++++++++++++
 sysdeps/unix/sysv/linux/tile/clone.S              |   1 +
 sysdeps/unix/sysv/linux/x86_64/clone.S            |   1 +
 27 files changed, 568 insertions(+), 2 deletions(-)
 create mode 100644 posix/tst-spawn2.c
 create mode 100644 sysdeps/unix/sysv/linux/spawni.c

diff --git a/include/sched.h b/include/sched.h
index 4f59397..b4d7406 100644
--- a/include/sched.h
+++ b/include/sched.h
@@ -19,7 +19,9 @@ extern int __sched_rr_get_interval (__pid_t __pid, struct timespec *__t);
 /* These are Linux specific.  */
 extern int __clone (int (*__fn) (void *__arg), void *__child_stack,
 		    int __flags, void *__arg, ...);
+libc_hidden_proto (__clone)
 extern int __clone2 (int (*__fn) (void *__arg), void *__child_stack_base,
 		     size_t __child_stack_size, int __flags, void *__arg, ...);
+libc_hidden_proto (__clone2)
 #endif
 #endif
diff --git a/include/unistd.h b/include/unistd.h
index 5152f64..d2802b2 100644
--- a/include/unistd.h
+++ b/include/unistd.h
@@ -81,6 +81,7 @@ char *__canonicalize_directory_name_internal (const char *__thisdir,
 					      size_t __size) attribute_hidden;
 
 extern int __dup (int __fd);
+libc_hidden_proto (__dup)
 extern int __dup2 (int __fd, int __fd2);
 libc_hidden_proto (__dup2)
 extern int __dup3 (int __fd, int __fd2, int flags);
diff --git a/posix/Makefile b/posix/Makefile
index e251363..906817e 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -93,7 +93,7 @@ tests		:= tstgetopt testfnm runtests runptests	     \
 xtests		:= bug-ga2
 ifeq (yes,$(build-shared))
 test-srcs	:= globtest
-tests           += wordexp-test tst-exec tst-spawn
+tests           += wordexp-test tst-exec tst-spawn tst-spawn2
 endif
 tests-static	= tst-exec-static tst-spawn-static
 tests		+= $(tests-static)
diff --git a/posix/tst-spawn2.c b/posix/tst-spawn2.c
new file mode 100644
index 0000000..41bfde8
--- /dev/null
+++ b/posix/tst-spawn2.c
@@ -0,0 +1,73 @@
+/* Further tests for spawn in case of invalid binary paths.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+#include <error.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/wait.h>
+
+#include <stdio.h>
+
+int
+posix_spawn_test (void)
+{
+  /* Check if posix_spawn correctly returns an error and an invalid pid
+     by trying to spawn an invalid binary.  */
+
+  const char *program = "/path/to/invalid/binary";
+  char * const args[] = { 0 };
+  pid_t pid = -1;
+
+  int ret = posix_spawn (&pid, program, 0, 0, args, environ);
+  if (ret != ENOENT)
+    error (EXIT_FAILURE, errno, "posix_spawn");
+
+  /* POSIX states the value returned on pid variable in case of an error
+     is not specified.  GLIBC will update the value iff the child 
+     execution is successful.  */
+  if (pid != -1)
+    error (EXIT_FAILURE, errno, "posix_spawn returned pid != -1");
+
+  /* Check if no child is actually created.  */
+  ret = waitpid (-1, NULL, 0);
+  if (ret != -1 || errno != ECHILD)
+    error (EXIT_FAILURE, errno, "waitpid");
+
+  /* Same as before, but with posix_spawnp.  */
+  char *args2[] = { (char*) program, 0 };
+
+  ret = posix_spawnp (&pid, args2[0], 0, 0, args2, environ);
+  if (ret != ENOENT)
+    error (EXIT_FAILURE, errno, "posix_spawnp");
+  
+  if (pid != -1)
+    error (EXIT_FAILURE, errno, "posix_spawnp returned pid != -1");
+
+  ret = waitpid (-1, NULL, 0);
+  if (ret != -1 || errno != ECHILD)
+    error (EXIT_FAILURE, errno, "waitpid");
+
+  return 0;
+}
+
+#define TEST_FUNCTION  posix_spawn_test ()
+#include "../test-skeleton.c"
+
diff --git a/sysdeps/posix/dup.c b/sysdeps/posix/dup.c
index 9525e76..abf3ff5 100644
--- a/sysdeps/posix/dup.c
+++ b/sysdeps/posix/dup.c
@@ -26,5 +26,5 @@ __dup (int fd)
 {
   return fcntl (fd, F_DUPFD, 0);
 }
-
+libc_hidden_def (__dup)
 weak_alias (__dup, dup)
diff --git a/sysdeps/unix/sysv/linux/aarch64/clone.S b/sysdeps/unix/sysv/linux/aarch64/clone.S
index 596fb9c..8f3ab40 100644
--- a/sysdeps/unix/sysv/linux/aarch64/clone.S
+++ b/sysdeps/unix/sysv/linux/aarch64/clone.S
@@ -93,4 +93,5 @@ thread_start:
 	cfi_endproc
 	.size thread_start, .-thread_start
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/alpha/clone.S b/sysdeps/unix/sysv/linux/alpha/clone.S
index ec9c06d..556aa63 100644
--- a/sysdeps/unix/sysv/linux/alpha/clone.S
+++ b/sysdeps/unix/sysv/linux/alpha/clone.S
@@ -137,4 +137,5 @@ thread_start:
 	cfi_endproc
 	.end thread_start
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/arm/clone.S b/sysdeps/unix/sysv/linux/arm/clone.S
index 460a8f5..c103123 100644
--- a/sysdeps/unix/sysv/linux/arm/clone.S
+++ b/sysdeps/unix/sysv/linux/arm/clone.S
@@ -93,4 +93,5 @@ PSEUDO_END (__clone)
 
 	.fnend
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/hppa/clone.S b/sysdeps/unix/sysv/linux/hppa/clone.S
index e80fd8d..0a18137 100644
--- a/sysdeps/unix/sysv/linux/hppa/clone.S
+++ b/sysdeps/unix/sysv/linux/hppa/clone.S
@@ -175,4 +175,5 @@ ENTRY(__clone)
 
 PSEUDO_END(__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/i386/clone.S b/sysdeps/unix/sysv/linux/i386/clone.S
index 7d818c1..ef447d1 100644
--- a/sysdeps/unix/sysv/linux/i386/clone.S
+++ b/sysdeps/unix/sysv/linux/i386/clone.S
@@ -139,4 +139,5 @@ L(nomoregetpid):
 	cfi_startproc
 PSEUDO_END (__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/ia64/clone2.S b/sysdeps/unix/sysv/linux/ia64/clone2.S
index 9b257b3..fc046eb 100644
--- a/sysdeps/unix/sysv/linux/ia64/clone2.S
+++ b/sysdeps/unix/sysv/linux/ia64/clone2.S
@@ -96,6 +96,8 @@ ENTRY(__clone2)
 	ret			/* Not reached.		*/
 PSEUDO_END(__clone2)
 
+libc_hidden_def (__clone2)
+
 /* For now we leave __clone undefined.  This is unlikely to be a	*/
 /* problem, since at least the i386 __clone in glibc always failed	*/
 /* with a 0 sp (eventhough the kernel explicitly handled it).		*/
diff --git a/sysdeps/unix/sysv/linux/m68k/clone.S b/sysdeps/unix/sysv/linux/m68k/clone.S
index 8b40df2..33474cc 100644
--- a/sysdeps/unix/sysv/linux/m68k/clone.S
+++ b/sysdeps/unix/sysv/linux/m68k/clone.S
@@ -127,4 +127,5 @@ donepid:
 	cfi_startproc
 PSEUDO_END (__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/microblaze/clone.S b/sysdeps/unix/sysv/linux/microblaze/clone.S
index 035d88b..cbc95ce 100644
--- a/sysdeps/unix/sysv/linux/microblaze/clone.S
+++ b/sysdeps/unix/sysv/linux/microblaze/clone.S
@@ -69,4 +69,5 @@ L(thread_start):
 	nop
 PSEUDO_END(__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone,clone)
diff --git a/sysdeps/unix/sysv/linux/mips/clone.S b/sysdeps/unix/sysv/linux/mips/clone.S
index 755e8cc..feb8250 100644
--- a/sysdeps/unix/sysv/linux/mips/clone.S
+++ b/sysdeps/unix/sysv/linux/mips/clone.S
@@ -166,4 +166,5 @@ L(gotpid):
 
 	END(__thread_start)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/nios2/clone.S b/sysdeps/unix/sysv/linux/nios2/clone.S
index 4da5c19..e4d3970 100644
--- a/sysdeps/unix/sysv/linux/nios2/clone.S
+++ b/sysdeps/unix/sysv/linux/nios2/clone.S
@@ -104,4 +104,5 @@ thread_start:
 
 	cfi_startproc
 PSEUDO_END (__clone)
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/nptl-signals.h b/sysdeps/unix/sysv/linux/nptl-signals.h
index 7560a21..01f34c2 100644
--- a/sysdeps/unix/sysv/linux/nptl-signals.h
+++ b/sysdeps/unix/sysv/linux/nptl-signals.h
@@ -16,6 +16,8 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <signal.h>
+
 /* The signal used for asynchronous cancelation.  */
 #define SIGCANCEL       __SIGRTMIN
 
@@ -29,5 +31,13 @@
 /* Signal used to implement the setuid et.al. functions.  */
 #define SIGSETXID       (__SIGRTMIN + 1)
 
+
+/* Return is sig is used internally.  */
+static inline int
+__nptl_is_internal_signal (int sig)
+{
+  return (sig == SIGCANCEL) || (sig == SIGTIMER) || (sig == SIGSETXID);
+}
+
 /* Used to communicate with signal handler.  */
 extern struct xid_command *__xidcmd attribute_hidden;
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S b/sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S
index 28948ea..eb973db 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S
@@ -108,4 +108,5 @@ L(badargs):
 	cfi_startproc
 END (__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
index c8c6de8..959fdb7 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S
@@ -126,4 +126,5 @@ L(parent):
 
 END (__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/clone.S b/sysdeps/unix/sysv/linux/s390/s390-32/clone.S
index cb2afbb..f774e90 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/clone.S
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/clone.S
@@ -70,4 +70,6 @@ thread_start:
 	xc	0(4,%r15),0(%r15)
 	basr    %r14,%r1        /* jump to fn */
 	DO_CALL (exit, 1)
+
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/clone.S b/sysdeps/unix/sysv/linux/s390/s390-64/clone.S
index eddab35..928a881 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/clone.S
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/clone.S
@@ -73,4 +73,6 @@ thread_start:
 	xc	0(8,%r15),0(%r15)
 	basr	%r14,%r1	/* jump to fn */
 	DO_CALL	(exit, 1)
+
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/sh/clone.S b/sysdeps/unix/sysv/linux/sh/clone.S
index 53cc08b..a73dd7d 100644
--- a/sysdeps/unix/sysv/linux/sh/clone.S
+++ b/sysdeps/unix/sysv/linux/sh/clone.S
@@ -123,4 +123,5 @@ ENTRY(__clone)
 	.word	TID - TLS_PRE_TCB_SIZE
 PSEUDO_END (__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/clone.S b/sysdeps/unix/sysv/linux/sparc/sparc32/clone.S
index 6f41f0f..68f8202 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/clone.S
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/clone.S
@@ -100,4 +100,5 @@ __thread_start:
 
 	.size	__thread_start, .-__thread_start
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/clone.S b/sysdeps/unix/sysv/linux/sparc/sparc64/clone.S
index e2d3914..cecffa7 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/clone.S
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/clone.S
@@ -96,4 +96,5 @@ __thread_start:
 
 	.size	__thread_start, .-__thread_start
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
new file mode 100644
index 0000000..407d696
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -0,0 +1,424 @@
+/* POSIX spawn interface.  Linux version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <spawn.h>
+#include <fcntl.h>
+#include <paths.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <sys/wait.h>
+#include <sys/param.h>
+#include <sys/mman.h>
+#include <not-cancel.h>
+#include <local-setxid.h>
+#include <shlib-compat.h>
+#include <nptl/pthreadP.h>
+#include <dl-sysdep.h>
+#include <ldsodefs.h>
+#include "spawn_int.h"
+
+/* The Linux implementation of posix_spawn{p} uses the clone syscall directly
+   with CLONE_VM and CLONE_VFORK flags and an allocated stack.  The new stack
+   and start function solves most the vfork limitation (possible parent
+   clobber due stack spilling). The remaning issue are:
+
+   1. That no signal handlers must run in child context, to avoid corrupt
+      parent's state.
+   2. The parent must ensure child's stack freeing.
+   3. Child must synchronize with parent to enforce 2. and to possible
+      return execv issues.
+
+   The first issue is solved by blocking all signals in child, even the
+   NPTL-internal ones (SIGCANCEL and SIGSETXID).  The second and third issue
+   is done by a stack allocation in parent and a synchronization with using
+   a pipe or waitpid (in case or error).  The pipe has the advantage of
+   allowing the child the signal an exec error.  */
+
+
+/* The Unix standard contains a long explanation of the way to signal
+   an error after the fork() was successful.  Since no new wait status
+   was wanted there is no way to signal an error using one of the
+   available methods.  The committee chose to signal an error by a
+   normal program exit with the exit code 127.  */
+#define SPAWN_ERROR	127
+
+/* We need to block both SIGCANCEL and SIGSETXID.  */
+#define SIGALL_SET \
+  ((__sigset_t) { .__val = {[0 ...  _SIGSET_NWORDS-1 ] =  -1 } })
+
+#ifdef __ia64__
+# define CLONE(__fn, __stack, __flags, __args) \
+  __clone2 (__fn, __stack, __flags, __args, 0, 0, 0)
+#else
+# define CLONE(__fn, __stack, __flags, __args) \
+  __clone (__fn, __stack, __flags, __args)
+#endif
+
+#if _STACK_GROWS_DOWN
+# define STACK(__stack, __stack_size) (__stack + __stack_size)
+#elif _STACK_GROWS_UP
+# define STACK(__stack, __stack_size) (__stack)
+#endif
+
+/* Issues a 'pipe2' when the architecture supports or a 'pipe' plus O_CLOEXEC
+   otherwise.  When pipe2 syscall is not available the pipe plus fcntl have a
+   inherent race condition (pipe2 is the right solution to avoid it).  */
+
+#ifdef __ASSUME_PIPE2
+# define __have_pipe2 1
+#endif
+
+static int
+__do_pipe2_cloexec (int pipe_fds[2])
+{
+  /* Try pipe2 even if __ASSUME_PIPE2 is not define and returning an error
+     iff the call returns ENOSYS.  */
+  int r = -1;
+  if (__have_pipe2 >= 0)
+    {
+      r = __pipe2 (pipe_fds, O_CLOEXEC);
+#ifndef __ASSUME_PIPE2
+      if (__have_pipe2 == 0)
+	__have_pipe2 = r != -1 || errno != ENOSYS ? 1 : -1;
+#endif
+
+      if (__have_pipe2 > 0)
+	return r;
+    }
+  if (__have_pipe2 < 0)
+    if (__pipe (pipe_fds) < 0)
+      return -1;
+
+  /* Then apply FD_CLOCEXEC if it is supported in the pipe call case.  */
+  if (__have_pipe2 < 0)
+    {
+      if ((r = __fcntl (pipe_fds[0], F_SETFD, FD_CLOEXEC)) == -1
+	  || (r = __fcntl (pipe_fds[1], F_SETFD, FD_CLOEXEC)) == -1)
+	{
+	  close_not_cancel (pipe_fds[0]);
+	  close_not_cancel (pipe_fds[1]);
+	  return r;
+	}
+    }
+
+  return r;
+}
+
+struct posix_spawn_args
+{
+  int pipe[2];
+  sigset_t oldmask;
+  const char *file;
+  int (*exec) (const char *, char *const *, char *const *);
+  const posix_spawn_file_actions_t *fa;
+  const posix_spawnattr_t *restrict attr;
+  char *const *argv;
+  char *const *envp;
+  int xflags;
+};
+
+/* Older version requires that shell script without shebang definition
+   to be called explicity using /bin/sh (_PATH_BSHELL).  */
+static void
+maybe_script_execute (struct posix_spawn_args *args)
+{
+  if (SHLIB_COMPAT (libc, GLIBC_2_2, GLIBC_2_15)
+      && (args->xflags & SPAWN_XFLAGS_TRY_SHELL) && errno == ENOEXEC)
+    {
+      char *const *argv = args->argv;
+      int argc = 0;
+      while (argv[argc++] != NULL)
+	continue;
+
+      /* Construct an argument list for the shell.  */
+      char *new_argv[argc];
+      new_argv[0] = (char *) _PATH_BSHELL;
+      new_argv[1] = (char *) args->file;
+      while (argc > 1)
+	{
+	  new_argv[argc] = argv[argc - 1];
+	  --argc;
+	}
+
+      /* Execute the shell.  */
+      args->exec (new_argv[0], new_argv, args->envp);
+    }
+}
+
+/* Function used in the clone call to setup the signals mask, posix_spawn
+   attributes, and file actions.  It run on its own stack (provided by the
+   posix_spawn call).  */
+static int
+__spawni_child (void *arguments)
+{
+  struct posix_spawn_args *args = arguments;
+  const posix_spawnattr_t *restrict attr = args->attr;
+  const posix_spawn_file_actions_t *file_actions = args->fa;
+  int p = args->pipe[1];
+  int ret;
+
+  close_not_cancel (args->pipe[0]);
+
+  /* The child must ensure that no signal handler are enabled because it shared
+     memory with parent, so the signal disposition must be either SIG_DFL or
+     SIG_IGN.  It does by iterating over all signals and although it could 
+     possibly be more optimized (by tracking which signal potentially have a
+     signal handler), it might requires system specific solutions (since the
+     sigset_t data type can be very different on different architectures).  */
+  struct sigaction sa;
+  memset (&sa, '\0', sizeof (sa));
+
+  sigset_t hset;
+  __sigprocmask (SIG_BLOCK, 0, &hset);
+  for (int sig = 1; sig < _NSIG; ++sig)
+    {
+      if ((attr->__flags & POSIX_SPAWN_SETSIGDEF)
+	  && sigismember (&attr->__sd, sig))
+	{
+	  sa.sa_handler = SIG_DFL;
+	}
+      else if (sigismember (&hset, sig))
+	{
+	  if (__nptl_is_internal_signal (sig))
+	    sa.sa_handler = SIG_IGN;
+	  else
+	    {
+	      __libc_sigaction (sig, 0, &sa);
+	      if (sa.sa_handler == SIG_IGN)
+		continue;
+	      sa.sa_handler = SIG_DFL;
+	    }
+	}
+      else
+	continue;
+
+      __libc_sigaction (sig, &sa, 0);
+    }
+
+#ifdef _POSIX_PRIORITY_SCHEDULING
+  /* Set the scheduling algorithm and parameters.  */
+  if ((attr->__flags & (POSIX_SPAWN_SETSCHEDPARAM | POSIX_SPAWN_SETSCHEDULER))
+      == POSIX_SPAWN_SETSCHEDPARAM)
+    {
+      if ((ret = __sched_setparam (0, &attr->__sp)) == -1)
+	goto fail;
+    }
+  else if ((attr->__flags & POSIX_SPAWN_SETSCHEDULER) != 0)
+    {
+      if ((ret = __sched_setscheduler (0, attr->__policy, &attr->__sp)) == -1)
+	goto fail;
+    }
+#endif
+
+  /* Set the process group ID.  */
+  if ((attr->__flags & POSIX_SPAWN_SETPGROUP) != 0
+      && (ret = __setpgid (0, attr->__pgrp)) != 0)
+    goto fail;
+
+  /* Set the effective user and group IDs.  */
+  if ((attr->__flags & POSIX_SPAWN_RESETIDS) != 0
+      && ((ret = local_seteuid (__getuid ())) != 0
+	  || (ret = local_setegid (__getgid ())) != 0))
+    goto fail;
+
+  /* Execute the file actions.  */
+  if (file_actions != 0)
+    {
+      int cnt;
+      struct rlimit64 fdlimit;
+      bool have_fdlimit = false;
+
+      for (cnt = 0; cnt < file_actions->__used; ++cnt)
+	{
+	  struct __spawn_action *action = &file_actions->__actions[cnt];
+
+	  /* Dup the pipe fd onto an unoccupied one to avoid any file
+	     operation to clobber it.  */
+	  if ((action->action.close_action.fd == p)
+	      || (action->action.open_action.fd == p)
+	      || (action->action.dup2_action.fd == p))
+	    {
+	      if ((ret = __dup (p)) < 0)
+		goto fail;
+	      p = ret;
+	    }
+
+	  switch (action->tag)
+	    {
+	    case spawn_do_close:
+	      if ((ret =
+		   close_not_cancel (action->action.close_action.fd)) != 0)
+		{
+		  if (!have_fdlimit)
+		    {
+		      __getrlimit64 (RLIMIT_NOFILE, &fdlimit);
+		      have_fdlimit = true;
+		    }
+
+		  /* Signal errors only for file descriptors out of range.  */
+		  if (action->action.close_action.fd < 0
+		      || action->action.close_action.fd >= fdlimit.rlim_cur)
+		    goto fail;
+		}
+	      break;
+
+	    case spawn_do_open:
+	      {
+		ret = open_not_cancel (action->action.open_action.path,
+				       action->action.
+				       open_action.oflag | O_LARGEFILE,
+				       action->action.open_action.mode);
+
+		if (ret == -1)
+		  goto fail;
+
+		int new_fd = ret;
+
+		/* Make sure the desired file descriptor is used.  */
+		if (ret != action->action.open_action.fd)
+		  {
+		    if ((ret = __dup2 (new_fd, action->action.open_action.fd))
+			!= action->action.open_action.fd)
+		      goto fail;
+
+		    if ((ret = close_not_cancel (new_fd)) != 0)
+		      goto fail;
+		  }
+	      }
+	      break;
+
+	    case spawn_do_dup2:
+	      if ((ret = __dup2 (action->action.dup2_action.fd,
+				 action->action.dup2_action.newfd))
+		  != action->action.dup2_action.newfd)
+		goto fail;
+	      break;
+	    }
+	}
+    }
+
+  /* Set the initial signal mask of the child if POSIX_SPAWN_SETSIGMASK
+     is set, otherwise restore the previous one.  */
+  __sigprocmask (SIG_SETMASK, (attr->__flags & POSIX_SPAWN_SETSIGMASK)
+		 ? &attr->__ss : &args->oldmask, 0);
+
+  args->exec (args->file, args->argv, args->envp);
+
+  /* This is compatibility function required to enable posix_spawn run
+     script without shebang definition for older posix_spawn versions
+     (2.15).  */
+  maybe_script_execute (args);
+
+  ret = -errno;
+
+fail:
+  /* Since sizeof errno < PIPE_BUF, the write is atomic. */
+  ret = -ret;
+  if (ret)
+    while (write_not_cancel (p, &ret, sizeof ret) < 0)
+      continue;
+  exit (SPAWN_ERROR);
+}
+
+/* Spawn a new process executing PATH with the attributes describes in *ATTRP.
+   Before running the process perform the actions described in FILE-ACTIONS. */
+static int
+__spawnix (pid_t * pid, const char *file,
+	   const posix_spawn_file_actions_t * file_actions,
+	   const posix_spawnattr_t * attrp, char *const argv[],
+	   char *const envp[], int xflags,
+	   int (*exec) (const char *, char *const *, char *const *))
+{
+  pid_t new_pid;
+  struct posix_spawn_args args;
+  int ec;
+
+  if (__do_pipe2_cloexec (args.pipe))
+    return errno;
+
+  /* Allocates a child stack with sufficient size to handle a large argument
+     set in both the 'maybe_script_execute' compatibility code and on the
+     execvpe (posix_spawnp) where it might allocate stack size when path
+     does not contain a slash.  */
+  const int prot = (PROT_READ | PROT_WRITE
+		    | ((GL (dl_stack_flags) & PF_X) ? PROT_EXEC : 0));
+
+  const int stack_size = ARCH_STACK_DEFAULT_SIZE;
+  void *stack = __mmap (NULL, stack_size, prot,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
+  if (__glibc_unlikely (stack == MAP_FAILED))
+    {
+      close_not_cancel (args.pipe[0]);
+      close_not_cancel (args.pipe[1]);
+      return errno;
+    }
+
+  /* Disable asynchronous cancellation.  */
+  int cs = LIBC_CANCEL_ASYNC ();
+
+  args.file = file;
+  args.exec = exec;
+  args.fa = file_actions;
+  args.attr = attrp ? attrp : &(const posix_spawnattr_t) { 0 };
+  args.argv = argv;
+  args.envp = envp;
+  args.xflags = xflags;
+
+  __sigprocmask (SIG_BLOCK, &SIGALL_SET, &args.oldmask);
+
+  new_pid = CLONE (__spawni_child, STACK (stack, stack_size),
+		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
+
+  close_not_cancel (args.pipe[1]);
+
+  if (new_pid > 0)
+    {
+      if (__read (args.pipe[0], &ec, sizeof ec) != sizeof ec)
+	ec = 0;
+      else
+	__waitpid (new_pid, NULL, 0);
+    }
+  else
+    ec = -new_pid;
+
+  __munmap (stack, stack_size);
+
+  close_not_cancel (args.pipe[0]);
+
+  if (!ec && new_pid)
+    *pid = new_pid;
+
+  __sigprocmask (SIG_SETMASK, &args.oldmask, 0);
+
+  LIBC_CANCEL_RESET (cs);
+
+  return ec;
+}
+
+/* Spawn a new process executing PATH with the attributes describes in *ATTRP.
+   Before running the process perform the actions described in FILE-ACTIONS. */
+int
+__spawni (pid_t * pid, const char *file,
+	  const posix_spawn_file_actions_t * acts,
+	  const posix_spawnattr_t * attrp, char *const argv[],
+	  char *const envp[], int xflags)
+{
+  if (xflags & SPAWN_XFLAGS_USE_PATH)
+    return __spawnix (pid, file, acts, attrp, argv, envp, xflags, __execvpe);
+  return __spawnix (pid, file, acts, attrp, argv, envp, xflags, __execve);
+}
diff --git a/sysdeps/unix/sysv/linux/tile/clone.S b/sysdeps/unix/sysv/linux/tile/clone.S
index 02fe3a8..a300eb5 100644
--- a/sysdeps/unix/sysv/linux/tile/clone.S
+++ b/sysdeps/unix/sysv/linux/tile/clone.S
@@ -219,4 +219,5 @@ ENTRY (__clone)
 	}
 PSEUDO_END (__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
diff --git a/sysdeps/unix/sysv/linux/x86_64/clone.S b/sysdeps/unix/sysv/linux/x86_64/clone.S
index 382568a..1a50957 100644
--- a/sysdeps/unix/sysv/linux/x86_64/clone.S
+++ b/sysdeps/unix/sysv/linux/x86_64/clone.S
@@ -115,4 +115,5 @@ L(thread_start):
 	cfi_startproc;
 PSEUDO_END (__clone)
 
+libc_hidden_def (__clone)
 weak_alias (__clone, clone)
-- 
1.9.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4 2/3] posix: execvpe cleanup
  2016-02-01 16:21 [PATCH 0/3] posix: Execute file function fixes Adhemerval Zanella
  2016-02-01 16:21 ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Adhemerval Zanella
@ 2016-02-01 16:21 ` Adhemerval Zanella
  2016-02-01 16:21 ` [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p} Adhemerval Zanella
  2 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-01 16:21 UTC (permalink / raw)
  To: libc-alpha

This patch removes all the dynamic allocation on execvpe code and
instead use direct stack allocation.  This is QoI approach to make
it possible use in scenarios where memory is shared with parent
(vfork or clone with CLONE_VM).

For default process spawn (script file without a shebang), stack
allocation is bounded by NAME_MAX plus PATH_MAX plus 1.  Large
file arguments returns an error (ENAMETOOLONG).  This differs than
current GLIBC pratice in general, but it used to limit stack
allocation for large inputs.  Also, path in PATH environment variable
larger than PATH_MAX are ignored.

The shell direct execution exeception, where execve returns ENOEXEC,
might requires a large stack allocation due large input argument list.

Tested on i686, x86_64, powerpc64le, and aarch64.

Changes from previous version:

- Remove arbitrary limit on stack allocation for argument handling
  (it is arbitrary and does no impose any limit since it does not 
  consider current stack size neither stack size limit).

	* posix/execvpe.c (__execvpe): Remove dynamic allocation.
	* posix/Makefile (tests): Add tst-execvpe{1,2,3,4,5,6}.
	* posix/tst-execvp1.c (do_test): Use a macro to call execvp.
	* posix/tst-execvp2.c (do_test): Likewise.
	* posix/tst-execvp3.c (do_test): Likewise.
	* posix/tst-execvp4.c (do_test): Likewise.
	* posix/tst-execvpe1.c: New file.
	* posix/tst-execvpe2.c: Likewise.
	* posix/tst-execvpe3.c: Likewise.
	* posix/tst-execvpe4.c: Likewise.
	* posix/tst-execvpe5.c: Likewise.
	* posix/tst-execvpe6.c: Likewise.
---
 posix/Makefile       |   3 +
 posix/execvpe.c      | 238 +++++++++++++++++++++------------------------------
 posix/tst-execvp1.c  |   6 +-
 posix/tst-execvp2.c  |   5 +-
 posix/tst-execvp3.c  |   5 +-
 posix/tst-execvp4.c  |   6 +-
 posix/tst-execvpe1.c |  20 +++++
 posix/tst-execvpe2.c |  20 +++++
 posix/tst-execvpe3.c |  20 +++++
 posix/tst-execvpe4.c |  20 +++++
 posix/tst-execvpe5.c | 130 ++++++++++++++++++++++++++++
 posix/tst-execvpe6.c |  82 ++++++++++++++++++
 13 files changed, 422 insertions(+), 146 deletions(-)
 create mode 100644 posix/tst-execvpe1.c
 create mode 100644 posix/tst-execvpe2.c
 create mode 100644 posix/tst-execvpe3.c
 create mode 100644 posix/tst-execvpe4.c
 create mode 100644 posix/tst-execvpe5.c
 create mode 100644 posix/tst-execvpe6.c

diff --git a/posix/Makefile b/posix/Makefile
index f94e023..e251363 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -82,6 +82,8 @@ tests		:= tstgetopt testfnm runtests runptests	     \
 		   tst-execv1 tst-execv2 tst-execl1 tst-execl2 \
 		   tst-execve1 tst-execve2 tst-execle1 tst-execle2 \
 		   tst-execvp3 tst-execvp4 tst-rfc3484 tst-rfc3484-2 \
+		   tst-execvpe1 tst-execvpe2 tst-execvpe3 tst-execvpe4 \
+		   tst-execvpe5 tst-execvpe6 \
 		   tst-rfc3484-3 \
 		   tst-getaddrinfo3 tst-fnmatch2 tst-cpucount tst-cpuset \
 		   bug-getopt1 bug-getopt2 bug-getopt3 bug-getopt4 \
@@ -228,6 +230,7 @@ tstgetopt-ARGS = -a -b -cfoobar --required foobar --optional=bazbug \
 
 tst-exec-ARGS = -- $(host-test-program-cmd)
 tst-exec-static-ARGS = $(tst-exec-ARGS)
+tst-execvpe5-ARGS = -- $(host-test-program-cmd)
 tst-spawn-ARGS = -- $(host-test-program-cmd)
 tst-spawn-static-ARGS = $(tst-spawn-ARGS)
 tst-dir-ARGS = `pwd` `cd $(common-objdir)/$(subdir); pwd` `cd $(common-objdir); pwd` $(objpfx)tst-dir
diff --git a/posix/execvpe.c b/posix/execvpe.c
index 61697a7..ff11b28 100644
--- a/posix/execvpe.c
+++ b/posix/execvpe.c
@@ -15,7 +15,6 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include <alloca.h>
 #include <unistd.h>
 #include <stdarg.h>
 #include <stdbool.h>
@@ -23,15 +22,28 @@
 #include <string.h>
 #include <errno.h>
 #include <paths.h>
+#include <confstr.h>
+#include <sys/param.h>
 
+#ifndef PATH_MAX
+# ifdef MAXPATHLEN
+#  define PATH_MAX MAXPATHLEN
+# else
+#  define PATH_MAX 1024
+# endif
+#endif
 
 /* The file is accessible but it is not an executable file.  Invoke
    the shell to interpret it as a script.  */
 static void
-internal_function
-scripts_argv (const char *file, char *const argv[], int argc, char **new_argv)
+maybe_script_execute (const char *file, char *const argv[], char *const envp[])
 {
+  int argc = 0;
+  while (argv[argc++] != NULL)
+    continue;
+
   /* Construct an argument list for the shell.  */
+  char *new_argv[argc];
   new_argv[0] = (char *) _PATH_BSHELL;
   new_argv[1] = (char *) file;
   while (argc > 1)
@@ -39,6 +51,9 @@ scripts_argv (const char *file, char *const argv[], int argc, char **new_argv)
       new_argv[argc] = argv[argc - 1];
       --argc;
     }
+
+  /* Execute the shell.  */
+  __execve (new_argv[0], new_argv, envp);
 }
 
 
@@ -47,170 +62,109 @@ scripts_argv (const char *file, char *const argv[], int argc, char **new_argv)
 int
 __execvpe (const char *file, char *const argv[], char *const envp[])
 {
+  /* We check the simple case first. */
   if (*file == '\0')
     {
-      /* We check the simple case first. */
       __set_errno (ENOENT);
       return -1;
     }
 
+  /* Don't search when it contains a slash.  */
   if (strchr (file, '/') != NULL)
     {
-      /* Don't search when it contains a slash.  */
       __execve (file, argv, envp);
 
       if (errno == ENOEXEC)
-	{
-	  /* Count the arguments.  */
-	  int argc = 0;
-	  while (argv[argc++])
-	    ;
-	  size_t len = (argc + 1) * sizeof (char *);
-	  char **script_argv;
-	  void *ptr = NULL;
-	  if (__libc_use_alloca (len))
-	    script_argv = alloca (len);
-	  else
-	    script_argv = ptr = malloc (len);
-
-	  if (script_argv != NULL)
-	    {
-	      scripts_argv (file, argv, argc, script_argv);
-	      __execve (script_argv[0], script_argv, envp);
-
-	      free (ptr);
-	    }
-	}
+        maybe_script_execute (file, argv, envp);
+
+      return -1;
     }
-  else
+
+  const char *path = getenv ("PATH");
+  if (!path)
+    path = CS_PATH;
+  /* Although GLIBC does not enforce NAME_MAX, we set it as the maximum
+     size to avoid unbounded stack allocation.  Same applies for
+     PATH_MAX.  */
+  size_t file_len = __strnlen (file, NAME_MAX + 1);
+  size_t path_len = __strnlen (path, PATH_MAX - 1) + 1;
+
+  if ((file_len > NAME_MAX)
+      || !__libc_alloca_cutoff (path_len + file_len + 1))
     {
-      size_t pathlen;
-      size_t alloclen = 0;
-      char *path = getenv ("PATH");
-      if (path == NULL)
-	{
-	  pathlen = confstr (_CS_PATH, (char *) NULL, 0);
-	  alloclen = pathlen + 1;
-	}
-      else
-	pathlen = strlen (path);
+      errno = ENAMETOOLONG;
+      return -1;
+    }
 
-      size_t len = strlen (file) + 1;
-      alloclen += pathlen + len + 1;
+  const char *subp;
+  bool got_eacces = false;
+  for (const char *p = path; ; p = subp)
+    {
+      char buffer[path_len + file_len + 1];
 
-      char *name;
-      char *path_malloc = NULL;
-      if (__libc_use_alloca (alloclen))
-	name = alloca (alloclen);
-      else
-	{
-	  path_malloc = name = malloc (alloclen);
-	  if (name == NULL)
-	    return -1;
-	}
+      subp = __strchrnul (p, ':');
 
-      if (path == NULL)
+      /* PATH is larger than PATH_MAX and thus potentially larger than
+	 the stack allocation.  */
+      if (subp - p >= path_len)
 	{
-	  /* There is no `PATH' in the environment.
-	     The default search path is the current directory
-	     followed by the path `confstr' returns for `_CS_PATH'.  */
-	  path = name + pathlen + len + 1;
-	  path[0] = ':';
-	  (void) confstr (_CS_PATH, path + 1, pathlen);
+          /* If there is only one path, bail out.  */
+	  if (!*subp)
+	    break;
+	  /* Otherwise skip to next one.  */
+	  continue;
 	}
 
-      /* Copy the file name at the top.  */
-      name = (char *) memcpy (name + pathlen + 1, file, len);
-      /* And add the slash.  */
-      *--name = '/';
+      /* Set current path considered plus a '/'.  */
+      memcpy (buffer, p, subp - p);
+      buffer[subp - p] = '/';
+      /* And the file to execute.  */
+      memcpy (buffer + (subp - p) + (subp > p), file, file_len + 1);
+
+      __execve (buffer, argv, envp);
+
+      if (errno == ENOEXEC)
+        maybe_script_execute (buffer, argv, envp);
 
-      char **script_argv = NULL;
-      void *script_argv_malloc = NULL;
-      bool got_eacces = false;
-      char *p = path;
-      do
+      switch (errno)
 	{
-	  char *startp;
-
-	  path = p;
-	  p = __strchrnul (path, ':');
-
-	  if (p == path)
-	    /* Two adjacent colons, or a colon at the beginning or the end
-	       of `PATH' means to search the current directory.  */
-	    startp = name + 1;
-	  else
-	    startp = (char *) memcpy (name - (p - path), path, p - path);
-
-	  /* Try to execute this name.  If it works, execve will not return. */
-	  __execve (startp, argv, envp);
-
-	  if (errno == ENOEXEC)
-	    {
-	      if (script_argv == NULL)
-		{
-		  /* Count the arguments.  */
-		  int argc = 0;
-		  while (argv[argc++])
-		    ;
-		  size_t arglen = (argc + 1) * sizeof (char *);
-		  if (__libc_use_alloca (alloclen + arglen))
-		    script_argv = alloca (arglen);
-		  else
-		    script_argv = script_argv_malloc = malloc (arglen);
-		  if (script_argv == NULL)
-		    {
-		      /* A possible EACCES error is not as important as
-			 the ENOMEM.  */
-		      got_eacces = false;
-		      break;
-		    }
-		  scripts_argv (startp, argv, argc, script_argv);
-		}
-
-	      __execve (script_argv[0], script_argv, envp);
-	    }
-
-	  switch (errno)
-	    {
-	    case EACCES:
-	      /* Record the we got a `Permission denied' error.  If we end
-		 up finding no executable we can use, we want to diagnose
-		 that we did find one but were denied access.  */
-	      got_eacces = true;
-	    case ENOENT:
-	    case ESTALE:
-	    case ENOTDIR:
-	      /* Those errors indicate the file is missing or not executable
-		 by us, in which case we want to just try the next path
-		 directory.  */
-	    case ENODEV:
-	    case ETIMEDOUT:
-	      /* Some strange filesystems like AFS return even
-		 stranger error numbers.  They cannot reasonably mean
-		 anything else so ignore those, too.  */
-	      break;
-
-	    default:
-	      /* Some other error means we found an executable file, but
-		 something went wrong executing it; return the error to our
-		 caller.  */
-	      return -1;
-	    }
+	  case EACCES:
+	  /* Record the we got a 'Permission denied' error.  If we end
+             up finding no executable we can use, we want to diagnose
+             that we did find one but were denied access.  */
+	    got_eacces = true;
+	  case ENOENT:
+	  case ESTALE:
+	  case ENOTDIR:
+	  /* Those errors indicate the file is missing or not executable
+	     by us, in which case we want to just try the next path
+	     directory.  */
+	  case ENODEV:
+	  case ETIMEDOUT:
+          /* Some strange filesystems like AFS return even
+             stranger error numbers.  They cannot reasonably mean
+             anything else so ignore those, too.  */
+	    break;
+
+          default:
+	  /* Some other error means we found an executable file, but
+	     something went wrong executing it; return the error to our
+	     caller.  */
+	    return -1;
 	}
-      while (*p++ != '\0');
-
-      /* We tried every element and none of them worked.  */
-      if (got_eacces)
-	/* At least one failure was due to permissions, so report that
-	   error.  */
-	__set_errno (EACCES);
 
-      free (script_argv_malloc);
-      free (path_malloc);
+      if (*subp++ == '\0')
+	break;
     }
 
-  /* Return the error from the last attempt (probably ENOENT).  */
+  /* We tried every element and none of them worked.  */
+  if (got_eacces)
+    /* At least one failure was due to permissions, so report that
+       error.  */
+    __set_errno (EACCES);
+
   return -1;
+
 }
+
 weak_alias (__execvpe, execvpe)
diff --git a/posix/tst-execvp1.c b/posix/tst-execvp1.c
index ecc673d..fe98ce5 100644
--- a/posix/tst-execvp1.c
+++ b/posix/tst-execvp1.c
@@ -3,6 +3,10 @@
 #include <stdlib.h>
 #include <unistd.h>
 
+#ifndef EXECVP
+# define EXECVP(__file, __argv) execvp (__file, __argv)
+#endif
+
 static int
 do_test (void)
 {
@@ -19,7 +23,7 @@ do_test (void)
 
   char *argv[] = { (char *) "does-not-exist", NULL };
   errno = 0;
-  execvp (argv[0], argv);
+  EXECVP (argv[0], argv);
 
   if (errno != ENOENT)
     {
diff --git a/posix/tst-execvp2.c b/posix/tst-execvp2.c
index 7e0f5d8..9be8733 100644
--- a/posix/tst-execvp2.c
+++ b/posix/tst-execvp2.c
@@ -14,6 +14,9 @@ static int do_test (void);
 #define TEST_FUNCTION do_test ()
 #include "../test-skeleton.c"
 
+#ifndef EXECVP
+# define EXECVP(__file, __argv)  execvp (__file, __argv)
+#endif
 
 static char *copy;
 
@@ -70,7 +73,7 @@ do_test (void)
 
   char *argv[] = { basename (copy), NULL };
   errno = 0;
-  execvp (argv[0], argv);
+  EXECVP (argv[0], argv);
 
   if (errno != EACCES)
     {
diff --git a/posix/tst-execvp3.c b/posix/tst-execvp3.c
index 5ebc879..43f7c34 100644
--- a/posix/tst-execvp3.c
+++ b/posix/tst-execvp3.c
@@ -12,6 +12,9 @@ static int do_test (void);
 
 #include "../test-skeleton.c"
 
+#ifndef EXECVP
+# define EXECVP(__file, __argv)  execvp (__file, __argv)
+#endif
 
 static char *fname;
 
@@ -35,7 +38,7 @@ do_test (void)
     }
 
   char *argv[] = { fname, NULL };
-  execvp (basename (fname), argv);
+  EXECVP (basename (fname), argv);
 
   /* If we come here, the execvp call failed.  */
   return 1;
diff --git a/posix/tst-execvp4.c b/posix/tst-execvp4.c
index 531fab2..116624f 100644
--- a/posix/tst-execvp4.c
+++ b/posix/tst-execvp4.c
@@ -5,6 +5,10 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
+#ifndef EXECVP
+# define EXECVP(__file, __argv)  execvp (__file, __argv)
+#endif
+
 static int
 do_test (void)
 {
@@ -27,7 +31,7 @@ do_test (void)
 
   unsetenv ("PATH");
   char *argv[] = { buf + 9, NULL };
-  execvp (argv[0], argv);
+  EXECVP (argv[0], argv);
   return 0;
 }
 
diff --git a/posix/tst-execvpe1.c b/posix/tst-execvpe1.c
new file mode 100644
index 0000000..7b386c8
--- /dev/null
+++ b/posix/tst-execvpe1.c
@@ -0,0 +1,20 @@
+/* Check ENOENT failure for execvpe.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define EXECVP(__file, __argv) execvpe (__file, __argv, NULL)
+#include <posix/tst-execvp1.c>
diff --git a/posix/tst-execvpe2.c b/posix/tst-execvpe2.c
new file mode 100644
index 0000000..3c99fb1
--- /dev/null
+++ b/posix/tst-execvpe2.c
@@ -0,0 +1,20 @@
+/* Check EACCES for execvpe.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define EXECVP(__file, __argv) execvpe (__file, __argv, NULL)
+#include <posix/tst-execvp2.c>
diff --git a/posix/tst-execvpe3.c b/posix/tst-execvpe3.c
new file mode 100644
index 0000000..8380fd3
--- /dev/null
+++ b/posix/tst-execvpe3.c
@@ -0,0 +1,20 @@
+/* Check script execution without shebang for execvpe.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define EXECVP(__file, __argv) execvpe (__file, __argv, NULL)
+#include <posix/tst-execvp3.c>
diff --git a/posix/tst-execvpe4.c b/posix/tst-execvpe4.c
new file mode 100644
index 0000000..08fdaf0
--- /dev/null
+++ b/posix/tst-execvpe4.c
@@ -0,0 +1,20 @@
+/* Check unexistent binary for execvpe.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define EXECVP(__file, __argv) execvpe (__file, __argv, NULL)
+#include <posix/tst-execvp4.c>
diff --git a/posix/tst-execvpe5.c b/posix/tst-execvpe5.c
new file mode 100644
index 0000000..09f0fb0
--- /dev/null
+++ b/posix/tst-execvpe5.c
@@ -0,0 +1,130 @@
+/* General tests for execpve.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <wait.h>
+
+
+/* Nonzero if the program gets called via `exec'.  */
+static int restart;
+
+
+#define CMDLINE_OPTIONS \
+  { "restart", no_argument, &restart, 1 },
+
+/* Prototype for our test function.  */
+extern void do_prepare (int argc, char *argv[]);
+extern int do_test (int argc, char *argv[]);
+
+#include "../test-skeleton.c"
+
+#define EXECVPE_KEY    "EXECVPE_ENV"
+#define EXECVPE_VALUE  "execvpe_test"
+
+
+static int
+handle_restart (void)
+{
+  /* First check if only one variable is passed on execvpe.  */
+  int env_count = 0;
+  for (char **e = environ; *e != NULL; ++e)
+    if (++env_count == INT_MAX)
+      error (EXIT_FAILURE, 0, "environment variable number overflow");
+  if (env_count != 1)
+    error (EXIT_FAILURE, 0, "wrong number of environment variables");
+
+  /* Check if the combinarion os "EXECVPE_ENV=execvpe_test"  */
+  const char *env = getenv (EXECVPE_KEY);
+  if (env == NULL)
+    error (EXIT_FAILURE, 0, "test environment variable not found");
+
+  if (strncmp (env, EXECVPE_VALUE, sizeof (EXECVPE_VALUE)))
+    error (EXIT_FAILURE, 0, "test environment variable with wrong value");
+
+  return 0;
+}
+
+
+int
+do_test (int argc, char *argv[])
+{
+  pid_t pid;
+  int status;
+
+  /* We must have
+     - one or four parameters left if called initially
+       + path for ld.so		optional
+       + "--library-path"	optional
+       + the library path	optional
+       + the application name
+  */
+
+  if (restart)
+    {
+      if (argc != 1)
+	error (EXIT_FAILURE, 0, "wrong number of arguments (%d)", argc);
+
+      return handle_restart ();
+    }
+
+  if (argc != 2 && argc != 5)
+    error (EXIT_FAILURE, 0, "wrong number of arguments (%d)", argc);
+
+  /* We want to test the `execvpe' function.  To do this we restart the
+     program with an additional parameter.  */
+  pid = fork ();
+  if (pid == 0)
+    {
+      /* This is the child.  Construct the command line.  */
+
+      /* We cast here to char* because the test itself does not modify neither
+	 the argument nor the environment list.  */
+      char *envs[] = { (char*)(EXECVPE_KEY "=" EXECVPE_VALUE), NULL };
+      if (argc == 5)
+	{
+	  char *args[] = { argv[1], argv[2], argv[3], argv[4],
+			   (char*)"--direct", (char*)"--restart", NULL };
+	  execvpe (args[0], args, envs);
+	}
+      else
+	{
+	  char *args[] = { argv[1], argv[1],
+			   (char*)"--direct", (char*)"--restart", NULL };
+	  execvpe (args[0], args, envs);
+	}
+
+      error (EXIT_FAILURE, errno, "cannot exec");
+    }
+  else if (pid == (pid_t) -1)
+    error (EXIT_FAILURE, errno, "cannot fork");
+
+  /* Wait for the child.  */
+  if (waitpid (pid, &status, 0) != pid)
+    error (EXIT_FAILURE, errno, "wrong child");
+
+  if (WTERMSIG (status) != 0)
+    error (EXIT_FAILURE, 0, "Child terminated incorrectly");
+  status = WEXITSTATUS (status);
+
+  return status;
+}
diff --git a/posix/tst-execvpe6.c b/posix/tst-execvpe6.c
new file mode 100644
index 0000000..e099a2f
--- /dev/null
+++ b/posix/tst-execvpe6.c
@@ -0,0 +1,82 @@
+/* Check execvpe script argument handling.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include <sys/param.h>
+
+static char *fname;
+
+static void do_prepare (void);
+#define PREPARE(argc, argv) do_prepare ()
+static int do_test (void);
+#define TEST_FUNCTION do_test ()
+
+#include "../test-skeleton.c"
+
+static void
+do_prepare (void)
+{
+  int fd = create_temp_file ("testscript", &fname);
+  dprintf (fd, "echo foo\n");
+  fchmod (fd, 0700);
+  close (fd);
+}
+
+static int
+do_test (void)
+{
+  if  (setenv ("PATH", test_dir, 1) != 0)
+    {
+      puts ("setenv failed");
+      return 1;
+    }
+
+  /* To limit stack allocation for argument construction in case of
+     script without shebang execvpe limits total number of argument
+     to NCARGS.  */
+  const size_t max_args = NCARGS + 1;
+  char **args = malloc ((NCARGS + 1) * sizeof (char*));
+  if (args == NULL)
+    {
+      puts ("malloc failed");
+      return 1;
+    }
+
+  args[0] = fname;
+  for (int i=1; i < max_args; ++i)
+    args[i] = strdup ("a");
+  args[max_args] = NULL;
+
+  execvpe (basename (fname), args, NULL);
+
+  for (int i=1; i < max_args; ++i)
+    free (args[i]);
+  free (args);
+
+  if (errno != E2BIG)
+    {
+      printf ("errno = %d (%m), expected E2BIG\n", errno);
+      return 1;
+    }
+
+  return 0;
+}
-- 
1.9.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/3] posix: Execute file function fixes
@ 2016-02-01 16:21 Adhemerval Zanella
  2016-02-01 16:21 ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Adhemerval Zanella
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-01 16:21 UTC (permalink / raw)
  To: libc-alpha

Hi,

This patchset add some cleanup and fixes for the exec{l,le,lp,vpe}
general function implementation and fixes long standing bugs for
posix_spawn{p} on Linux.  It is basically my previous 2 patchset
for execvpe and posix_spawn{p} along with the execl{e,p} fixes.

For exe{l,le,lp,vpe} function main difference is using stack allocation
instead of dynamic one for argument handling.  The main difference from
previous patch iteration is it does not add any memory stack allocation
constraints due:

1. Current GLIBC logic to limit stack allocation through __MAX_ALLOCA_CUTOFF
   is arbitrary and does no impose any limit (it does not consider current
   stack size neither stack size limit).

2. Memory allocation constraints associated with the functions make
   stack allocation the only sane option.  All exec function family are
   defined to be async-safe, where they can be called either through a
   signal handler or in vfork child, and they can not really on dynamic
   memory allocation (either through malloc or directly by mmap).

The posix_spawn{p} is a new implementation for Linux which aims to
fix some long-standing bug regarding signal handling.  It also
tries to avoid dynamic memory allocation by either relying on the
exec{l,vpe} functions with a dynamic mmap memory region allocate
to use along with direct created child (using clone syscall).


Adhemerval Zanella (3):
  posix: Remove dynamic memory allocation from execl{e,p}
  posix: execvpe cleanup
  posix: New Linux posix_spawn{p} implementation

 include/sched.h                                   |   2 +
 include/unistd.h                                  |   1 +
 posix/Makefile                                    |   5 +-
 posix/execl.c                                     |  65 +---
 posix/execle.c                                    |  66 +---
 posix/execlp.c                                    |  64 +---
 posix/execvpe.c                                   | 238 +++++-------
 posix/tst-execvp1.c                               |   6 +-
 posix/tst-execvp2.c                               |   5 +-
 posix/tst-execvp3.c                               |   5 +-
 posix/tst-execvp4.c                               |   6 +-
 posix/tst-execvpe1.c                              |  20 +
 posix/tst-execvpe2.c                              |  20 +
 posix/tst-execvpe3.c                              |  20 +
 posix/tst-execvpe4.c                              |  20 +
 posix/tst-execvpe5.c                              | 130 +++++++
 posix/tst-execvpe6.c                              |  82 +++++
 posix/tst-spawn2.c                                |  73 ++++
 sysdeps/posix/dup.c                               |   2 +-
 sysdeps/unix/sysv/linux/aarch64/clone.S           |   1 +
 sysdeps/unix/sysv/linux/alpha/clone.S             |   1 +
 sysdeps/unix/sysv/linux/arm/clone.S               |   1 +
 sysdeps/unix/sysv/linux/hppa/clone.S              |   1 +
 sysdeps/unix/sysv/linux/i386/clone.S              |   1 +
 sysdeps/unix/sysv/linux/ia64/clone2.S             |   2 +
 sysdeps/unix/sysv/linux/m68k/clone.S              |   1 +
 sysdeps/unix/sysv/linux/microblaze/clone.S        |   1 +
 sysdeps/unix/sysv/linux/mips/clone.S              |   1 +
 sysdeps/unix/sysv/linux/nios2/clone.S             |   1 +
 sysdeps/unix/sysv/linux/nptl-signals.h            |  10 +
 sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S |   1 +
 sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S |   1 +
 sysdeps/unix/sysv/linux/s390/s390-32/clone.S      |   2 +
 sysdeps/unix/sysv/linux/s390/s390-64/clone.S      |   2 +
 sysdeps/unix/sysv/linux/sh/clone.S                |   1 +
 sysdeps/unix/sysv/linux/sparc/sparc32/clone.S     |   1 +
 sysdeps/unix/sysv/linux/sparc/sparc64/clone.S     |   1 +
 sysdeps/unix/sysv/linux/spawni.c                  | 424 ++++++++++++++++++++++
 sysdeps/unix/sysv/linux/tile/clone.S              |   1 +
 sysdeps/unix/sysv/linux/x86_64/clone.S            |   1 +
 41 files changed, 1055 insertions(+), 285 deletions(-)
 create mode 100644 posix/tst-execvpe1.c
 create mode 100644 posix/tst-execvpe2.c
 create mode 100644 posix/tst-execvpe3.c
 create mode 100644 posix/tst-execvpe4.c
 create mode 100644 posix/tst-execvpe5.c
 create mode 100644 posix/tst-execvpe6.c
 create mode 100644 posix/tst-spawn2.c
 create mode 100644 sysdeps/unix/sysv/linux/spawni.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-01 16:21 [PATCH 0/3] posix: Execute file function fixes Adhemerval Zanella
  2016-02-01 16:21 ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Adhemerval Zanella
  2016-02-01 16:21 ` [PATCH v4 2/3] posix: execvpe cleanup Adhemerval Zanella
@ 2016-02-01 16:21 ` Adhemerval Zanella
  2016-02-01 16:52   ` Joseph Myers
  2 siblings, 1 reply; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-01 16:21 UTC (permalink / raw)
  To: libc-alpha

GLIBC execl{e,p} implementation might use malloc if the total number of i
arguments exceed initial assumption size (1024).  This might lead to
issue in two situations:

1. execl/execle is stated to be async-signal-safe by POSIX [1].  However
   if execl is used in a signal handler with a large argument set (that
   may call malloc internally) and the resulting call fails, it might
   lead malloc in the program in a bad state.

2. If the functions are used in a vfork/clone(VFORK) situation it also
   might issue malloc internal bad state.

This patch fixes it by using stack allocation instead.  It also fixes
BZ#19534.

Tested on x86_64.

Changes from previous version:

- Remove arbitrary limit on stack allocation for argument handling
  (it is arbitrary and does no impose any limit since it does not 
  consider current stack size neither stack size limit).

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html

	[BZ #19534]
	* posix/execl.c (execl): Remove dynamic memory allocation.
	* posix/execle.c (execle): Likewise.
	* posix/execlp.c (execlp): Likewise.
---
 posix/execl.c  | 65 +++++++++++++++++----------------------------------------
 posix/execle.c | 66 ++++++++++++++++++----------------------------------------
 posix/execlp.c | 64 +++++++++++++++++---------------------------------------
 4 files changed, 65 insertions(+), 137 deletions(-)

diff --git a/posix/execl.c b/posix/execl.c
index 102d19d..8b8a324 100644
--- a/posix/execl.c
+++ b/posix/execl.c
@@ -16,58 +16,31 @@
    <http://www.gnu.org/licenses/>.  */
 
 #include <unistd.h>
+#include <errno.h>
 #include <stdarg.h>
-#include <stddef.h>
-#include <stdlib.h>
-#include <string.h>
-
-#include <stackinfo.h>
-
+#include <sys/param.h>
 
 /* Execute PATH with all arguments after PATH until
    a NULL pointer and environment from `environ'.  */
 int
 execl (const char *path, const char *arg, ...)
 {
-#define INITIAL_ARGV_MAX 1024
-  size_t argv_max = INITIAL_ARGV_MAX;
-  const char *initial_argv[INITIAL_ARGV_MAX];
-  const char **argv = initial_argv;
-  va_list args;
-
-  argv[0] = arg;
-
-  va_start (args, arg);
-  unsigned int i = 0;
-  while (argv[i++] != NULL)
-    {
-      if (i == argv_max)
-	{
-	  argv_max *= 2;
-	  const char **nptr = realloc (argv == initial_argv ? NULL : argv,
-				       argv_max * sizeof (const char *));
-	  if (nptr == NULL)
-	    {
-	      if (argv != initial_argv)
-		free (argv);
-	      va_end (args);
-	      return -1;
-	    }
-	  if (argv == initial_argv)
-	    /* We have to copy the already filled-in data ourselves.  */
-	    memcpy (nptr, argv, i * sizeof (const char *));
-
-	  argv = nptr;
-	}
-
-      argv[i] = va_arg (args, const char *);
-    }
-  va_end (args);
-
-  int ret = __execve (path, (char *const *) argv, __environ);
-  if (argv != initial_argv)
-    free (argv);
-
-  return ret;
+  int argc;
+  va_list ap;
+  va_start (ap, arg);
+  for (argc = 1; va_arg (ap, const char *); argc++)
+    continue;
+  va_end (ap);
+
+  int i;
+  char *argv[argc+1];
+  va_start (ap, arg);
+  argv[0] = (char*) arg;
+  for (i = 1; i < argc; i++)
+     argv[i] = va_arg (ap, char *);
+  argv[i] = NULL;
+  va_end (ap);
+
+  return __execve (path, argv, __environ);
 }
 libc_hidden_def (execl)
diff --git a/posix/execle.c b/posix/execle.c
index 8edc03a..1a0c9ee 100644
--- a/posix/execle.c
+++ b/posix/execle.c
@@ -17,57 +17,31 @@
 
 #include <unistd.h>
 #include <stdarg.h>
-#include <stddef.h>
-#include <stdlib.h>
-#include <string.h>
-
-#include <stackinfo.h>
+#include <errno.h>
+#include <sys/param.h>
 
 /* Execute PATH with all arguments after PATH until a NULL pointer,
    and the argument after that for environment.  */
 int
 execle (const char *path, const char *arg, ...)
 {
-#define INITIAL_ARGV_MAX 1024
-  size_t argv_max = INITIAL_ARGV_MAX;
-  const char *initial_argv[INITIAL_ARGV_MAX];
-  const char **argv = initial_argv;
-  va_list args;
-  argv[0] = arg;
-
-  va_start (args, arg);
-  unsigned int i = 0;
-  while (argv[i++] != NULL)
-    {
-      if (i == argv_max)
-	{
-	  argv_max *= 2;
-	  const char **nptr = realloc (argv == initial_argv ? NULL : argv,
-				       argv_max * sizeof (const char *));
-	  if (nptr == NULL)
-	    {
-	      if (argv != initial_argv)
-		free (argv);
-	      va_end (args);
-	      return -1;
-	    }
-	  if (argv == initial_argv)
-	    /* We have to copy the already filled-in data ourselves.  */
-	    memcpy (nptr, argv, i * sizeof (const char *));
-
-	  argv = nptr;
-	}
-
-      argv[i] = va_arg (args, const char *);
-    }
-
-  const char *const *envp = va_arg (args, const char *const *);
-  va_end (args);
-
-  int ret = __execve (path, (char *const *) argv, (char *const *) envp);
-  if (argv != initial_argv)
-    free (argv);
-
-  return ret;
+  int argc;
+  va_list ap;
+  va_start (ap, arg);
+  for (argc = 1; va_arg (ap, const char *); argc++)
+    continue;
+  va_end (ap);
+
+  int i;
+  char *argv[argc+1];
+  char **envp;
+  va_start (ap, arg);
+  argv[0] = (char*) arg;
+  for (i = 1; i < argc; i++)
+     argv[i] = va_arg (ap, char *);
+  envp = va_arg (ap, char **);
+  va_end (ap);
+
+  return __execve (path, argv, envp);
 }
 libc_hidden_def (execle)
diff --git a/posix/execlp.c b/posix/execlp.c
index 6700994..a0e1859 100644
--- a/posix/execlp.c
+++ b/posix/execlp.c
@@ -17,11 +17,8 @@
 
 #include <unistd.h>
 #include <stdarg.h>
-#include <stddef.h>
-#include <stdlib.h>
-#include <string.h>
-
-#include <stackinfo.h>
+#include <errno.h>
+#include <sys/param.h>
 
 /* Execute FILE, searching in the `PATH' environment variable if
    it contains no slashes, with all arguments after FILE until a
@@ -29,45 +26,22 @@
 int
 execlp (const char *file, const char *arg, ...)
 {
-#define INITIAL_ARGV_MAX 1024
-  size_t argv_max = INITIAL_ARGV_MAX;
-  const char *initial_argv[INITIAL_ARGV_MAX];
-  const char **argv = initial_argv;
-  va_list args;
-
-  argv[0] = arg;
-
-  va_start (args, arg);
-  unsigned int i = 0;
-  while (argv[i++] != NULL)
-    {
-      if (i == argv_max)
-	{
-	  argv_max *= 2;
-	  const char **nptr = realloc (argv == initial_argv ? NULL : argv,
-				       argv_max * sizeof (const char *));
-	  if (nptr == NULL)
-	    {
-	      if (argv != initial_argv)
-		free (argv);
-	      va_end (args);
-	      return -1;
-	    }
-	  if (argv == initial_argv)
-	    /* We have to copy the already filled-in data ourselves.  */
-	    memcpy (nptr, argv, i * sizeof (const char *));
-
-	  argv = nptr;
-	}
-
-      argv[i] = va_arg (args, const char *);
-    }
-  va_end (args);
-
-  int ret = execvp (file, (char *const *) argv);
-  if (argv != initial_argv)
-    free (argv);
-
-  return ret;
+  int argc;
+  va_list ap;
+  va_start (ap, arg);
+  for (argc = 1; va_arg (ap, const char *); argc++)
+    continue;
+  va_end (ap);
+
+  int i;
+  char *argv[argc+1];
+  va_start (ap, arg);
+  argv[0] = (char*) arg;
+  for (i = 1; i < argc; i++)
+     argv[i] = va_arg (ap, char *);
+  argv[i] = NULL;
+  va_end (ap);
+
+  return __execvpe (file, argv, __environ);
 }
 libc_hidden_def (execlp)
-- 
1.9.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-01 16:21 ` [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p} Adhemerval Zanella
@ 2016-02-01 16:52   ` Joseph Myers
  2016-02-01 17:18     ` Adhemerval Zanella
  2016-02-02 16:33     ` Rich Felker
  0 siblings, 2 replies; 45+ messages in thread
From: Joseph Myers @ 2016-02-01 16:52 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

On Mon, 1 Feb 2016, Adhemerval Zanella wrote:

> +  char *argv[argc+1];
> +  va_start (ap, arg);
> +  argv[0] = (char*) arg;
> +  for (i = 1; i < argc; i++)
> +     argv[i] = va_arg (ap, char *);
> +  argv[i] = NULL;

I don't see how you're ensuring this stack allocation is safe (i.e. if 
it's too big, it doesn't corrupt memory that's in use by other threads).  
Can't it jump beyond any guard page and start overwriting other memory, 
possibly in use by another thread, before reaching unmapped memory?  I'd 
expect safe large stack allocations (as with -fstack-check) to need to 
touch the pages in the right order (and doing that safely probably means 
using -fstack-check).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-01 16:52   ` Joseph Myers
@ 2016-02-01 17:18     ` Adhemerval Zanella
  2016-02-01 17:54       ` Joseph Myers
  2016-02-02 11:24       ` Florian Weimer
  2016-02-02 16:33     ` Rich Felker
  1 sibling, 2 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-01 17:18 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha



On 01-02-2016 14:52, Joseph Myers wrote:
> On Mon, 1 Feb 2016, Adhemerval Zanella wrote:
> 
>> +  char *argv[argc+1];
>> +  va_start (ap, arg);
>> +  argv[0] = (char*) arg;
>> +  for (i = 1; i < argc; i++)
>> +     argv[i] = va_arg (ap, char *);
>> +  argv[i] = NULL;
> 
> I don't see how you're ensuring this stack allocation is safe (i.e. if 
> it's too big, it doesn't corrupt memory that's in use by other threads).  
> Can't it jump beyond any guard page and start overwriting other memory, 
> possibly in use by another thread, before reaching unmapped memory?  I'd 
> expect safe large stack allocations (as with -fstack-check) to need to 
> touch the pages in the right order (and doing that safely probably means 
> using -fstack-check).
> 

Right, it is not ensuring the safeness. Is '-fstack-check' the suffice 
option to ensure it or do we need a more strict one?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-01 17:18     ` Adhemerval Zanella
@ 2016-02-01 17:54       ` Joseph Myers
  2016-02-01 18:09         ` Adhemerval Zanella
  2016-02-02 11:24       ` Florian Weimer
  1 sibling, 1 reply; 45+ messages in thread
From: Joseph Myers @ 2016-02-01 17:54 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

On Mon, 1 Feb 2016, Adhemerval Zanella wrote:

> Right, it is not ensuring the safeness. Is '-fstack-check' the suffice 
> option to ensure it or do we need a more strict one?

I think it's the right option, but don't know if it works reliably across 
supported architectures and GCC versions (or, at least, does not generate 
wrong code even if the checks aren't fully safe in all cases) - it's not 
very widely used.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-01 17:54       ` Joseph Myers
@ 2016-02-01 18:09         ` Adhemerval Zanella
  0 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-01 18:09 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha



On 01-02-2016 15:53, Joseph Myers wrote:
> On Mon, 1 Feb 2016, Adhemerval Zanella wrote:
> 
>> Right, it is not ensuring the safeness. Is '-fstack-check' the suffice 
>> option to ensure it or do we need a more strict one?
> 
> I think it's the right option, but don't know if it works reliably across 
> supported architectures and GCC versions (or, at least, does not generate 
> wrong code even if the checks aren't fully safe in all cases) - it's not 
> very widely used.
> 

I think we can use this option, since it supported on gcc (and I also do not
have any other better option in mind that fits in these function memory
constraints).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-01 17:18     ` Adhemerval Zanella
  2016-02-01 17:54       ` Joseph Myers
@ 2016-02-02 11:24       ` Florian Weimer
  2016-02-02 12:46         ` Szabolcs Nagy
  2016-02-02 12:47         ` Adhemerval Zanella
  1 sibling, 2 replies; 45+ messages in thread
From: Florian Weimer @ 2016-02-02 11:24 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Joseph Myers, libc-alpha

On 02/01/2016 06:18 PM, Adhemerval Zanella wrote:

> Right, it is not ensuring the safeness. Is '-fstack-check' the suffice 
> option to ensure it or do we need a more strict one?

In my tests, the initial stack banging probe is sometimes more than a
page away from the current stack pointer, so it does not look safe to me.

For posix_spawn, it's probably simpler for now to compute the shell
invocation in the parent.  That is, perform two vforks in case the first
execve fails.

Florian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-02 11:24       ` Florian Weimer
@ 2016-02-02 12:46         ` Szabolcs Nagy
  2016-02-02 12:47         ` Adhemerval Zanella
  1 sibling, 0 replies; 45+ messages in thread
From: Szabolcs Nagy @ 2016-02-02 12:46 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella; +Cc: Joseph Myers, libc-alpha, nd

On 02/02/16 11:24, Florian Weimer wrote:
> On 02/01/2016 06:18 PM, Adhemerval Zanella wrote:
> 
>> Right, it is not ensuring the safeness. Is '-fstack-check' the suffice 
>> option to ensure it or do we need a more strict one?
> 
> In my tests, the initial stack banging probe is sometimes more than a
> page away from the current stack pointer, so it does not look safe to me.

i think that can be fixed by

memset(argv, 0, sizeof argv);

even if it clobbers other threads it will hit the guard page eventually.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-02 11:24       ` Florian Weimer
  2016-02-02 12:46         ` Szabolcs Nagy
@ 2016-02-02 12:47         ` Adhemerval Zanella
  1 sibling, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-02 12:47 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Joseph Myers, libc-alpha



On 02-02-2016 09:24, Florian Weimer wrote:
> On 02/01/2016 06:18 PM, Adhemerval Zanella wrote:
> 
>> Right, it is not ensuring the safeness. Is '-fstack-check' the suffice 
>> option to ensure it or do we need a more strict one?
> 
> In my tests, the initial stack banging probe is sometimes more than a
> page away from the current stack pointer, so it does not look safe to me.

I am not aware of a better option, if any, to avoid stack overflow in case of
a exec function call with large arguments.  Also, check at least on x86_64
I do see trying to overflow the stack with new implementation does force a
segfault on the stack protector code (it tries to orq a memory beyond stack).

> 
> For posix_spawn, it's probably simpler for now to compute the shell
> invocation in the parent.  That is, perform two vforks in case the first
> execve fails.
> 
> Florian
> 

I do not think it would require to clone(VFORK) twice in parent: we can
calculate the total argument size prior any call and calculate the
total stack to mmap based on this plus a slack for possible local
variables (128 or 256 bytes or even higher). I will add some latency
in any posix_spawn{p} case.

Another option is to just create a guard page in the end of the allocated
stack page (as pthread_create does) to force a segfaults in case of
a stack allocation higher. However, as for execl using stack-check will
also force a segfault in case of stack overflow in this case.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-02-01 16:21 ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Adhemerval Zanella
@ 2016-02-02 13:06   ` Florian Weimer
  2016-02-02 13:31     ` Adhemerval Zanella
  2016-08-31 21:13   ` Rasmus Villemoes
  1 sibling, 1 reply; 45+ messages in thread
From: Florian Weimer @ 2016-02-02 13:06 UTC (permalink / raw)
  To: adhemerval.zanella; +Cc: libc-alpha

On 02/01/2016 05:21 PM, Adhemerval Zanella wrote:

> +  new_pid = CLONE (__spawni_child, STACK (stack, stack_size),
> +		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);

Does this set up new per-thread variables?  Otherwise, errno in the
parent and child will be same and the code still has races.

Florian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-02-02 13:06   ` Florian Weimer
@ 2016-02-02 13:31     ` Adhemerval Zanella
  2016-02-03 11:07       ` Torvald Riegel
  0 siblings, 1 reply; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-02 13:31 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha



On 02-02-2016 11:05, Florian Weimer wrote:
> On 02/01/2016 05:21 PM, Adhemerval Zanella wrote:
> 
>> +  new_pid = CLONE (__spawni_child, STACK (stack, stack_size),
>> +		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
> 
> Does this set up new per-thread variables?  Otherwise, errno in the
> parent and child will be same and the code still has races.
> 
> Florian
> 

Could you elaborate? In my understanding there is no requirement of
using CLONE_SETTLS to avoid races: CLONE_VFORK will suspend the 
calling process and even with child using the same TLS namespace as
the parent there will be no concurrent access between them.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-01 16:52   ` Joseph Myers
  2016-02-01 17:18     ` Adhemerval Zanella
@ 2016-02-02 16:33     ` Rich Felker
  2016-02-07 21:28       ` Rasmus Villemoes
  1 sibling, 1 reply; 45+ messages in thread
From: Rich Felker @ 2016-02-02 16:33 UTC (permalink / raw)
  To: libc-alpha

On Mon, Feb 01, 2016 at 04:52:15PM +0000, Joseph Myers wrote:
> On Mon, 1 Feb 2016, Adhemerval Zanella wrote:
> 
> > +  char *argv[argc+1];
> > +  va_start (ap, arg);
> > +  argv[0] = (char*) arg;
> > +  for (i = 1; i < argc; i++)
> > +     argv[i] = va_arg (ap, char *);
> > +  argv[i] = NULL;
> 
> I don't see how you're ensuring this stack allocation is safe (i.e. if 
> it's too big, it doesn't corrupt memory that's in use by other threads).  

There's no obligation to. If you pass something like a million
arguments to a variadic function, the compiler will generate code in
the caller that overflows the stack before the callee is even reached.
The size of the vla used in execl is exactly the same size as the
argument block on the stack used for passing arguments to execl from
its caller, and it's nobody's fault but the programmer's if this is
way too big. It's not a runtime variable.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-02-02 13:31     ` Adhemerval Zanella
@ 2016-02-03 11:07       ` Torvald Riegel
  2016-02-03 12:05         ` Adhemerval Zanella
  0 siblings, 1 reply; 45+ messages in thread
From: Torvald Riegel @ 2016-02-03 11:07 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Florian Weimer, libc-alpha

On Tue, 2016-02-02 at 11:31 -0200, Adhemerval Zanella wrote:
> 
> On 02-02-2016 11:05, Florian Weimer wrote:
> > On 02/01/2016 05:21 PM, Adhemerval Zanella wrote:
> > 
> >> +  new_pid = CLONE (__spawni_child, STACK (stack, stack_size),
> >> +		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
> > 
> > Does this set up new per-thread variables?  Otherwise, errno in the
> > parent and child will be same and the code still has races.
> > 
> > Florian
> > 
> 
> Could you elaborate? In my understanding there is no requirement of
> using CLONE_SETTLS to avoid races: CLONE_VFORK will suspend the 
> calling process and even with child using the same TLS namespace as
> the parent there will be no concurrent access between them.

Whatever you agree on eventually, it sounds as if this should be
documented as part of the concurrency notes for this function.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-02-03 11:07       ` Torvald Riegel
@ 2016-02-03 12:05         ` Adhemerval Zanella
  2016-02-03 12:06           ` Torvald Riegel
  0 siblings, 1 reply; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-03 12:05 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: Florian Weimer, libc-alpha



On 03-02-2016 09:06, Torvald Riegel wrote:
> On Tue, 2016-02-02 at 11:31 -0200, Adhemerval Zanella wrote:
>>
>> On 02-02-2016 11:05, Florian Weimer wrote:
>>> On 02/01/2016 05:21 PM, Adhemerval Zanella wrote:
>>>
>>>> +  new_pid = CLONE (__spawni_child, STACK (stack, stack_size),
>>>> +		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
>>>
>>> Does this set up new per-thread variables?  Otherwise, errno in the
>>> parent and child will be same and the code still has races.
>>>
>>> Florian
>>>
>>
>> Could you elaborate? In my understanding there is no requirement of
>> using CLONE_SETTLS to avoid races: CLONE_VFORK will suspend the 
>> calling process and even with child using the same TLS namespace as
>> the parent there will be no concurrent access between them.
> 
> Whatever you agree on eventually, it sounds as if this should be
> documented as part of the concurrency notes for this function.
> 

I think it is worth to comment internally on the function implementation
about the CLONE_VFORK synchronization, but since it is transparent
to user (the function will either spawn a new process or fails and it can
be either through a syscall, {v}fork/exec or any other mechanism),
I do not see why expose these implementation details explicit. 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-02-03 12:05         ` Adhemerval Zanella
@ 2016-02-03 12:06           ` Torvald Riegel
  2016-02-03 12:14             ` Adhemerval Zanella
  0 siblings, 1 reply; 45+ messages in thread
From: Torvald Riegel @ 2016-02-03 12:06 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Florian Weimer, libc-alpha

On Wed, 2016-02-03 at 10:05 -0200, Adhemerval Zanella wrote:
> 
> On 03-02-2016 09:06, Torvald Riegel wrote:
> > On Tue, 2016-02-02 at 11:31 -0200, Adhemerval Zanella wrote:
> >>
> >> On 02-02-2016 11:05, Florian Weimer wrote:
> >>> On 02/01/2016 05:21 PM, Adhemerval Zanella wrote:
> >>>
> >>>> +  new_pid = CLONE (__spawni_child, STACK (stack, stack_size),
> >>>> +		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
> >>>
> >>> Does this set up new per-thread variables?  Otherwise, errno in the
> >>> parent and child will be same and the code still has races.
> >>>
> >>> Florian
> >>>
> >>
> >> Could you elaborate? In my understanding there is no requirement of
> >> using CLONE_SETTLS to avoid races: CLONE_VFORK will suspend the 
> >> calling process and even with child using the same TLS namespace as
> >> the parent there will be no concurrent access between them.
> > 
> > Whatever you agree on eventually, it sounds as if this should be
> > documented as part of the concurrency notes for this function.
> > 
> 
> I think it is worth to comment internally on the function implementation
> about the CLONE_VFORK synchronization, but since it is transparent
> to user (the function will either spawn a new process or fails and it can
> be either through a syscall, {v}fork/exec or any other mechanism),
> I do not see why expose these implementation details explicit. 

I meant the internal documentation; IOW, comments in the code.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-02-03 12:06           ` Torvald Riegel
@ 2016-02-03 12:14             ` Adhemerval Zanella
  0 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-03 12:14 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: Florian Weimer, libc-alpha



On 03-02-2016 10:06, Torvald Riegel wrote:
> On Wed, 2016-02-03 at 10:05 -0200, Adhemerval Zanella wrote:
>>
>> On 03-02-2016 09:06, Torvald Riegel wrote:
>>> On Tue, 2016-02-02 at 11:31 -0200, Adhemerval Zanella wrote:
>>>>
>>>> On 02-02-2016 11:05, Florian Weimer wrote:
>>>>> On 02/01/2016 05:21 PM, Adhemerval Zanella wrote:
>>>>>
>>>>>> +  new_pid = CLONE (__spawni_child, STACK (stack, stack_size),
>>>>>> +		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
>>>>>
>>>>> Does this set up new per-thread variables?  Otherwise, errno in the
>>>>> parent and child will be same and the code still has races.
>>>>>
>>>>> Florian
>>>>>
>>>>
>>>> Could you elaborate? In my understanding there is no requirement of
>>>> using CLONE_SETTLS to avoid races: CLONE_VFORK will suspend the 
>>>> calling process and even with child using the same TLS namespace as
>>>> the parent there will be no concurrent access between them.
>>>
>>> Whatever you agree on eventually, it sounds as if this should be
>>> documented as part of the concurrency notes for this function.
>>>
>>
>> I think it is worth to comment internally on the function implementation
>> about the CLONE_VFORK synchronization, but since it is transparent
>> to user (the function will either spawn a new process or fails and it can
>> be either through a syscall, {v}fork/exec or any other mechanism),
>> I do not see why expose these implementation details explicit. 
> 
> I meant the internal documentation; IOW, comments in the code.
> 

Fair enough, I will add them.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-02 16:33     ` Rich Felker
@ 2016-02-07 21:28       ` Rasmus Villemoes
  2016-02-09 11:36         ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Rasmus Villemoes @ 2016-02-07 21:28 UTC (permalink / raw)
  To: libc-alpha

On Tue, Feb 02 2016, Rich Felker <dalias@libc.org> wrote:

> On Mon, Feb 01, 2016 at 04:52:15PM +0000, Joseph Myers wrote:
>> On Mon, 1 Feb 2016, Adhemerval Zanella wrote:
>> 
>> > +  char *argv[argc+1];
>> > +  va_start (ap, arg);
>> > +  argv[0] = (char*) arg;
>> > +  for (i = 1; i < argc; i++)
>> > +     argv[i] = va_arg (ap, char *);
>> > +  argv[i] = NULL;
>> 
>> I don't see how you're ensuring this stack allocation is safe (i.e. if 
>> it's too big, it doesn't corrupt memory that's in use by other threads).  
>
> There's no obligation to. If you pass something like a million
> arguments to a variadic function, the compiler will generate code in
> the caller that overflows the stack before the callee is even reached.
> The size of the vla used in execl is exactly the same size as the
> argument block on the stack used for passing arguments to execl from
> its caller, and it's nobody's fault but the programmer's if this is
> way too big. It's not a runtime variable.

This is true, and maybe it's not worth the extra complication, but if
we're willing to make arch-specific versions of execl and execle we can
avoid the double stack use and the time spent copying the argv
array. That won't remove the possible stack overflow, of course, but
then it'll in all likelihood happen in the user code and not glibc.

On i386 it's pretty straight-forward since all args are already passed
on the stack, while on x86-64 we can, roughly speaking, make a local
change of the ABI by just stashing the return address somewhere else so
that we can prepend the few passed-by-register to the stack-passed
parameters. I suppose something similar is possible on most other
architectures.

Something like this passes some quick testing (nevermind the z_ prefix;
I just used that to avoid any clashes with the actual functions).

i386:

.globl z_execl ; .align 4,0x90 ; z_execl:
	pushl z_environ
	lea 0x0c(%esp), %eax
	push %eax
	pushl 0x0c(%esp)

	call z_execve

	add $0x0c, %esp
	ret
.type z_execl, @function ; .size z_execl, .-z_execl

.globl z_execle ; .align 4,0x90 ; z_execle:
	lea 0x0c(%esp), %eax // eax = address of first vararg (&argv[1])
	movl (%eax), %edx // we'll test this
1:	add $0x4, %eax // move on to the next word
	test %edx,%edx // have we found NULL?
	movl (%eax), %edx // load the next word into edx
	jnz 1b
	
	pushl %edx
	lea 0x0c(%esp), %eax // same offset as above, but because of the push this is now &argv[0]
	push %eax
	pushl 0x0c(%esp)

	call z_execve

	add $0x0c, %esp
	ret
.type z_execle, @function ; .size z_execle, .-z_execle
	

x86-64:

.globl z_execl  ; .align 4,0x90 ; z_execl:
	movq (%rsp),%rax // rax is caller-saved and dead at this point
	sub $0x28, %rsp // Only allocates room for five pointers, we store six!
	movq %rax, (%rsp) // Stash the return address here.
	movq %rsi, 0x8(%rsp)  // Put the paramaters passed via registers
	movq %rdx, 0x10(%rsp) // at the bottom of the array.
	movq %rcx, 0x18(%rsp)
	movq %r8,  0x20(%rsp)
	movq %r9,  0x28(%rsp) // Overwrites return address, adjacent to stack-passed parameters (arg 7+)

	lea 0x8(%rsp),%rsi // Put the address of our argv array in rsi.
	movq z_environ, %rdx // Third argument is the global environment.
	call z_execve // %rdi is unchanged since the beginning

	// If all goes well, this never returns. Otherwise, we need to clean up.
	// %rax should be preserved, but we're free to use any other caller-saved register.
	
	movq (%rsp), %r8 // Fetch the return address.
	add $0x28, %rsp // Clean up the stack.
	movq %r8, (%rsp) // Put the return address back where it belongs.
	ret
.type z_execl, @function ; .size z_execl, .-z_execl

.globl z_execle ; .align 4,0x90 ; z_execle:
	movq (%rsp),%rax
	sub $0x28, %rsp
	movq %rax, (%rsp)
	movq %rsi, 0x8(%rsp)
	movq %rdx, 0x10(%rsp)
	movq %rcx, 0x18(%rsp)
	movq %r8,  0x20(%rsp)
	movq %r9,  0x28(%rsp)

	// Find the env argument; it is the array element after the
	// argv NULL pointer, which cannot be located before argv[1].
	lea 0x10(%rsp),%r8
	movq (%r8), %rdx
1:	add $0x8, %r8
	test %rdx, %rdx
	movq (%r8), %rdx
	jnz 1b
	
	lea 0x8(%rsp),%rsi
	call z_execve

	movq (%rsp), %r8 
	add $0x28, %rsp
	movq %r8, (%rsp)
	ret
.type z_execle, @function ; .size z_execle, .-z_execle


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-07 21:28       ` Rasmus Villemoes
@ 2016-02-09 11:36         ` Richard Henderson
  2016-02-09 13:30           ` Szabolcs Nagy
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2016-02-09 11:36 UTC (permalink / raw)
  To: Rasmus Villemoes, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1918 bytes --]

On 02/08/2016 08:28 AM, Rasmus Villemoes wrote:
> On Tue, Feb 02 2016, Rich Felker <dalias@libc.org> wrote:
>
>> On Mon, Feb 01, 2016 at 04:52:15PM +0000, Joseph Myers wrote:
>>> On Mon, 1 Feb 2016, Adhemerval Zanella wrote:
>>>
>>>> +  char *argv[argc+1];
>>>> +  va_start (ap, arg);
>>>> +  argv[0] = (char*) arg;
>>>> +  for (i = 1; i < argc; i++)
>>>> +     argv[i] = va_arg (ap, char *);
>>>> +  argv[i] = NULL;
>>>
>>> I don't see how you're ensuring this stack allocation is safe (i.e. if
>>> it's too big, it doesn't corrupt memory that's in use by other threads).
>>
>> There's no obligation to. If you pass something like a million
>> arguments to a variadic function, the compiler will generate code in
>> the caller that overflows the stack before the callee is even reached.
>> The size of the vla used in execl is exactly the same size as the
>> argument block on the stack used for passing arguments to execl from
>> its caller, and it's nobody's fault but the programmer's if this is
>> way too big. It's not a runtime variable.
>
> This is true, and maybe it's not worth the extra complication, but if
> we're willing to make arch-specific versions of execl and execle we can
> avoid the double stack use and the time spent copying the argv
> array. That won't remove the possible stack overflow, of course, but
> then it'll in all likelihood happen in the user code and not glibc.

I like the idea.  It's a quality of implementation issue, wherein by re-using 
the data that's (mostly) on the stack already we don't need twice again the 
amount of stack space for any given call.

I think that Adhemerval's patch should go in as a default implementation, and 
various targets can implement the assembly as desired.

I've tested the following on x86_64 and i686.  I've compile-tested for x32 (but 
need a more complete x32 installation for testing), and alpha (testing is just 
slow).

Thoughts?


r~

[-- Attachment #2: 0001-Move-posix-execl-files-to-sysdeps.patch --]
[-- Type: text/x-patch, Size: 977 bytes --]

From f685eeedc6d7baf1bf822ef058e06bc1bcae4541 Mon Sep 17 00:00:00 2001
From: Richard Henderson <rth@twiddle.net>
Date: Tue, 9 Feb 2016 11:53:32 +1100
Subject: [PATCH 1/5] Move posix/execl files to sysdeps/

This will allow them to be overridable.
---
 {posix => sysdeps/posix}/execl.c  | 0
 {posix => sysdeps/posix}/execle.c | 0
 {posix => sysdeps/posix}/execlp.c | 0
 3 files changed, 0 insertions(+), 0 deletions(-)
 rename {posix => sysdeps/posix}/execl.c (100%)
 rename {posix => sysdeps/posix}/execle.c (100%)
 rename {posix => sysdeps/posix}/execlp.c (100%)

diff --git a/posix/execl.c b/sysdeps/posix/execl.c
similarity index 100%
rename from posix/execl.c
rename to sysdeps/posix/execl.c
diff --git a/posix/execle.c b/sysdeps/posix/execle.c
similarity index 100%
rename from posix/execle.c
rename to sysdeps/posix/execle.c
diff --git a/posix/execlp.c b/sysdeps/posix/execlp.c
similarity index 100%
rename from posix/execlp.c
rename to sysdeps/posix/execlp.c
-- 
2.5.0


[-- Attachment #3: 0002-x86_64-Implement-execl-e-p-without-double-stack-allo.patch --]
[-- Type: text/x-patch, Size: 7218 bytes --]

From 447f711575e70c09fd883f28d86e175993025277 Mon Sep 17 00:00:00 2001
From: Richard Henderson <rth@twiddle.net>
Date: Tue, 9 Feb 2016 12:53:17 +1100
Subject: [PATCH 2/5] x86_64: Implement execl{,e,p} without double stack
 allocation

---
 include/unistd.h                           |  1 +
 posix/execv.c                              |  1 +
 sysdeps/unix/sysv/linux/x86_64/64/execl.S  | 65 +++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/64/execle.S | 66 ++++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/64/execlp.S | 52 +++++++++++++++++++++++
 5 files changed, 185 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/execl.S
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/execle.S
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/execlp.S

diff --git a/include/unistd.h b/include/unistd.h
index 5152f64..21b8135 100644
--- a/include/unistd.h
+++ b/include/unistd.h
@@ -9,6 +9,7 @@ rtld_hidden_proto (_exit, __noreturn__)
 libc_hidden_proto (alarm)
 libc_hidden_proto (confstr)
 libc_hidden_proto (execl)
+libc_hidden_proto (execv)
 libc_hidden_proto (execle)
 libc_hidden_proto (execlp)
 libc_hidden_proto (execvp)
diff --git a/posix/execv.c b/posix/execv.c
index 16c0a02..faca14f 100644
--- a/posix/execv.c
+++ b/posix/execv.c
@@ -24,3 +24,4 @@ execv (const char *path, char *const argv[])
 {
   return __execve (path, argv, __environ);
 }
+libc_hidden_def (execv)
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/execl.S b/sysdeps/unix/sysv/linux/x86_64/64/execl.S
new file mode 100644
index 0000000..29b4923
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/execl.S
@@ -0,0 +1,65 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execl)
+	/* Move return address into a register.  */
+	pop	%rax
+	cfi_adjust_cfa_offset(-8)
+	cfi_register(%rip, %rax)
+
+	/* Save the portions of the argv argument list in registers.  */
+	push	%r9
+	cfi_adjust_cfa_offset(8)
+	push	%r8
+	cfi_adjust_cfa_offset(8)
+	push	%rcx
+	cfi_adjust_cfa_offset(8)
+	push	%rdx
+	cfi_adjust_cfa_offset(8)
+	push	%rsi
+	cfi_adjust_cfa_offset(8)
+
+	/* Load the address of the argv array.  */
+	mov	%rsp, %rsi
+
+	/* Restore return address to the stack.  */
+	push	%rax
+	cfi_adjust_cfa_offset(8)
+	cfi_rel_offset(%rip, 0)
+
+	/* Load __environ for the env parameter.  */
+#ifdef PIC
+	mov	__environ@GOTPCREL(%rip), %rdx
+	mov	(%rdx), %rdx
+#else
+	mov	__environ(%rip), %rdx
+#endif
+
+	DO_CALL (execve, 3)
+
+	/* All returns are errors.  */
+	SYSCALL_SET_ERRNO
+	or	$-1, %rax
+
+	/* Pop all of the extra stack space in one go.  */
+	ret	$40
+
+END(execl)
+
+libc_hidden_def (execl)
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/execle.S b/sysdeps/unix/sysv/linux/x86_64/64/execle.S
new file mode 100644
index 0000000..c2fc5c9
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/execle.S
@@ -0,0 +1,66 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execle)
+	/* Move return address into a register.  */
+	pop	%rax
+	cfi_adjust_cfa_offset(-8)
+	cfi_register(%rip, %rax)
+
+	/* Save the portions of the argv argument list in registers.  */
+	push	%r9
+	cfi_adjust_cfa_offset(8)
+	push	%r8
+	cfi_adjust_cfa_offset(8)
+	push	%rcx
+	cfi_adjust_cfa_offset(8)
+	push	%rdx
+	cfi_adjust_cfa_offset(8)
+	push	%rsi
+	cfi_adjust_cfa_offset(8)
+
+	/* Load the address of the argv array.  */
+	mov	%rsp, %rsi
+
+	/* Restore return address to the stack.  */
+	push	%rax
+	cfi_adjust_cfa_offset(8)
+	cfi_rel_offset(%rip, 0)
+
+	/* Find the env argument.  It is the array element after the argv
+	   NULL terminator, which cannot be located before argv[1].  */
+	lea	8(%rsi), %rax
+	mov	(%rax), %rdx
+1:	add	$8, %rax
+	test	%rdx, %rdx
+	mov	(%rax), %rdx
+	jnz	1b
+
+	DO_CALL (execve, 3)
+
+	/* All returns are errors.  */
+	SYSCALL_SET_ERRNO
+	or	$-1, %rax
+
+	/* Pop all of the extra stack space in one go.  */
+	ret	$40
+
+END(execle)
+
+libc_hidden_def (execle)
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/execlp.S b/sysdeps/unix/sysv/linux/x86_64/64/execlp.S
new file mode 100644
index 0000000..caa0895
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/execlp.S
@@ -0,0 +1,52 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execlp)
+	/* Move return address into a register.  */
+	pop	%rax
+	cfi_adjust_cfa_offset(-8)
+	cfi_register(%rip, %rax)
+
+	/* Save the portions of the argv argument list in registers.  */
+	push	%r9
+	cfi_adjust_cfa_offset(8)
+	push	%r8
+	cfi_adjust_cfa_offset(8)
+	push	%rcx
+	cfi_adjust_cfa_offset(8)
+	push	%rdx
+	cfi_adjust_cfa_offset(8)
+	push	%rsi
+	cfi_adjust_cfa_offset(8)
+
+	/* Load the address of the argv array.  */
+	mov	%rsp, %rsi
+
+	/* Restore return address to the stack.  */
+	push	%rax
+	cfi_adjust_cfa_offset(8)
+	cfi_rel_offset(%rip, 0)
+
+	call	HIDDEN_JUMPTARGET (execvp)
+
+	/* Pop all of the extra stack space in one go.  */
+	ret	$40
+END(execlp)
+
+libc_hidden_def (execlp)
-- 
2.5.0


[-- Attachment #4: 0003-i386-Implement-execl-e-p-without-double-stack-alloca.patch --]
[-- Type: text/x-patch, Size: 4113 bytes --]

From 072dce81771f971f49f2480630842a2874a2c860 Mon Sep 17 00:00:00 2001
From: Richard Henderson <rth@twiddle.net>
Date: Tue, 9 Feb 2016 13:12:33 +1100
Subject: [PATCH 3/5] i386: Implement execl{,e,p} without double stack
 allocation

---
 sysdeps/unix/sysv/linux/i386/execl.S  | 39 +++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/i386/execle.S | 46 +++++++++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/i386/execlp.S |  4 +++
 3 files changed, 89 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/i386/execl.S
 create mode 100644 sysdeps/unix/sysv/linux/i386/execle.S
 create mode 100644 sysdeps/unix/sysv/linux/i386/execlp.S

diff --git a/sysdeps/unix/sysv/linux/i386/execl.S b/sysdeps/unix/sysv/linux/i386/execl.S
new file mode 100644
index 0000000..047ef48
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/i386/execl.S
@@ -0,0 +1,39 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execl)
+	mov	4(%esp), %edx
+	lea	8(%esp), %eax
+
+	push	%eax			/* alignment padding */
+	cfi_adjust_cfa_offset(4)
+	push	%eax			/* create argv argument */
+	cfi_adjust_cfa_offset(4)
+	push	%edx			/* create path argument */
+	cfi_adjust_cfa_offset(4)
+
+	/* Let execv deal with pic vs non-pic loading of __environ.  */
+	call	HIDDEN_JUMPTARGET (execv)
+
+	add	$12, %esp
+	cfi_adjust_cfa_offset(-12)
+	ret
+END(execl)
+
+libc_hidden_def (execl)
diff --git a/sysdeps/unix/sysv/linux/i386/execle.S b/sysdeps/unix/sysv/linux/i386/execle.S
new file mode 100644
index 0000000..34cc7bc
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/i386/execle.S
@@ -0,0 +1,46 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execle)
+	lea	8(%esp), %eax	/* find argv array */
+
+	/* Find the env argument.  It is the array element after the argv
+	   NULL terminator, which cannot be located before argv[1].  */
+	mov	4(%eax), %edx
+	xor	%ecx, %ecx
+1:	inc	%ecx
+	test	%edx, %edx
+	mov	4(%eax, %ecx, 4), %edx
+	jnz	1b
+
+	push	%edx		/* create env argument */
+	cfi_adjust_cfa_offset(4)
+	push	%eax		/* create argv argument */
+	cfi_adjust_cfa_offset(4)
+	push	12(%esp)	/* copy path argument */
+	cfi_adjust_cfa_offset(4)
+
+	call	HIDDEN_JUMPTARGET (execve)
+
+	add	$12, %esp
+	cfi_adjust_cfa_offset(-12)
+	ret
+END(execle)
+
+libc_hidden_def (execle)
diff --git a/sysdeps/unix/sysv/linux/i386/execlp.S b/sysdeps/unix/sysv/linux/i386/execlp.S
new file mode 100644
index 0000000..d1c8806
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/i386/execlp.S
@@ -0,0 +1,4 @@
+#define execl		execlp
+#define execv		execvp
+#define __GI_execv	__GI_execvp
+#include "execl.S"
-- 
2.5.0


[-- Attachment #5: 0004-x32-Implement-execl-e-p-without-double-stack-allocat.patch --]
[-- Type: text/x-patch, Size: 8124 bytes --]

From 4577d8457637adac28ed34bc3f9ce0a936dc786e Mon Sep 17 00:00:00 2001
From: Richard Henderson <rth@twiddle.net>
Date: Tue, 9 Feb 2016 15:19:32 +1100
Subject: [PATCH 4/5] x32: Implement execl{,e,p} without double stack
 allocation

---
 sysdeps/unix/sysv/linux/x86_64/x32/execl.S  | 88 +++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/x32/execle.S | 87 ++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/x32/execlp.S | 76 +++++++++++++++++++++++++
 3 files changed, 251 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/x32/execl.S
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/x32/execle.S
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/x32/execlp.S

diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/execl.S b/sysdeps/unix/sysv/linux/x86_64/x32/execl.S
new file mode 100644
index 0000000..9139bad
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/execl.S
@@ -0,0 +1,88 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execl)
+	/* Move return address into a register.  */
+	pop	%rax
+	cfi_adjust_cfa_offset(-8)
+	cfi_register(%rip, %rax)
+
+	/* Save the arguments in registers.  Stop as soon as we detect
+	   the NULL terminator, as if we find one, we do not want to fall
+	   into the on-stack conversion loop.  */
+	sub	$24, %esp
+	cfi_adjust_cfa_offset(24)
+
+	mov	%edi, 4(%rsp)	/* argv[0] must be non-null.  */
+
+	mov	%edx, 8(%rsp)
+	test	%edx, %edx
+	jz	9f
+
+	mov	%ecx, 12(%rsp)
+	test	%ecx, %ecx
+	jz	9f
+
+	mov	%r8d, 16(%rsp)
+	test	%r8d, %r8d
+	jz	9f
+
+	mov	%r9d, 20(%rsp)
+	test	%r9d, %r9d
+	jz	9f
+
+	/* Convert the on-stack pointer arguments to in-place
+	   from a 64-bit padded array into a 32-bit packed array.
+           Note that this is memory is callee owned.  */
+	xor	%ecx, %ecx
+1:	mov	24(%rsp, %rcx, 8), %edx
+	mov	%edx, 24(%rsp, %rcx, 4)
+	inc	%ecx
+	test	%edx, %edx
+	jnz	1b
+
+9:
+	/* Restore return address to the stack.  */
+	push	%rax
+	cfi_adjust_cfa_offset(8)
+	cfi_rel_offset(%rip, 0)
+
+	/* Load __environ for the env parameter.  */
+#ifdef PIC
+	mov	__environ@GOTPCREL(%rip), %edx
+	mov	(%rdx), %edx
+#else
+	mov	__environ(%rip), %edx
+#endif
+
+	/* Load argv parameter.  Note that path (esi) is already loaded.  */
+	lea	12(%rsp), %edi
+
+	DO_CALL (execve, 3)
+
+	/* All returns are errors.  */
+	SYSCALL_SET_ERRNO
+	or	$-1, %rax
+
+	/* Pop all of the extra stack space in one go.  */
+	ret	$24
+
+END(execl)
+
+libc_hidden_def (execl)
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/execle.S b/sysdeps/unix/sysv/linux/x86_64/x32/execle.S
new file mode 100644
index 0000000..c7de7e0
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/execle.S
@@ -0,0 +1,87 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execle)
+	/* Move return address into a register.  */
+	pop	%rax
+	cfi_adjust_cfa_offset(-8)
+	cfi_register(%rip, %rax)
+
+	/* Save the arguments in registers.  Stop as soon as we detect
+	   the NULL terminator, as if we find one, we do not want to fall
+	   into the on-stack conversion loop.  Move the potential ENV
+	   parameter in place in EDX on each exit path.  */
+	sub	$24, %esp
+	cfi_adjust_cfa_offset(24)
+
+	mov	%edi, 4(%rsp)	/* argv[0] must be non-null.  */
+
+	mov	%edx, 8(%rsp)
+	test	%edx, %edx
+	mov	%ecx, %edx
+	jz	9f
+
+	mov	%ecx, 12(%rsp)
+	test	%ecx, %ecx
+	mov	%r8d, %edx
+	jz	9f
+
+	mov	%r8d, 16(%rsp)
+	test	%r8d, %r8d
+	mov	%r9d, %edx
+	jz	9f
+
+	mov	%r9d, 20(%rsp)
+	test	%r9d, %r9d
+	mov	24(%rsp), %edx
+	jz	9f
+
+	/* Convert the on-stack pointer arguments to in-place
+	   from a 64-bit padded array into a 32-bit packed array.
+           Note that this is memory is callee owned, and that this
+	   loop exits with the ENV parameter loaded in EDX.  */
+	xor	%ecx, %ecx
+	mov	24(%rsp, %rcx, 8), %edx
+1:	mov	%edx, 24(%rsp, %rcx, 4)
+	inc	%ecx
+	test	%edx, %edx
+	mov	24(%rsp, %rcx, 8), %edx
+	jnz	1b
+
+9:
+	/* Restore return address to the stack.  */
+	push	%rax
+	cfi_adjust_cfa_offset(8)
+	cfi_rel_offset(%rip, 0)
+
+	/* Load argv parameter.  Note that path (esi) is already loaded.  */
+	lea	12(%rsp), %edi
+
+	DO_CALL (execve, 3)
+
+	/* All returns are errors.  */
+	SYSCALL_SET_ERRNO
+	or	$-1, %rax
+
+	/* Pop all of the extra stack space in one go.  */
+	ret	$24
+
+END(execle)
+
+libc_hidden_def (execle)
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/execlp.S b/sysdeps/unix/sysv/linux/x86_64/x32/execlp.S
new file mode 100644
index 0000000..cad65f5
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/execlp.S
@@ -0,0 +1,76 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execlp)
+	/* Move return address into a register.  */
+	pop	%rax
+	cfi_adjust_cfa_offset(-8)
+	cfi_register(%rip, %rax)
+
+	/* Save the arguments in registers.  Stop as soon as we detect
+	   the NULL terminator, as if we find one, we do not want to fall
+	   into the on-stack conversion loop.  */
+	sub	$24, %esp
+	cfi_adjust_cfa_offset(24)
+
+	mov	%edi, 4(%rsp)	/* argv[0] must be non-null.  */
+
+	mov	%edx, 8(%rsp)
+	test	%edx, %edx
+	jz	9f
+
+	mov	%ecx, 12(%rsp)
+	test	%ecx, %ecx
+	jz	9f
+
+	mov	%r8d, 16(%rsp)
+	test	%r8d, %r8d
+	jz	9f
+
+	mov	%r9d, 20(%rsp)
+	test	%r9d, %r9d
+	jz	9f
+
+	/* Convert the on-stack pointer arguments to in-place
+	   from a 64-bit padded array into a 32-bit packed array.
+           Note that this is memory is callee owned.  */
+	xor	%ecx, %ecx
+1:	mov	24(%rsp, %rcx, 8), %edx
+	mov	%edx, 24(%rsp, %rcx, 4)
+	inc	%ecx
+	test	%edx, %edx
+	jnz	1b
+
+9:
+	/* Restore return address to the stack.  */
+	push	%rax
+	cfi_adjust_cfa_offset(8)
+	cfi_rel_offset(%rip, 0)
+
+	/* Load argv parameter.  Note that path (esi) is already loaded.  */
+	lea	12(%rsp), %edi
+
+	call	HIDDEN_JUMPTARGET (execvp)
+
+	/* Pop all of the extra stack space in one go.  */
+	ret	$24
+
+END(execlp)
+
+libc_hidden_def (execlp)
-- 
2.5.0


[-- Attachment #6: 0005-alpha-Implement-execl-e-p-without-double-stack-alloc.patch --]
[-- Type: text/x-patch, Size: 5898 bytes --]

From 5b78856069a21550d4b67b4c0a269915f37fce0f Mon Sep 17 00:00:00 2001
From: Richard Henderson <rth@twiddle.net>
Date: Tue, 9 Feb 2016 13:43:08 +1100
Subject: [PATCH 5/5] alpha: Implement execl{,e,p} without double stack
 allocation

---
 sysdeps/unix/sysv/linux/alpha/execl.S  | 52 ++++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/alpha/execle.S | 58 ++++++++++++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/alpha/execlp.S | 53 +++++++++++++++++++++++++++++++
 3 files changed, 163 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/alpha/execl.S
 create mode 100644 sysdeps/unix/sysv/linux/alpha/execle.S
 create mode 100644 sysdeps/unix/sysv/linux/alpha/execlp.S

diff --git a/sysdeps/unix/sysv/linux/alpha/execl.S b/sysdeps/unix/sysv/linux/alpha/execl.S
new file mode 100644
index 0000000..11f4307
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/alpha/execl.S
@@ -0,0 +1,52 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execl)
+	cfi_startproc
+	ldgp	gp, 0(pv)
+	lda	sp, -48(sp)
+	cfi_adjust_cfa_offset(48)
+	.frame	sp, 48, ra
+	.prologue 1
+
+	/* Save the portions of the argv argument list in registers.  */
+	stq	a5, 40(sp)
+	stq	a4, 32(sp)
+	stq	a3, 24(sp)
+	stq	a2, 16(sp)
+	stq	a1,  8(sp)
+
+	/* Load the argv and envp arguments; path is already in place.  */
+	lda	a1, 8(sp)
+	ldq	a2, __environ
+
+	lda	v0, SYS_ify(execve)
+	call_pal PAL_callsys
+
+	/* Discard the stack frame now.  */
+	lda	sp, 48(sp)
+	cfi_adjust_cfa_offset(-48)
+
+	/* All returns are errors.  */
+	br	SYSCALL_ERROR_LABEL
+
+PSEUDO_END (execle)
+	cfi_endproc
+
+libc_hidden_def (execl)
diff --git a/sysdeps/unix/sysv/linux/alpha/execle.S b/sysdeps/unix/sysv/linux/alpha/execle.S
new file mode 100644
index 0000000..ce75ce3
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/alpha/execle.S
@@ -0,0 +1,58 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execle)
+	cfi_startproc
+	lda	sp, -48(sp)
+	cfi_adjust_cfa_offset(48)
+	.frame  sp, 48, ra
+	.prologue 0
+
+	/* Save the portions of the argv argument list in registers.  */
+	stq	a5, 40(sp)
+	stq	a4, 32(sp)
+	stq	a3, 24(sp)
+	stq	a2, 16(sp)
+	stq	a1,  8(sp)
+
+	/* Find the env argument.  It is the array element after the argv
+	   NULL terminator, which cannot be located before argv[1].  */
+	lda	t0, 16(sp)
+1:	ldq	t1, 0(t0)
+	addq	t0, 8, t0
+	bne	t1, 1b
+
+	/* Load the argv and envp arguments; path is already in place.  */
+	lda	a1, 8(sp)
+	ldq	a2, 0(t0)
+
+	lda	v0, SYS_ify(execve)
+	call_pal PAL_callsys
+
+	/* Discard the stack frame now.  */
+	lda	sp, 48(sp)
+	cfi_adjust_cfa_offset(-48)
+
+	/* All returns are errors.  */
+	br	SYSCALL_ERROR_LABEL
+
+PSEUDO_END (execle)
+	cfi_endproc
+
+libc_hidden_def (execle)
diff --git a/sysdeps/unix/sysv/linux/alpha/execlp.S b/sysdeps/unix/sysv/linux/alpha/execlp.S
new file mode 100644
index 0000000..b0ef76d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/alpha/execlp.S
@@ -0,0 +1,53 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ENTRY(execlp)
+	cfi_startproc
+	ldgp	gp, 0(pv)
+	lda	sp, -48(sp)
+	cfi_adjust_cfa_offset(48)
+	stq	ra, 0(sp)
+	cfi_rel_offset(ra, 0)
+	.prologue 1
+
+	/* Save the portions of the argv argument list in registers.  */
+	stq	a5, 40(sp)
+	stq	a4, 32(sp)
+	stq	a3, 24(sp)
+	stq	a2, 16(sp)
+	stq	a1,  8(sp)
+
+	/* Load the argv and envp arguments; path is already in place.  */
+	lda	a1, 8(sp)
+	ldq	a2, __environ
+#ifdef PIC
+	bsr	ra, __execvpe		!samegp
+#else
+	jsr	ra, __execvpe
+	ldgp	gp, 0(ra)
+#endif
+
+	lda	sp, 48(sp)
+	cfi_adjust_cfa_offset(-48)
+	ret
+
+END (execlp)
+	cfi_endproc
+
+libc_hidden_def (execlp)
-- 
2.5.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-09 11:36         ` Richard Henderson
@ 2016-02-09 13:30           ` Szabolcs Nagy
  2016-02-10 16:36             ` Adhemerval Zanella
  0 siblings, 1 reply; 45+ messages in thread
From: Szabolcs Nagy @ 2016-02-09 13:30 UTC (permalink / raw)
  To: Richard Henderson, Rasmus Villemoes, libc-alpha; +Cc: nd

On 09/02/16 11:35, Richard Henderson wrote:
> On 02/08/2016 08:28 AM, Rasmus Villemoes wrote:
>> On Tue, Feb 02 2016, Rich Felker <dalias@libc.org> wrote:
>>
>>> On Mon, Feb 01, 2016 at 04:52:15PM +0000, Joseph Myers wrote:
>>>> On Mon, 1 Feb 2016, Adhemerval Zanella wrote:
>>>>
>>>>> +  char *argv[argc+1];
>>>>> +  va_start (ap, arg);
>>>>> +  argv[0] = (char*) arg;
>>>>> +  for (i = 1; i < argc; i++)
>>>>> +     argv[i] = va_arg (ap, char *);
>>>>> +  argv[i] = NULL;
>>>>
>>>> I don't see how you're ensuring this stack allocation is safe (i.e. if
>>>> it's too big, it doesn't corrupt memory that's in use by other threads).
>>>
>>> There's no obligation to. If you pass something like a million
>>> arguments to a variadic function, the compiler will generate code in
>>> the caller that overflows the stack before the callee is even reached.
>>> The size of the vla used in execl is exactly the same size as the
>>> argument block on the stack used for passing arguments to execl from
>>> its caller, and it's nobody's fault but the programmer's if this is
>>> way too big. It's not a runtime variable.
>>
>> This is true, and maybe it's not worth the extra complication, but if
>> we're willing to make arch-specific versions of execl and execle we can
>> avoid the double stack use and the time spent copying the argv
>> array. That won't remove the possible stack overflow, of course, but
>> then it'll in all likelihood happen in the user code and not glibc.
> 
> I like the idea.  It's a quality of implementation issue, wherein by re-using the data that's (mostly) on the
> stack already we don't need twice again the amount of stack space for any given call.
> 
> I think that Adhemerval's patch should go in as a default implementation, and various targets can implement the
> assembly as desired.
> 
> I've tested the following on x86_64 and i686.  I've compile-tested for x32 (but need a more complete x32
> installation for testing), and alpha (testing is just slow).
> 
> Thoughts?
> 

i think it is a hard to maintain, nasty hack

is execl stack usage the bottleneck somewhere?

to improve the stack usage in glibc, the extreme
wasteful cases should be fixed first.
(crypt uses >100k, strtod uses ???, etc)

e.g. musl guarantees hard limits on libc stack usage
and execl is one of the (two) exceptions where it just
seems useless to do so (you cannot do it portably anyway).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p}
  2016-02-09 13:30           ` Szabolcs Nagy
@ 2016-02-10 16:36             ` Adhemerval Zanella
  0 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-02-10 16:36 UTC (permalink / raw)
  To: libc-alpha



On 09-02-2016 11:30, Szabolcs Nagy wrote:
> On 09/02/16 11:35, Richard Henderson wrote:
>> On 02/08/2016 08:28 AM, Rasmus Villemoes wrote:
>>> On Tue, Feb 02 2016, Rich Felker <dalias@libc.org> wrote:
>>>
>>>> On Mon, Feb 01, 2016 at 04:52:15PM +0000, Joseph Myers wrote:
>>>>> On Mon, 1 Feb 2016, Adhemerval Zanella wrote:
>>>>>
>>>>>> +  char *argv[argc+1];
>>>>>> +  va_start (ap, arg);
>>>>>> +  argv[0] = (char*) arg;
>>>>>> +  for (i = 1; i < argc; i++)
>>>>>> +     argv[i] = va_arg (ap, char *);
>>>>>> +  argv[i] = NULL;
>>>>>
>>>>> I don't see how you're ensuring this stack allocation is safe (i.e. if
>>>>> it's too big, it doesn't corrupt memory that's in use by other threads).
>>>>
>>>> There's no obligation to. If you pass something like a million
>>>> arguments to a variadic function, the compiler will generate code in
>>>> the caller that overflows the stack before the callee is even reached.
>>>> The size of the vla used in execl is exactly the same size as the
>>>> argument block on the stack used for passing arguments to execl from
>>>> its caller, and it's nobody's fault but the programmer's if this is
>>>> way too big. It's not a runtime variable.
>>>
>>> This is true, and maybe it's not worth the extra complication, but if
>>> we're willing to make arch-specific versions of execl and execle we can
>>> avoid the double stack use and the time spent copying the argv
>>> array. That won't remove the possible stack overflow, of course, but
>>> then it'll in all likelihood happen in the user code and not glibc.
>>
>> I like the idea.  It's a quality of implementation issue, wherein by re-using the data that's (mostly) on the
>> stack already we don't need twice again the amount of stack space for any given call.
>>
>> I think that Adhemerval's patch should go in as a default implementation, and various targets can implement the
>> assembly as desired.
>>
>> I've tested the following on x86_64 and i686.  I've compile-tested for x32 (but need a more complete x32
>> installation for testing), and alpha (testing is just slow).
>>
>> Thoughts?
>>
> 
> i think it is a hard to maintain, nasty hack
> 
> is execl stack usage the bottleneck somewhere?
> 
> to improve the stack usage in glibc, the extreme
> wasteful cases should be fixed first.
> (crypt uses >100k, strtod uses ???, etc)
> 
> e.g. musl guarantees hard limits on libc stack usage
> and execl is one of the (two) exceptions where it just
> seems useless to do so (you cannot do it portably anyway).
> 


I agree with Szabolcs and I also see these kind of asm specific
implementation also adds platform different behaviour which I also
think we should avoid to add within GLIBC.

The only doubt with my patch I have now is if it is worth to add
the -fstack-check or not. Florian has raised some questioning about
its efficacy and I am not sure how well it is supported on all 
architectures.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-02-01 16:21 ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Adhemerval Zanella
  2016-02-02 13:06   ` Florian Weimer
@ 2016-08-31 21:13   ` Rasmus Villemoes
  2016-08-31 22:08     ` Joseph Myers
                       ` (2 more replies)
  1 sibling, 3 replies; 45+ messages in thread
From: Rasmus Villemoes @ 2016-08-31 21:13 UTC (permalink / raw)
  To: libc-alpha; +Cc: Adhemerval Zanella

On Mon, Feb 01 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:


> +      for (cnt = 0; cnt < file_actions->__used; ++cnt)
> +	{
> +	  struct __spawn_action *action = &file_actions->__actions[cnt];
> +
> +	  /* Dup the pipe fd onto an unoccupied one to avoid any file
> +	     operation to clobber it.  */
> +	  if ((action->action.close_action.fd == p)
> +	      || (action->action.open_action.fd == p)
> +	      || (action->action.dup2_action.fd == p))
> +	    {
> +	      if ((ret = __dup (p)) < 0)
> +		goto fail;
> +	      p = ret;
> +	    }
> +

Rather late to the party, but I think there's a few bugs here. Most
importantly, dup() doesn't preserve the CLOEXEC flag, so if we do move
the write end around like this, the fd will not automatically be closed
during the exec, and hence the parent won't receive EOF and will block
in read() call until the child finally exits. That's easily fixable with
fcntl(p, F_DUPFD_CLOEXEC, 0). Pretty annoying to add a test for, though.

Then there's the condition. First of all I think it's a little ugly
reading the fd member from the "irrelevant" union members, even if gcc
probably optimizes this to a single comparison. But more importantly,
for the purpose of preventing clobbering our back channel to our parent,
in the case of dup2_action the target aka newfd is much more relevant.

That doesn't mean we don't also have to do some checking of the source
of a dup2 action: If we allow dup2'ing it, we'd end up with the same
problem as above; some copy of the write end of the pipe not being
closed by exec(). Arguably, from the application's POV, p is not an open
file descriptor, so we could fail the action (and hence the entire
spawn) with EBADF. In fact, I can't really see what else we could
do. For the close action, we can get by with just dupping the fd away
and letting the action close the original (effectively making the action
a no-op), but I also think that's an application bug.

So I think the pipe clobber checking should be done in each action
individually, since there's different logic that needs to be applied in
each case.

> +	  switch (action->tag)
> +	    {
> +	    case spawn_do_close:
> +	      if ((ret =
> +		   close_not_cancel (action->action.close_action.fd)) != 0)
> +		{
> +		  if (!have_fdlimit)
> +		    {
> +		      __getrlimit64 (RLIMIT_NOFILE, &fdlimit);
> +		      have_fdlimit = true;
> +		    }
> +
> +		  /* Signal errors only for file descriptors out of range.  */
> +		  if (action->action.close_action.fd < 0
> +		      || action->action.close_action.fd >= fdlimit.rlim_cur)
> +		    goto fail;
> +		}

I may have missed it in the original discussion, but what is the
rationale for this? POSIX says

  If the file_actions argument is not NULL, and specifies any close,
  dup2, or open actions to be performed, and if posix_spawn() or
  posix_spawnp() fails for any of the reasons that would cause close(),
  dup2(), or open() to fail, an error value shall be returned as
  described by close(), dup2(), and open(), respectively
  
> +	      break;
> +
> +	    case spawn_do_open:
> +	      {
> +		ret = open_not_cancel (action->action.open_action.path,
> +				       action->action.
> +				       open_action.oflag | O_LARGEFILE,
> +				       action->action.open_action.mode);
> +
> +		if (ret == -1)
> +		  goto fail;
> +
> +		int new_fd = ret;
> +
> +		/* Make sure the desired file descriptor is used.  */
> +		if (ret != action->action.open_action.fd)
> +		  {
> +		    if ((ret = __dup2 (new_fd, action->action.open_action.fd))
> +			!= action->action.open_action.fd)
> +		      goto fail;
> +
> +		    if ((ret = close_not_cancel (new_fd)) != 0)
> +		      goto fail;
> +		  }
> +	      }

This is also how I'd have implemented it, but POSIX explicitly says

  If fildes was already an open file descriptor, it shall be closed
  before the new file is opened.

Some, if slightly pathological, examples of how the difference could be
observed:

* ENFILE/EMFILE

* Some single-open special device; following POSIX,
  'action_open("/dev/watchdog", 47); action_open("/dev/watchdog", 47);'
  should both succeed if the first does, whereas the second will fail
  with the current code.

> +	      break;
> +
> +	    case spawn_do_dup2:
> +	      if ((ret = __dup2 (action->action.dup2_action.fd,
> +				 action->action.dup2_action.newfd))
> +		  != action->action.dup2_action.newfd)
> +		goto fail;
> +	      break;
> +	    }
> +	}
> +    }
> +
> +  /* Set the initial signal mask of the child if POSIX_SPAWN_SETSIGMASK
> +     is set, otherwise restore the previous one.  */
> +  __sigprocmask (SIG_SETMASK, (attr->__flags & POSIX_SPAWN_SETSIGMASK)
> +		 ? &attr->__ss : &args->oldmask, 0);
> +
> +  args->exec (args->file, args->argv, args->envp);
> +
> +  /* This is compatibility function required to enable posix_spawn run
> +     script without shebang definition for older posix_spawn versions
> +     (2.15).  */
> +  maybe_script_execute (args);
> +
> +  ret = -errno;
> +
> +fail:
> +  /* Since sizeof errno < PIPE_BUF, the write is atomic. */
> +  ret = -ret;
> +  if (ret)
> +    while (write_not_cancel (p, &ret, sizeof ret) < 0)
> +      continue;

This also seems wrong, at least if the intention is to write the proper
errno value back. That only happens if we reach fail: after failing to
exec - in most or all other cases, ret has the value -1, so we'd be
writing EPERM to our parent. So why not just pull the value from errno
in all cases?

Rasmus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-08-31 21:13   ` Rasmus Villemoes
@ 2016-08-31 22:08     ` Joseph Myers
  2016-09-01  9:28       ` Rasmus Villemoes
  2016-09-06 20:43     ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Rasmus Villemoes
  2016-09-14 19:37     ` Adhemerval Zanella
  2 siblings, 1 reply; 45+ messages in thread
From: Joseph Myers @ 2016-08-31 22:08 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: libc-alpha, Adhemerval Zanella

On Wed, 31 Aug 2016, Rasmus Villemoes wrote:

> Rather late to the party, but I think there's a few bugs here. Most
> importantly, dup() doesn't preserve the CLOEXEC flag, so if we do move
> the write end around like this, the fd will not automatically be closed
> during the exec, and hence the parent won't receive EOF and will block
> in read() call until the child finally exits. That's easily fixable with
> fcntl(p, F_DUPFD_CLOEXEC, 0). Pretty annoying to add a test for, though.

In Linux-specific code we can assume the presence of dup3 (which needs to 
be called by the name __dup3 in implementations of POSIX functions).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-08-31 22:08     ` Joseph Myers
@ 2016-09-01  9:28       ` Rasmus Villemoes
  2016-09-14 13:13         ` Adhemerval Zanella
  2016-09-20 20:25         ` Florian Weimer
  0 siblings, 2 replies; 45+ messages in thread
From: Rasmus Villemoes @ 2016-09-01  9:28 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha, Adhemerval Zanella

On 1 September 2016 at 00:08, Joseph Myers <joseph@codesourcery.com> wrote:
> On Wed, 31 Aug 2016, Rasmus Villemoes wrote:
>
>> Rather late to the party, but I think there's a few bugs here. Most
>> importantly, dup() doesn't preserve the CLOEXEC flag, so if we do move
>> the write end around like this, the fd will not automatically be closed
>> during the exec, and hence the parent won't receive EOF and will block
>> in read() call until the child finally exits. That's easily fixable with
>> fcntl(p, F_DUPFD_CLOEXEC, 0). Pretty annoying to add a test for, though.
>
> In Linux-specific code we can assume the presence of dup3 (which needs to
> be called by the name __dup3 in implementations of POSIX functions).

What I meant was that it is a little hard to write a regression test
for this bug, since we don't know beforehand what fds the pipe2() call
will give us, making it hard to create actions that is guaranteed to
exercise this code.

But, thinking a bit more about this, why do we even need a pipe to
ensure the child is gone, when we already set CLONE_VFORK? Can't we
just exploit the fact that we run in the same VM as the parent and
make the child write a non-zero error code to the spawn_args
structure? That would eliminate this problem entirely. Something like
below (sorry if gmail has whitespace-damaged it).

Rasmus



From: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Date: Thu, 1 Sep 2016 11:03:43 +0200
Subject: [PATCH] linux: spawni.c: simplify error reporting to parent

Using VFORK already ensures that the parent does not run until the
child has either exec'ed succesfully or called _exit. Hence we don't
need to read from a CLOEXEC pipe to ensure proper synchronization - we
just make explicit use of the fact the the child and parent run in the
same VM, so the child can write an error code to a field of the
posix_spawn_args struct instead of sending it through a pipe.

This eliminates some annoying bookkeeping that is necessary to avoid
the file actions from clobbering the write end of the pipe, and
getting rid of the pipe creation in the first place means fewer system
calls and fewer chanches for the spawn to fail (e.g. if we're close to
EMFILE).

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 sysdeps/unix/sysv/linux/spawni.c | 63 ++++++++++++----------------------------
 1 file changed, 18 insertions(+), 45 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index bb3eecf..a3c4175 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -44,11 +44,12 @@
    3. Child must synchronize with parent to enforce 2. and to possible
       return execv issues.

-   The first issue is solved by blocking all signals in child, even the
-   NPTL-internal ones (SIGCANCEL and SIGSETXID).  The second and third issue
-   is done by a stack allocation in parent and a synchronization with using
-   a pipe or waitpid (in case or error).  The pipe has the advantage of
-   allowing the child the communicate an exec error.  */
+   The first issue is solved by blocking all signals in child, even
+   the NPTL-internal ones (SIGCANCEL and SIGSETXID).  The second and
+   third issue is done by a stack allocation in parent, and by using a
+   field in struct spawn_args where the child can write an error
+   code. CLONE_VFORK ensures that the parent does not run until the
+   child has either exec'ed successfully or exited.  */


 /* The Unix standard contains a long explanation of the way to signal
@@ -79,7 +80,6 @@

 struct posix_spawn_args
 {
-  int pipe[2];
   sigset_t oldmask;
   const char *file;
   int (*exec) (const char *, char *const *, char *const *);
@@ -89,6 +89,7 @@ struct posix_spawn_args
   ptrdiff_t argc;
   char *const *envp;
   int xflags;
+  int err;
 };

 /* Older version requires that shell script without shebang definition
@@ -125,11 +126,8 @@ __spawni_child (void *arguments)
   struct posix_spawn_args *args = arguments;
   const posix_spawnattr_t *restrict attr = args->attr;
   const posix_spawn_file_actions_t *file_actions = args->fa;
-  int p = args->pipe[1];
   int ret;

-  close_not_cancel (args->pipe[0]);
-
   /* The child must ensure that no signal handler are enabled because it shared
      memory with parent, so the signal disposition must be either SIG_DFL or
      SIG_IGN.  It does by iterating over all signals and although it could
@@ -203,17 +201,6 @@ __spawni_child (void *arguments)
     {
       struct __spawn_action *action = &file_actions->__actions[cnt];

-      /* Dup the pipe fd onto an unoccupied one to avoid any file
-         operation to clobber it.  */
-      if ((action->action.close_action.fd == p)
-          || (action->action.open_action.fd == p)
-          || (action->action.dup2_action.fd == p))
-        {
-          if ((ret = __dup (p)) < 0)
-        goto fail;
-          p = ret;
-        }
-
       switch (action->tag)
         {
         case spawn_do_close:
@@ -280,14 +267,12 @@ __spawni_child (void *arguments)
      (2.15).  */
   maybe_script_execute (args);

-  ret = -errno;
-
 fail:
-  /* Since sizeof errno < PIPE_BUF, the write is atomic. */
-  ret = -ret;
-  if (ret)
-    while (write_not_cancel (p, &ret, sizeof ret) < 0)
-      continue;
+  /* errno should have an appropriate non-zero value, but make sure
+     that's the case so that our parent knows we failed to
+     exec. There's no EUNKNOWN or EINTERNALBUG, so we use a value
+     which is clearly bogus.  */
+  args->err = errno ? : EHOSTDOWN;
   _exit (SPAWN_ERROR);
 }

@@ -304,9 +289,6 @@ __spawnix (pid_t * pid, const char *file,
   struct posix_spawn_args args;
   int ec;

-  if (__pipe2 (args.pipe, O_CLOEXEC))
-    return errno;
-
   /* To avoid imposing hard limits on posix_spawn{p} the total number of
      arguments is first calculated to allocate a mmap to hold all possible
      values.  */
@@ -333,15 +315,12 @@ __spawnix (pid_t * pid, const char *file,
   void *stack = __mmap (NULL, stack_size, prot,
             MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
   if (__glibc_unlikely (stack == MAP_FAILED))
-    {
-      close_not_cancel (args.pipe[0]);
-      close_not_cancel (args.pipe[1]);
-      return errno;
-    }
+    return errno;

   /* Disable asynchronous cancellation.  */
   int cs = LIBC_CANCEL_ASYNC ();

+  args.err = 0;
   args.file = file;
   args.exec = exec;
   args.fa = file_actions;
@@ -355,9 +334,8 @@ __spawnix (pid_t * pid, const char *file,

   /* The clone flags used will create a new child that will run in the same
      memory space (CLONE_VM) and the execution of calling thread will be
-     suspend until the child calls execve or _exit.  These condition as
-     signal below either by pipe write (_exit with SPAWN_ERROR) or
-     a successful execve.
+     suspend until the child calls execve or _exit.
+
      Also since the calling thread execution will be suspend, there is not
      need for CLONE_SETTLS.  Although parent and child share the same TLS
      namespace, there will be no concurrent access for TLS variables (errno
@@ -365,13 +343,10 @@ __spawnix (pid_t * pid, const char *file,
   new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
            CLONE_VM | CLONE_VFORK | SIGCHLD, &args);

-  close_not_cancel (args.pipe[1]);
-
   if (new_pid > 0)
     {
-      if (__read (args.pipe[0], &ec, sizeof ec) != sizeof ec)
-    ec = 0;
-      else
+      ec = args.err;
+      if (ec != 0)
     __waitpid (new_pid, NULL, 0);
     }
   else
@@ -379,8 +354,6 @@ __spawnix (pid_t * pid, const char *file,

   __munmap (stack, stack_size);

-  close_not_cancel (args.pipe[0]);
-
   if ((ec == 0) && (pid != NULL))
     *pid = new_pid;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-08-31 21:13   ` Rasmus Villemoes
  2016-08-31 22:08     ` Joseph Myers
@ 2016-09-06 20:43     ` Rasmus Villemoes
  2016-09-14 19:37     ` Adhemerval Zanella
  2 siblings, 0 replies; 45+ messages in thread
From: Rasmus Villemoes @ 2016-09-06 20:43 UTC (permalink / raw)
  To: libc-alpha; +Cc: Adhemerval Zanella

One more thing: The signal blocking also seems to be flawed; sigprocmask
silently removes SIGCANCEL and SIGSETXID before doing the syscall.

Rasmus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-09-01  9:28       ` Rasmus Villemoes
@ 2016-09-14 13:13         ` Adhemerval Zanella
  2016-09-14 18:58           ` Rasmus Villemoes
  2016-09-20 20:25         ` Florian Weimer
  1 sibling, 1 reply; 45+ messages in thread
From: Adhemerval Zanella @ 2016-09-14 13:13 UTC (permalink / raw)
  To: Rasmus Villemoes, Joseph Myers; +Cc: libc-alpha



On 01/09/2016 06:28, Rasmus Villemoes wrote:
> On 1 September 2016 at 00:08, Joseph Myers <joseph@codesourcery.com> wrote:
>> On Wed, 31 Aug 2016, Rasmus Villemoes wrote:
>>
>>> Rather late to the party, but I think there's a few bugs here. Most
>>> importantly, dup() doesn't preserve the CLOEXEC flag, so if we do move
>>> the write end around like this, the fd will not automatically be closed
>>> during the exec, and hence the parent won't receive EOF and will block
>>> in read() call until the child finally exits. That's easily fixable with
>>> fcntl(p, F_DUPFD_CLOEXEC, 0). Pretty annoying to add a test for, though.
>>
>> In Linux-specific code we can assume the presence of dup3 (which needs to
>> be called by the name __dup3 in implementations of POSIX functions).
> 
> What I meant was that it is a little hard to write a regression test
> for this bug, since we don't know beforehand what fds the pipe2() call
> will give us, making it hard to create actions that is guaranteed to
> exercise this code.
> 
> But, thinking a bit more about this, why do we even need a pipe to
> ensure the child is gone, when we already set CLONE_VFORK? Can't we
> just exploit the fact that we run in the same VM as the parent and
> make the child write a non-zero error code to the spawn_args
> structure? That would eliminate this problem entirely. Something like
> below (sorry if gmail has whitespace-damaged it).

I think patch is ok and fixes the issues you noted about using the pipe2
call to signal the execv issue.  It just have one remark about it below.


> @@ -280,14 +267,12 @@ __spawni_child (void *arguments)
>       (2.15).  */
>    maybe_script_execute (args);
> 
> -  ret = -errno;
> -
>  fail:
> -  /* Since sizeof errno < PIPE_BUF, the write is atomic. */
> -  ret = -ret;
> -  if (ret)
> -    while (write_not_cancel (p, &ret, sizeof ret) < 0)
> -      continue;
> +  /* errno should have an appropriate non-zero value, but make sure
> +     that's the case so that our parent knows we failed to
> +     exec. There's no EUNKNOWN or EINTERNALBUG, so we use a value
> +     which is clearly bogus.  */
> +  args->err = errno ? : EHOSTDOWN;
>    _exit (SPAWN_ERROR);
>  }

I would prefer an assert call here to ensure errno is non zero for
failure case instead of reporting a bogus errno to program.  Since
this unexpected issue is either something wrong being reported from
kernel or an underlying bug it would be better to fail at once than
instead to document on manuals that this is potentially an unknown
issue.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-09-14 13:13         ` Adhemerval Zanella
@ 2016-09-14 18:58           ` Rasmus Villemoes
  2016-09-14 19:59             ` Adhemerval Zanella
  0 siblings, 1 reply; 45+ messages in thread
From: Rasmus Villemoes @ 2016-09-14 18:58 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Joseph Myers, libc-alpha

On Wed, Sep 14 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

> I think patch is ok and fixes the issues you noted about using the pipe2
> call to signal the execv issue.  It just have one remark about it below.
>
>
>> @@ -280,14 +267,12 @@ __spawni_child (void *arguments)
>>       (2.15).  */
>>    maybe_script_execute (args);
>> 
>> -  ret = -errno;
>> -
>>  fail:
>> -  /* Since sizeof errno < PIPE_BUF, the write is atomic. */
>> -  ret = -ret;
>> -  if (ret)
>> -    while (write_not_cancel (p, &ret, sizeof ret) < 0)
>> -      continue;
>> +  /* errno should have an appropriate non-zero value, but make sure
>> +     that's the case so that our parent knows we failed to
>> +     exec. There's no EUNKNOWN or EINTERNALBUG, so we use a value
>> +     which is clearly bogus.  */
>> +  args->err = errno ? : EHOSTDOWN;
>>    _exit (SPAWN_ERROR);
>>  }
>
> I would prefer an assert call here to ensure errno is non zero for
> failure case instead of reporting a bogus errno to program.  Since
> this unexpected issue is either something wrong being reported from
> kernel or an underlying bug it would be better to fail at once than
> instead to document on manuals that this is potentially an unknown
> issue.

But asserting/aborting in the child doesn't really solve the problem; we
still need to write some non-zero value for the parent to pick up once
we're gone. We could of course write -1 to indicate this really
exceptional situation, but that still leaves deciding how to handle that
in the parent. IMO an assert/abort is a little too harsh, but then the
parent has to return _some_ error code to its caller.

Rasmus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-08-31 21:13   ` Rasmus Villemoes
  2016-08-31 22:08     ` Joseph Myers
  2016-09-06 20:43     ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Rasmus Villemoes
@ 2016-09-14 19:37     ` Adhemerval Zanella
  2 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-09-14 19:37 UTC (permalink / raw)
  To: Rasmus Villemoes, libc-alpha



On 31/08/2016 18:13, Rasmus Villemoes wrote:
>> +	  switch (action->tag)
>> +	    {
>> +	    case spawn_do_close:
>> +	      if ((ret =
>> +		   close_not_cancel (action->action.close_action.fd)) != 0)
>> +		{
>> +		  if (!have_fdlimit)
>> +		    {
>> +		      __getrlimit64 (RLIMIT_NOFILE, &fdlimit);
>> +		      have_fdlimit = true;
>> +		    }
>> +
>> +		  /* Signal errors only for file descriptors out of range.  */
>> +		  if (action->action.close_action.fd < 0
>> +		      || action->action.close_action.fd >= fdlimit.rlim_cur)
>> +		    goto fail;
>> +		}
> 
> I may have missed it in the original discussion, but what is the
> rationale for this? POSIX says
> 
>   If the file_actions argument is not NULL, and specifies any close,
>   dup2, or open actions to be performed, and if posix_spawn() or
>   posix_spawnp() fails for any of the reasons that would cause close(),
>   dup2(), or open() to fail, an error value shall be returned as
>   described by close(), dup2(), and open(), respectively

I haven't joined the original discussion also (if any) for old implementation
behaviour, but I think this conforms to austin group issue #370 [1] where it
changed:

fails for any of the reasons that would cause close( ), dup2( ), or open( ) to
fail, an error value shall be returned

to:

fails for any of the reasons that would cause close( ), dup2( ), or open( ) to
fail, other than attempting a close( ) on a file descriptor that is in range
but already closed, an error value shall be returned

The documentation at [2] seems to not have this updates issues description.

[1] http://austingroupbugs.net/view.php?id=370
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html

>   
>> +	      break;
>> +
>> +	    case spawn_do_open:
>> +	      {
>> +		ret = open_not_cancel (action->action.open_action.path,
>> +				       action->action.
>> +				       open_action.oflag | O_LARGEFILE,
>> +				       action->action.open_action.mode);
>> +
>> +		if (ret == -1)
>> +		  goto fail;
>> +
>> +		int new_fd = ret;
>> +
>> +		/* Make sure the desired file descriptor is used.  */
>> +		if (ret != action->action.open_action.fd)
>> +		  {
>> +		    if ((ret = __dup2 (new_fd, action->action.open_action.fd))
>> +			!= action->action.open_action.fd)
>> +		      goto fail;
>> +
>> +		    if ((ret = close_not_cancel (new_fd)) != 0)
>> +		      goto fail;
>> +		  }
>> +	      }
> 
> This is also how I'd have implemented it, but POSIX explicitly says
> 
>   If fildes was already an open file descriptor, it shall be closed
>   before the new file is opened.
> 
> Some, if slightly pathological, examples of how the difference could be
> observed:
> 
> * ENFILE/EMFILE

I think this should be an issue iff the program tries call posix_spawn 
with a total number of file descriptors equal to RLIMIT_NOFILE.

> 
> * Some single-open special device; following POSIX,
>   'action_open("/dev/watchdog", 47); action_open("/dev/watchdog", 47);'
>   should both succeed if the first does, whereas the second will fail
>   with the current code.

Right, this should more problematic to handle.  I think an option is to
add a 'fcntl(fd, F_GETFD) != -1 || errno != EBADF' check before the open
syscall and close the file descriptor in this case.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-09-14 18:58           ` Rasmus Villemoes
@ 2016-09-14 19:59             ` Adhemerval Zanella
  0 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-09-14 19:59 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Joseph Myers, libc-alpha



On 14/09/2016 15:58, Rasmus Villemoes wrote:
> On Wed, Sep 14 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
>> I think patch is ok and fixes the issues you noted about using the pipe2
>> call to signal the execv issue.  It just have one remark about it below.
>>
>>
>>> @@ -280,14 +267,12 @@ __spawni_child (void *arguments)
>>>       (2.15).  */
>>>    maybe_script_execute (args);
>>>
>>> -  ret = -errno;
>>> -
>>>  fail:
>>> -  /* Since sizeof errno < PIPE_BUF, the write is atomic. */
>>> -  ret = -ret;
>>> -  if (ret)
>>> -    while (write_not_cancel (p, &ret, sizeof ret) < 0)
>>> -      continue;
>>> +  /* errno should have an appropriate non-zero value, but make sure
>>> +     that's the case so that our parent knows we failed to
>>> +     exec. There's no EUNKNOWN or EINTERNALBUG, so we use a value
>>> +     which is clearly bogus.  */
>>> +  args->err = errno ? : EHOSTDOWN;
>>>    _exit (SPAWN_ERROR);
>>>  }
>>
>> I would prefer an assert call here to ensure errno is non zero for
>> failure case instead of reporting a bogus errno to program.  Since
>> this unexpected issue is either something wrong being reported from
>> kernel or an underlying bug it would be better to fail at once than
>> instead to document on manuals that this is potentially an unknown
>> issue.
> 
> But asserting/aborting in the child doesn't really solve the problem; we
> still need to write some non-zero value for the parent to pick up once
> we're gone. We could of course write -1 to indicate this really
> exceptional situation, but that still leaves deciding how to handle that
> in the parent. IMO an assert/abort is a little too harsh, but then the
> parent has to return _some_ error code to its caller.

My idea is to in fact not return to parent, but rather terminate program
execution in face of an unknown issue.  However, I do not have a strong
opinion if it should be really the desirable behaviour and thinking twice
it does seems that aborting program is too harsh.  I think -1 would be
suffice.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-09-01  9:28       ` Rasmus Villemoes
  2016-09-14 13:13         ` Adhemerval Zanella
@ 2016-09-20 20:25         ` Florian Weimer
  2016-09-20 20:54           ` Rasmus Villemoes
  2016-09-20 21:01           ` [PATCH] linux: spawni.c: simplify error reporting to parent Rasmus Villemoes
  1 sibling, 2 replies; 45+ messages in thread
From: Florian Weimer @ 2016-09-20 20:25 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Joseph Myers, libc-alpha, Adhemerval Zanella

* Rasmus Villemoes:

> +  /* errno should have an appropriate non-zero value, but make sure
> +     that's the case so that our parent knows we failed to
> +     exec. There's no EUNKNOWN or EINTERNALBUG, so we use a value
> +     which is clearly bogus.  */
> +  args->err = errno ? : EHOSTDOWN;
>    _exit (SPAWN_ERROR);
>  }

I think ECHILD is probably a better fake error code.

You should set args->err to 0 on success ...

> +  args.err = 0;

... and initialize it to -1.

>    if (new_pid > 0)
>      {
> -      if (__read (args.pipe[0], &ec, sizeof ec) != sizeof ec)
> -    ec = 0;
> -      else
> +      ec = args.err;
> +      if (ec != 0)
>      __waitpid (new_pid, NULL, 0);
>      }

You should check (assert?) here that args.err is not -1.  Otherwise we
will never notice if the page is not shared between parent and child,
and the error reporting mechanism does not work.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation
  2016-09-20 20:25         ` Florian Weimer
@ 2016-09-20 20:54           ` Rasmus Villemoes
  2016-09-20 21:01           ` [PATCH] linux: spawni.c: simplify error reporting to parent Rasmus Villemoes
  1 sibling, 0 replies; 45+ messages in thread
From: Rasmus Villemoes @ 2016-09-20 20:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Joseph Myers, libc-alpha, Adhemerval Zanella

On Tue, Sep 20 2016, Florian Weimer <fw@deneb.enyo.de> wrote:

> * Rasmus Villemoes:
>
>> +  /* errno should have an appropriate non-zero value, but make sure
>> +     that's the case so that our parent knows we failed to
>> +     exec. There's no EUNKNOWN or EINTERNALBUG, so we use a value
>> +     which is clearly bogus.  */
>> +  args->err = errno ? : EHOSTDOWN;
>>    _exit (SPAWN_ERROR);
>>  }
>
> I think ECHILD is probably a better fake error code.

Yeah, that's probably ok. It's consistent with its use in pclose() where
the posix description reads "The status of the child process could not
be obtained". We just use it in the sense "something went wrong, we just
don't know what".

I'd really wish EINTERNALBUG existed.

> You should set args->err to 0 on success ...
>
>> +  args.err = 0;
>
> ... and initialize it to -1.
>
>>    if (new_pid > 0)
>>      {
>> -      if (__read (args.pipe[0], &ec, sizeof ec) != sizeof ec)
>> -    ec = 0;
>> -      else
>> +      ec = args.err;
>> +      if (ec != 0)
>>      __waitpid (new_pid, NULL, 0);
>>      }
>
> You should check (assert?) here that args.err is not -1.  Otherwise we
> will never notice if the page is not shared between parent and child,
> and the error reporting mechanism does not work.

Good point. I'll send an updated patch in a moment.

Rasmus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-20 20:25         ` Florian Weimer
  2016-09-20 20:54           ` Rasmus Villemoes
@ 2016-09-20 21:01           ` Rasmus Villemoes
  2016-09-22 20:54             ` Adhemerval Zanella
                               ` (2 more replies)
  1 sibling, 3 replies; 45+ messages in thread
From: Rasmus Villemoes @ 2016-09-20 21:01 UTC (permalink / raw)
  To: libc-alpha
  Cc: Florian Weimer, Adhemerval Zanella, Joseph Myers, Rasmus Villemoes

Using VFORK already ensures that the parent does not run until the
child has either exec'ed succesfully or called _exit. Hence we don't
need to read from a CLOEXEC pipe to ensure proper synchronization - we
just make explicit use of the fact the the child and parent run in the
same VM, so the child can write an error code to a field of the
posix_spawn_args struct instead of sending it through a pipe.

To ensure that this mechanism really works, the parent initializes the
field to -1 and the child writes 0 before execing.

This eliminates some annoying bookkeeping that is necessary to avoid
the file actions from clobbering the write end of the pipe, and
getting rid of the pipe creation in the first place means fewer system
calls (four in the parent, usually one in the child) and fewer
chanches for the spawn to fail (e.g. if we're close to EMFILE).

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 sysdeps/unix/sysv/linux/spawni.c | 71 ++++++++++++++--------------------------
 1 file changed, 25 insertions(+), 46 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index bb3eecf..c0e15ac 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -17,6 +17,7 @@
    <http://www.gnu.org/licenses/>.  */
 
 #include <spawn.h>
+#include <assert.h>
 #include <fcntl.h>
 #include <paths.h>
 #include <string.h>
@@ -44,11 +45,12 @@
    3. Child must synchronize with parent to enforce 2. and to possible
       return execv issues.
 
-   The first issue is solved by blocking all signals in child, even the
-   NPTL-internal ones (SIGCANCEL and SIGSETXID).  The second and third issue
-   is done by a stack allocation in parent and a synchronization with using
-   a pipe or waitpid (in case or error).  The pipe has the advantage of
-   allowing the child the communicate an exec error.  */
+   The first issue is solved by blocking all signals in child, even
+   the NPTL-internal ones (SIGCANCEL and SIGSETXID).  The second and
+   third issue is done by a stack allocation in parent, and by using a
+   field in struct spawn_args where the child can write an error
+   code. CLONE_VFORK ensures that the parent does not run until the
+   child has either exec'ed successfully or exited.  */
 
 
 /* The Unix standard contains a long explanation of the way to signal
@@ -79,7 +81,6 @@
 
 struct posix_spawn_args
 {
-  int pipe[2];
   sigset_t oldmask;
   const char *file;
   int (*exec) (const char *, char *const *, char *const *);
@@ -89,6 +90,7 @@ struct posix_spawn_args
   ptrdiff_t argc;
   char *const *envp;
   int xflags;
+  int err;
 };
 
 /* Older version requires that shell script without shebang definition
@@ -125,11 +127,8 @@ __spawni_child (void *arguments)
   struct posix_spawn_args *args = arguments;
   const posix_spawnattr_t *restrict attr = args->attr;
   const posix_spawn_file_actions_t *file_actions = args->fa;
-  int p = args->pipe[1];
   int ret;
 
-  close_not_cancel (args->pipe[0]);
-
   /* The child must ensure that no signal handler are enabled because it shared
      memory with parent, so the signal disposition must be either SIG_DFL or
      SIG_IGN.  It does by iterating over all signals and although it could
@@ -203,17 +202,6 @@ __spawni_child (void *arguments)
 	{
 	  struct __spawn_action *action = &file_actions->__actions[cnt];
 
-	  /* Dup the pipe fd onto an unoccupied one to avoid any file
-	     operation to clobber it.  */
-	  if ((action->action.close_action.fd == p)
-	      || (action->action.open_action.fd == p)
-	      || (action->action.dup2_action.fd == p))
-	    {
-	      if ((ret = __dup (p)) < 0)
-		goto fail;
-	      p = ret;
-	    }
-
 	  switch (action->tag)
 	    {
 	    case spawn_do_close:
@@ -273,6 +261,7 @@ __spawni_child (void *arguments)
   __sigprocmask (SIG_SETMASK, (attr->__flags & POSIX_SPAWN_SETSIGMASK)
 		 ? &attr->__ss : &args->oldmask, 0);
 
+  args->err = 0;
   args->exec (args->file, args->argv, args->envp);
 
   /* This is compatibility function required to enable posix_spawn run
@@ -280,14 +269,13 @@ __spawni_child (void *arguments)
      (2.15).  */
   maybe_script_execute (args);
 
-  ret = -errno;
-
 fail:
-  /* Since sizeof errno < PIPE_BUF, the write is atomic. */
-  ret = -ret;
-  if (ret)
-    while (write_not_cancel (p, &ret, sizeof ret) < 0)
-      continue;
+  /* errno should have an appropriate non-zero value; otherwise,
+     there's a bug in glibc or the kernel.  For lack of an error code
+     (EINTERNALBUG) describing that, use ECHILD.  Another option would
+     be to set args->err to some negative sentinel and have the parent
+     abort(), but that seems needlessly harsh.  */
+  args->err = errno ? : ECHILD;
   _exit (SPAWN_ERROR);
 }
 
@@ -304,9 +292,6 @@ __spawnix (pid_t * pid, const char *file,
   struct posix_spawn_args args;
   int ec;
 
-  if (__pipe2 (args.pipe, O_CLOEXEC))
-    return errno;
-
   /* To avoid imposing hard limits on posix_spawn{p} the total number of
      arguments is first calculated to allocate a mmap to hold all possible
      values.  */
@@ -333,15 +318,14 @@ __spawnix (pid_t * pid, const char *file,
   void *stack = __mmap (NULL, stack_size, prot,
 			MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
   if (__glibc_unlikely (stack == MAP_FAILED))
-    {
-      close_not_cancel (args.pipe[0]);
-      close_not_cancel (args.pipe[1]);
-      return errno;
-    }
+    return errno;
 
   /* Disable asynchronous cancellation.  */
   int cs = LIBC_CANCEL_ASYNC ();
 
+  /* Child must set args.err to something non-negative - we rely on
+     the parent and child sharing VM.  */
+  args.err = -1;
   args.file = file;
   args.exec = exec;
   args.fa = file_actions;
@@ -355,9 +339,8 @@ __spawnix (pid_t * pid, const char *file,
 
   /* The clone flags used will create a new child that will run in the same
      memory space (CLONE_VM) and the execution of calling thread will be
-     suspend until the child calls execve or _exit.  These condition as
-     signal below either by pipe write (_exit with SPAWN_ERROR) or
-     a successful execve.
+     suspend until the child calls execve or _exit.
+
      Also since the calling thread execution will be suspend, there is not
      need for CLONE_SETTLS.  Although parent and child share the same TLS
      namespace, there will be no concurrent access for TLS variables (errno
@@ -365,22 +348,18 @@ __spawnix (pid_t * pid, const char *file,
   new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
 		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
 
-  close_not_cancel (args.pipe[1]);
-
   if (new_pid > 0)
     {
-      if (__read (args.pipe[0], &ec, sizeof ec) != sizeof ec)
-	ec = 0;
-      else
-	__waitpid (new_pid, NULL, 0);
+      ec = args.err;
+      assert (ec >= 0);
+      if (ec != 0)
+	  __waitpid (new_pid, NULL, 0);
     }
   else
     ec = -new_pid;
 
   __munmap (stack, stack_size);
 
-  close_not_cancel (args.pipe[0]);
-
   if ((ec == 0) && (pid != NULL))
     *pid = new_pid;
 
-- 
2.1.4

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-20 21:01           ` [PATCH] linux: spawni.c: simplify error reporting to parent Rasmus Villemoes
@ 2016-09-22 20:54             ` Adhemerval Zanella
  2016-09-23  5:21             ` Florian Weimer
  2016-09-28 14:14             ` Rich Felker
  2 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-09-22 20:54 UTC (permalink / raw)
  To: libc-alpha

LGTM, I think it addressed all the comments.

On 20/09/2016 18:01, Rasmus Villemoes wrote:
> Using VFORK already ensures that the parent does not run until the
> child has either exec'ed succesfully or called _exit. Hence we don't
> need to read from a CLOEXEC pipe to ensure proper synchronization - we
> just make explicit use of the fact the the child and parent run in the
> same VM, so the child can write an error code to a field of the
> posix_spawn_args struct instead of sending it through a pipe.
> 
> To ensure that this mechanism really works, the parent initializes the
> field to -1 and the child writes 0 before execing.
> 
> This eliminates some annoying bookkeeping that is necessary to avoid
> the file actions from clobbering the write end of the pipe, and
> getting rid of the pipe creation in the first place means fewer system
> calls (four in the parent, usually one in the child) and fewer
> chanches for the spawn to fail (e.g. if we're close to EMFILE).
> 
> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
> ---
>  sysdeps/unix/sysv/linux/spawni.c | 71 ++++++++++++++--------------------------
>  1 file changed, 25 insertions(+), 46 deletions(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> index bb3eecf..c0e15ac 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c
> @@ -17,6 +17,7 @@
>     <http://www.gnu.org/licenses/>.  */
>  
>  #include <spawn.h>
> +#include <assert.h>
>  #include <fcntl.h>
>  #include <paths.h>
>  #include <string.h>
> @@ -44,11 +45,12 @@
>     3. Child must synchronize with parent to enforce 2. and to possible
>        return execv issues.
>  
> -   The first issue is solved by blocking all signals in child, even the
> -   NPTL-internal ones (SIGCANCEL and SIGSETXID).  The second and third issue
> -   is done by a stack allocation in parent and a synchronization with using
> -   a pipe or waitpid (in case or error).  The pipe has the advantage of
> -   allowing the child the communicate an exec error.  */
> +   The first issue is solved by blocking all signals in child, even
> +   the NPTL-internal ones (SIGCANCEL and SIGSETXID).  The second and
> +   third issue is done by a stack allocation in parent, and by using a
> +   field in struct spawn_args where the child can write an error
> +   code. CLONE_VFORK ensures that the parent does not run until the
> +   child has either exec'ed successfully or exited.  */
>  
>  
>  /* The Unix standard contains a long explanation of the way to signal
> @@ -79,7 +81,6 @@
>  
>  struct posix_spawn_args
>  {
> -  int pipe[2];
>    sigset_t oldmask;
>    const char *file;
>    int (*exec) (const char *, char *const *, char *const *);
> @@ -89,6 +90,7 @@ struct posix_spawn_args
>    ptrdiff_t argc;
>    char *const *envp;
>    int xflags;
> +  int err;
>  };
>  
>  /* Older version requires that shell script without shebang definition
> @@ -125,11 +127,8 @@ __spawni_child (void *arguments)
>    struct posix_spawn_args *args = arguments;
>    const posix_spawnattr_t *restrict attr = args->attr;
>    const posix_spawn_file_actions_t *file_actions = args->fa;
> -  int p = args->pipe[1];
>    int ret;
>  
> -  close_not_cancel (args->pipe[0]);
> -
>    /* The child must ensure that no signal handler are enabled because it shared
>       memory with parent, so the signal disposition must be either SIG_DFL or
>       SIG_IGN.  It does by iterating over all signals and although it could
> @@ -203,17 +202,6 @@ __spawni_child (void *arguments)
>  	{
>  	  struct __spawn_action *action = &file_actions->__actions[cnt];
>  
> -	  /* Dup the pipe fd onto an unoccupied one to avoid any file
> -	     operation to clobber it.  */
> -	  if ((action->action.close_action.fd == p)
> -	      || (action->action.open_action.fd == p)
> -	      || (action->action.dup2_action.fd == p))
> -	    {
> -	      if ((ret = __dup (p)) < 0)
> -		goto fail;
> -	      p = ret;
> -	    }
> -
>  	  switch (action->tag)
>  	    {
>  	    case spawn_do_close:
> @@ -273,6 +261,7 @@ __spawni_child (void *arguments)
>    __sigprocmask (SIG_SETMASK, (attr->__flags & POSIX_SPAWN_SETSIGMASK)
>  		 ? &attr->__ss : &args->oldmask, 0);
>  
> +  args->err = 0;
>    args->exec (args->file, args->argv, args->envp);
>  
>    /* This is compatibility function required to enable posix_spawn run
> @@ -280,14 +269,13 @@ __spawni_child (void *arguments)
>       (2.15).  */
>    maybe_script_execute (args);
>  
> -  ret = -errno;
> -
>  fail:
> -  /* Since sizeof errno < PIPE_BUF, the write is atomic. */
> -  ret = -ret;
> -  if (ret)
> -    while (write_not_cancel (p, &ret, sizeof ret) < 0)
> -      continue;
> +  /* errno should have an appropriate non-zero value; otherwise,
> +     there's a bug in glibc or the kernel.  For lack of an error code
> +     (EINTERNALBUG) describing that, use ECHILD.  Another option would
> +     be to set args->err to some negative sentinel and have the parent
> +     abort(), but that seems needlessly harsh.  */
> +  args->err = errno ? : ECHILD;
>    _exit (SPAWN_ERROR);
>  }
>  
> @@ -304,9 +292,6 @@ __spawnix (pid_t * pid, const char *file,
>    struct posix_spawn_args args;
>    int ec;
>  
> -  if (__pipe2 (args.pipe, O_CLOEXEC))
> -    return errno;
> -
>    /* To avoid imposing hard limits on posix_spawn{p} the total number of
>       arguments is first calculated to allocate a mmap to hold all possible
>       values.  */
> @@ -333,15 +318,14 @@ __spawnix (pid_t * pid, const char *file,
>    void *stack = __mmap (NULL, stack_size, prot,
>  			MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
>    if (__glibc_unlikely (stack == MAP_FAILED))
> -    {
> -      close_not_cancel (args.pipe[0]);
> -      close_not_cancel (args.pipe[1]);
> -      return errno;
> -    }
> +    return errno;
>  
>    /* Disable asynchronous cancellation.  */
>    int cs = LIBC_CANCEL_ASYNC ();
>  
> +  /* Child must set args.err to something non-negative - we rely on
> +     the parent and child sharing VM.  */
> +  args.err = -1;
>    args.file = file;
>    args.exec = exec;
>    args.fa = file_actions;
> @@ -355,9 +339,8 @@ __spawnix (pid_t * pid, const char *file,
>  
>    /* The clone flags used will create a new child that will run in the same
>       memory space (CLONE_VM) and the execution of calling thread will be
> -     suspend until the child calls execve or _exit.  These condition as
> -     signal below either by pipe write (_exit with SPAWN_ERROR) or
> -     a successful execve.
> +     suspend until the child calls execve or _exit.
> +
>       Also since the calling thread execution will be suspend, there is not
>       need for CLONE_SETTLS.  Although parent and child share the same TLS
>       namespace, there will be no concurrent access for TLS variables (errno
> @@ -365,22 +348,18 @@ __spawnix (pid_t * pid, const char *file,
>    new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
>  		   CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
>  
> -  close_not_cancel (args.pipe[1]);
> -
>    if (new_pid > 0)
>      {
> -      if (__read (args.pipe[0], &ec, sizeof ec) != sizeof ec)
> -	ec = 0;
> -      else
> -	__waitpid (new_pid, NULL, 0);
> +      ec = args.err;
> +      assert (ec >= 0);
> +      if (ec != 0)
> +	  __waitpid (new_pid, NULL, 0);
>      }
>    else
>      ec = -new_pid;
>  
>    __munmap (stack, stack_size);
>  
> -  close_not_cancel (args.pipe[0]);
> -
>    if ((ec == 0) && (pid != NULL))
>      *pid = new_pid;
>  
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-20 21:01           ` [PATCH] linux: spawni.c: simplify error reporting to parent Rasmus Villemoes
  2016-09-22 20:54             ` Adhemerval Zanella
@ 2016-09-23  5:21             ` Florian Weimer
  2016-09-23 19:09               ` Rasmus Villemoes
  2016-09-28 14:14             ` Rich Felker
  2 siblings, 1 reply; 45+ messages in thread
From: Florian Weimer @ 2016-09-23  5:21 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: libc-alpha, Adhemerval Zanella, Joseph Myers

* Rasmus Villemoes:

> +      ec = args.err;
> +      assert (ec >= 0);
> +      if (ec != 0)
> +	  __waitpid (new_pid, NULL, 0);

One minor issue: Now that the variable name “ec” appears in an
assertion, it is a good idea to rename it to “error_code_from_child“
or something similar, so that the assertion message is more
meaningful.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-23  5:21             ` Florian Weimer
@ 2016-09-23 19:09               ` Rasmus Villemoes
  2016-09-23 19:28                 ` Adhemerval Zanella
  0 siblings, 1 reply; 45+ messages in thread
From: Rasmus Villemoes @ 2016-09-23 19:09 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Adhemerval Zanella, Joseph Myers

On Fri, Sep 23 2016, Florian Weimer <fw@deneb.enyo.de> wrote:

> * Rasmus Villemoes:
>
>> +      ec = args.err;
>> +      assert (ec >= 0);
>> +      if (ec != 0)
>> +	  __waitpid (new_pid, NULL, 0);
>
> One minor issue: Now that the variable name “ec” appears in an
> assertion, it is a good idea to rename it to “error_code_from_child“
> or something similar, so that the assertion message is more
> meaningful.

IMO, that's already addressed by the comments above the initialization
in the parent and exit path in the child. An assert message would have
to be overly verbose to make sense without the context of the code it
appears in. In any case, error_code_from_child is a bad name if we hit
the "ec = -new_pid" branch.

Rasmus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-23 19:09               ` Rasmus Villemoes
@ 2016-09-23 19:28                 ` Adhemerval Zanella
  2016-09-23 20:37                   ` Florian Weimer
  0 siblings, 1 reply; 45+ messages in thread
From: Adhemerval Zanella @ 2016-09-23 19:28 UTC (permalink / raw)
  To: Rasmus Villemoes, Florian Weimer; +Cc: libc-alpha, Joseph Myers



On 23/09/2016 16:09, Rasmus Villemoes wrote:
> On Fri, Sep 23 2016, Florian Weimer <fw@deneb.enyo.de> wrote:
> 
>> * Rasmus Villemoes:
>>
>>> +      ec = args.err;
>>> +      assert (ec >= 0);
>>> +      if (ec != 0)
>>> +	  __waitpid (new_pid, NULL, 0);
>>
>> One minor issue: Now that the variable name “ec” appears in an
>> assertion, it is a good idea to rename it to “error_code_from_child“
>> or something similar, so that the assertion message is more
>> meaningful.
> 
> IMO, that's already addressed by the comments above the initialization
> in the parent and exit path in the child. An assert message would have
> to be overly verbose to make sense without the context of the code it
> appears in. In any case, error_code_from_child is a bad name if we hit
> the "ec = -new_pid" branch.

Also, if CLONE_VM is not really creating a thread with shared VM it means
something really broken in the system.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-23 19:28                 ` Adhemerval Zanella
@ 2016-09-23 20:37                   ` Florian Weimer
  2016-09-27 20:26                     ` Rasmus Villemoes
  0 siblings, 1 reply; 45+ messages in thread
From: Florian Weimer @ 2016-09-23 20:37 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Rasmus Villemoes, libc-alpha, Joseph Myers

* Adhemerval Zanella:

> On 23/09/2016 16:09, Rasmus Villemoes wrote:
>> On Fri, Sep 23 2016, Florian Weimer <fw@deneb.enyo.de> wrote:
>> 
>>> * Rasmus Villemoes:
>>>
>>>> +      ec = args.err;
>>>> +      assert (ec >= 0);
>>>> +      if (ec != 0)
>>>> +	  __waitpid (new_pid, NULL, 0);
>>>
>>> One minor issue: Now that the variable name “ec” appears in an
>>> assertion, it is a good idea to rename it to “error_code_from_child“
>>> or something similar, so that the assertion message is more
>>> meaningful.
>> 
>> IMO, that's already addressed by the comments above the initialization
>> in the parent and exit path in the child. An assert message would have
>> to be overly verbose to make sense without the context of the code it
>> appears in. In any case, error_code_from_child is a bad name if we hit
>> the "ec = -new_pid" branch.
>
> Also, if CLONE_VM is not really creating a thread with shared VM it means
> something really broken in the system.

Agreed.  I have no objections anymore to the patch as-is.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-23 20:37                   ` Florian Weimer
@ 2016-09-27 20:26                     ` Rasmus Villemoes
  2016-09-27 21:14                       ` Adhemerval Zanella
  0 siblings, 1 reply; 45+ messages in thread
From: Rasmus Villemoes @ 2016-09-27 20:26 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Adhemerval Zanella, libc-alpha, Joseph Myers

On Fri, Sep 23 2016, Florian Weimer <fw@deneb.enyo.de> wrote:

> * Adhemerval Zanella:
>
>> On 23/09/2016 16:09, Rasmus Villemoes wrote:
>>> On Fri, Sep 23 2016, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> 
>>>> * Rasmus Villemoes:
>>>>
>>>>> +      ec = args.err;
>>>>> +      assert (ec >= 0);
>>>>> +      if (ec != 0)
>>>>> +	  __waitpid (new_pid, NULL, 0);
>>>>
>>>> One minor issue: Now that the variable name “ec” appears in an
>>>> assertion, it is a good idea to rename it to “error_code_from_child“
>>>> or something similar, so that the assertion message is more
>>>> meaningful.
>>> 
>>> IMO, that's already addressed by the comments above the initialization
>>> in the parent and exit path in the child. An assert message would have
>>> to be overly verbose to make sense without the context of the code it
>>> appears in. In any case, error_code_from_child is a bad name if we hit
>>> the "ec = -new_pid" branch.
>>
>> Also, if CLONE_VM is not really creating a thread with shared VM it means
>> something really broken in the system.
>

I'd be more worried about CLONE_VFORK not being absolutely honoured in
all scenarios (emulation, ptrace, ...).  But that's what we now have the
assert for.

> Agreed.  I have no objections anymore to the patch as-is.

Thanks. Could someone with commit access take this?

Rasmus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-27 20:26                     ` Rasmus Villemoes
@ 2016-09-27 21:14                       ` Adhemerval Zanella
  0 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella @ 2016-09-27 21:14 UTC (permalink / raw)
  To: Rasmus Villemoes, Florian Weimer; +Cc: libc-alpha, Joseph Myers



On 27/09/2016 13:26, Rasmus Villemoes wrote:
> On Fri, Sep 23 2016, Florian Weimer <fw@deneb.enyo.de> wrote:
> 
>> * Adhemerval Zanella:
>>
>>> On 23/09/2016 16:09, Rasmus Villemoes wrote:
>>>> On Fri, Sep 23 2016, Florian Weimer <fw@deneb.enyo.de> wrote:
>>>>
>>>>> * Rasmus Villemoes:
>>>>>
>>>>>> +      ec = args.err;
>>>>>> +      assert (ec >= 0);
>>>>>> +      if (ec != 0)
>>>>>> +	  __waitpid (new_pid, NULL, 0);
>>>>>
>>>>> One minor issue: Now that the variable name “ec” appears in an
>>>>> assertion, it is a good idea to rename it to “error_code_from_child“
>>>>> or something similar, so that the assertion message is more
>>>>> meaningful.
>>>>
>>>> IMO, that's already addressed by the comments above the initialization
>>>> in the parent and exit path in the child. An assert message would have
>>>> to be overly verbose to make sense without the context of the code it
>>>> appears in. In any case, error_code_from_child is a bad name if we hit
>>>> the "ec = -new_pid" branch.
>>>
>>> Also, if CLONE_VM is not really creating a thread with shared VM it means
>>> something really broken in the system.
>>
> 
> I'd be more worried about CLONE_VFORK not being absolutely honoured in
> all scenarios (emulation, ptrace, ...).  But that's what we now have the
> assert for.
> 
>> Agreed.  I have no objections anymore to the patch as-is.
> 
> Thanks. Could someone with commit access take this?
> 
> Rasmus
> 

I will commit for you.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-20 21:01           ` [PATCH] linux: spawni.c: simplify error reporting to parent Rasmus Villemoes
  2016-09-22 20:54             ` Adhemerval Zanella
  2016-09-23  5:21             ` Florian Weimer
@ 2016-09-28 14:14             ` Rich Felker
  2016-09-28 15:03               ` Andreas Schwab
  2 siblings, 1 reply; 45+ messages in thread
From: Rich Felker @ 2016-09-28 14:14 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: libc-alpha, Florian Weimer, Adhemerval Zanella, Joseph Myers

On Tue, Sep 20, 2016 at 11:01:00PM +0200, Rasmus Villemoes wrote:
> Using VFORK already ensures that the parent does not run until the
> child has either exec'ed succesfully or called _exit. Hence we don't
> need to read from a CLOEXEC pipe to ensure proper synchronization - we
> just make explicit use of the fact the the child and parent run in the
> same VM, so the child can write an error code to a field of the
> posix_spawn_args struct instead of sending it through a pipe.
> 
> To ensure that this mechanism really works, the parent initializes the
> field to -1 and the child writes 0 before execing.
> 
> This eliminates some annoying bookkeeping that is necessary to avoid
> the file actions from clobbering the write end of the pipe, and
> getting rid of the pipe creation in the first place means fewer system
> calls (four in the parent, usually one in the child) and fewer
> chanches for the spawn to fail (e.g. if we're close to EMFILE).

This is a bad idea for at least one reason: running under strace seems
to cause vfork _not_ to wait in the parent, leading to stack
clobbering and runaway wrong code execution. I have not tested lately
so I don't have a recipe to reproduce it, but I know this was one of
the motivations for musl's use of a pipe.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-28 14:14             ` Rich Felker
@ 2016-09-28 15:03               ` Andreas Schwab
  2016-09-28 15:22                 ` Rich Felker
  0 siblings, 1 reply; 45+ messages in thread
From: Andreas Schwab @ 2016-09-28 15:03 UTC (permalink / raw)
  To: Rich Felker
  Cc: Rasmus Villemoes, libc-alpha, Florian Weimer, Adhemerval Zanella,
	Joseph Myers

On Sep 28 2016, Rich Felker <dalias@libc.org> wrote:

> This is a bad idea for at least one reason: running under strace seems
> to cause vfork _not_ to wait in the parent, leading to stack
> clobbering and runaway wrong code execution.

This is not true.  With modern kernels implementing ptrace events a
tracer can correctly handle all clone variants including vfork.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-28 15:03               ` Andreas Schwab
@ 2016-09-28 15:22                 ` Rich Felker
  2016-09-28 18:13                   ` Adhemerval Zanella
  0 siblings, 1 reply; 45+ messages in thread
From: Rich Felker @ 2016-09-28 15:22 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Rasmus Villemoes, libc-alpha, Florian Weimer, Adhemerval Zanella,
	Joseph Myers

On Wed, Sep 28, 2016 at 05:03:41PM +0200, Andreas Schwab wrote:
> On Sep 28 2016, Rich Felker <dalias@libc.org> wrote:
> 
> > This is a bad idea for at least one reason: running under strace seems
> > to cause vfork _not_ to wait in the parent, leading to stack
> > clobbering and runaway wrong code execution.
> 
> This is not true.  With modern kernels implementing ptrace events a
> tracer can correctly handle all clone variants including vfork.

The problem might very well be limited to older (but still supported,
I think) kernels or older versions of strace; I'm not sure. That's
what I meant by saying I don't have a formula to reproduce the issue
at the moment.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-28 15:22                 ` Rich Felker
@ 2016-09-28 18:13                   ` Adhemerval Zanella
  2016-10-06 12:52                     ` Florian Weimer
  0 siblings, 1 reply; 45+ messages in thread
From: Adhemerval Zanella @ 2016-09-28 18:13 UTC (permalink / raw)
  To: Rich Felker, Andreas Schwab
  Cc: Rasmus Villemoes, libc-alpha, Florian Weimer, Joseph Myers



On 28/09/2016 08:22, Rich Felker wrote:
> On Wed, Sep 28, 2016 at 05:03:41PM +0200, Andreas Schwab wrote:
>> On Sep 28 2016, Rich Felker <dalias@libc.org> wrote:
>>
>>> This is a bad idea for at least one reason: running under strace seems
>>> to cause vfork _not_ to wait in the parent, leading to stack
>>> clobbering and runaway wrong code execution.
>>
>> This is not true.  With modern kernels implementing ptrace events a
>> tracer can correctly handle all clone variants including vfork.
> 
> The problem might very well be limited to older (but still supported,
> I think) kernels or older versions of strace; I'm not sure. That's
> what I meant by saying I don't have a formula to reproduce the issue
> at the moment.
> 
> Rich
> 

I am not convinced this specific issue should be a blocker for this
patch.  It seems quite limited in scope (running the process in
strace) on older/not supported kernel. And the behaviour is clearly
a kernel issue.

Also I checked with CentOS 6 (2.6.32-642.el6.x86_64, strace 4.8) and
it seems to follow the expected behaviour.  I would also consider
any deviation in more recent Linux releases as a kernel regression 
and I see no point and trying to support it.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] linux: spawni.c: simplify error reporting to parent
  2016-09-28 18:13                   ` Adhemerval Zanella
@ 2016-10-06 12:52                     ` Florian Weimer
  0 siblings, 0 replies; 45+ messages in thread
From: Florian Weimer @ 2016-10-06 12:52 UTC (permalink / raw)
  To: Adhemerval Zanella, Rich Felker, Andreas Schwab
  Cc: Rasmus Villemoes, libc-alpha, Florian Weimer, Joseph Myers

On 09/28/2016 08:12 PM, Adhemerval Zanella wrote:
> On 28/09/2016 08:22, Rich Felker wrote:
>> On Wed, Sep 28, 2016 at 05:03:41PM +0200, Andreas Schwab wrote:
>>> On Sep 28 2016, Rich Felker <dalias@libc.org> wrote:
>>>
>>>> This is a bad idea for at least one reason: running under strace seems
>>>> to cause vfork _not_ to wait in the parent, leading to stack
>>>> clobbering and runaway wrong code execution.
>>>
>>> This is not true.  With modern kernels implementing ptrace events a
>>> tracer can correctly handle all clone variants including vfork.
>>
>> The problem might very well be limited to older (but still supported,
>> I think) kernels or older versions of strace; I'm not sure. That's
>> what I meant by saying I don't have a formula to reproduce the issue
>> at the moment.

Rich, I value your comments, I really do, but this isn't actionable 
information, I'm afraid.

> I am not convinced this specific issue should be a blocker for this
> patch.  It seems quite limited in scope (running the process in
> strace) on older/not supported kernel. And the behaviour is clearly
> a kernel issue.
>
> Also I checked with CentOS 6 (2.6.32-642.el6.x86_64, strace 4.8) and
> it seems to follow the expected behaviour.  I would also consider
> any deviation in more recent Linux releases as a kernel regression
> and I see no point and trying to support it.

Agreed, seems reasonable.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-10-06 12:52 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-01 16:21 [PATCH 0/3] posix: Execute file function fixes Adhemerval Zanella
2016-02-01 16:21 ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Adhemerval Zanella
2016-02-02 13:06   ` Florian Weimer
2016-02-02 13:31     ` Adhemerval Zanella
2016-02-03 11:07       ` Torvald Riegel
2016-02-03 12:05         ` Adhemerval Zanella
2016-02-03 12:06           ` Torvald Riegel
2016-02-03 12:14             ` Adhemerval Zanella
2016-08-31 21:13   ` Rasmus Villemoes
2016-08-31 22:08     ` Joseph Myers
2016-09-01  9:28       ` Rasmus Villemoes
2016-09-14 13:13         ` Adhemerval Zanella
2016-09-14 18:58           ` Rasmus Villemoes
2016-09-14 19:59             ` Adhemerval Zanella
2016-09-20 20:25         ` Florian Weimer
2016-09-20 20:54           ` Rasmus Villemoes
2016-09-20 21:01           ` [PATCH] linux: spawni.c: simplify error reporting to parent Rasmus Villemoes
2016-09-22 20:54             ` Adhemerval Zanella
2016-09-23  5:21             ` Florian Weimer
2016-09-23 19:09               ` Rasmus Villemoes
2016-09-23 19:28                 ` Adhemerval Zanella
2016-09-23 20:37                   ` Florian Weimer
2016-09-27 20:26                     ` Rasmus Villemoes
2016-09-27 21:14                       ` Adhemerval Zanella
2016-09-28 14:14             ` Rich Felker
2016-09-28 15:03               ` Andreas Schwab
2016-09-28 15:22                 ` Rich Felker
2016-09-28 18:13                   ` Adhemerval Zanella
2016-10-06 12:52                     ` Florian Weimer
2016-09-06 20:43     ` [PATCH v2 3/3] posix: New Linux posix_spawn{p} implementation Rasmus Villemoes
2016-09-14 19:37     ` Adhemerval Zanella
2016-02-01 16:21 ` [PATCH v4 2/3] posix: execvpe cleanup Adhemerval Zanella
2016-02-01 16:21 ` [PATCH v2 1/3] posix: Remove dynamic memory allocation from execl{e,p} Adhemerval Zanella
2016-02-01 16:52   ` Joseph Myers
2016-02-01 17:18     ` Adhemerval Zanella
2016-02-01 17:54       ` Joseph Myers
2016-02-01 18:09         ` Adhemerval Zanella
2016-02-02 11:24       ` Florian Weimer
2016-02-02 12:46         ` Szabolcs Nagy
2016-02-02 12:47         ` Adhemerval Zanella
2016-02-02 16:33     ` Rich Felker
2016-02-07 21:28       ` Rasmus Villemoes
2016-02-09 11:36         ` Richard Henderson
2016-02-09 13:30           ` Szabolcs Nagy
2016-02-10 16:36             ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).