From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by sourceware.org (Postfix) with ESMTPS id B73F738515CF for ; Sun, 29 May 2022 14:33:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B73F738515CF Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E922F60F19; Sun, 29 May 2022 14:33:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 81D27C385A9; Sun, 29 May 2022 14:33:16 +0000 (UTC) Date: Sun, 29 May 2022 16:33:13 +0200 From: Christian Brauner To: Andrei Vagin Cc: Florian Weimer , Adhemerval Zanella , libc-alpha@sourceware.org, Alexey Izbyshev , Carlos O'Donell , Dmitry Safonov <0x7f454c46@gmail.com> Subject: Re: [PATCH v4 0/3] Linux: Fix posix_spawn when user with time namespaces Message-ID: <20220529143313.xgvqkr7qe35hwcz7@wittgenstein> References: <20220510191155.1998575-1-adhemerval.zanella@linaro.org> <877d6tb3hl.fsf@oldenburg.str.redhat.com> <20220511092119.ke4zlm2dkazasmva@wittgenstein> <87h75dyf3p.fsf@oldenburg.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 May 2022 14:33:21 -0000 On Fri, May 27, 2022 at 05:03:14PM -0700, Andrei Vagin wrote: > On Fri, May 27, 2022 at 08:53:54AM -0700, Andrei Vagin wrote: > > On Wed, May 25, 2022 at 5:24 AM Florian Weimer wrote: > > > > > > * Christian Brauner: > > > > > > > On Tue, May 10, 2022 at 09:18:46PM +0200, Florian Weimer wrote: > > > >> * Adhemerval Zanella: > > > >> > > > >> > The patchset adds some support to tests the fallback code to > > > >> > use only use CLONE_VFORK. It uses unshare directly because > > > >> > it simpler than add container support. > > > >> > > > > >> > Adhemerval Zanella (3): > > > >> > linux: Add CLONE_NEWTIME from Linux 5.6 to bits/sched.h > > > >> > support: Add support_enter_time_namespace > > > >> > linux: Add fallback for clone failure on posix_spawn (BZ #29115) > > > >> > > > >> Christan, how likely is it that we'd get another time namespace variant > > > >> that would only become effective after execve (when the DSO is remapped > > > >> anyway)? > > > > > > > > Not unlikely if it helps you avoid a lot of complexity. I will need some > > > > time to track down Andrei and others to discuss though. > > > > > > Any progress with that? (I hope I guessed the right Andrei.) > > > > I think this is the right me. Have I missed something? > > > > > > > > Breaking vfork is really a bit of a hassle for us, and the workaround > > > code is quite non-trivial and will have to implemented across many > > > projects (not just glibc). An unshare request that takes effect on > > > execve only would really help. > > > > Is the problem that vfork fails if a process has half-entered a time namespace? > > I like the idea of entering a time namespace on exec and don't fail > vfork. The only worry is that the behavior becomes more complicated and > less obvious. > > Here is the untested patch: > > diff --git a/fs/exec.c b/fs/exec.c > index 14b4b3755580..96f650ec7533 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -65,6 +65,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1266,6 +1267,7 @@ int begin_new_exec(struct linux_binprm * bprm) > if (retval) > goto out; > > + timens_on_fork(me->nsproxy, me); > /* > * Cancel any io_uring activity across execve > */ > diff --git a/kernel/fork.c b/kernel/fork.c > index 124829ed0163..653b80524a54 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2030,15 +2030,6 @@ static __latent_entropy struct task_struct > *copy_process( > return ERR_PTR(-EINVAL); > } > > - /* > - * If the new process will be in a different time namespace > - * do not allow it to share VM or a thread group with the > forking task. > - */ > - if (clone_flags & (CLONE_THREAD | CLONE_VM)) { > - if (nsp->time_ns != nsp->time_ns_for_children) > - return ERR_PTR(-EINVAL); > - } > - > if (clone_flags & CLONE_PIDFD) { > /* > * - CLONE_DETACHED is blocked so that we can > * potentially > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c > index eec72ca962e2..b4cbb406bc28 100644 > --- a/kernel/nsproxy.c > +++ b/kernel/nsproxy.c > @@ -179,7 +179,8 @@ int copy_namespaces(unsigned long flags, struct > task_struct *tsk) > if (IS_ERR(new_ns)) > return PTR_ERR(new_ns); > > - timens_on_fork(new_ns, tsk); > + if ((flags & CLONE_VM) == 0) > + timens_on_fork(new_ns, tsk); > > tsk->nsproxy = new_ns; > return 0; > > If this is what we want, I can prepare a final patch. But I will be on > vacation next week, so it will happen after it. No problem. It's the merge window anyway and it's May which means bank holiday galore in parts of Europe.