From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 69D403858D1E for ; Mon, 4 Apr 2022 14:18:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 69D403858D1E Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-197-HKKvBNahOV21nhlhtC_sCQ-1; Mon, 04 Apr 2022 10:17:59 -0400 X-MC-Unique: HKKvBNahOV21nhlhtC_sCQ-1 Received: by mail-qv1-f69.google.com with SMTP id im8-20020a056214246800b00443d3d3956dso3099015qvb.22 for ; Mon, 04 Apr 2022 07:17:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:organization:in-reply-to :content-transfer-encoding; bh=KqmEzE9whAJshMQ5l1M0UTOJKVQ3iFFknRcRs6uSM0k=; b=gZdBu67HabyyW8isQajIkK9LM+2mk+jTkWfftNOoEvWcbZg/Tzq+qcGtBfj8oY1lx5 cO/dTom0wr0/oRTd4gi3i916peWL5f2hQDA8Bl+8AsWDDNQw6HDahipUG9hgJJfGKg29 eb3b9XsNwLdbgoutoOQ5DtcNlScFZf+0D3ORZcOpptW5ahbOsxrlCzX2dZ31Lq2EXmOE k13yFeqyWnH92lR29W9w/oc14C36n7AvJjBWHy+BiR2HsfddJKnHJEGqdgm2z4EsRFSl wCxHDzZ4bEMTs4HuWEn61wdiXTZLwUVve1ovH8KezQcIcS8+6544GtHSKDMQNIQOm/28 ba2A== X-Gm-Message-State: AOAM5328dTnRzxdzeqkvWhkEH+bUbYWsFZfQxIMXMR1aCgENJpmllapF 6hZxxxxP3IUg+QcbYxAhGpFigDuV+FVKBkbXQA1uMQUs5C12/hCull96kma2+rE4SsCqMyylCOx 3trzH6uwJotfrkiYbFAC9 X-Received: by 2002:a05:6214:2a82:b0:443:e2fc:c209 with SMTP id jr2-20020a0562142a8200b00443e2fcc209mr45951qvb.59.1649081878261; Mon, 04 Apr 2022 07:17:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz4trHT5rNkTo5QLOdDXbbNLEkisaYnPQDhD0vo49hUWuUji3h81gY2d4/cWZSn/QkKlwoAAw== X-Received: by 2002:a05:6214:2a82:b0:443:e2fc:c209 with SMTP id jr2-20020a0562142a8200b00443e2fcc209mr45921qvb.59.1649081877800; Mon, 04 Apr 2022 07:17:57 -0700 (PDT) Received: from [192.168.0.241] (135-23-175-80.cpe.pppoe.ca. [135.23.175.80]) by smtp.gmail.com with ESMTPSA id j1-20020a05620a288100b0067b1be3201bsm6311288qkp.112.2022.04.04.07.17.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Apr 2022 07:17:57 -0700 (PDT) Message-ID: Date: Mon, 4 Apr 2022 10:17:55 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [patch v6] Allow for unpriviledged nested containers To: DJ Delorie , libc-alpha@sourceware.org References: From: Carlos O'Donell Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-29.6 required=5.0 tests=BAYES_00, DKIM_INVALID, DKIM_SIGNED, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Apr 2022 14:18:03 -0000 On 3/28/22 23:53, DJ Delorie wrote: > > [changes since v5: whether or not to isolate the pid namespace is now a > test-specific "pidns" option, support_needs_proc() takes a "why" message > for future readers] > > If the build itself is run in a container, we may not be able to > fully set up a nested container for test-container testing. > Notably is the mounting of /proc, since it's critical that it > be mounted from within the same PID namespace as its users, and > thus cannot be bind mounted from outside the container like other > mounts. > > This patch defaults to using the parent's PID namespace instead of > creating a new one, as this is more likely to be allowed. > > If the test needs an isolated PID namespace, it should add the "pidns" > command to its init script. LGTM. Reviewed-by: Carlos O'Donell > diff --git a/elf/tst-pldd.c b/elf/tst-pldd.c > index f31f9956faa..378262504a5 100644 > --- a/elf/tst-pldd.c > +++ b/elf/tst-pldd.c > @@ -85,6 +85,8 @@ in_str_list (const char *libname, const char *const strlist[]) > static int > do_test (void) > { > + support_need_proc ("needs /proc/sys/kernel/yama/ptrace_scope and /proc/$child"); > + OK. Nice use of a message to print. > /* Check if our subprocess can be debugged with ptrace. */ > { > int ptrace_scope = support_ptrace_scope (); > diff --git a/nptl/tst-pthread-getattr.c b/nptl/tst-pthread-getattr.c > index d2ebf308ae7..3f6f76ea83b 100644 > --- a/nptl/tst-pthread-getattr.c > +++ b/nptl/tst-pthread-getattr.c > @@ -28,6 +28,8 @@ > #include > #include > > +#include > + > /* There is an obscure bug in the kernel due to which RLIMIT_STACK is sometimes > returned as unlimited when it is not, which may cause this test to fail. > There is also the other case where RLIMIT_STACK is intentionally set as > @@ -153,6 +155,8 @@ check_stack_top (void) > static int > do_test (void) > { > + support_need_proc ("Reads /proc/self/maps to get stack size."); OK. > + > pagesize = sysconf (_SC_PAGESIZE); > return check_stack_top (); > } > diff --git a/nss/tst-reload2.c b/nss/tst-reload2.c > index fb3b94a1fab..7df0ca740b6 100644 > --- a/nss/tst-reload2.c > +++ b/nss/tst-reload2.c > @@ -95,6 +95,8 @@ do_test (void) > char buf1[PATH_MAX]; > char buf2[PATH_MAX]; > > + support_need_proc ("Our xmkdirp fails if we can't map our uid, which requires /proc."); > + OK. > sprintf (buf1, "/subdir%s", support_slibdir_prefix); > xmkdirp (buf1, 0777); > > diff --git a/support/Makefile b/support/Makefile > index 5ddcb8d1581..f036a813048 100644 > --- a/support/Makefile > +++ b/support/Makefile > @@ -64,6 +64,7 @@ libsupport-routines = \ > support_format_netent \ > support_isolate_in_subprocess \ > support_mutex_pi_monotonic \ > + support_need_proc \ OK. > support_path_support_time64 \ > support_process_state \ > support_ptrace \ > diff --git a/support/support.h b/support/support.h > index 73b9fc48f01..d20051da4d4 100644 > --- a/support/support.h > +++ b/support/support.h > @@ -91,6 +91,11 @@ char *support_quote_string (const char *); > regular file open for writing, and initially empty. */ > int support_descriptor_supports_holes (int fd); > > +/* Predicates that a test requires a working /proc filesystem. This > + call will exit with UNSUPPORTED if /proc is not available, printing > + WHY_MSG as part of the diagnostic. */ > +void support_need_proc (const char *why_msg); OK. > + > /* Error-checking wrapper functions which terminate the process on > error. */ > > diff --git a/support/support_need_proc.c b/support/support_need_proc.c > new file mode 100644 > index 00000000000..9b4eab7539b > --- /dev/null > +++ b/support/support_need_proc.c > @@ -0,0 +1,35 @@ > +/* Indicate that a test requires a working /proc. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > + > +/* We test for /proc/self/maps since that's one of the files that one > + of our tests actually uses, but the general idea is if Linux's > + /proc/ (procfs) filesystem is mounted. If not, the process exits > + with an UNSUPPORTED result code. */ > + > +void > +support_need_proc (const char *why_msg) > +{ > +#ifdef __linux__ > + if (access ("/proc/self/maps", R_OK)) > + FAIL_UNSUPPORTED ("/proc is not available, %s", why_msg); > +#endif OK. Only does anything on linux (fixed from v5). > +} > diff --git a/support/test-container.c b/support/test-container.c > index 25e7f142193..c837c4d7580 100644 > --- a/support/test-container.c > +++ b/support/test-container.c > @@ -97,6 +97,7 @@ int verbose = 0; > * mytest.root/mytest.script has a list of "commands" to run: > syntax: > # comment > + pidns Perfect. This is a great way to avoid bitrot of this code while still making it functional. > su > mv FILE FILE > cp FILE FILE > @@ -122,6 +123,8 @@ int verbose = 0; > > details: > - '#': A comment. > + - 'pidns': Require a separate PID namespace, prints comment if it can't > + (default is a shared pid namespace) OK. > - 'su': Enables running test as root in the container. > - 'mv': A minimal move files command. > - 'cp': A minimal copy files command. > @@ -148,7 +151,7 @@ int verbose = 0; > * Simple, easy to review code (i.e. prefer simple naive code over > complex efficient code) > > - * The current implementation ist parallel-make-safe, but only in > + * The current implementation is parallel-make-safe, but only in > that it uses a lock to prevent parallel access to the testroot. */ > > > @@ -227,11 +230,37 @@ concat (const char *str, ...) > return bufs[n]; > } > > +/* Like the above, but put spaces between words. Caller frees. */ > +static char * > +concat_words (char **words, int num_words) > +{ > + int len = 0; > + int i; > + char *rv, *p; > + > + for (i = 0; i < num_words; i ++) > + { > + len += strlen (words[i]); > + len ++; > + } > + > + p = rv = (char *) xmalloc (len); > + > + for (i = 0; i < num_words; i ++) > + { > + if (i > 0) > + p = stpcpy (p, " "); > + p = stpcpy (p, words[i]); > + } > + > + return rv; > +} > + > /* Try to mount SRC onto DEST. */ > static void > trymount (const char *src, const char *dest) > { > - if (mount (src, dest, "", MS_BIND, NULL) < 0) > + if (mount (src, dest, "", MS_BIND | MS_REC, NULL) < 0) > FAIL_EXIT1 ("can't mount %s onto %s\n", src, dest); > } > > @@ -726,6 +755,9 @@ main (int argc, char **argv) > gid_t original_gid; > /* If set, the test runs as root instead of the user running the testsuite. */ > int be_su = 0; > + int require_pidns = 0; > + const char *pidns_comment = NULL; > + int do_proc_mounts = 0; > int UMAP; > int GMAP; > /* Used for "%lld %lld 1" so need not be large. */ > @@ -1011,6 +1043,12 @@ main (int argc, char **argv) > { > be_su = 1; > } > + else if (nt >= 1 && strcmp (the_words[0], "pidns") == 0) > + { > + require_pidns = 1; > + if (nt > 1) > + pidns_comment = concat_words (the_words + 1, nt - 1); OK. > + } > else if (nt == 3 && strcmp (the_words[0], "mkdirp") == 0) > { > long int m; > @@ -1068,7 +1106,8 @@ main (int argc, char **argv) > > #ifdef CLONE_NEWNS > /* The unshare here gives us our own spaces and capabilities. */ > - if (unshare (CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNS) < 0) > + if (unshare (CLONE_NEWUSER | CLONE_NEWNS > + | (require_pidns ? CLONE_NEWPID : 0)) < 0) > { > /* Older kernels may not support all the options, or security > policy may block this call. */ > @@ -1079,6 +1118,11 @@ main (int argc, char **argv) > check_for_unshare_hints (); > FAIL_UNSUPPORTED ("unable to unshare user/fs: %s", strerror (saved_errno)); > } > + /* We're about to exit anyway, it's "safe" to call unshare again > + just to see if the CLONE_NEWPID caused the error. */ > + else if (require_pidns && unshare (CLONE_NEWUSER | CLONE_NEWNS) >= 0) > + FAIL_EXIT1 ("unable to unshare pid ns: %s : %s", strerror (errno), > + pidns_comment ? pidns_comment : "required by test"); > else > FAIL_EXIT1 ("unable to unshare user/fs: %s", strerror (errno)); > } > @@ -1094,6 +1138,15 @@ main (int argc, char **argv) > trymount (support_srcdir_root, new_srcdir_path); > trymount (support_objdir_root, new_objdir_path); > > + /* It may not be possible to mount /proc directly. */ > + if (! require_pidns) > + { > + char *new_proc = concat (new_root_path, "/proc", NULL); > + xmkdirp (new_proc, 0755); > + trymount ("/proc", new_proc); > + do_proc_mounts = 1; > + } > + > xmkdirp (concat (new_root_path, "/dev", NULL), 0755); > devmount (new_root_path, "null"); > devmount (new_root_path, "zero"); > @@ -1163,42 +1216,60 @@ main (int argc, char **argv) > > maybe_xmkdir ("/tmp", 0755); > > - /* Now that we're pid 1 (effectively "root") we can mount /proc */ > - maybe_xmkdir ("/proc", 0777); > - if (mount ("proc", "/proc", "proc", 0, NULL) < 0) > - FAIL_EXIT1 ("Unable to mount /proc: "); > - > - /* We map our original UID to the same UID in the container so we > - can own our own files normally. */ > - UMAP = open ("/proc/self/uid_map", O_WRONLY); > - if (UMAP < 0) > - FAIL_EXIT1 ("can't write to /proc/self/uid_map\n"); > - > - sprintf (tmp, "%lld %lld 1\n", > - (long long) (be_su ? 0 : original_uid), (long long) original_uid); > - write (UMAP, tmp, strlen (tmp)); > - xclose (UMAP); > - > - /* We must disable setgroups () before we can map our groups, else we > - get EPERM. */ > - GMAP = open ("/proc/self/setgroups", O_WRONLY); > - if (GMAP >= 0) > + if (require_pidns) > { > - /* We support kernels old enough to not have this. */ > - write (GMAP, "deny\n", 5); > - xclose (GMAP); > + /* Now that we're pid 1 (effectively "root") we can mount /proc */ > + maybe_xmkdir ("/proc", 0777); > + if (mount ("proc", "/proc", "proc", 0, NULL) != 0) > + { > + /* This happens if we're trying to create a nested container, > + like if the build is running under podman, and we lack > + priviledges. > + > + Ideally we would WARN here, but that would just add noise to > + *every* test-container test, and the ones that care should > + have their own relevent diagnostics. > + > + FAIL_EXIT1 ("Unable to mount /proc: "); */ > + } > + else > + do_proc_mounts = 1; > } > > - /* We map our original GID to the same GID in the container so we > - can own our own files normally. */ > - GMAP = open ("/proc/self/gid_map", O_WRONLY); > - if (GMAP < 0) > - FAIL_EXIT1 ("can't write to /proc/self/gid_map\n"); > + if (do_proc_mounts) > + { > + /* We map our original UID to the same UID in the container so we > + can own our own files normally. */ > + UMAP = open ("/proc/self/uid_map", O_WRONLY); > + if (UMAP < 0) > + FAIL_EXIT1 ("can't write to /proc/self/uid_map\n"); > + > + sprintf (tmp, "%lld %lld 1\n", > + (long long) (be_su ? 0 : original_uid), (long long) original_uid); > + write (UMAP, tmp, strlen (tmp)); > + xclose (UMAP); > + > + /* We must disable setgroups () before we can map our groups, else we > + get EPERM. */ > + GMAP = open ("/proc/self/setgroups", O_WRONLY); > + if (GMAP >= 0) > + { > + /* We support kernels old enough to not have this. */ > + write (GMAP, "deny\n", 5); > + xclose (GMAP); > + } > > - sprintf (tmp, "%lld %lld 1\n", > - (long long) (be_su ? 0 : original_gid), (long long) original_gid); > - write (GMAP, tmp, strlen (tmp)); > - xclose (GMAP); > + /* We map our original GID to the same GID in the container so we > + can own our own files normally. */ > + GMAP = open ("/proc/self/gid_map", O_WRONLY); > + if (GMAP < 0) > + FAIL_EXIT1 ("can't write to /proc/self/gid_map\n"); > + > + sprintf (tmp, "%lld %lld 1\n", > + (long long) (be_su ? 0 : original_gid), (long long) original_gid); > + write (GMAP, tmp, strlen (tmp)); > + xclose (GMAP); > + } > > if (change_cwd) > { > -- Cheers, Carlos.