From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x244.google.com (mail-lj1-x244.google.com [IPv6:2a00:1450:4864:20::244]) by sourceware.org (Postfix) with ESMTPS id 6089B389702B for ; Thu, 22 Oct 2020 10:39:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6089B389702B Received: by mail-lj1-x244.google.com with SMTP id i2so1395210ljg.4 for ; Thu, 22 Oct 2020 03:39:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=7zNbQ3ooEkkM2/rbEW8hAn7HIFAFXPXWCeuQ4EPmCkU=; b=jYE/aEHjtY3cVXK/byUXZF8lPvxr+MS5advBWnCnCj+3KB0RmpTcqgRA5N2ONFWho0 NW30wg5dlaB9ZvyJJCcWSd55yrnRkJCsSAWay7eqkbCwD97bfTNVC3+bM/ADXaeQe7OH crjuBwNqWwq/7pGn3nZcPrREsY0YBEYrTIxv0NzmAB2Hj0Ykw7BYp7hsnPJ/DW1A8V9A QHtkAv0+NiuxA5KoZbwc35LO43kaWQ4kB+MtuWFdBM2YpqB1Hx9Foa+MP9nNrXhcUri9 Vs6RmwdmCRUcC/2RoRPv0NFIr/7GouJsC2cxqKehrWpDU+yCJ4CUotlC2gnGvBY/7vlA OMMw== X-Gm-Message-State: AOAM530pteevqMnPs07Ig+vpNSvHT/L5ulOhf2nOri8dbRv6K2wVCtXI w8+M/Gd5BeMgFYrJOmhdGEk= X-Google-Smtp-Source: ABdhPJw6kKNDTOJNJiGp85+W/aXRj0sqQ4Yma5mJXrnCZkQnA5NAle6/aehyn3/d+Svn++m4ilTMJA== X-Received: by 2002:a2e:592:: with SMTP id 140mr663959ljf.381.1603363166231; Thu, 22 Oct 2020 03:39:26 -0700 (PDT) Received: from [192.168.1.112] (88-114-211-119.elisa-laajakaista.fi. [88.114.211.119]) by smtp.gmail.com with ESMTPSA id m13sm206105lfl.269.2020.10.22.03.39.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Oct 2020 03:39:25 -0700 (PDT) Subject: Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures To: Szabolcs Nagy , Jeremy Linton Cc: "linux-arm-kernel@lists.infradead.org" , libc-alpha@sourceware.org, systemd-devel@lists.freedesktop.org, "linux-kernel@vger.kernel.org" , Mark Rutland , Mark Brown , Dave Martin , Kees Cook , Catalin Marinas , Will Deacon References: <8584c14f-5c28-9d70-c054-7c78127d84ea@arm.com> <20201022075447.GO3819@arm.com> From: Topi Miettinen Message-ID: <78464155-f459-773f-d0ee-c5bdbeb39e5d@gmail.com> Date: Thu, 22 Oct 2020 13:39:07 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201022075447.GO3819@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Oct 2020 10:39:29 -0000 On 22.10.2020 10.54, Szabolcs Nagy wrote: > The 10/21/2020 22:44, Jeremy Linton wrote: >> There is a problem with glibc+systemd on BTI enabled systems. Systemd >> has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny >> PROT_EXEC changes. Glibc enables BTI only on segments which are marked as >> being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is >> caught by the seccomp filter, resulting in service failures. >> >> So, at the moment one has to pick either denying PROT_EXEC changes, or BTI. >> This is obviously not desirable. >> >> Various changes have been suggested, replacing the mprotect with mmap calls >> having PROT_BTI set on the original mapping, re-mmapping the segments, >> implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set, >> and various modification to seccomp to allow particular mprotect cases to >> bypass the filters. In each case there seems to be an undesirable attribute >> to the solution. >> >> So, whats the best solution? > > the easy fix in glibc is to ignore mprotect(PROT_BTI|PROT_EXEC) > failures, so programs work with seccomp filters, but bti gets > disabled (it's unreasonable to expect bti protection if mprotect > is filtered). it will be a nasty silent failure though. Some may also want to use seccomp filters so that they will immediately kill the process and in this case they couldn't do it. > and i'm also considering a fix that re-mmaps the executable > segment with PROT_BTI instead of mprotect since that is not > filtered. unfortunately the main exe is mmaped by the kernel > without PROT_BTI and the libc does not have the fd to re-mmap. > (bti can be left off for the main exe if mprotect fails and > later we can teach the kernel to add bti there.) currently > this is not a complete fix so i'm a bit hesitant about it. > > as for a kernel side fix: if there is a way to only filter > PROT_EXEC mprotect on mappings that are not yet PROT_EXEC > that would solve this problem (but likely needs new syscall > or seccomp capability). Problem with seccomp MDWX is that it's still possible for malicious programs to circumvent the filter by using memfd_create(), fill the memory with desired content and then use mmap(,,PROT_EXEC) to make it executable without triggering seccomp. This can be mitigated by filtering also memfd_create(), but then some programs want to use it. Also the protection can be bypassed if the program can write to a file system which isn't mounted with "noexec". This can be mitigated with private mount namespaces and global mount options, but again some programs are written to expect W & X. But I think SELinux has a more complete solution (execmem) which can track the pages better than is possible with seccomp solution which has a very narrow field of view. Maybe this facility could be made available to non-SELinux systems, for example with prctl()? Then the in-kernel MDWX could allow mprotect(PROT_EXEC | PROT_BTI) in case the backing file hasn't been modified, the source filesystem isn't writable for the calling process and the file descriptor isn't created with memfd_create(). -Topi