From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 38072 invoked by alias); 7 Nov 2017 19:59:36 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 36513 invoked by uid 89); 7 Nov 2017 19:59:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-12.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_1,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 spammy=replied, solving, Based, HTo:U*fw X-HELO: mail-qk0-f193.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=2FT3Lvw4GDn7eklCn849ZKTYwJYnXlmm7ZOrWLKGI4c=; b=jimCzCIywUmEQcA0cd4AHo1+ywzGeNspphqtxobm2JLgKMzoqNgyU9F9qx0J5NPXVc 01E0WX8dagLzUDjoeFdbzmRnKRQCiEPR2zspdvsuQb/k/YBzb86B1pjacInI+U1tk866 tP0FeQx3wHWr7g3QKQiQqLcehIDhxEJVcoXIMjFG6003YeMMPI6xXbeK5aN345BS+USy RsFmArqdSbiDyeuID9LSIuhL8WQs+Hxt06GZidhVY/A4flzBNUYEazcGgFeMV8XuLI3K IGCJQnTnTFMSfDwPgM/FXDWflXdgz7n19BR2YYUzj3vpJF2dRVc39xYDS6krNZHQHUKA Bz7g== X-Gm-Message-State: AJaThX5PuYNz3PKyMPNmf5RBZ5L2sN95ZwbW1WVg/YeSDkmSjLMyEwcz p2RoorZbLoJRrxYeVjw/qOWL+cnX+iE= X-Google-Smtp-Source: ABhQp+QNDPp6gTKVhzzKkG9Fx7Z4l/Ny4h3c9ZW4OTGpXjhDwxmtGgwBWrIdRg7eXtA/YWWR5bxM9Q== X-Received: by 10.233.216.6 with SMTP id u6mr28020367qkf.357.1510084771297; Tue, 07 Nov 2017 11:59:31 -0800 (PST) Subject: Re: [PATCH] Linux: Implement interfaces for memory protection keys To: Florian Weimer Cc: libc-alpha@sourceware.org References: <20171104174948.A79F6400EA2DF@oldenburg.str.redhat.com> <5f7602dd-daf1-36ec-3040-19cfd17f8fc2@redhat.com> <521ff046-f859-e8f4-3e75-f2a6e0d527b0@linaro.org> <878tfjm5ds.fsf@mid.deneb.enyo.de> From: Adhemerval Zanella Message-ID: <8c486d33-25ea-ee3e-113e-e41a777727e0@linaro.org> Date: Tue, 07 Nov 2017 19:59:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <878tfjm5ds.fsf@mid.deneb.enyo.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2017-11/txt/msg00245.txt.bz2 On 06/11/2017 20:03, Florian Weimer wrote: > * Adhemerval Zanella: > >>> Oh, I was worried that these arguments were expected to be true >>> 64-bit values eventually.  Not today, but perhaps in the future.   >>> Then using unsigned long would be no good on x32.  But this does not >>> seem to be the case, so we can just use unsigned int, to make clear >>> that these arguments are actually 32 bits only. >> >> Even they were 64 bits argument, current {INLINE,INTERNAL}_SYSCALL on >> x32 pass them correctly in syscall arguments. > > But there would only be 32 bits to pass. > >>>> So I think there is no reason to use a different symbol signature than >>>> kernel. >>> >>> I still think using unsigned long should be avoided. >> >> Is your objection because it will have different argument types for 32 >> and 64 bits? > > Yes, if 64 bits were needed, then it would have to be unsigned long > long on x32. But we need only 32 actually (0 and 3 bits for the > upcoming POWER support), so unsigned int is the right type, I think. Right, I would like to try deviate as little as possible from the kernel but I think we can have a different signature for these function. > >> Currently for x86 indeed it does not make difference since for >> 'wrpkru' instruction on x86_64 the high-order 32-bits of RCX, RDX >> and RAX are ignored. > > The size of the PKRU register does not matter. pkey_alloc does not > use a bitmap of keys, it returns a key number (which happens to be an > index into PKRU on x86-64, but this is not exposed in the API). > >> But if we aim to make it a generic API I think we should stick to >> what kernel provides for main reason we can not foresee what kind of >> usage future architecture might do with this value. > > Well, it's a generic interface, so if there's an expectation that the > generic code might eventually use a flag beyond 32 bits, we'd better > using uint64_t today. But that seems quite an overkill. Ok, seems a reasonable approach. > >>>>> +/* Compare the two numbers LEFT and RIGHT and report failure if they >>>>> +   are different.  */ >>>>> +#define TEST_COMPARE(left, right)                                       \ >>>>> +  ({                                                                    \ >>>>> +    __typeof__ (left) __left_value = (left);                            \ >>>>> +    __typeof__ (right) __right_value = (right);                         \ >>>>> +    _Static_assert (sizeof (__left_value) <= sizeof (long long),        \ >>>>> +                    "left value fits into long long");                  \ >>>>> +    _Static_assert (sizeof (__right_value) <= sizeof (long long),       \ >>>>> +                    "right value fits into long long");                 \ >>>>> +    if (__left_value != __right_value                                   \ >>>>> +        || ((__left_value > 0) != (__right_value > 0)))                 \ >>>>> +      support_test_compare_failure                                      \ >>>>> +        (__FILE__, __LINE__,                                            \ >>>>> +         #left, __left_value, __left_value > 0,                         \ >>>>> +         #right, __right_value, __right_value > 0);                     \ >>>>> +  }) >>>>> + >>>> >>>> I think the macro name should be more explicit since it is not only >>>> comparing two number but also their size. >>> >>> Huh?  No, it checks that conversion to long long plus the sign bit >>> is not lossy and that the signs match.  So I think the name is >>> accurate. >> >> Yeah, but it ties some type (long long) and also some sign (since it >> is comparing if both sides are positive) to a quite generic name. >> Maybe TEST_COMPARE_LL_INT_POSITIVE? > > It uses long long as a proxy for intmax_t. I don't want to include > due to namespace pollution. > > The compile-time check is there to prevent the accidental use of the > macro with 128-bit integer types. Ok, would you mind adding this comment on macro implementation? > >>> Sorry, I disagree with that.  Premature generalization usually does >>> not work out because you don't know which parts to generalize. >>> >>> If we get key revocation on pkey_alloc, pkey_set would have to turn >>> into a vsyscall anyway (where the code sequence is known to the >>> kernel, and the revocation code can override the PKRU value the >>> pkey_set implementation has read, thus closing the race). >> >> Alright, that is fair point. I am thinking more about the sense of >> what should be generic such a maximum number of bits for the mask, >> the possible mask value and to set them on the internal flag. But >> I think if it were the case in the future we can work a more generic >> implementation. > > I just saw POWER patches fly by. There, the executable permission > seems to be in a separate register which cannot be updated directly > from userspace. 8-/ It is at v9 now [1] (and you seems to be already aware since you replied about key reuse issue). And they are solving the executable permission userspace issue it by adding a new syscall (__NR_pkey_modify). Reading powerpc patchset it seems to me that it will be worth add some generalization on arch-pkey.h and pkey_{set,get} - Based on current patchset, powerpc seems to define different value for PKEY_DISABLE_ACCESS, so I think it would be worth to parametrize by moving it to arch-defined header. - What about adding this on arch-pkey.h --- #define HAVE_ARCH_PKEY 1 #define NR_PKEYS 16 #define PKEY_BITS_PER_PKEY 2 static inline uint32_t pkey_shift (uint32_t pkey) { return pkey * PKEY_BITS_PER_PKEY; } --- And define as pkey_get: --- uint32_t pkey_get (int pkey, unsigned long flags) { #if HAVE_ARCH_PKEY uint32_t mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE); pkey_reg_t pkey_reg = pkey_read (); pkey_reg_t pkey_shifted = pkey_reg >> pkey_shift (pkey); return pkey_shifted & mask; #else return -1; #endif } --- With this a possible future powerpc patch would just need to add code on arch-pkey.h instead of reimplement pkey_set.c as well. - I would expect something similar to pkey_set, but also issuing the syscall for executable permission masking. [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-November/165602.html > >> What about new pkey_mprotect (..., 0) after pkey_set (0, ...)? Will >> it follow the new key setting for key 0 or kernel will use a default >> value? > > It's important that the keys are basically additional tag bits on a > page. In isolation they do not grant or reject access. These tag > bits only select which access rights associated with the current > thread should be applied to the page. So pkey_mprotect and pkey_set > are completely indepedent: pkey_mprotect changes the tag bits on the > page (but nothing else) and does not depend on the current thread's > access rights, and pkey_set changes the current thread's access rights > only (without changing the page tag bits set by pkey_mprotect before > or after). Crucially, these keys do not any additional protection > flags to pages. The tag bits gain meaning only in combination with > the access rights of a running thread. My understanding from documentation it key 0 is default one and it is assigned to any memory regions (through mmap) which has not been explicitly assigned through pkey_mprotect. My questioning whether we should filter out key 0 is to: 1. Issue an error when application tries to use a key which has not been previously allocated by pkey_alloc (since key 0 is pre-allocated by the kernel). 2. Avoid changing default masking bits for pages which has not been explicitly assigned through a pkey_mprotect. However this does not prevent uses from issuing a RDPKRU/WRPKRU directly. > >> My question is of you think we need to filter and/or document key 0 >> (and 1 as you indicate) as being special? > > No, we shouldn't do that. The kernel will do it for us as part of > pkey_alloc, and that should be sufficient. >