From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x329.google.com (mail-ot1-x329.google.com [IPv6:2607:f8b0:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id 332D93858CDB for ; Wed, 21 Sep 2022 20:58:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 332D93858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ot1-x329.google.com with SMTP id x23-20020a056830409700b00655c6dace73so4812973ott.11 for ; Wed, 21 Sep 2022 13:58:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date; bh=BAQHSXpmC2EaS5vxvLxM8hvWCBYuMC1sX53Vv6A0/GI=; b=pDvKVwXzyQRrTD5R7HPlxZeWShlHN20rY+gwWKsComrE6zKSXgkHgEeeIE8ScfCFHd Q3sI/d3kqvIxiCP68uEOWAoUOyXzfgpn2xJtMdIxek8kiNslfaB25UrUhxqah57GeZfT PJ6CwuldeVZjGO74xf6Uoq5rmRoGBfZeZsxPR8LYuI3adp3hXsrE4H7TtZzMOTw3z3ek SXfkLsifZUYsyToxMhUUj09jLsDxFuekahFryK2xZULcvGR2D/J6WMK3SnW0kvbrAzdL T+PzSu4setfSx7/RPmbZgu0Ji7syxxSCOUBLKAGHg/dJNYD26UQjOKUhB5EE0H1uXF3C 85pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=BAQHSXpmC2EaS5vxvLxM8hvWCBYuMC1sX53Vv6A0/GI=; b=egKo8IbPGp0tf8RwLwQNfrC92fedoDpMqzPmfTPvxzI8vvgoW+ZMggx2V4dKt16fqO 3zi/gOFLE7CpIN22IFzSZfW6/a0WB4TQDx6pOMPEKrCZnYTkFkHcAooN0MpqQwqHaMHJ NKvlzC79mP9zvtRzUW5d8PDzE+yCMo5nR3DLQvu1Mu/cPc1T9C3a1TvjsN97L83q+tgn ox+uQlRNahYWFoVn7mExhK31qLawVdiXD1agiLl8uSuN5ERQiKNhPUTkGTf82J+Z3i3o xWum4s2ysg8cbquK71m1lAUzjt4yFIRXh27Dyz6kP00+PK0TCQhL7NZ4jqMOdI7ns4jM IXHA== X-Gm-Message-State: ACrzQf28aHhCSSPgVA0TEqhZsLSUx6XC/o1OdEw0rPpbd81S82iEjN3f ay/921yXtC7T8xku4KqiaOPnDLxQ2CtBXqGd X-Google-Smtp-Source: AMsMyM5PtxLsc9y9XQhhBHbq7NK4VppKtXH9pSoN1ATWw+13+z2kqGk2zAYcAI5GZNLwYGfMksrLow== X-Received: by 2002:a05:6830:1083:b0:65b:ca18:f1c1 with SMTP id y3-20020a056830108300b0065bca18f1c1mr34926oto.314.1663793928943; Wed, 21 Sep 2022 13:58:48 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c1:c266:cc93:9f8:1eba:d2e8? ([2804:1b3:a7c1:c266:cc93:9f8:1eba:d2e8]) by smtp.gmail.com with ESMTPSA id cy24-20020a056870b69800b00127dc5bb89esm2160223oab.37.2022.09.21.13.58.47 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Sep 2022 13:58:48 -0700 (PDT) Message-ID: <54c6018f-3b1d-84e9-04e5-55c0eca66a4c@linaro.org> Date: Wed, 21 Sep 2022 17:58:46 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: RFC PATCH: Don't use /proc/self/maps to calculate size of initial thread stack Content-Language: en-US To: libc-alpha@sourceware.org References: <79dae81f-8e33-4499-a47a-93cc0903be6a@www.fastmail.com> <87fsgvvbwq.fsf@oldenburg.str.redhat.com> <9d232b1b-f123-4189-bf09-dd29aab6486a@www.fastmail.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <9d232b1b-f123-4189-bf09-dd29aab6486a@www.fastmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 15/09/22 13:09, Zack Weinberg via Libc-alpha wrote: > On Tue, Sep 13, 2022, at 5:52 AM, Florian Weimer wrote: >> * Zack Weinberg via Libc-alpha: >>> for many years, the NPTL implementation has said that >>> the stack starts at __libc_stack_end, rounded in the opposite >>> direction from stack growth to the nearest page boundary, and extends >>> for getrlimit(RLIMIT_STACK).rlim_cur bytes, *minus the size of the >>> information block*, which is beyond __libc_stack_end. The rationale >>> is that the resource limit is enforced against the entire memory area, >>> so if we don't subtract the size of the information block, then the >>> program will run out of stack a few pages before pthread_attr_getstack >>> says it will. >> >> Do we actually have to subtract the size of the information block? >> One could argue that this is just part of the arguments passed to main, >> so sort-of-but-not-quite part of main's stack frame. > > We could make that change, but we'd need to make other changes as well > to keep everything consistent, and I'm not sure _how_ to make that > change without having the information that pthread_getattr_np is probing for. > > Suppose 'stackaddr' and 'stacksize' are the values reported by > pthread_attr_getstack when applied to the initial thread. Then the > invariants I think we need to preserve are: > > stacksize <= getrlimit(RLIMIT_STACK).rlim_cur > stackaddr % getpagesize() == 0 > if the stack grows downward in memory, it must be OK to grow the > stack down to, but not necessarily beyond, stackaddr > conversely, if the stack grows upward, it must be OK to grow the > stack up to, but not necessarily beyond, stackaddr + stacksize > > Now, the entire headache here is that __libc_stack_end is *not* > necessarily page aligned and (on an architecture where the stack grows > downward in memory) > > __libc_stack_end - getrlimit(RLIMIT_STACK).rlim_cur > > will be a pointer to somewhere *beyond* the lowest address that the > kernel will enlarge the stack to, even if you round __libc_stack_end > up to the next page boundary before the subtraction. The function of > the code changed by my patch -- before and after -- is to determine > the actual boundaries of the lazy-allocation region for the initial > thread's stack. > > If we changed __libc_stack_end to point to the "bottom" (opposite the > direction of stack growth) of the entire stack region, then we could > simply subtract the rlimit size from it and have stackaddr. But > that's exactly the challenge: how do we know where that "bottom" is? > > I don't know where __libc_stack_end is set. Early startup code should > be able to do things that pthread_attr_t can't, like "find the > end-most address among all the pointers in argv, envp, and auxv, then > round end-wards to a page boundary" (where "end-most" and "end-wards" > mean "in the direction opposite to stack growth") but that might not > always give the right answer. I also don't know if there's any > existing code in libc that depends on __libc_stack_end _not_ pointing > past the information block (of course we could always add a new > __libc_info_block_end, or just fill in the initial thread's pthread_t > more thoroughly). > >> process_vm_readv seems quite likely to get blocked by seccomp filters. > > I was worried about that too :-/ > >> Maybe we can get the kernel to pass the end of the stack in the >> auxiliary vector? > > Sure, but then what do we do on older kernels? I'm reluctant to say > "keep the old code" because we know this is breaking for people right > now (although honestly "mount /proc earlier" isn't a terrible > suggestion for a workaround). > > zw I wonder if we could use inplace mremap (which should be a nop) to inform a more approximate value for the stack (the code only handles grown down architecture): uintptr_t pagesize = GLRO(dl_pagesize); char *stack_end_page = (char*) ALIGN_UP ((uintptr_t) __libc_stack_end, pagesize); size_t stacksize = pagesize; while (mremap (stack_end_page - stacksize - pagesize, pagesize, 2 * pagesize, 0) == MAP_FAILED && errno == ENOMEM) stacksize += pagesize; iattr->stackaddr = (void *) stack_end_page; iattr->stacksize = stacksize; On x86_64 it does show a value more similar to what [stack] segment reports, for instance with: 7ffffffdd000-7ffffffff000 rw-p 00000000 00:00 0 [stack] It returns stackaddr as 0x7ffffffdd000 with stacksize as 0x21000. It does not return the same value as current implementation, but I also see that current implementation returns both the address and size way large than what /proc/self/maps actually maps for the process.