From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by sourceware.org (Postfix) with ESMTPS id DD0713821FCA for ; Thu, 15 Sep 2022 16:09:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DD0713821FCA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=owlfolio.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=owlfolio.org Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id E8E5C5C00F8; Thu, 15 Sep 2022 12:09:57 -0400 (EDT) Received: from imap45 ([10.202.2.95]) by compute1.internal (MEProxy); Thu, 15 Sep 2022 12:09:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=owlfolio.org; h= cc:content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to; s=fm1; t=1663258197; x=1663344597; bh=GXwPpogknW D+HfLgf8zEuAG5+xDBLwSRIGWfNzHEbbc=; b=X/y6wsyR42P8+E0FJdCuXhx/Kp ku38F5mOYLJpU9JANa17hiq/vUl2lb954WvZoPtG5oK8IIKtCyD5fT5pDLTSzTPX exsXOWS94EjU/5Kk1/e63lmo7egv6UKoxKqFUw1toXqA+MIehkuOjzfD17bg9Y/m asPg5m9d9FkD3PMBv45Qc4cGMEwozS7lOeaISGPTaqbx52UhMsvn8spdmdStTM4K emZ0l2hTW0Eq8FYPyaCLgUxAP9FNvFuB+q4hprDkrEAOXp4Rn1MVJ2OQFZR0glMk t/qotTBgo8JKBiZm68BK2zEjhXcaXSDNQnGmINQLK9+d23ZzK+LiaW/BGnTA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1663258197; x=1663344597; bh=GXwPpogknWD+HfLgf8zEuAG5+xDB LwSRIGWfNzHEbbc=; b=ZYNNRq2MxnOkubePlZHco8f5xvUyNFtb9dv8DPT5U7hl mfrPxFYxOBm9SdX1IaC20/Tv8BYWMyDLBr8X6DLQKWowlE4xDSjd0VGQya+Iu9yO dIbe9wooQM7AyFX7Dpfs5t/ftMP/T9R7RHpgSZAkiwrEDsicJwkkkNnmsyoT3T5W 7mSOTLe/ir0bZxbbTfJ+Sss9eLWLMTzKTpeeiyrjy07EeeDy8JdUDnwo/EbeEfEe PYFCCUfjqLCb8rJmXCiEAgQ5y7IQVCFjzU4w84Zo2ruReoIZFOAquCd+XSGyExMt OiaQ/cXLaR79nHZi/NjSiY2giBqzFKgs+4aQkFeu/w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfedukedgleelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreertdenucfhrhhomhepfdgkrggt khcuhggvihhnsggvrhhgfdcuoeiirggtkhesohiflhhfohhlihhordhorhhgqeenucggtf frrghtthgvrhhnpefhuefhveeuffetfffgjeetgfekkeehfedtfeelgfehffffveehkeel fefgheffudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpeiirggtkhesohiflhhfohhlihhordhorhhg X-ME-Proxy: Feedback-ID: i876146a2:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 46149272007B; Thu, 15 Sep 2022 12:09:57 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-935-ge4ccd4c47b-fm-20220914.001-ge4ccd4c4 Mime-Version: 1.0 Message-Id: <9d232b1b-f123-4189-bf09-dd29aab6486a@www.fastmail.com> In-Reply-To: <87fsgvvbwq.fsf@oldenburg.str.redhat.com> References: <79dae81f-8e33-4499-a47a-93cc0903be6a@www.fastmail.com> <87fsgvvbwq.fsf@oldenburg.str.redhat.com> Date: Thu, 15 Sep 2022 12:09:36 -0400 From: "Zack Weinberg" To: "Florian Weimer" , "GNU libc development" Subject: Re: RFC PATCH: Don't use /proc/self/maps to calculate size of initial thread stack Content-Type: text/plain X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Sep 13, 2022, at 5:52 AM, Florian Weimer wrote: > * Zack Weinberg via Libc-alpha: >> for many years, the NPTL implementation has said that >> the stack starts at __libc_stack_end, rounded in the opposite >> direction from stack growth to the nearest page boundary, and extends >> for getrlimit(RLIMIT_STACK).rlim_cur bytes, *minus the size of the >> information block*, which is beyond __libc_stack_end. The rationale >> is that the resource limit is enforced against the entire memory area, >> so if we don't subtract the size of the information block, then the >> program will run out of stack a few pages before pthread_attr_getstack >> says it will. > > Do we actually have to subtract the size of the information block? > One could argue that this is just part of the arguments passed to main, > so sort-of-but-not-quite part of main's stack frame. We could make that change, but we'd need to make other changes as well to keep everything consistent, and I'm not sure _how_ to make that change without having the information that pthread_getattr_np is probing for. Suppose 'stackaddr' and 'stacksize' are the values reported by pthread_attr_getstack when applied to the initial thread. Then the invariants I think we need to preserve are: stacksize <= getrlimit(RLIMIT_STACK).rlim_cur stackaddr % getpagesize() == 0 if the stack grows downward in memory, it must be OK to grow the stack down to, but not necessarily beyond, stackaddr conversely, if the stack grows upward, it must be OK to grow the stack up to, but not necessarily beyond, stackaddr + stacksize Now, the entire headache here is that __libc_stack_end is *not* necessarily page aligned and (on an architecture where the stack grows downward in memory) __libc_stack_end - getrlimit(RLIMIT_STACK).rlim_cur will be a pointer to somewhere *beyond* the lowest address that the kernel will enlarge the stack to, even if you round __libc_stack_end up to the next page boundary before the subtraction. The function of the code changed by my patch -- before and after -- is to determine the actual boundaries of the lazy-allocation region for the initial thread's stack. If we changed __libc_stack_end to point to the "bottom" (opposite the direction of stack growth) of the entire stack region, then we could simply subtract the rlimit size from it and have stackaddr. But that's exactly the challenge: how do we know where that "bottom" is? I don't know where __libc_stack_end is set. Early startup code should be able to do things that pthread_attr_t can't, like "find the end-most address among all the pointers in argv, envp, and auxv, then round end-wards to a page boundary" (where "end-most" and "end-wards" mean "in the direction opposite to stack growth") but that might not always give the right answer. I also don't know if there's any existing code in libc that depends on __libc_stack_end _not_ pointing past the information block (of course we could always add a new __libc_info_block_end, or just fill in the initial thread's pthread_t more thoroughly). > process_vm_readv seems quite likely to get blocked by seccomp filters. I was worried about that too :-/ > Maybe we can get the kernel to pass the end of the stack in the > auxiliary vector? Sure, but then what do we do on older kernels? I'm reluctant to say "keep the old code" because we know this is breaking for people right now (although honestly "mount /proc earlier" isn't a terrible suggestion for a workaround). zw