From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2b.google.com (mail-oa1-x2b.google.com [IPv6:2001:4860:4864:20::2b]) by sourceware.org (Postfix) with ESMTPS id 38B393858D37 for ; Wed, 28 Jun 2023 08:20:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 38B393858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-1b078b34df5so482583fac.2 for ; Wed, 28 Jun 2023 01:20:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1687940445; x=1690532445; h=content-transfer-encoding:mime-version:user-agent:message-id :in-reply-to:date:errors-to:references:organization:subject:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=hWUDpcxNKaeQpFX1SzqjmVP15Axcm+P+zr4vtFD8SLs=; b=OoDqrdXnERLI7NimlOHEpmixQ0b5mu4YBZV2Eb/8QagtIysH6/MeKv7PXm5WcEjcH+ RB5V+DKrPqFMaWkzP7cKhtJF19Wif5Pc4sa37EG/VqvacUmY7BW4ET0qIGzbaoevl82C s0zVtF/76YVgHgk/GEOw+VGDyDkePm1UsZCgNvEwuEY/PHbAfm0bHN9t2wEHLarFfmFT r3nn9RwS+UNsaEEdY82N+W2nhkbkYTgANUjDlzYENDeSM+/x0DkxN8lDBuYaM/rQJTiN 2qPdvsmUqUAXhxHWIV8w6lQ3CQkQVY0EE5vicRDSqap6S/7g2uRhkTh9nv1uPZsVctT1 QxjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687940445; x=1690532445; h=content-transfer-encoding:mime-version:user-agent:message-id :in-reply-to:date:errors-to:references:organization:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hWUDpcxNKaeQpFX1SzqjmVP15Axcm+P+zr4vtFD8SLs=; b=Y7IrEyqEy2iycUHalaG2K6XMxefwdi6zd62mehr1GaczIdip/bQYtwW3wjv+WwIXUF Myrk7hRolgMrHeOWo0rwAnps1C/KcAqaewSp20uc+kh1j+CG2qqIINVqNuHMZzpLPzML 9kidcHNY7iwvlN/ghaxEfw3IWNQ1TzNemRsIWVzAhCxqX1zSy3lsDzN/qL/1B5bpcfMb jKYkbgQpvu/TcnT2p1x+SIl1M95oJdQtGsFzX1yl+PRGNjSHtTp1xFTOIki+ocwf/U92 LRcPQHojGWyYH/z7Gkn9HnF2B+OwV7iHG8mvxw5T9lIO/eirtU29JmwN1pOKKw7DY+Q/ OXew== X-Gm-Message-State: AC+VfDwfWVhvDWx2GPVA74AoO/xFfZr4W7wXjPgNecIBeM/GK9FKfIvv wfKmEddHMjgzCJzpoopgul6Eag== X-Google-Smtp-Source: ACHHUZ7uSSyK4vvrVJg+rHRZZZlKQd0Z53qNBk3HJiL0vptSYLreinjZ1Nmrgl+iRljHf8pcIl8a7g== X-Received: by 2002:a05:6871:c12:b0:180:b716:9825 with SMTP id ve18-20020a0568710c1200b00180b7169825mr36378734oab.57.1687940445423; Wed, 28 Jun 2023 01:20:45 -0700 (PDT) Received: from free.home ([2804:7f1:2080:5f5c:a5d4:5604:3034:12b5]) by smtp.gmail.com with ESMTPSA id er28-20020a056870c89c00b0019ed19a8659sm6254321oab.8.2023.06.28.01.20.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 01:20:44 -0700 (PDT) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 35S8KWaa1049092 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 28 Jun 2023 05:20:32 -0300 From: Alexandre Oliva To: Qing Zhao Cc: Qing Zhao via Gcc-patches , Jeremy Bennett , Craig Blackmore , Graham Markall , Martin Jambor , Jan Hubicka , Richard Biener , Jim Wilson Subject: Re: [PATCH v3] Introduce strub: machine-independent stack scrubbing Organization: Free thinker, does not speak for AdaCore References: Errors-To: aoliva@lxoliva.fsfla.org Date: Wed, 28 Jun 2023 05:20:32 -0300 In-Reply-To: (Qing Zhao's message of "Tue, 27 Jun 2023 21:28:41 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, Qing, On Jun 27, 2023, Qing Zhao wrote: > I am wondering why stack scrubbing, proposed in this patch series, cannot= do the stack scrubbing in the routine=E2=80=99s epilogue similar as > register scrubbing? There were multiple considerations that led to this design decision: - Stack scrubbing in epilogues would be highly target-dependent An epilogue expected to scrub the stack of its containing function would not usually be able to call memset; there might not even be registers available to do the cleaning, let alone to do it efficiently. Since epilogues are output after register allocation, the epilogue code generator would have to allocate registers itself to do the job, avoiding call-saved registers (that would have to be restored before scrubbing the stack holding them), those holding return values, and taking care of any machine- or ABI-specific conventions that apply to epilogues. - Exception Handling Raising or propagating an exception requires a function's stack frame to be active. It wouldn't be possible for e.g. a cleanup handler to clean up the stack frame holding it and then propagating the exception: either the scrubbing would have to leave much of the stack frame alone for propagation to work, or it would scrub too much and propagation would fail. So we had to devise a way for stack frames to be scrubbed and protect the sensitive data in them even if an exception is raised or propagated out of the sensitive frame. - Variable frame size Though many functions have static frame sizes, there are cases in which a function dynamically allocates and releases stack space, and that extra space should be scrubbed as well. So the improvements out of a known frame size are not a given, and we may need a watermark to handle the general case. Now consider that this watermark needs to survive past the point in which the epilogue restores call-saved registers, so that the save area can be scrubbed. Call-clobbered registers might not be available, or need scrubbing themselves. A caller-owned watermark relieves the callee from these contradictory requirements, enables the register pointing to the watermark to be reused by the callee as soon as it's no longer needed; aggregation of scrubbing, passing on the watermark when tail-calling another scrubbed subprogram; caller and callee to be compiled separately, circumstances in which the caller (in the strub("at-calls") mode) wouldn't know how much stack space used by the callee is to be scrubbed. - Watermark as in/out argument Thus, watermarks, and caller-based scrubbing were required, so we might as well use the same strategy for non-exceptional exit paths to make it portable. We've explored various possibilities of watermark passing to reduce the impact on the ABI: -- a single global variable wouldn't do in multi-threaded programs; we need per-thread stack information. TLS is not available on every target, it's emulated with high overhead on some, and even when it doesn't use part of the thread's stack for static thread-local storage, each caller of a scrubbing function would have to preserve that variable somehow (presumably in its own stack frame) before reusing it to communicate with its callee. -- a thread-local pointer to a heap-allocated parallel stack of stack-scrubbing ranges might avoid holding the watermarks in the stack, or passing pointers to them as arguments, leaving the entire scrub range management in the library. that would make the __strub_* library components heavy enough that inlining them would not be viable. Furthermore, making such low-level APIs heap allocators normally makes for problems of async-signal safety, and prevents heap implementations from relying on such low-level APIs. -- using the static chain machinery to convey to scrubbed callees access to the callee's watermark seems viable, if onerous, but the chained records live in the stack anyway, and there are targets that do not support static chains. -- an out parameter might do for "amount of stack used", but making it an in/out watermark enabled aggregation and tail-calling; early set-and-forget on fixed-size stack frames; and assured initialization, even in case of an early asynchronous exception. - Internal scrubbing Though we have implemented strub("internal") through wrappers that call the actual function and then scrub its stack space, we have envisioned an alternate implementation that, through machine-specific support, performs actual internal scrubbing, arranging the stack frame in such a way that epilogues and EH cleanups can scrub most, if not all of the stack frame (analogous to how the wrapper only scrubs the wrapped frame, not its own), and taking advantage of constant frame sizes where possible. At least with variable frame sizes, the amount of stack space to be scrubbed in the epilogue (or in an EH cleanup) will have to be held in a local variable or somesuch, and at least for nonleaf functions, that surely will end up in the stack one way or another. > 2. I have concerns on the runtime performance overhead, do you have any = data on this for your current implementation? Though one could conceivable build entire applications with the testing option -fstrub=3Dall, and that works AFAICT, the expected use case is marking sensitive functions or variables for strubbing, and there aren't benchmarks for this use case. > 3. You mentioned that there are several =E2=80=9Cmodes=E2=80=9D for this = feature, > could you please provide more details on the modes and their > description? There's strict vs relaxed, and there's internal vs at-calls. The documentation for these modes included in the patch in quite extensive. Rather than duplicating it here in other words, I suppose it would be a better "test" for the documentation to have others go through it, try to make sense of it, and point out passages that are unclear or hard to understand. WDYT? Thanks, --=20 Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about