From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=b3Wu=CQ=adacore.com=oliva@sourceware.org>
Received: from mail-oa1-x2b.google.com (mail-oa1-x2b.google.com [IPv6:2001:4860:4864:20::2b])
	by sourceware.org (Postfix) with ESMTPS id 38B393858D37
	for <gcc-patches@gcc.gnu.org>; Wed, 28 Jun 2023 08:20:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 38B393858D37
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com
Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-1b078b34df5so482583fac.2
        for <gcc-patches@gcc.gnu.org>; Wed, 28 Jun 2023 01:20:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=adacore.com; s=google; t=1687940445; x=1690532445;
        h=content-transfer-encoding:mime-version:user-agent:message-id
         :in-reply-to:date:errors-to:references:organization:subject:cc:to
         :from:from:to:cc:subject:date:message-id:reply-to;
        bh=hWUDpcxNKaeQpFX1SzqjmVP15Axcm+P+zr4vtFD8SLs=;
        b=OoDqrdXnERLI7NimlOHEpmixQ0b5mu4YBZV2Eb/8QagtIysH6/MeKv7PXm5WcEjcH+
         RB5V+DKrPqFMaWkzP7cKhtJF19Wif5Pc4sa37EG/VqvacUmY7BW4ET0qIGzbaoevl82C
         s0zVtF/76YVgHgk/GEOw+VGDyDkePm1UsZCgNvEwuEY/PHbAfm0bHN9t2wEHLarFfmFT
         r3nn9RwS+UNsaEEdY82N+W2nhkbkYTgANUjDlzYENDeSM+/x0DkxN8lDBuYaM/rQJTiN
         2qPdvsmUqUAXhxHWIV8w6lQ3CQkQVY0EE5vicRDSqap6S/7g2uRhkTh9nv1uPZsVctT1
         QxjQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1687940445; x=1690532445;
        h=content-transfer-encoding:mime-version:user-agent:message-id
         :in-reply-to:date:errors-to:references:organization:subject:cc:to
         :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=hWUDpcxNKaeQpFX1SzqjmVP15Axcm+P+zr4vtFD8SLs=;
        b=Y7IrEyqEy2iycUHalaG2K6XMxefwdi6zd62mehr1GaczIdip/bQYtwW3wjv+WwIXUF
         Myrk7hRolgMrHeOWo0rwAnps1C/KcAqaewSp20uc+kh1j+CG2qqIINVqNuHMZzpLPzML
         9kidcHNY7iwvlN/ghaxEfw3IWNQ1TzNemRsIWVzAhCxqX1zSy3lsDzN/qL/1B5bpcfMb
         jKYkbgQpvu/TcnT2p1x+SIl1M95oJdQtGsFzX1yl+PRGNjSHtTp1xFTOIki+ocwf/U92
         LRcPQHojGWyYH/z7Gkn9HnF2B+OwV7iHG8mvxw5T9lIO/eirtU29JmwN1pOKKw7DY+Q/
         OXew==
X-Gm-Message-State: AC+VfDwfWVhvDWx2GPVA74AoO/xFfZr4W7wXjPgNecIBeM/GK9FKfIvv
	wfKmEddHMjgzCJzpoopgul6Eag==
X-Google-Smtp-Source: ACHHUZ7uSSyK4vvrVJg+rHRZZZlKQd0Z53qNBk3HJiL0vptSYLreinjZ1Nmrgl+iRljHf8pcIl8a7g==
X-Received: by 2002:a05:6871:c12:b0:180:b716:9825 with SMTP id ve18-20020a0568710c1200b00180b7169825mr36378734oab.57.1687940445423;
        Wed, 28 Jun 2023 01:20:45 -0700 (PDT)
Received: from free.home ([2804:7f1:2080:5f5c:a5d4:5604:3034:12b5])
        by smtp.gmail.com with ESMTPSA id er28-20020a056870c89c00b0019ed19a8659sm6254321oab.8.2023.06.28.01.20.44
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 28 Jun 2023 01:20:44 -0700 (PDT)
Received: from livre (livre.home [172.31.160.2])
	by free.home (8.15.2/8.15.2) with ESMTPS id 35S8KWaa1049092
	(version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
	Wed, 28 Jun 2023 05:20:32 -0300
From: Alexandre Oliva <oliva@adacore.com>
To: Qing Zhao <qing.zhao@oracle.com>
Cc: Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org>,
        Jeremy Bennett
 <jeremy.bennett@embecosm.com>,
        Craig Blackmore
 <craig.blackmore@embecosm.com>,
        Graham Markall
 <graham.markall@embecosm.com>,
        Martin Jambor <mjambor@suse.cz>, Jan
 Hubicka <hubicka@ucw.cz>,
        Richard Biener <richard.guenther@gmail.com>,
        Jim Wilson <wilson@tuliptree.org>
Subject: Re: [PATCH v3] Introduce strub: machine-independent stack scrubbing
Organization: Free thinker, does not speak for AdaCore
References: <ormtqpsbuc.fsf@lxoliva.fsfla.org>
	<orpmti5ikv.fsf@lxoliva.fsfla.org>
	<or35eko33q.fsf_-_@lxoliva.fsfla.org>
	<or5y7ox7jb.fsf_-_@lxoliva.fsfla.org>
	<A553C465-A6D7-4BA7-BF44-FF86E34D14D1@oracle.com>
Errors-To: aoliva@lxoliva.fsfla.org
Date: Wed, 28 Jun 2023 05:20:32 -0300
In-Reply-To: <A553C465-A6D7-4BA7-BF44-FF86E34D14D1@oracle.com> (Qing Zhao's
	message of "Tue, 27 Jun 2023 21:28:41 +0000")
Message-ID: <orh6qsf167.fsf@lxoliva.fsfla.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Scanned-By: MIMEDefang 2.84
X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hello, Qing,

On Jun 27, 2023, Qing Zhao <qing.zhao@oracle.com> wrote:

> I am wondering why stack scrubbing, proposed in this patch series, cannot=
 do the stack scrubbing in the routine=E2=80=99s epilogue similar as
> register scrubbing?

There were multiple considerations that led to this design decision:

- Stack scrubbing in epilogues would be highly target-dependent

An epilogue expected to scrub the stack of its containing function would
not usually be able to call memset; there might not even be registers
available to do the cleaning, let alone to do it efficiently.  Since
epilogues are output after register allocation, the epilogue code
generator would have to allocate registers itself to do the job,
avoiding call-saved registers (that would have to be restored before
scrubbing the stack holding them), those holding return values, and
taking care of any machine- or ABI-specific conventions that apply to
epilogues.

- Exception Handling

Raising or propagating an exception requires a function's stack frame to
be active.  It wouldn't be possible for e.g. a cleanup handler to clean
up the stack frame holding it and then propagating the exception: either
the scrubbing would have to leave much of the stack frame alone for
propagation to work, or it would scrub too much and propagation would
fail.

So we had to devise a way for stack frames to be scrubbed and protect
the sensitive data in them even if an exception is raised or propagated
out of the sensitive frame.

- Variable frame size

Though many functions have static frame sizes, there are cases in which
a function dynamically allocates and releases stack space, and that
extra space should be scrubbed as well.  So the improvements out of a
known frame size are not a given, and we may need a watermark to handle
the general case.

Now consider that this watermark needs to survive past the point in
which the epilogue restores call-saved registers, so that the save area
can be scrubbed.  Call-clobbered registers might not be available, or
need scrubbing themselves.

A caller-owned watermark relieves the callee from these contradictory
requirements, enables the register pointing to the watermark to be
reused by the callee as soon as it's no longer needed; aggregation of
scrubbing, passing on the watermark when tail-calling another scrubbed
subprogram; caller and callee to be compiled separately, circumstances
in which the caller (in the strub("at-calls") mode) wouldn't know how
much stack space used by the callee is to be scrubbed.

- Watermark as in/out argument

Thus, watermarks, and caller-based scrubbing were required, so we might
as well use the same strategy for non-exceptional exit paths to make it
portable.

We've explored various possibilities of watermark passing to reduce the
impact on the ABI:

-- a single global variable wouldn't do in multi-threaded programs;
we need per-thread stack information.  TLS is not available on every
target, it's emulated with high overhead on some, and even when it
doesn't use part of the thread's stack for static thread-local
storage, each caller of a scrubbing function would have to preserve
that variable somehow (presumably in its own stack frame) before
reusing it to communicate with its callee.

-- a thread-local pointer to a heap-allocated parallel stack of
stack-scrubbing ranges might avoid holding the watermarks in the
stack, or passing pointers to them as arguments, leaving the entire
scrub range management in the library.  that would make the __strub_*
library components heavy enough that inlining them would not be
viable.  Furthermore, making such low-level APIs heap allocators
normally makes for problems of async-signal safety, and prevents heap
implementations from relying on such low-level APIs.

-- using the static chain machinery to convey to scrubbed callees
access to the callee's watermark seems viable, if onerous, but the
chained records live in the stack anyway, and there are targets that
do not support static chains.

-- an out parameter might do for "amount of stack used", but
making it an in/out watermark enabled aggregation and tail-calling;
early set-and-forget on fixed-size stack frames; and assured
initialization, even in case of an early asynchronous exception.

- Internal scrubbing

Though we have implemented strub("internal") through wrappers that call
the actual function and then scrub its stack space, we have envisioned
an alternate implementation that, through machine-specific support,
performs actual internal scrubbing, arranging the stack frame in such a
way that epilogues and EH cleanups can scrub most, if not all of the
stack frame (analogous to how the wrapper only scrubs the wrapped frame,
not its own), and taking advantage of constant frame sizes where
possible.  At least with variable frame sizes, the amount of stack space
to be scrubbed in the epilogue (or in an EH cleanup) will have to be
held in a local variable or somesuch, and at least for nonleaf
functions, that surely will end up in the stack one way or another.

> 2.  I have concerns on the runtime performance overhead, do you have any =
data on this for your current implementation?

Though one could conceivable build entire applications with the testing
option -fstrub=3Dall, and that works AFAICT, the expected use case is
marking sensitive functions or variables for strubbing, and there aren't
benchmarks for this use case.


> 3. You mentioned that there are several =E2=80=9Cmodes=E2=80=9D for this =
feature,
> could you please provide more details on the modes and their
> description?

There's strict vs relaxed, and there's internal vs at-calls.  The
documentation for these modes included in the patch in quite extensive.
Rather than duplicating it here in other words, I suppose it would be a
better "test" for the documentation to have others go through it, try to
make sense of it, and point out passages that are unclear or hard to
understand.  WDYT?

Thanks,

--=20
Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
   Free Software Activist                       GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>