From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by sourceware.org (Postfix) with ESMTPS id 99F7A3857C4D for ; Thu, 23 Nov 2023 10:56:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 99F7A3857C4D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 99F7A3857C4D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::102b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700736991; cv=none; b=hgRuuRUyKJRgTtOweRWesxdz2OuDyBn/LMS8GnCj7L5aF9IaX/eY2W6cLHVDNBsLFbvSbUYuElKREVcY0Ghs2eq9a0cPm9LJ2cVzsNgmVOvNuKYU6SzFQGdC7V0ozpwOiUDSpZ4jhEAnFAjQOpCBugjfLULTW5RBsQxp88ZfPNI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700736991; c=relaxed/simple; bh=TOjNcNdQ/Ghj9nAA6eS82DXS9fsXgU3JRyY0guF4i/s=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=CaszPVH4U9K4XNAONkVmSyOICz2ZAmtcTjKcNFsWEjYi3MX9ZJM2K5CUsU2eqEUDaho3wgM9IV2LrXgPojfajnuHFyVcjJmMMsA1pJFNIit/bq+SMDCpbdX6edffoT/nRNmaBsNtegaPMqbMINp3vRONDGsamxE1k2yiDIvXU5k= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-285196556fcso633909a91.0 for ; Thu, 23 Nov 2023 02:56:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1700736987; x=1701341787; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:user-agent:message-id :in-reply-to:date:references:organization:subject:cc:to:from:from:to :cc:subject:date:message-id:reply-to; bh=fw7NN1GaHPJv+bf1nEz6I8oq5NTIZlvPtyBvmuq+ZMw=; b=DBeu2YtjkvTT52ZVDNBndFtErey9EwQWF28uEb/TkHaix6PH5P6YnR/0B0tgRqTt72 w3EegQSPc/Jprzal6+hYcOgnAESJEbYe//WVJYKR2K+geiHpte+gV/FbDvAuNlL4rvE7 zA7Hee7F9UhmYRcLBPK+D08a2PqTgerNq9kMM3Ek7iHP2b4VrY5m8Gy3RgJVI3iHJue7 i723uxuo6whyo45R/UKy8njQn+iVq29OlZDVC+tYgXV0TQpivyX9kXWxiPMG19BsKm1x 4Jj6Jkn/+kCNGv5YmM9DbhH8tfWTL+hkeZf13jfeswYSLpCqjguRAq7usnyhbLF1EqOF HhBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700736987; x=1701341787; h=content-transfer-encoding:mime-version:user-agent:message-id :in-reply-to:date:references:organization:subject:cc:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fw7NN1GaHPJv+bf1nEz6I8oq5NTIZlvPtyBvmuq+ZMw=; b=wklq85AwN1CaeSE03peNdbWxyFOG8/jVUbeaOBwJTWqIKY++Z9b9KadgAP10pA3GKD UK8l/6q7XoPJap5v+O2DPTCpsyTytKPnrLFQDjb4dzQ2n7CzAFHaS+xZCJr4XnrDcdno d57Fq5w28JRIKhwXzLPLfSp2G+RCcxl6Fh6aa33fdF+bnVTcrRN1sXYaQuZKHVimwj2V wJVPyne1axa1m8EGc4lJ6eC07kjNFSP23NMhyklLfkKCMRo/DhpBKSP26CSwTNl+P2ZY WeWccQjSeO0Y5Tz4n+1TJSNHKHpxB8UfkOKPjP52PV9C8UCPOE4Ip2Nxd7wv4Bo4q0q0 bSeg== X-Gm-Message-State: AOJu0Yxm3bswWHddiCiPfia4PJXyrMR4xOhBthAWYN3agZgAq9unuURc eblj1WB3ayoHaOKgg/zO1SKtmQ== X-Google-Smtp-Source: AGHT+IHFfkJ2cqfmPLs9pW9Pbw28kwkRvos1lHSE2wWK8YjAmCPxezP/b41X1V3nmYlgWHRtDIiSXQ== X-Received: by 2002:a17:90a:354:b0:280:18ba:f016 with SMTP id 20-20020a17090a035400b0028018baf016mr5840814pjf.47.1700736987101; Thu, 23 Nov 2023 02:56:27 -0800 (PST) Received: from free.home ([2804:7f1:2080:6cc5:d323:4080:d880:1a26]) by smtp.gmail.com with ESMTPSA id 5-20020a17090a198500b00280fcbbe774sm1131115pji.10.2023.11.23.02.56.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Nov 2023 02:56:26 -0800 (PST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 3ANAu7rf248357 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 23 Nov 2023 07:56:07 -0300 From: Alexandre Oliva To: Richard Biener Cc: gcc-patches@gcc.gnu.org, Jeremy Bennett , Craig Blackmore , Graham Markall , Martin Jambor , Jan Hubicka , Jim Wilson , Jeff Law , Jakub Jelinek Subject: Re: [PATCH v4] Introduce strub: machine-independent stack scrubbing Organization: Free thinker, does not speak for AdaCore References: Date: Thu, 23 Nov 2023 07:56:07 -0300 In-Reply-To: (Richard Biener's message of "Wed, 22 Nov 2023 15:14:23 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, Richi, Thanks for the extensive review! On Nov 22, 2023, Richard Biener wrote: > On Mon, Nov 20, 2023 at 1:40=E2=80=AFPM Alexandre Oliva wrote: >>=20 >> On Oct 26, 2023, Alexandre Oliva wrote: >>=20 >> >> This is a refreshed and improved version of the version posted back in >> >> June. https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621936.html >>=20 >> > Ping? https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633675.ht= ml >> > I'm combining the gcc/ipa-strub.cc bits from >> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633526.html >>=20 >> Ping? >> Retested on x86_64-linux-gnu, with and without -fstrub=3Dall. > @@ -898,7 +899,24 @@ decl_attributes (tree *node, tree attributes, int fl= ags, > TYPE_NAME (tt) =3D *node; > } > - *anode =3D cur_and_last_decl[0]; > + if (*anode !=3D cur_and_last_decl[0]) > + { > + /* Even if !spec->function_type_required, allow the attribute > + handler to request the attribute to be applied to the function > + type, rather than to the function pointer type, by setting > + cur_and_last_decl[0] to the function type. */ > + if (!fn_ptr_tmp > + && POINTER_TYPE_P (*anode) > + && TREE_TYPE (*anode) =3D=3D cur_and_last_decl[0] > + && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (*anode))) > + { > + fn_ptr_tmp =3D TREE_TYPE (*anode); > + fn_ptr_quals =3D TYPE_QUALS (*anode); > + anode =3D &fn_ptr_tmp; > + } > + *anode =3D cur_and_last_decl[0]; > + } > + > what is this a workaround for? For the fact that the strub attribute attaches to types, whether data or function types, so we can't have fn_type_req, but when it's a function or pointer-to-function type, we want to affect the function type, rather than the pointer type, when the attribute has an argument. The argument names the strub mode for a function; that only applies to function types, never to data types. The hunk above introduces the means for the attribute handler to choose what to attach the attribute t. > Isn't there a suitable parsing position for placing the attribute? It's been a while, but IIRC the need for this first came up in Ada, where attributes can't just go anywhere, and it was further complicated by the fact that Ada doesn't have first-class function or procedure types, only access-to-them, but we needed some means for the attributes to apply to the function type. > +#ifndef STACK_GROWS_DOWNWARD > +# define STACK_TOPS GT > +#else > +# define STACK_TOPS LT > +#endif > according to docs this is defined to 0 or 1 so the above looks wrong > (it's always defined). Ugh. Thanks, will fix. (I'm pretty sure I had notes somewhere stating that stack-grows-upwards hadn't been tested, and that was for the sheer lack of platforms making that choice, but I hoped it wasn't that broken :-( > + if (optimize < 2 || optimize_size || flag_no_inline) > + return NULL_RTX; > I'm wondering about these checks in the expansions of the builtins, > I think this is about inline expanding or emitting a libcall, right? Yeah. > I wonder if you should use optimize_function_for_speed (cfun) instead? > Usually -fno-inline shouldn't affect such calls, but -fno-builtin-FOO wou= ld. > I have no strong opinion here though. I've occasionally wondered whether builtins were the best framework for these semi-internal calls. > The new builtins seem undocumented - usually those are documented > within extend.texi Erhm... Weird. I had documentation for them. (checks) No, it's there, in extend.texi, right after __builtin_stack_address. It's admittedly a big patch :-/ > I guess placing __builtin___strub_enter calls in the code manually > will break in interesting ways - if that's not supposed to happen the > trick is to embed a space in the name of the built-in. Yeah, I was a little torn between the choices here. On the one hand, I needed visible symbols for the out-of-line implementations, so I figured that trying to hide the builtins wouldn't bring any advantage. However, I've also designed the builtins with interfaces that would avoid disruption even with explicit calls. __strub_enter and __strub_update only initialize or adjust a pointer handed to them. __strub_leave will erase things from the top of the stack to the pointer, so if the watermark is "active stack", nothing happens, and things only get cleared if it points to "unused stack space". There's potential for disruption if one passes a statically-allocated pointer to it, but nothing much different from memsetting that memory range, core wars-style. > -symtab_node::reset (void) > +symtab_node::reset (bool preserve_comdat_group) > not sure what for, I'll leave Honza to comment. This restores the possibility of getting the pre-PR107897 behavior, that the strub wrapper/wrapped splitting relied on. Conceptually, the original function becomes the wrapped one, and the wrapper that calls it is kind of an implementation detail to preserve the exposed API/ABI while introducing strubbing around the body, so preserving the comdat group makes sense. ISTR getting strub regressions when the patch for PR107897 was put in, but my notes don't detail what broke, alas. > +/* Create a distinct copy of the type of NODE's function, and change > + the fntype of all calls to it with the same main type to the new > + type. */ > + > +static void > +distinctify_node_type (cgraph_node *node) > +{ > + tree old_type =3D TREE_TYPE (node->decl); > + tree new_type =3D build_distinct_type_copy (old_type); > + tree new_ptr_type =3D NULL_TREE; > + > + /* Remap any calls to node->decl that use old_type, or a variant > + thereof, to new_type as well. We don't look for aliases, their > + declarations will have their types changed independently, and > + we'll adjust their fntypes then. */ > + for (cgraph_edge *e =3D node->callers; e; e =3D e->next_caller) > + { > + if (!e->call_stmt) > + continue; > + tree fnaddr =3D gimple_call_fn (e->call_stmt); > + gcc_checking_assert (TREE_CODE (fnaddr) =3D=3D ADDR_EXPR > + && TREE_OPERAND (fnaddr, 0) =3D=3D node->decl); > + if (strub_call_fntype_override_p (e->call_stmt)) > + continue; > + if (!new_ptr_type) > + new_ptr_type =3D build_pointer_type (new_type); > + TREE_TYPE (fnaddr) =3D new_ptr_type; > + gimple_call_set_fntype (e->call_stmt, new_type); > + } > + > + TREE_TYPE (node->decl) =3D new_type; > it does feel like there's IPA mechanisms to deal with what you are trying= to do > here (or in the caller(s)). If there is, I couldn't find it. There are some vaguely similar operations, but nothing that will modify the ABI of exported functions like what strub-at-calls purports to do. A synthetic argument gets added to the function's interface, but since that type could conceivably have been associated/unified with non-strub types, or with similar types with different strub modes that don't undergo such transformations, we have to build a new type for the function, and adjust the type in the call graph. IPA can do this as part of replacing calls with specialized clones of a function, but here the attribute calls for an adjustment to the function type, vaguely resembling the introduction of the implicit 'this' in non-static member functions in C++, but without much help from the front-end, and without exposing the formal to the front-end. It doesn't seem to me that IPA is willing to help with that, and even if it were, it would require strub to become an actual IPA pass, which would be a huge undertaking without clear benefit. > +unsigned int > +pass_ipa_strub_mode::execute (function *) > +{ > + last_cgraph_order =3D 0; > + ipa_strub_set_mode_for_new_functions (); > + > + /* Verify before any inlining or other transformations. */ > + verify_strub (); > if (flag_checking) verify_strub (); > please. No, no, verify_strub is not an internal consistency check, it's part of the implementation that verifies that strictness requirements have been met WRT calls requested by the user. In strict mode, a strub function can only call other strub functions or functions explicitly marked as callable from strub contexts. This security requirement is to be enforced, and diagnostics are to be issued in case of violation, regardless of the self-checking mode of the compiler. > I guess we talked about this last year - what's the reason to have both > an IPA pass and a simple IPA pass? We did. I didn't recall all the reasons on the spot when you asked, but I followed up by email: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603255.html The earlier pass, that must be placed before any inlining, is the simple IPA one, and it only assigns modes. It must be that early because strub-enabled functions must never be inlined into strub-disabled ones. So we have to determine modes before any sort of inlining. But we can and want to inline strub-enabled functions into other strub-enabled functions. But we don't want to inline a wrapped function back into its wrapper, so we want to (early) inline functions before they're split, and we also want to inline wrappers after they're split if possible. So the assignment pass has to be before any inlining, and the wrapping/splitting (for internal strub) must be after some inlining but before the last attempt at inlining. > IIRC the simple IPA pass is a simple one because it wants to see > inlined bodies and "fixes" those up? Some toplevel comments > explaining both passes in the ipa-strub.cc pass would be nice to have. ACK, will do. > I guess I also asked before - did you try it with -flto? Yeah, I've tested it extensively with -flto during development. I've run the GCC testsuite with a patch equivalent to -fstrub=3Dall enabled by default during development, and I still do so occasionally. I didn't before the ping, but prompted by you, I did again. Torture tests and specific lto tests don't seem to face any trouble. And there's no reason why they should. The spots in which the strub passes were introduced involved exploring where they fit best, and adding calls such as analyzing nodes. There are even commented-out vestigial calls of inline_analyze_function, from a time when the pass was inserted at a different spot and that call was required. > + /* Decide which of the wrapped function's parms we want to turn into > + references to the argument passed to the wrapper. In general, > we want to > + copy small arguments, and avoid copying large ones. > Variable-sized array > + lengths given by other arguments, as in 20020210-1.c, would lead = to > + problems if passed by value, after resetting the original functio= n and > + dropping the length computation; passing them by reference works. > + DECL_BY_REFERENCE is *not* a substitute for this: it involves cop= ying > + anyway, but performed at the caller. */ > + indirect_parms_t indirect_nparms (3, false); > ... > IMHO it's a bad idea, at least initially, to mess with the ABI in a heuri= stical > way this way. Do you realize that this is only about the wrapped ABI, used exclusively by the wrapper, to avoid pointlessly copying large arguments twice? The wrapper already needs a custom ABI, because of the added watermark pointer, and va_list and whatnot when needed, so a little further customization to avoid a quite significant overhead seemed desirable. > I see you need all sorts of "fixup", including re-gimplification > of stmts, etc., double-ugh. *nod*, I've considered going the nested-function way, enabling the wrapped function to access wrapper's variables directly, but (i) support for nested functions isn't available everywhere, and (ii) strub functions could already be nested functions, and the wrapped body would need access to both enclosing contexts, so adjustments would have to be made anyway. > Ideally IPA-SRA can already do parts of those > transforms which would mean ipa-param-manipulation has generic code > to encode them? You'd think so. I tried that, run into various issues, still have pending patches I submitted separately to address them, but... at some point I had to stop waiting for feedback and had make it work without that. I figured what I was doing wasn't a good fit and moved on. Now, it would be great if IPA could enable me to split the entire body of a function *while* adding custom arguments, such as the watermark, va_list, etc. It didn't look like it could or even wanted to, but maybe the presence of a useful feature like stack scrubbing could motivate the introduction of such infrastructure. > I realize you can't tail-call, but I wonder if some clever > machine-specific tricks would work, I guess it would need the wrapper > to be target generated of course. I'm not sure what you have in mind here, but one of the goals was to introduce stack scrubbing without machine-dependent code. We are considering an improvement for strub(internal) to do away with wrappers given machine-specific prologue/epilogue support, but that's not even in the roadmap, and it's not clear how that would interact with exceptions, which was a primary driver for the currently-proposed approaches. > As it's only for optimization, how much of the code would go away when > you avoid changing the parameters of the wrapped function? We can't avoid changing them entirely, and IIRC some of the infrastructure for regimplification was also used for va_list and apply_args handling, so it wouldn't save all that much. It is an important optimization, to avoid doubling the stack requirements for parameters with -fstrub=3Dinternal (because wrapper passes arguments on to wrapped body), and the optimization is already implemented, so it would be a pity to take it out, and you wouldn't even get rid of any significant amount of code. > Could one easily disable the optimization via a --param maybe? The param is not implemented, but guarding the loop you commented on by it would accomplish it. You probably won't want to bootstrap GCC with -fstrub=3Dall along with it, though ;-) > There's still much commented code and FIXMEs in, a lot of cgraph > related code is used which I can't review very well. Now that you mentioned it, I realize I hadn't yet dropped some of the preprocessed-out bits when I posted v4. I did so afterwards. I was trying to avoid posting such a monster patch again, but I guess I will have to post at least another version :-( But yeah, there are some wishlist items that remain, and some bits that could probably go away. > I fear that -fstrub is going to be actually used by people - are you up > to the task fixing things? I, for one, would welcome that, and I am (professionally) on the hook to fix it, so I'd better be up to it ;-) Building it on top of infrastructure I'm closely familiar with, rather than e.g. on IPA, that I'm not so much, is a plus in this regard. That said, I've fixed bugs in GCC most everywhere, from front-ends to back-ends, so I'm pretty sure I've got it covered in terms of knowledge. As for time, if it gets too much interest (is there such a thing? :-) I might have trouble keeping up (there are only so many hours in the day), and it would likely be wise then for others to become familiar with it. I'd be happy to share knowledge, and I know I have support from my "employer" (in quotes because I'm not formally employed, I'm a consultant) to maintain the code I contribute to the community on its behalf, even having other work and non-work commitments. So, bring them on, I say ;-) > +void ATTRIBUTE_STRUB_CALLABLE > +__strub_enter (void **watermark) > +{ > + *watermark =3D __builtin_frame_address (0); > in your experience, do these (or the inline version) work reliably > when optimizations like omitting the frame pointer or shrink-wrapping > happen from a security point of view? Yeah. We're talking about the frame address for the current function here, and the compiler knows what it is, even if it optimizes the heck out of the function. In the builtin expansion, it's the top of the stack instead, and the compiler also has to have a good grasp of where that is. Red zones are a small complication, but not really much trouble here: it's a watermark we're talking about, how high the stack may have gotten, so we know we have to scrub no further than that. If it goes too far (as it often does with red zones), we just scrub a little too much. What shouldn't happen is not going far enough, but since we rely on the stack frame address for __strub_update to update the watermark, and the caller can't really misrepresent its own stack use without risking the callee to corrupt it, this works really well. If strub_update gets builtin-expanded inline, it uses the adjusted (for red zone) stack pointer instead, after any internal stack allocation (prologue and alloca), but it won't necessarily capture non-accumulated outgoing args. The arguments passed to a callee are not regarded as part of the caller stack frame, so that should be fine. In order to cover them, one must have a strub-enabled caller and a strub(at-calls) callee. > You are adding symbols to libgcc but I don't see you amending > libgcc/libgcc-std.ver.in? That's a good catch, thanks. hardcfr symbols are missing too. Will fix. > Neither the documentation in extend.texi nor the one in invoke.texi > mentions the target audience of -fstrub ACK, good point, despite all the detail, that is too implicit. Will fix. > The patches mention the red zone, documenting the guarantees and > caveats would be nice to have. I think the documentation belongs to > the attribute documentation. It is kind of there, but since the details vary depending on the strub mode, they're given where the modes are detailed, rather than as if applying to all strub modes. (some of the modes are disabled and callable, that don't do any strubbing whatsoever) > I hope Honza can have a quick look over the IPA passes. Likewise. I shall wait for further feedback before posting a huge v5, but I'll be happy to refresh the users/aoliva/heads/strub branch in the git repo as I adjust the patch if that helps. Currently the patch is in users/aoliva/heads/testme. > a better way than taking what is percieved as a huge ugly part > of the patch. ISTM you're misperceiving the amount of code involved in that optimization. There is a significant amount of wrapper/wrapped parameter manipulation that is *not* involved with the optimization, and that was already there before the optimization was implemented. The optimization was a tiny addition over it. Thanks again, --=20 Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive