public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Qing Zhao <QING.ZHAO@ORACLE.COM>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
	gcc Patches <gcc-patches@gcc.gnu.org>,
	kees Cook <keescook@chromium.org>
Subject: Re: How to traverse all the local variables that declared in the current routine?
Date: Thu, 3 Dec 2020 10:07:28 -0600	[thread overview]
Message-ID: <8A595141-563A-4CDC-AA09-BAB76E5113AA@ORACLE.COM> (raw)
In-Reply-To: <CAFiYyc033-ZA4pHvQEEhDDyL3471HAxdoXELHRXeTJvydheRYw@mail.gmail.com>



> On Dec 3, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 2, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> Hi, Richard,
>> 
>> Could you please comment on the following approach:
>> 
>> Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification.
>> However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to
>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes,
>> (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p”  to consider “artificial” zero-initializers.
>> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value).
>> 
>> With such approach, we should be able to address all those conflicts.
>> 
>> Do you see any obvious issue with this approach?
>> 
>> 
>> Yes, DSE will happily elide an explicit zero-init following the
>> artificial one leading to false uninit diagnostics.
>> 
>> 
>> Indeed.  This is a big issue. And other optimizations might also be impacted by the new zero-init, resulting changed behavior
>> of uninitialized analysis in the later stage.
> 
> I don't see how the issue can be resolved, you can't get both, uninit
> warnings and no uninitialized memory.
> People can compile twice, once without -fzero-init to get uninit
> warnings and once with -fzero-init to get
> the extra "security".

So, for GCC, you think that it’s okay to get rid of the following requirement:

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.

Then, we can add explanation in the user documentation of the new -fzero-init and also 
that of the -Wuninitialized to inform users that -fzero-init will change the behavior of -Wuninitialized.
In order to get the warnings, -fzero-init should not be added at the same time?

With this requirement being eliminated, implementation will be much easier. 

We can add the new initialization during simplification phase. Then this new option will work
for all languages.  Is this reasonable?

thanks.

Qing



> 
> Richard.
> 
>> 
>> What's the intended purpose of the zero-init?
>> 
>> 
>> 
>> The purpose of this new option is: (from the original LLVM patch submission):
>> 
>> "Add an option to initialize automatic variables with either a pattern or with
>> zeroes. The default is still that automatic variables are uninitialized. Also
>> add attributes to request uninitialized on a per-variable basis, mainly to disable
>> initialization of large stack arrays when deemed too expensive.
>> 
>> This isn't meant to change the semantics of C and C++. Rather, it's meant to be
>> a last-resort when programmers inadvertently have some undefined behavior in
>> their code. This patch aims to make undefined behavior hurt less, which
>> security-minded people will be very happy about. Notably, this means that
>> there's no inadvertent information leak when:
>> 
>> • The compiler re-uses stack slots, and a value is used uninitialized.
>> • The compiler re-uses a register, and a value is used uninitialized.
>> • Stack structs / arrays / unions with padding are copied.
>> This patch only addresses stack and register information leaks. There's many
>> more infoleaks that we could address, and much more undefined behavior that
>> could be tamed. Let's keep this patch focused, and I'm happy to address related
>> issues elsewhere."
>> 
>> For more details, please refer to the LLVM code review discussion on this patch:
>> https://reviews.llvm.org/D54604
>> 
>> 
>> I also wrote a simple writeup for this task based on my study and discussion with
>> Kees Cook (cc’ing him) as following:
>> 
>> 
>> thanks.
>> 
>> Qing
>> 
>> Support stack variables auto-initialization in GCC
>> 
>> 11/19/2020
>> 
>> Qing Zhao
>> 
>> =======================================================
>> 
>> 
>> ** Background of the task:
>> 
>> The correponding GCC bugzilla RFE was created on 9/3/2018:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210
>> 
>> A similar option for LLVM (around Nov, 2018)
>> https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html
>> had invoked a lot of discussion before committed.
>> 
>> (The following are quoted from the comments of Alexander Potapenko in
>> GCC bug 87210):
>> 
>> Finally, on Oct, 2019, upstream Clang supports force initialization
>> of stack variables under the -ftrivial-auto-var-init flag.
>> 
>> -ftrivial-auto-var-init=pattern initializes local variables with a 0xAA pattern
>> (actually it's more complicated, see https://reviews.llvm.org/D54604)
>> 
>> -ftrivial-auto-var-init=zero provides zero-initialization of locals.
>> This mode isn't officially supported yet and is hidden behind an additional
>> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang flag.
>> This is done to avoid creating a C++ dialect where all variables are
>> zero-initialized.
>> 
>> Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that performs
>> the build  with -ftrivial-auto-var-init=pattern. This one isn't widely adopted
>> yet, partially because initializing locals with 0xAA isn't fast enough.
>> 
>> Linus Torvalds is quite positive about zero-initializing the locals though,
>> see https://lkml.org/lkml/2019/7/30/1303:
>> 
>> "when a compiler has an option to initialize stack variables, it
>> would probably _also_ be a very good idea for that compiler to then
>> support a variable attribute that says "don't initialize _this_
>> variable, I will do that manually".
>> I also think that the "initialize with poison" is
>> pointless and wrong. Yes, it can find bugs, but it doesn't really help
>> improve the general situation, and people see it as a debugging tool,
>> not a "improve code quality and improve the life of kernel developers"
>> tool.
>> 
>> So having a flag similar to -ftrivial-auto-var-init=zero in GCC will be
>> appreciated by the Linux kernel community.
>> 
>> currently, kernel is using a gcc plugin to support stack variables
>> auto-initialization:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c
>> 
>> ** Current situation:
>> 
>> A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support pattern init and
>> zero init already;
>> http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
>> https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/
>> Pattern init is used in development build for debugging purpose, zero init is
>> used in production build for security purpose.
>> 
>> B. for CLANG, even though zero init is controlled by
>> "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang",
>> many end users have used it for production build.
>> this functionality cannot be removed anymore.
>> "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang"
>> might be changed to more meaningful name later in CLANG.
>> 
>> 
>> ** My proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> 
>> 
>> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> 
>> 
>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>> 
>> 
>> I think both as long as they are source-level auto-variables. Then which one is better?
>> 
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag-Wmaybe-uninitialized.
>> 
>> 
>> 
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>> 
>> if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>  {
>>    tree init = DECL_INITIAL (decl);
>> ...
>>    if (init && init != error_mark_node)
>>      {
>>        if (!TREE_STATIC (decl))
>>  {
>>    DECL_IS_INITIALIZED(decl) = 1;
>>  }
>> 
>> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>> 
>> 
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?
>> 
>> 
>> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>> 
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>> 
>> I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> 
>> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>> 
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> This is a really good point…
>> 
>> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
>> for this issue). However, if we have to met the following requirement:
>> 
>> 
>> So is optimization supposed to pick up zero or is it supposed to act
>> as if the initializer
>> is unknown?
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>> 
>> So, this is a problem that is not easy to resolve.
>> 
>> 
>> Indeed, those are conflicting goals.
>> 
>> Do you have suggestion on this?
>> 
>> 
>> No, not any easy ones.  Doing more of the uninit analysis early (there
>> is already an early
>> uninit pass) which would mean doing IPA analysis turing GCC into more
>> of a static analysis
>> tool.  Theres the analyzer now, not sure if that can employ an early
>> LTO phase for example.
>> 
>> 
>> 
>> 
>> Richard.


  reply	other threads:[~2020-12-03 16:09 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-23 23:05 Qing Zhao
2020-11-24  7:32 ` Richard Biener
2020-11-24 15:47   ` Qing Zhao
2020-11-24 15:55     ` Richard Biener
2020-11-24 16:54       ` Qing Zhao
2020-11-25  9:11         ` Richard Biener
2020-11-25 17:41           ` Qing Zhao
2020-12-01 19:47           ` Qing Zhao
2020-12-02  8:45             ` Richard Biener
2020-12-02 15:36               ` Qing Zhao
2020-12-03  8:45                 ` Richard Biener
2020-12-03 16:07                   ` Qing Zhao [this message]
2020-12-03 16:36                     ` Richard Biener
2020-12-03 16:40                       ` Qing Zhao
2020-12-03 16:56                       ` Richard Sandiford
2020-11-26  0:08         ` Martin Sebor
2020-11-30 16:23           ` Qing Zhao
2020-11-30 17:18             ` Martin Sebor
2020-11-30 23:05               ` Qing Zhao
2020-12-03 17:32       ` Richard Sandiford
2020-12-03 23:04         ` Qing Zhao
2020-12-04  8:50         ` Richard Biener
2020-12-04 16:19           ` Qing Zhao
2020-12-07  7:12             ` Richard Biener
2020-12-07 16:20               ` Qing Zhao
2020-12-07 17:10                 ` Richard Sandiford
2020-12-07 17:36                   ` Qing Zhao
2020-12-07 18:05                     ` Richard Sandiford
2020-12-07 18:34                       ` Qing Zhao
2020-12-08  7:35                         ` Richard Biener
2020-12-08  7:40                 ` Richard Biener
2020-12-08 19:54                   ` Qing Zhao
2020-12-09  8:23                     ` Richard Biener
2020-12-09 15:04                       ` Qing Zhao
2020-12-09 15:12                         ` Richard Biener
2020-12-09 16:18                           ` Qing Zhao
2021-01-05 19:05                             ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao
2021-01-05 19:10                               ` Qing Zhao
2021-01-12 20:34                               ` Qing Zhao
2021-01-13  7:39                                 ` Richard Biener
2021-01-13 15:06                                   ` Qing Zhao
2021-01-13 15:10                                     ` Richard Biener
2021-01-13 15:35                                       ` Qing Zhao
2021-01-13 15:40                                         ` Richard Biener
2021-01-14 21:16                                   ` Qing Zhao
2021-01-15  8:11                                     ` Richard Biener
2021-01-15 16:16                                       ` Qing Zhao
2021-01-15 17:22                                         ` Richard Biener
2021-01-15 17:57                                           ` Qing Zhao
2021-01-18 13:09                                             ` Richard Sandiford
2021-01-18 16:12                                               ` Qing Zhao
2021-02-01 19:12                                                 ` Qing Zhao
2021-02-02  7:43                                                   ` Richard Biener
2021-02-02 15:17                                                     ` Qing Zhao
2021-02-02 23:32                                                       ` Qing Zhao
2020-12-07 17:21           ` How to traverse all the local variables that declared in the current routine? Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8A595141-563A-4CDC-AA09-BAB76E5113AA@ORACLE.COM \
    --to=qing.zhao@oracle.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=keescook@chromium.org \
    --cc=richard.guenther@gmail.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).