How to traverse all the local variables that declared in the current routine?

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* How to traverse all the local variables that declared in the current routine?
@ 2020-11-23 23:05 Qing Zhao
  2020-11-24  7:32 ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-11-23 23:05 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc Patches

Hi, 

Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? 

If not, what’s the best way to traverse the local variables?

Thanks.

Qing

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-23 23:05 How to traverse all the local variables that declared in the current routine? Qing Zhao
@ 2020-11-24  7:32 ` Richard Biener
  2020-11-24 15:47   ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2020-11-24  7:32 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches

On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
>
> If not, what’s the best way to traverse the local variables?

Depends on what for.  There's the source level view you get by walking
BLOCK_VARS of the
scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
there's SSA names
(FOR_EACH_SSA_NAME).

Richard.

>
> Thanks.
>
> Qing

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-24  7:32 ` Richard Biener
@ 2020-11-24 15:47   ` Qing Zhao
  2020-11-24 15:55     ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-11-24 15:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc Patches

> On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> 
>> Hi,
>> 
>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
>> 
>> If not, what’s the best way to traverse the local variables?
> 
> Depends on what for.  There's the source level view you get by walking
> BLOCK_VARS of the
> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
> there's SSA names
> (FOR_EACH_SSA_NAME).

I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
not explicitly initialized in the declaration, the basic idea is following:

** The proposal:

A. add a new GCC option: (same name and meaning as CLANG)
-ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;

B. add a new attribute for variable:
__attribute((uninitialized)
the marked variable is uninitialized intentionaly for performance purpose.

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language".

** The implementation:

There are two major requirements for the implementation:

1. all auto-variables that do not have an explicit initializer should be initialized to
zero by this option.  (Same behavior as CLANG)

2. keep the current static warning on uninitialized variables untouched.

In order to satisfy 1, we should check whether an auto-variable has initializer
or not;
In order to satisfy 2, we should add this new transformation after
"pass_late_warn_uninitialized".

So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, 
If Not, then insert an initialization for it. 

For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?

Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
  unsigned decl_is_initialized :1;

/* IN VAR_DECL, set when the decl is initialized at the declaration.  */
#define DECL_IS_INITIALIZED(NODE) \
  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)

set this bit when setting DECL_INITIAL for the variables in FE. then keep it
even though DECL_INITIAL might be NULLed.

Do you have any comment and suggestions?

Thanks a lot for the help.

Qing

> Richard.
> 
>> 
>> Thanks.
>> 
>> Qing

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-24 15:47   ` Qing Zhao
@ 2020-11-24 15:55     ` Richard Biener
  2020-11-24 16:54       ` Qing Zhao
  2020-12-03 17:32       ` Richard Sandiford
  0 siblings, 2 replies; 56+ messages in thread
From: Richard Biener @ 2020-11-24 15:55 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches

On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> > On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> Hi,
> >>
> >> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
> >>
> >> If not, what’s the best way to traverse the local variables?
> >
> > Depends on what for.  There's the source level view you get by walking
> > BLOCK_VARS of the
> > scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
> > there's SSA names
> > (FOR_EACH_SSA_NAME).
>
> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
> not explicitly initialized in the declaration, the basic idea is following:
>
> ** The proposal:
>
> A. add a new GCC option: (same name and meaning as CLANG)
> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>
> B. add a new attribute for variable:
> __attribute((uninitialized)
> the marked variable is uninitialized intentionaly for performance purpose.
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language".
>
>
> ** The implementation:
>
> There are two major requirements for the implementation:
>
> 1. all auto-variables that do not have an explicit initializer should be initialized to
> zero by this option.  (Same behavior as CLANG)
>
> 2. keep the current static warning on uninitialized variables untouched.
>
> In order to satisfy 1, we should check whether an auto-variable has initializer
> or not;
> In order to satisfy 2, we should add this new transformation after
> "pass_late_warn_uninitialized".
>
> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
> If Not, then insert an initialization for it.
>
> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?

Yes, but do you want to catch variables promoted to register as well
or just variables
on the stack?

> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>   /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>   unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
>   (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.

For locals it would be more reliable to set this flag during gimplification.

> Do you have any comment and suggestions?

As said above - do you want to cover registers as well as locals?  I'd do
the actual zeroing during RTL expansion instead since otherwise you
have to figure youself whether a local is actually used (see expand_stack_vars)

Note that optimization will already made have use of "uninitialized" state
of locals so depending on what the actual goal is here "late" may be too late.

Richard.

>
> Thanks a lot for the help.
>
> Qing
>
> > Richard.
> >
> >>
> >> Thanks.
> >>
> >> Qing
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-24 15:55     ` Richard Biener
@ 2020-11-24 16:54       ` Qing Zhao
  2020-11-25  9:11         ` Richard Biener
  2020-11-26  0:08         ` Martin Sebor
  2020-12-03 17:32       ` Richard Sandiford
  1 sibling, 2 replies; 56+ messages in thread
From: Qing Zhao @ 2020-11-24 16:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc Patches



> On Nov 24, 2020, at 9:55 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>>> On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
>>>> 
>>>> If not, what’s the best way to traverse the local variables?
>>> 
>>> Depends on what for.  There's the source level view you get by walking
>>> BLOCK_VARS of the
>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>>> there's SSA names
>>> (FOR_EACH_SSA_NAME).
>> 
>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
> 
> Yes, but do you want to catch variables promoted to register as well
> or just variables
> on the stack?

I think both as long as they are source-level auto-variables. Then which one is better?

> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>  unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
> 
> For locals it would be more reliable to set this flag during gimplification.

You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:

  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
    {
      tree init = DECL_INITIAL (decl);
...
      if (init && init != error_mark_node)
        {
          if (!TREE_STATIC (decl))
	    {
	      DECL_IS_INITIALIZED(decl) = 1;
	    }

Is this enough for all Frontends? Are there other places that I need to maintain this bit? 


> 
>> Do you have any comment and suggestions?
> 
> As said above - do you want to cover registers as well as locals?

All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE). 
If GCC adds this phase in FE, then the following design requirement

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.

cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late. 

So, we have to add this new phase after “pass_late_warn_uninitialized”. 

>  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)

Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase. 
> 
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.

This is a really good point… 

In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best 
for this issue). However, if we have to met the following requirement:

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.

We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. 

So, this is a problem that is not easy to resolve. 

Do you have suggestion on this?

Qing

> 
> Richard.
> 
>> 
>> Thanks a lot for the help.
>> 
>> Qing
>> 
>>> Richard.
>>> 
>>>> 
>>>> Thanks.
>>>> 
>>>> Qing


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-24 16:54       ` Qing Zhao
@ 2020-11-25  9:11         ` Richard Biener
  2020-11-25 17:41           ` Qing Zhao
  2020-12-01 19:47           ` Qing Zhao
  2020-11-26  0:08         ` Martin Sebor
  1 sibling, 2 replies; 56+ messages in thread
From: Richard Biener @ 2020-11-25  9:11 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches

On Tue, Nov 24, 2020 at 5:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> On Nov 24, 2020, at 9:55 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
>
> On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>
>
> Hi,
>
> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
>
> If not, what’s the best way to traverse the local variables?
>
>
> Depends on what for.  There's the source level view you get by walking
> BLOCK_VARS of the
> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
> there's SSA names
> (FOR_EACH_SSA_NAME).
>
>
> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
> not explicitly initialized in the declaration, the basic idea is following:
>
> ** The proposal:
>
> A. add a new GCC option: (same name and meaning as CLANG)
> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>
> B. add a new attribute for variable:
> __attribute((uninitialized)
> the marked variable is uninitialized intentionaly for performance purpose.
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language".
>
>
> ** The implementation:
>
> There are two major requirements for the implementation:
>
> 1. all auto-variables that do not have an explicit initializer should be initialized to
> zero by this option.  (Same behavior as CLANG)
>
> 2. keep the current static warning on uninitialized variables untouched.
>
> In order to satisfy 1, we should check whether an auto-variable has initializer
> or not;
> In order to satisfy 2, we should add this new transformation after
> "pass_late_warn_uninitialized".
>
> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
> If Not, then insert an initialization for it.
>
> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>
>
> Yes, but do you want to catch variables promoted to register as well
> or just variables
> on the stack?
>
>
> I think both as long as they are source-level auto-variables. Then which one is better?
>
>
> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>  unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag during gimplification.
>
>
> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>
>   if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>     {
>       tree init = DECL_INITIAL (decl);
> ...
>       if (init && init != error_mark_node)
>         {
>           if (!TREE_STATIC (decl))
>     {
>       DECL_IS_INITIALIZED(decl) = 1;
>     }
>
> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>
>
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?
>
>
> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
> If GCC adds this phase in FE, then the following design requirement
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
>
> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>
> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>
>  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
>
> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> This is a really good point…
>
> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
> for this issue). However, if we have to met the following requirement:

So is optimization supposed to pick up zero or is it supposed to act
as if the initializer
is unknown?

> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
>
> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>
> So, this is a problem that is not easy to resolve.

Indeed, those are conflicting goals.

> Do you have suggestion on this?

No, not any easy ones.  Doing more of the uninit analysis early (there
is already an early
uninit pass) which would mean doing IPA analysis turing GCC into more
of a static analysis
tool.  Theres the analyzer now, not sure if that can employ an early
LTO phase for example.

Richard.

> Qing
>
>
> Richard.
>
>
> Thanks a lot for the help.
>
> Qing
>
> Richard.
>
>
> Thanks.
>
> Qing
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-25  9:11         ` Richard Biener
@ 2020-11-25 17:41           ` Qing Zhao
  2020-12-01 19:47           ` Qing Zhao
  1 sibling, 0 replies; 56+ messages in thread
From: Qing Zhao @ 2020-11-25 17:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc Patches



> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> 
>> Hi,
>> 
>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
>> 
>> If not, what’s the best way to traverse the local variables?
>> 
>> 
>> Depends on what for.  There's the source level view you get by walking
>> BLOCK_VARS of the
>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>> there's SSA names
>> (FOR_EACH_SSA_NAME).
>> 
>> 
>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>> 
>> 
>> Yes, but do you want to catch variables promoted to register as well
>> or just variables
>> on the stack?
>> 
>> 
>> I think both as long as they are source-level auto-variables. Then which one is better?
>> 
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> 
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>> 
>>  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>    {
>>      tree init = DECL_INITIAL (decl);
>> ...
>>      if (init && init != error_mark_node)
>>        {
>>          if (!TREE_STATIC (decl))
>>    {
>>      DECL_IS_INITIALIZED(decl) = 1;
>>    }
>> 
>> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>> 
>> 
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?
>> 
>> 
>> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>> 
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>> 
>> I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> 
>> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>> 
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> This is a really good point…
>> 
>> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
>> for this issue). However, if we have to met the following requirement:
> 
> So is optimization supposed to pick up zero or is it supposed to act
> as if the initializer
> is unknown?

Good question!

Theoretically,  the new option -ftrivial-auto-var-init=zero is supposed to add zero initialization to auto-variables 
that are not explicitly initialized in order to avoid the possible undefined behavior. 

So, I think that with the new option specified, compiler optimization should pick up zero initialization. 
Therefore, ideally, zero initializations should  be inserted before optimizations. 

However, this will conflict with the requirement “ keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”."

>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>> 
>> So, this is a problem that is not easy to resolve.
> 
> Indeed, those are conflicting goals.

Yes, this is the most difficult part for this task. 
Not sure how CLANG resolved this issue?

> 
>> Do you have suggestion on this?
> 
> No, not any easy ones.  Doing more of the uninit analysis early (there
> is already an early
> uninit pass) which would mean doing IPA analysis turing GCC into more
> of a static analysis
> tool.  Theres the analyzer now, not sure if that can employ an early
> LTO phase for example.

You mean to enhance “pass_early_warn_uninitialized” or “pass_analyzer” to catch
more uninitialized cases, then add the new “zero initialization” after these passes?

However, both “pass_early_warn_uninitialized” and “pass_analyzer” still utilize
some early ipa optimizations. These early optimizations still act as the initializers are unknown. 

So, looks like the conflicting cannot be completely resolved. 


Another thought, If we still add the initializations at “pass_expand” as you suggested in the previous email, 
GCC will be split into two parts, the earlier part before “pass_expand” all act without the zero initialization
And report the uninitialized warnings based on this. 
The later part after “pass_expand” will pick up zero initializations. All the RTL optimizations will be applied
on the program with all new zero initializations. 
Will such approach have any potential big issue?

Qing

> 
> Richard.
> 
>> Qing
>> 
>> 
>> Richard.
>> 
>> 
>> Thanks a lot for the help.
>> 
>> Qing
>> 
>> Richard.
>> 
>> 
>> Thanks.
>> 
>> Qing


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-24 16:54       ` Qing Zhao
  2020-11-25  9:11         ` Richard Biener
@ 2020-11-26  0:08         ` Martin Sebor
  2020-11-30 16:23           ` Qing Zhao
  1 sibling, 1 reply; 56+ messages in thread
From: Martin Sebor @ 2020-11-26  0:08 UTC (permalink / raw)
  To: Qing Zhao, Richard Biener; +Cc: Richard Sandiford, gcc Patches

On 11/24/20 9:54 AM, Qing Zhao via Gcc-patches wrote:
> 
> 
>> On Nov 24, 2020, at 9:55 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>>>
>>>
>>>
>>>> On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>
>>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
>>>>>
>>>>> If not, what’s the best way to traverse the local variables?
>>>>
>>>> Depends on what for.  There's the source level view you get by walking
>>>> BLOCK_VARS of the
>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>>>> there's SSA names
>>>> (FOR_EACH_SSA_NAME).
>>>
>>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>>> not explicitly initialized in the declaration, the basic idea is following:
>>>
>>> ** The proposal:
>>>
>>> A. add a new GCC option: (same name and meaning as CLANG)
>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>>>
>>> B. add a new attribute for variable:
>>> __attribute((uninitialized)
>>> the marked variable is uninitialized intentionaly for performance purpose.
>>>
>>> C. The implementation needs to keep the current static warning on uninitialized
>>> variables untouched in order to avoid "forking the language".
>>>
>>>
>>> ** The implementation:
>>>
>>> There are two major requirements for the implementation:
>>>
>>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>>> zero by this option.  (Same behavior as CLANG)
>>>
>>> 2. keep the current static warning on uninitialized variables untouched.
>>>
>>> In order to satisfy 1, we should check whether an auto-variable has initializer
>>> or not;
>>> In order to satisfy 2, we should add this new transformation after
>>> "pass_late_warn_uninitialized".
>>>
>>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>>> If Not, then insert an initialization for it.
>>>
>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>>
>> Yes, but do you want to catch variables promoted to register as well
>> or just variables
>> on the stack?
> 
> I think both as long as they are source-level auto-variables. Then which one is better?
> 
>>
>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>>   /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>   unsigned decl_is_initialized :1;
>>>
>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>> #define DECL_IS_INITIALIZED(NODE) \
>>>   (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>
>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>> even though DECL_INITIAL might be NULLed.
>>
>> For locals it would be more reliable to set this flag during gimplification.
> 
> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
> 
>    if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>      {
>        tree init = DECL_INITIAL (decl);
> ...
>        if (init && init != error_mark_node)
>          {
>            if (!TREE_STATIC (decl))
> 	    {
> 	      DECL_IS_INITIALIZED(decl) = 1;
> 	    }
> 
> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
> 
> 
>>
>>> Do you have any comment and suggestions?
>>
>> As said above - do you want to cover registers as well as locals?
> 
> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
> If GCC adds this phase in FE, then the following design requirement
> 
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
> 
> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
> 
> So, we have to add this new phase after “pass_late_warn_uninitialized”.
> 
>>   I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
> 
> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>>
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
> 
> This is a really good point…
> 
> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
> for this issue). However, if we have to met the following requirement:
> 
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
> 
> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
> 
> So, this is a problem that is not easy to resolve.
> 
> Do you have suggestion on this?

Not having thought about it very long or hard I'd be tempted to do
it the other way around.  For each use of an uninitialized variable
found, first either issue or queue up a -Wuninitialized for it and
then initialize it.  Then (if queued) at some later point, issue
the queued up -Wuninitialized.  The last part would be done in
tree-ssa-uninit.c where the remaining uses of uninitialized
variables would trigger warnings and induce their initialization
(if there were any left).

Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-26  0:08         ` Martin Sebor
@ 2020-11-30 16:23           ` Qing Zhao
  2020-11-30 17:18             ` Martin Sebor
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-11-30 16:23 UTC (permalink / raw)
  To: Martin Sebor; +Cc: Richard Biener, Richard Sandiford, gcc Patches

Hi, Martin,

Thanks a lot for your suggestion.

> On Nov 25, 2020, at 6:08 PM, Martin Sebor <msebor@gmail.com> wrote:
> 
> On 11/24/20 9:54 AM, Qing Zhao via Gcc-patches wrote:
>>> On Nov 24, 2020, at 9:55 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>> 
>>>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
>>>>>> 
>>>>>> If not, what’s the best way to traverse the local variables?
>>>>> 
>>>>> Depends on what for.  There's the source level view you get by walking
>>>>> BLOCK_VARS of the
>>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>>>>> there's SSA names
>>>>> (FOR_EACH_SSA_NAME).
>>>> 
>>>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>>>> not explicitly initialized in the declaration, the basic idea is following:
>>>> 
>>>> ** The proposal:
>>>> 
>>>> A. add a new GCC option: (same name and meaning as CLANG)
>>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>>>> 
>>>> B. add a new attribute for variable:
>>>> __attribute((uninitialized)
>>>> the marked variable is uninitialized intentionaly for performance purpose.
>>>> 
>>>> C. The implementation needs to keep the current static warning on uninitialized
>>>> variables untouched in order to avoid "forking the language".
>>>> 
>>>> 
>>>> ** The implementation:
>>>> 
>>>> There are two major requirements for the implementation:
>>>> 
>>>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>>>> zero by this option.  (Same behavior as CLANG)
>>>> 
>>>> 2. keep the current static warning on uninitialized variables untouched.
>>>> 
>>>> In order to satisfy 1, we should check whether an auto-variable has initializer
>>>> or not;
>>>> In order to satisfy 2, we should add this new transformation after
>>>> "pass_late_warn_uninitialized".
>>>> 
>>>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>>>> If Not, then insert an initialization for it.
>>>> 
>>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>>> 
>>> Yes, but do you want to catch variables promoted to register as well
>>> or just variables
>>> on the stack?
>> I think both as long as they are source-level auto-variables. Then which one is better?
>>> 
>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>  unsigned decl_is_initialized :1;
>>>> 
>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>> 
>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>>> even though DECL_INITIAL might be NULLed.
>>> 
>>> For locals it would be more reliable to set this flag during gimplification.
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>>   if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>     {
>>       tree init = DECL_INITIAL (decl);
>> ...
>>       if (init && init != error_mark_node)
>>         {
>>           if (!TREE_STATIC (decl))
>> 	    {
>> 	      DECL_IS_INITIALIZED(decl) = 1;
>> 	    }
>> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>>> 
>>>> Do you have any comment and suggestions?
>>> 
>>> As said above - do you want to cover registers as well as locals?
>> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>>>  I'd do
>>> the actual zeroing during RTL expansion instead since otherwise you
>>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>>> 
>>> Note that optimization will already made have use of "uninitialized" state
>>> of locals so depending on what the actual goal is here "late" may be too late.
>> This is a really good point…
>> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
>> for this issue). However, if we have to met the following requirement:
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>> So, this is a problem that is not easy to resolve.
>> Do you have suggestion on this?
> 
> Not having thought about it very long or hard I'd be tempted to do
> it the other way around.  For each use of an uninitialized variable
> found, first either issue or queue up a -Wuninitialized for it and
> then initialize it.  Then (if queued) at some later point, issue
> the queued up -Wuninitialized.  The last part would be done in
> tree-ssa-uninit.c where the remaining uses of uninitialized
> variables would trigger warnings and induce their initialization
> (if there were any left).


The major issue with this approach is:

There are two passes for uninitialized variable analysis:
pass_early_warn_uninitialized
pass_late_warn_uninitialized

The early pass is placed at the very beginning of the tree optimizer. But the late pass is placed at the very late stage of the tree optimizer. 
If we add the initializations at the early pass, the result of the late pass will be changed by the new added initializations. This does not meet
the requirement. 

Do I miss anything here? 

Qing


> 
> Martin


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-30 16:23           ` Qing Zhao
@ 2020-11-30 17:18             ` Martin Sebor
  2020-11-30 23:05               ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Martin Sebor @ 2020-11-30 17:18 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Biener, Richard Sandiford, gcc Patches

On 11/30/20 9:23 AM, Qing Zhao wrote:
> Hi, Martin,
> 
> Thanks a lot for your suggestion.
> 
>> On Nov 25, 2020, at 6:08 PM, Martin Sebor <msebor@gmail.com 
>> <mailto:msebor@gmail.com>> wrote:
>>
>> On 11/24/20 9:54 AM, Qing Zhao via Gcc-patches wrote:
>>>> On Nov 24, 2020, at 9:55 AM, Richard Biener 
>>>> <richard.guenther@gmail.com <mailto:richard.guenther@gmail.com>> wrote:
>>>>
>>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com 
>>>> <mailto:QING.ZHAO@oracle.com>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On Nov 24, 2020, at 1:32 AM, Richard Biener 
>>>>>> <richard.guenther@gmail.com <mailto:richard.guenther@gmail.com>> 
>>>>>> wrote:
>>>>>>
>>>>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
>>>>>> <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Does gcc provide an iterator to traverse all the local variables 
>>>>>>> that are declared in the current routine?
>>>>>>>
>>>>>>> If not, what’s the best way to traverse the local variables?
>>>>>>
>>>>>> Depends on what for.  There's the source level view you get by walking
>>>>>> BLOCK_VARS of the
>>>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>>>>>> there's SSA names
>>>>>> (FOR_EACH_SSA_NAME).
>>>>>
>>>>> I am planing to add a new phase immediately after 
>>>>> “pass_late_warn_uninitialized” to initialize all auto-variables 
>>>>> that are
>>>>> not explicitly initialized in the declaration, the basic idea is 
>>>>> following:
>>>>>
>>>>> ** The proposal:
>>>>>
>>>>> A. add a new GCC option: (same name and meaning as CLANG)
>>>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>>>>>
>>>>> B. add a new attribute for variable:
>>>>> __attribute((uninitialized)
>>>>> the marked variable is uninitialized intentionaly for performance 
>>>>> purpose.
>>>>>
>>>>> C. The implementation needs to keep the current static warning on 
>>>>> uninitialized
>>>>> variables untouched in order to avoid "forking the language".
>>>>>
>>>>>
>>>>> ** The implementation:
>>>>>
>>>>> There are two major requirements for the implementation:
>>>>>
>>>>> 1. all auto-variables that do not have an explicit initializer 
>>>>> should be initialized to
>>>>> zero by this option.  (Same behavior as CLANG)
>>>>>
>>>>> 2. keep the current static warning on uninitialized variables 
>>>>> untouched.
>>>>>
>>>>> In order to satisfy 1, we should check whether an auto-variable has 
>>>>> initializer
>>>>> or not;
>>>>> In order to satisfy 2, we should add this new transformation after
>>>>> "pass_late_warn_uninitialized".
>>>>>
>>>>> So, we should be able to check whether an auto-variable has 
>>>>> initializer or not after “pass_late_warn_uninitialized”,
>>>>> If Not, then insert an initialization for it.
>>>>>
>>>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>>>>
>>>> Yes, but do you want to catch variables promoted to register as well
>>>> or just variables
>>>> on the stack?
>>> I think both as long as they are source-level auto-variables. Then 
>>> which one is better?
>>>>
>>>>> Another issue is, in order to check whether an auto-variable has 
>>>>> initializer, I plan to add a new bit in “decl_common” as:
>>>>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>>  unsigned decl_is_initialized :1;
>>>>>
>>>>> /* IN VAR_DECL, set when the decl is initialized at the 
>>>>> declaration.  */
>>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>>>
>>>>> set this bit when setting DECL_INITIAL for the variables in FE. 
>>>>> then keep it
>>>>> even though DECL_INITIAL might be NULLed.
>>>>
>>>> For locals it would be more reliable to set this flag during 
>>>> gimplification.
>>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the 
>>> routine “gimpley_decl_expr” (gimplify.c) as following:
>>>   if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>>     {
>>>       tree init = DECL_INITIAL (decl);
>>> ...
>>>       if (init && init != error_mark_node)
>>>         {
>>>           if (!TREE_STATIC (decl))
>>>     {
>>>       DECL_IS_INITIALIZED(decl) = 1;
>>>     }
>>> Is this enough for all Frontends? Are there other places that I need 
>>> to maintain this bit?
>>>>
>>>>> Do you have any comment and suggestions?
>>>>
>>>> As said above - do you want to cover registers as well as locals?
>>> All the locals from the source-code point of view should be covered. 
>>>   (From my study so far,  looks like that Clang adds that phase in FE).
>>> If GCC adds this phase in FE, then the following design requirement
>>> C. The implementation needs to keep the current static warning on 
>>> uninitialized
>>> variables untouched in order to avoid "forking the language”.
>>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is 
>>> applied quite late.
>>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>>>>  I'd do
>>>> the actual zeroing during RTL expansion instead since otherwise you
>>>> have to figure youself whether a local is actually used (see 
>>>> expand_stack_vars)
>>> Adding  this new transformation during RTL expansion is okay.  I will 
>>> check on this in more details to see how to add it to RTL expansion 
>>> phase.
>>>>
>>>> Note that optimization will already made have use of "uninitialized" 
>>>> state
>>>> of locals so depending on what the actual goal is here "late" may be 
>>>> too late.
>>> This is a really good point…
>>> In order to avoid optimization  to use the “uninitialized” state of 
>>> locals, we should add the zeroing phase as early as possible (adding 
>>> it in FE might be best
>>> for this issue). However, if we have to met the following requirement:
>>> C. The implementation needs to keep the current static warning on 
>>> uninitialized
>>> variables untouched in order to avoid "forking the language”.
>>> We have to move the new phase after all the uninitialized analysis is 
>>> done in order to avoid “forking the language”.
>>> So, this is a problem that is not easy to resolve.
>>> Do you have suggestion on this?
>>
>> Not having thought about it very long or hard I'd be tempted to do
>> it the other way around.  For each use of an uninitialized variable
>> found, first either issue or queue up a -Wuninitialized for it and
>> then initialize it.  Then (if queued) at some later point, issue
>> the queued up -Wuninitialized.  The last part would be done in
>> tree-ssa-uninit.c where the remaining uses of uninitialized
>> variables would trigger warnings and induce their initialization
>> (if there were any left).
> 
> 
> The major issue with this approach is:
> 
> There are two passes for uninitialized variable analysis:
> pass_early_warn_uninitialized
> pass_late_warn_uninitialized
> 
> The early pass is placed at the very beginning of the tree optimizer. 
> But the late pass is placed at the very late stage of the tree optimizer.
> If we add the initializations at the early pass, the result of the late 
> pass will be changed by the new added initializations. This does not meet
> the requirement.
> 
> Do I miss anything here?

I'm not sure.  As I said, I'd consider issuing (or queuing up for
issuing later) -Wuninitialized at the same time as initializing
the uninitialized variables.  With that approach I'd expect to
diagnose all the same instances of uninitialized uses as the two
passes do today (actually, I'd expect to diagnose more of them,
including those Richard referred to above whose uninitialized
state may have been made use of for optimization decisions(*)).
Also with this approach the two existing warning passes would
cease to serve their current purpose of hunting down uninitialized
variables because by the time they ran all their uses would have
been initialized (and warnings issued).

One question in my mind is what to do with -Wmaybe-uninitialized.
Should those also be initialized, even though they're not necessarily
used?  Or are you only hoping to tackle -Wuninitialized?

Martin

[*] With the initialization approach I'd expect concerns about
the cost of losing those optimization opportunities.  Although
those could be addressed by making the initialization optional
(i.e., opt-in).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-30 17:18             ` Martin Sebor
@ 2020-11-30 23:05               ` Qing Zhao
  0 siblings, 0 replies; 56+ messages in thread
From: Qing Zhao @ 2020-11-30 23:05 UTC (permalink / raw)
  To: Martin Sebor; +Cc: Richard Biener, Richard Sandiford, gcc Patches

On Nov 30, 2020, at 11:18 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>>>>>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine?
>>>>>>>> 
>>>>>>>> If not, what’s the best way to traverse the local variables?
>>>>>>> 
>>>>>>> Depends on what for.  There's the source level view you get by walking
>>>>>>> BLOCK_VARS of the
>>>>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>>>>>>> there's SSA names
>>>>>>> (FOR_EACH_SSA_NAME).
>>>>>> 
>>>>>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>>>>>> not explicitly initialized in the declaration, the basic idea is following:
>>>>>> 
>>>>>> ** The proposal:
>>>>>> 
>>>>>> A. add a new GCC option: (same name and meaning as CLANG)
>>>>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>>>>>> 
>>>>>> B. add a new attribute for variable:
>>>>>> __attribute((uninitialized)
>>>>>> the marked variable is uninitialized intentionaly for performance purpose.
>>>>>> 
>>>>>> C. The implementation needs to keep the current static warning on uninitialized
>>>>>> variables untouched in order to avoid "forking the language".
>>>>>> 
>>>>>> 
>>>>>> ** The implementation:
>>>>>> 
>>>>>> There are two major requirements for the implementation:
>>>>>> 
>>>>>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>>>>>> zero by this option.  (Same behavior as CLANG)
>>>>>> 
>>>>>> 2. keep the current static warning on uninitialized variables untouched.
>>>>>> 
>>>>>> In order to satisfy 1, we should check whether an auto-variable has initializer
>>>>>> or not;
>>>>>> In order to satisfy 2, we should add this new transformation after
>>>>>> "pass_late_warn_uninitialized".
>>>>>> 
>>>>>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>>>>>> If Not, then insert an initialization for it.
>>>>>> 
>>>>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>>>>> 
>>>>> Yes, but do you want to catch variables promoted to register as well
>>>>> or just variables
>>>>> on the stack?
>>>> I think both as long as they are source-level auto-variables. Then which one is better?
>>>>> 
>>>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>>>>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>>>  unsigned decl_is_initialized :1;
>>>>>> 
>>>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>>>> 
>>>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>>>>> even though DECL_INITIAL might be NULLed.
>>>>> 
>>>>> For locals it would be more reliable to set this flag during gimplification.
>>>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>>>>   if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>>>     {
>>>>       tree init = DECL_INITIAL (decl);
>>>> ...
>>>>       if (init && init != error_mark_node)
>>>>         {
>>>>           if (!TREE_STATIC (decl))
>>>>     {
>>>>       DECL_IS_INITIALIZED(decl) = 1;
>>>>     }
>>>> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>>>>> 
>>>>>> Do you have any comment and suggestions?
>>>>> 
>>>>> As said above - do you want to cover registers as well as locals?
>>>> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
>>>> If GCC adds this phase in FE, then the following design requirement
>>>> C. The implementation needs to keep the current static warning on uninitialized
>>>> variables untouched in order to avoid "forking the language”.
>>>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>>>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>>>>>  I'd do
>>>>> the actual zeroing during RTL expansion instead since otherwise you
>>>>> have to figure youself whether a local is actually used (see expand_stack_vars)
>>>> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>>>>> 
>>>>> Note that optimization will already made have use of "uninitialized" state
>>>>> of locals so depending on what the actual goal is here "late" may be too late.
>>>> This is a really good point…
>>>> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
>>>> for this issue). However, if we have to met the following requirement:
>>>> C. The implementation needs to keep the current static warning on uninitialized
>>>> variables untouched in order to avoid "forking the language”.
>>>> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>>>> So, this is a problem that is not easy to resolve.
>>>> Do you have suggestion on this?
>>> 
>>> Not having thought about it very long or hard I'd be tempted to do
>>> it the other way around.  For each use of an uninitialized variable
>>> found, first either issue or queue up a -Wuninitialized for it and
>>> then initialize it.  Then (if queued) at some later point, issue
>>> the queued up -Wuninitialized.  The last part would be done in
>>> tree-ssa-uninit.c where the remaining uses of uninitialized
>>> variables would trigger warnings and induce their initialization
>>> (if there were any left).
>> The major issue with this approach is:
>> There are two passes for uninitialized variable analysis:
>> pass_early_warn_uninitialized
>> pass_late_warn_uninitialized
>> The early pass is placed at the very beginning of the tree optimizer. But the late pass is placed at the very late stage of the tree optimizer.
>> If we add the initializations at the early pass, the result of the late pass will be changed by the new added initializations. This does not meet
>> the requirement.
>> Do I miss anything here?
> 
> I'm not sure.  As I said, I'd consider issuing (or queuing up for
> issuing later) -Wuninitialized at the same time as initializing
> the uninitialized variables.

I have considered this approach in the very beginning of my study, but later I realized that it would not work.

For example, for the following small example:
qinzhao@gcc10:~/Bugs/auto-init$ cat t1.c
void blah(int);

int foo_2 (int n, int l, int m, int r)
{
  int v;

  if ( (n < 10) && (m != 100)  && (r < 20) )
    v = r;

  if (l > 100)
    if ( (n <= 8) &&  (m < 102)  && (r < 19) )
      blah(v); /* { dg-warning "uninitialized" "real warning" } */

  return 0;
}

With the latest gcc and the following options:
qinzhao@gcc10:~/Bugs/auto-init$ /home/qinzhao/Install/latest_write/bin/gcc -Wuninitialized -Wmaybe-uninitialized -S t1.c
qinzhao@gcc10:~/Bugs/auto-init$ 

We can see that there is no any uninitialized warning issued by the latest gcc if no optimization is specified. But for this case,
It’s clear that we should insert a zero initializer for auto-variable “v” even though the current uninitialized variable analysis pass
is not able to determine “v” is not initialized in some execution paths. 

The above is just a simple example to show that we cannot rely on the result of the uninitialized variable analysis pass to decide
which variable should be initialized. 

For security purpose, we should conservatively initialize all auto-variables that might not be initialized. i.e, for all the auto-variables that 
do not have an explicit initializer in source code level, we should insert initializer for them. 

This is the current behavior of LLVM with  -ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang. 

I believe that GCC should do the same thing for the security benefit. 

>  With that approach I'd expect to
> diagnose all the same instances of uninitialized uses as the two
> passes do today (actually, I'd expect to diagnose more of them,
> including those Richard referred to above whose uninitialized
> state may have been made use of for optimization decisions(*)).
> Also with this approach the two existing warning passes would
> cease to serve their current purpose of hunting down uninitialized
> variables because by the time they ran all their uses would have
> been initialized (and warnings issued).

Inserting the zero-initializer before pass_late_warn_uninitialized will invalid the current uninitialized variable analysis, which is unacceptable
based on my current understanding. 

> One question in my mind is what to do with -Wmaybe-uninitialized.
> Should those also be initialized, even though they're not necessarily
> used?  Or are you only hoping to tackle -Wuninitialized?

All the auto-variables that might not be initialized should be initialized with the new option. 
The decision on which auto-variable should be initialized should based on the source code level initializer:

If an auto-variable does not have a source code level initializer, the compiler should add a zero-initializer 
for it. 

Qing
> 
> Martin
> 
> [*] With the initialization approach I'd expect concerns about
> the cost of losing those optimization opportunities.  Although
> those could be addressed by making the initialization optional
> (i.e., opt-in).


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-25  9:11         ` Richard Biener
  2020-11-25 17:41           ` Qing Zhao
@ 2020-12-01 19:47           ` Qing Zhao
  2020-12-02  8:45             ` Richard Biener
  1 sibling, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-01 19:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc Patches

Hi, Richard, 

Could you please comment on the following approach:

Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification. 
However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to 
“pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes, 
(i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p”  to consider “artificial” zero-initializers. 
(i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value). 

With such approach, we should be able to address all those conflicts. 

Do you see any obvious issue with this approach?

Thanks a lot for your help.

Qing


> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> 
>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>> 
>> 
>> I think both as long as they are source-level auto-variables. Then which one is better?
>> 
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag-Wmaybe-uninitialized.
>> 
>> 
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>> 
>>  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>    {
>>      tree init = DECL_INITIAL (decl);
>> ...
>>      if (init && init != error_mark_node)
>>        {
>>          if (!TREE_STATIC (decl))
>>    {
>>      DECL_IS_INITIALIZED(decl) = 1;
>>    }
>> 
>> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>> 
>> 
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?
>> 
>> 
>> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>> 
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>> 
>> I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> 
>> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>> 
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> This is a really good point…
>> 
>> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
>> for this issue). However, if we have to met the following requirement:
> 
> So is optimization supposed to pick up zero or is it supposed to act
> as if the initializer
> is unknown?
> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>> 
>> So, this is a problem that is not easy to resolve.
> 
> Indeed, those are conflicting goals.
> 
>> Do you have suggestion on this?
> 
> No, not any easy ones.  Doing more of the uninit analysis early (there
> is already an early
> uninit pass) which would mean doing IPA analysis turing GCC into more
> of a static analysis
> tool.  Theres the analyzer now, not sure if that can employ an early
> LTO phase for example.


> 
> Richard.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-01 19:47           ` Qing Zhao
@ 2020-12-02  8:45             ` Richard Biener
  2020-12-02 15:36               ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2020-12-02  8:45 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches

On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Hi, Richard,
>
> Could you please comment on the following approach:
>
> Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification.
> However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to
> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes,
> (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p”  to consider “artificial” zero-initializers.
> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value).
>
> With such approach, we should be able to address all those conflicts.
>
> Do you see any obvious issue with this approach?

Yes, DSE will happily elide an explicit zero-init following the
artificial one leading to false uninit diagnostics.

What's the intended purpose of the zero-init?

Richard.

> Thanks a lot for your help.
>
> Qing
>
>
> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
>
>
> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
> not explicitly initialized in the declaration, the basic idea is following:
>
> ** The proposal:
>
> A. add a new GCC option: (same name and meaning as CLANG)
> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>
> B. add a new attribute for variable:
> __attribute((uninitialized)
> the marked variable is uninitialized intentionaly for performance purpose.
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language".
>
>
> ** The implementation:
>
> There are two major requirements for the implementation:
>
> 1. all auto-variables that do not have an explicit initializer should be initialized to
> zero by this option.  (Same behavior as CLANG)
>
> 2. keep the current static warning on uninitialized variables untouched.
>
> In order to satisfy 1, we should check whether an auto-variable has initializer
> or not;
> In order to satisfy 2, we should add this new transformation after
> "pass_late_warn_uninitialized".
>
> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
> If Not, then insert an initialization for it.
>
> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>
>
> I think both as long as they are source-level auto-variables. Then which one is better?
>
>
> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag-Wmaybe-uninitialized.
>
>
>
> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>
>  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>    {
>      tree init = DECL_INITIAL (decl);
> ...
>      if (init && init != error_mark_node)
>        {
>          if (!TREE_STATIC (decl))
>    {
>      DECL_IS_INITIALIZED(decl) = 1;
>    }
>
> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>
>
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?
>
>
> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
> If GCC adds this phase in FE, then the following design requirement
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
>
> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>
> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>
> I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
>
> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> This is a really good point…
>
> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
> for this issue). However, if we have to met the following requirement:
>
>
> So is optimization supposed to pick up zero or is it supposed to act
> as if the initializer
> is unknown?
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
>
> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>
> So, this is a problem that is not easy to resolve.
>
>
> Indeed, those are conflicting goals.
>
> Do you have suggestion on this?
>
>
> No, not any easy ones.  Doing more of the uninit analysis early (there
> is already an early
> uninit pass) which would mean doing IPA analysis turing GCC into more
> of a static analysis
> tool.  Theres the analyzer now, not sure if that can employ an early
> LTO phase for example.
>
>
>
>
> Richard.
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-02  8:45             ` Richard Biener
@ 2020-12-02 15:36               ` Qing Zhao
  2020-12-03  8:45                 ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-02 15:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc Patches, kees Cook



> On Dec 2, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> Hi, Richard,
>> 
>> Could you please comment on the following approach:
>> 
>> Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification.
>> However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to
>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes,
>> (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p”  to consider “artificial” zero-initializers.
>> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value).
>> 
>> With such approach, we should be able to address all those conflicts.
>> 
>> Do you see any obvious issue with this approach?
> 
> Yes, DSE will happily elide an explicit zero-init following the
> artificial one leading to false uninit diagnostics.

Indeed.  This is a big issue. And other optimizations might also be impacted by the new zero-init, resulting changed behavior
of uninitialized analysis in the later stage.

> 
> What's the intended purpose of the zero-init?


The purpose of this new option is: (from the original LLVM patch submission):

"Add an option to initialize automatic variables with either a pattern or with
zeroes. The default is still that automatic variables are uninitialized. Also
add attributes to request uninitialized on a per-variable basis, mainly to disable
initialization of large stack arrays when deemed too expensive.

This isn't meant to change the semantics of C and C++. Rather, it's meant to be
a last-resort when programmers inadvertently have some undefined behavior in
their code. This patch aims to make undefined behavior hurt less, which
security-minded people will be very happy about. Notably, this means that
there's no inadvertent information leak when:

	• The compiler re-uses stack slots, and a value is used uninitialized.
	• The compiler re-uses a register, and a value is used uninitialized.
	• Stack structs / arrays / unions with padding are copied.
This patch only addresses stack and register information leaks. There's many
more infoleaks that we could address, and much more undefined behavior that
could be tamed. Let's keep this patch focused, and I'm happy to address related
issues elsewhere."

For more details, please refer to the LLVM code review discussion on this patch:
https://reviews.llvm.org/D54604


I also wrote a simple writeup for this task based on my study and discussion with
Kees Cook (cc’ing him) as following:


thanks.

Qing

Support stack variables auto-initialization in GCC

11/19/2020

Qing Zhao

=======================================================


** Background of the task:

The correponding GCC bugzilla RFE was created on 9/3/2018:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210

A similar option for LLVM (around Nov, 2018)
https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html
had invoked a lot of discussion before committed.

(The following are quoted from the comments of Alexander Potapenko in
GCC bug 87210):

Finally, on Oct, 2019, upstream Clang supports force initialization
of stack variables under the -ftrivial-auto-var-init flag.

-ftrivial-auto-var-init=pattern initializes local variables with a 0xAA pattern
(actually it's more complicated, see https://reviews.llvm.org/D54604)

-ftrivial-auto-var-init=zero provides zero-initialization of locals.
This mode isn't officially supported yet and is hidden behind an additional
-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang flag.
This is done to avoid creating a C++ dialect where all variables are
zero-initialized.

Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that performs
the build  with -ftrivial-auto-var-init=pattern. This one isn't widely adopted
yet, partially because initializing locals with 0xAA isn't fast enough.

Linus Torvalds is quite positive about zero-initializing the locals though,
see https://lkml.org/lkml/2019/7/30/1303:

"when a compiler has an option to initialize stack variables, it
would probably _also_ be a very good idea for that compiler to then
support a variable attribute that says "don't initialize _this_
variable, I will do that manually".
I also think that the "initialize with poison" is
pointless and wrong. Yes, it can find bugs, but it doesn't really help
improve the general situation, and people see it as a debugging tool,
not a "improve code quality and improve the life of kernel developers"
tool.

So having a flag similar to -ftrivial-auto-var-init=zero in GCC will be
appreciated by the Linux kernel community.

currently, kernel is using a gcc plugin to support stack variables
auto-initialization:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c

** Current situation:

A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support pattern init and
 zero init already;
http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/
Pattern init is used in development build for debugging purpose, zero init is
used in production build for security purpose.

B. for CLANG, even though zero init is controlled by
"-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang",
many end users have used it for production build.
this functionality cannot be removed anymore.
"-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang"
might be changed to more meaningful name later in CLANG.


** My proposal:

A. add a new GCC option: (same name and meaning as CLANG)
-ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;

B. add a new attribute for variable:
__attribute((uninitialized)
the marked variable is uninitialized intentionaly for performance purpose.

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.


> 
>> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> 
>> 
>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>> 
>> 
>> I think both as long as they are source-level auto-variables. Then which one is better?
>> 
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag-Wmaybe-uninitialized.
>> 
>> 
>> 
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>> 
>> if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>   {
>>     tree init = DECL_INITIAL (decl);
>> ...
>>     if (init && init != error_mark_node)
>>       {
>>         if (!TREE_STATIC (decl))
>>   {
>>     DECL_IS_INITIALIZED(decl) = 1;
>>   }
>> 
>> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>> 
>> 
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?
>> 
>> 
>> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>> 
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>> 
>> I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> 
>> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>> 
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> This is a really good point…
>> 
>> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
>> for this issue). However, if we have to met the following requirement:
>> 
>> 
>> So is optimization supposed to pick up zero or is it supposed to act
>> as if the initializer
>> is unknown?
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>> 
>> So, this is a problem that is not easy to resolve.
>> 
>> 
>> Indeed, those are conflicting goals.
>> 
>> Do you have suggestion on this?
>> 
>> 
>> No, not any easy ones.  Doing more of the uninit analysis early (there
>> is already an early
>> uninit pass) which would mean doing IPA analysis turing GCC into more
>> of a static analysis
>> tool.  Theres the analyzer now, not sure if that can employ an early
>> LTO phase for example.
>> 
>> 
>> 
>> 
>> Richard.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-02 15:36               ` Qing Zhao
@ 2020-12-03  8:45                 ` Richard Biener
  2020-12-03 16:07                   ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2020-12-03  8:45 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches, kees Cook

On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> On Dec 2, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
> Hi, Richard,
>
> Could you please comment on the following approach:
>
> Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification.
> However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to
> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes,
> (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p”  to consider “artificial” zero-initializers.
> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value).
>
> With such approach, we should be able to address all those conflicts.
>
> Do you see any obvious issue with this approach?
>
>
> Yes, DSE will happily elide an explicit zero-init following the
> artificial one leading to false uninit diagnostics.
>
>
> Indeed.  This is a big issue. And other optimizations might also be impacted by the new zero-init, resulting changed behavior
> of uninitialized analysis in the later stage.

I don't see how the issue can be resolved, you can't get both, uninit
warnings and no uninitialized memory.
People can compile twice, once without -fzero-init to get uninit
warnings and once with -fzero-init to get
the extra "security".

Richard.

>
> What's the intended purpose of the zero-init?
>
>
>
> The purpose of this new option is: (from the original LLVM patch submission):
>
> "Add an option to initialize automatic variables with either a pattern or with
> zeroes. The default is still that automatic variables are uninitialized. Also
> add attributes to request uninitialized on a per-variable basis, mainly to disable
> initialization of large stack arrays when deemed too expensive.
>
> This isn't meant to change the semantics of C and C++. Rather, it's meant to be
> a last-resort when programmers inadvertently have some undefined behavior in
> their code. This patch aims to make undefined behavior hurt less, which
> security-minded people will be very happy about. Notably, this means that
> there's no inadvertent information leak when:
>
> • The compiler re-uses stack slots, and a value is used uninitialized.
> • The compiler re-uses a register, and a value is used uninitialized.
> • Stack structs / arrays / unions with padding are copied.
> This patch only addresses stack and register information leaks. There's many
> more infoleaks that we could address, and much more undefined behavior that
> could be tamed. Let's keep this patch focused, and I'm happy to address related
> issues elsewhere."
>
> For more details, please refer to the LLVM code review discussion on this patch:
> https://reviews.llvm.org/D54604
>
>
> I also wrote a simple writeup for this task based on my study and discussion with
> Kees Cook (cc’ing him) as following:
>
>
> thanks.
>
> Qing
>
> Support stack variables auto-initialization in GCC
>
> 11/19/2020
>
> Qing Zhao
>
> =======================================================
>
>
> ** Background of the task:
>
> The correponding GCC bugzilla RFE was created on 9/3/2018:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210
>
> A similar option for LLVM (around Nov, 2018)
> https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html
> had invoked a lot of discussion before committed.
>
> (The following are quoted from the comments of Alexander Potapenko in
> GCC bug 87210):
>
> Finally, on Oct, 2019, upstream Clang supports force initialization
> of stack variables under the -ftrivial-auto-var-init flag.
>
> -ftrivial-auto-var-init=pattern initializes local variables with a 0xAA pattern
> (actually it's more complicated, see https://reviews.llvm.org/D54604)
>
> -ftrivial-auto-var-init=zero provides zero-initialization of locals.
> This mode isn't officially supported yet and is hidden behind an additional
> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang flag.
> This is done to avoid creating a C++ dialect where all variables are
> zero-initialized.
>
> Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that performs
> the build  with -ftrivial-auto-var-init=pattern. This one isn't widely adopted
> yet, partially because initializing locals with 0xAA isn't fast enough.
>
> Linus Torvalds is quite positive about zero-initializing the locals though,
> see https://lkml.org/lkml/2019/7/30/1303:
>
> "when a compiler has an option to initialize stack variables, it
> would probably _also_ be a very good idea for that compiler to then
> support a variable attribute that says "don't initialize _this_
> variable, I will do that manually".
> I also think that the "initialize with poison" is
> pointless and wrong. Yes, it can find bugs, but it doesn't really help
> improve the general situation, and people see it as a debugging tool,
> not a "improve code quality and improve the life of kernel developers"
> tool.
>
> So having a flag similar to -ftrivial-auto-var-init=zero in GCC will be
> appreciated by the Linux kernel community.
>
> currently, kernel is using a gcc plugin to support stack variables
> auto-initialization:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c
>
> ** Current situation:
>
> A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support pattern init and
>  zero init already;
> http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
> https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/
> Pattern init is used in development build for debugging purpose, zero init is
> used in production build for security purpose.
>
> B. for CLANG, even though zero init is controlled by
> "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang",
> many end users have used it for production build.
> this functionality cannot be removed anymore.
> "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang"
> might be changed to more meaningful name later in CLANG.
>
>
> ** My proposal:
>
> A. add a new GCC option: (same name and meaning as CLANG)
> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>
> B. add a new attribute for variable:
> __attribute((uninitialized)
> the marked variable is uninitialized intentionaly for performance purpose.
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
>
>
>
> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
>
>
> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
> not explicitly initialized in the declaration, the basic idea is following:
>
> ** The proposal:
>
> A. add a new GCC option: (same name and meaning as CLANG)
> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>
> B. add a new attribute for variable:
> __attribute((uninitialized)
> the marked variable is uninitialized intentionaly for performance purpose.
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language".
>
>
> ** The implementation:
>
> There are two major requirements for the implementation:
>
> 1. all auto-variables that do not have an explicit initializer should be initialized to
> zero by this option.  (Same behavior as CLANG)
>
> 2. keep the current static warning on uninitialized variables untouched.
>
> In order to satisfy 1, we should check whether an auto-variable has initializer
> or not;
> In order to satisfy 2, we should add this new transformation after
> "pass_late_warn_uninitialized".
>
> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
> If Not, then insert an initialization for it.
>
> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>
>
> I think both as long as they are source-level auto-variables. Then which one is better?
>
>
> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag-Wmaybe-uninitialized.
>
>
>
> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>
> if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>   {
>     tree init = DECL_INITIAL (decl);
> ...
>     if (init && init != error_mark_node)
>       {
>         if (!TREE_STATIC (decl))
>   {
>     DECL_IS_INITIALIZED(decl) = 1;
>   }
>
> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>
>
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?
>
>
> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
> If GCC adds this phase in FE, then the following design requirement
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
>
> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>
> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>
> I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
>
> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> This is a really good point…
>
> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
> for this issue). However, if we have to met the following requirement:
>
>
> So is optimization supposed to pick up zero or is it supposed to act
> as if the initializer
> is unknown?
>
> C. The implementation needs to keep the current static warning on uninitialized
> variables untouched in order to avoid "forking the language”.
>
> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>
> So, this is a problem that is not easy to resolve.
>
>
> Indeed, those are conflicting goals.
>
> Do you have suggestion on this?
>
>
> No, not any easy ones.  Doing more of the uninit analysis early (there
> is already an early
> uninit pass) which would mean doing IPA analysis turing GCC into more
> of a static analysis
> tool.  Theres the analyzer now, not sure if that can employ an early
> LTO phase for example.
>
>
>
>
> Richard.
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-03  8:45                 ` Richard Biener
@ 2020-12-03 16:07                   ` Qing Zhao
  2020-12-03 16:36                     ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-03 16:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc Patches, kees Cook



> On Dec 3, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 2, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> Hi, Richard,
>> 
>> Could you please comment on the following approach:
>> 
>> Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification.
>> However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to
>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes,
>> (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p”  to consider “artificial” zero-initializers.
>> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value).
>> 
>> With such approach, we should be able to address all those conflicts.
>> 
>> Do you see any obvious issue with this approach?
>> 
>> 
>> Yes, DSE will happily elide an explicit zero-init following the
>> artificial one leading to false uninit diagnostics.
>> 
>> 
>> Indeed.  This is a big issue. And other optimizations might also be impacted by the new zero-init, resulting changed behavior
>> of uninitialized analysis in the later stage.
> 
> I don't see how the issue can be resolved, you can't get both, uninit
> warnings and no uninitialized memory.
> People can compile twice, once without -fzero-init to get uninit
> warnings and once with -fzero-init to get
> the extra "security".

So, for GCC, you think that it’s okay to get rid of the following requirement:

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.

Then, we can add explanation in the user documentation of the new -fzero-init and also 
that of the -Wuninitialized to inform users that -fzero-init will change the behavior of -Wuninitialized.
In order to get the warnings, -fzero-init should not be added at the same time?

With this requirement being eliminated, implementation will be much easier. 

We can add the new initialization during simplification phase. Then this new option will work
for all languages.  Is this reasonable?

thanks.

Qing



> 
> Richard.
> 
>> 
>> What's the intended purpose of the zero-init?
>> 
>> 
>> 
>> The purpose of this new option is: (from the original LLVM patch submission):
>> 
>> "Add an option to initialize automatic variables with either a pattern or with
>> zeroes. The default is still that automatic variables are uninitialized. Also
>> add attributes to request uninitialized on a per-variable basis, mainly to disable
>> initialization of large stack arrays when deemed too expensive.
>> 
>> This isn't meant to change the semantics of C and C++. Rather, it's meant to be
>> a last-resort when programmers inadvertently have some undefined behavior in
>> their code. This patch aims to make undefined behavior hurt less, which
>> security-minded people will be very happy about. Notably, this means that
>> there's no inadvertent information leak when:
>> 
>> • The compiler re-uses stack slots, and a value is used uninitialized.
>> • The compiler re-uses a register, and a value is used uninitialized.
>> • Stack structs / arrays / unions with padding are copied.
>> This patch only addresses stack and register information leaks. There's many
>> more infoleaks that we could address, and much more undefined behavior that
>> could be tamed. Let's keep this patch focused, and I'm happy to address related
>> issues elsewhere."
>> 
>> For more details, please refer to the LLVM code review discussion on this patch:
>> https://reviews.llvm.org/D54604
>> 
>> 
>> I also wrote a simple writeup for this task based on my study and discussion with
>> Kees Cook (cc’ing him) as following:
>> 
>> 
>> thanks.
>> 
>> Qing
>> 
>> Support stack variables auto-initialization in GCC
>> 
>> 11/19/2020
>> 
>> Qing Zhao
>> 
>> =======================================================
>> 
>> 
>> ** Background of the task:
>> 
>> The correponding GCC bugzilla RFE was created on 9/3/2018:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210
>> 
>> A similar option for LLVM (around Nov, 2018)
>> https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html
>> had invoked a lot of discussion before committed.
>> 
>> (The following are quoted from the comments of Alexander Potapenko in
>> GCC bug 87210):
>> 
>> Finally, on Oct, 2019, upstream Clang supports force initialization
>> of stack variables under the -ftrivial-auto-var-init flag.
>> 
>> -ftrivial-auto-var-init=pattern initializes local variables with a 0xAA pattern
>> (actually it's more complicated, see https://reviews.llvm.org/D54604)
>> 
>> -ftrivial-auto-var-init=zero provides zero-initialization of locals.
>> This mode isn't officially supported yet and is hidden behind an additional
>> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang flag.
>> This is done to avoid creating a C++ dialect where all variables are
>> zero-initialized.
>> 
>> Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that performs
>> the build  with -ftrivial-auto-var-init=pattern. This one isn't widely adopted
>> yet, partially because initializing locals with 0xAA isn't fast enough.
>> 
>> Linus Torvalds is quite positive about zero-initializing the locals though,
>> see https://lkml.org/lkml/2019/7/30/1303:
>> 
>> "when a compiler has an option to initialize stack variables, it
>> would probably _also_ be a very good idea for that compiler to then
>> support a variable attribute that says "don't initialize _this_
>> variable, I will do that manually".
>> I also think that the "initialize with poison" is
>> pointless and wrong. Yes, it can find bugs, but it doesn't really help
>> improve the general situation, and people see it as a debugging tool,
>> not a "improve code quality and improve the life of kernel developers"
>> tool.
>> 
>> So having a flag similar to -ftrivial-auto-var-init=zero in GCC will be
>> appreciated by the Linux kernel community.
>> 
>> currently, kernel is using a gcc plugin to support stack variables
>> auto-initialization:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c
>> 
>> ** Current situation:
>> 
>> A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support pattern init and
>> zero init already;
>> http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
>> https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/
>> Pattern init is used in development build for debugging purpose, zero init is
>> used in production build for security purpose.
>> 
>> B. for CLANG, even though zero init is controlled by
>> "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang",
>> many end users have used it for production build.
>> this functionality cannot be removed anymore.
>> "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang"
>> might be changed to more meaningful name later in CLANG.
>> 
>> 
>> ** My proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> 
>> 
>> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> 
>> 
>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>> 
>> 
>> I think both as long as they are source-level auto-variables. Then which one is better?
>> 
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag-Wmaybe-uninitialized.
>> 
>> 
>> 
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine “gimpley_decl_expr” (gimplify.c) as following:
>> 
>> if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>  {
>>    tree init = DECL_INITIAL (decl);
>> ...
>>    if (init && init != error_mark_node)
>>      {
>>        if (!TREE_STATIC (decl))
>>  {
>>    DECL_IS_INITIALIZED(decl) = 1;
>>  }
>> 
>> Is this enough for all Frontends? Are there other places that I need to maintain this bit?
>> 
>> 
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?
>> 
>> 
>> All the locals from the source-code point of view should be covered.   (From my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied quite late.
>> 
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>> 
>> I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> 
>> Adding  this new transformation during RTL expansion is okay.  I will check on this in more details to see how to add it to RTL expansion phase.
>> 
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> This is a really good point…
>> 
>> In order to avoid optimization  to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best
>> for this issue). However, if we have to met the following requirement:
>> 
>> 
>> So is optimization supposed to pick up zero or is it supposed to act
>> as if the initializer
>> is unknown?
>> 
>> C. The implementation needs to keep the current static warning on uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”.
>> 
>> So, this is a problem that is not easy to resolve.
>> 
>> 
>> Indeed, those are conflicting goals.
>> 
>> Do you have suggestion on this?
>> 
>> 
>> No, not any easy ones.  Doing more of the uninit analysis early (there
>> is already an early
>> uninit pass) which would mean doing IPA analysis turing GCC into more
>> of a static analysis
>> tool.  Theres the analyzer now, not sure if that can employ an early
>> LTO phase for example.
>> 
>> 
>> 
>> 
>> Richard.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-03 16:07                   ` Qing Zhao
@ 2020-12-03 16:36                     ` Richard Biener
  2020-12-03 16:40                       ` Qing Zhao
  2020-12-03 16:56                       ` Richard Sandiford
  0 siblings, 2 replies; 56+ messages in thread
From: Richard Biener @ 2020-12-03 16:36 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches, kees Cook

On December 3, 2020 5:07:28 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM> wrote:
>
>
>> On Dec 3, 2020, at 2:45 AM, Richard Biener
><richard.guenther@gmail.com> wrote:
>> 
>> On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com
><mailto:QING.ZHAO@oracle.com>> wrote:
>>> 
>>> 
>>> 
>>> On Dec 2, 2020, at 2:45 AM, Richard Biener
><richard.guenther@gmail.com> wrote:
>>> 
>>> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com>
>wrote:
>>> 
>>> 
>>> Hi, Richard,
>>> 
>>> Could you please comment on the following approach:
>>> 
>>> Instead of adding the zero-initializer quite late at the pass
>“pass_expand”, we can add it as early as during gimplification.
>>> However, we will mark these new added zero-initializers as
>“artificial”. And passing this “artificial” information to
>>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”,
>in these two uninitialized variable analysis passes,
>>> (i.e., in tree-sea-uninit.c) We will update the checking on
>“ssa_undefined_value_p”  to consider “artificial” zero-initializers.
>>> (i.e, if the def_stmt is marked with “artificial”, then it’s a
>undefined value).
>>> 
>>> With such approach, we should be able to address all those
>conflicts.
>>> 
>>> Do you see any obvious issue with this approach?
>>> 
>>> 
>>> Yes, DSE will happily elide an explicit zero-init following the
>>> artificial one leading to false uninit diagnostics.
>>> 
>>> 
>>> Indeed.  This is a big issue. And other optimizations might also be
>impacted by the new zero-init, resulting changed behavior
>>> of uninitialized analysis in the later stage.
>> 
>> I don't see how the issue can be resolved, you can't get both, uninit
>> warnings and no uninitialized memory.
>> People can compile twice, once without -fzero-init to get uninit
>> warnings and once with -fzero-init to get
>> the extra "security".
>
>So, for GCC, you think that it’s okay to get rid of the following
>requirement:
>
>C. The implementation needs to keep the current static warning on
>uninitialized
>variables untouched in order to avoid "forking the language”.
>
>Then, we can add explanation in the user documentation of the new
>-fzero-init and also 
>that of the -Wuninitialized to inform users that -fzero-init will
>change the behavior of -Wuninitialized.
>In order to get the warnings, -fzero-init should not be added at the
>same time?
>
>With this requirement being eliminated, implementation will be much
>easier. 
>
>We can add the new initialization during simplification phase. Then
>this new option will work
>for all languages.  Is this reasonable?

I think that's reasonable indeed. Eventually doing the init after the early uninit pass is possible as well.

Richard. 

>thanks.
>
>Qing
>
>
>
>> 
>> Richard.
>> 
>>> 
>>> What's the intended purpose of the zero-init?
>>> 
>>> 
>>> 
>>> The purpose of this new option is: (from the original LLVM patch
>submission):
>>> 
>>> "Add an option to initialize automatic variables with either a
>pattern or with
>>> zeroes. The default is still that automatic variables are
>uninitialized. Also
>>> add attributes to request uninitialized on a per-variable basis,
>mainly to disable
>>> initialization of large stack arrays when deemed too expensive.
>>> 
>>> This isn't meant to change the semantics of C and C++. Rather, it's
>meant to be
>>> a last-resort when programmers inadvertently have some undefined
>behavior in
>>> their code. This patch aims to make undefined behavior hurt less,
>which
>>> security-minded people will be very happy about. Notably, this means
>that
>>> there's no inadvertent information leak when:
>>> 
>>> • The compiler re-uses stack slots, and a value is used
>uninitialized.
>>> • The compiler re-uses a register, and a value is used
>uninitialized.
>>> • Stack structs / arrays / unions with padding are copied.
>>> This patch only addresses stack and register information leaks.
>There's many
>>> more infoleaks that we could address, and much more undefined
>behavior that
>>> could be tamed. Let's keep this patch focused, and I'm happy to
>address related
>>> issues elsewhere."
>>> 
>>> For more details, please refer to the LLVM code review discussion on
>this patch:
>>> https://reviews.llvm.org/D54604
>>> 
>>> 
>>> I also wrote a simple writeup for this task based on my study and
>discussion with
>>> Kees Cook (cc’ing him) as following:
>>> 
>>> 
>>> thanks.
>>> 
>>> Qing
>>> 
>>> Support stack variables auto-initialization in GCC
>>> 
>>> 11/19/2020
>>> 
>>> Qing Zhao
>>> 
>>> =======================================================
>>> 
>>> 
>>> ** Background of the task:
>>> 
>>> The correponding GCC bugzilla RFE was created on 9/3/2018:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210
>>> 
>>> A similar option for LLVM (around Nov, 2018)
>>> https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html
>>> had invoked a lot of discussion before committed.
>>> 
>>> (The following are quoted from the comments of Alexander Potapenko
>in
>>> GCC bug 87210):
>>> 
>>> Finally, on Oct, 2019, upstream Clang supports force initialization
>>> of stack variables under the -ftrivial-auto-var-init flag.
>>> 
>>> -ftrivial-auto-var-init=pattern initializes local variables with a
>0xAA pattern
>>> (actually it's more complicated, see
>https://reviews.llvm.org/D54604)
>>> 
>>> -ftrivial-auto-var-init=zero provides zero-initialization of locals.
>>> This mode isn't officially supported yet and is hidden behind an
>additional
>>>
>-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
>flag.
>>> This is done to avoid creating a C++ dialect where all variables are
>>> zero-initialized.
>>> 
>>> Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that
>performs
>>> the build  with -ftrivial-auto-var-init=pattern. This one isn't
>widely adopted
>>> yet, partially because initializing locals with 0xAA isn't fast
>enough.
>>> 
>>> Linus Torvalds is quite positive about zero-initializing the locals
>though,
>>> see https://lkml.org/lkml/2019/7/30/1303:
>>> 
>>> "when a compiler has an option to initialize stack variables, it
>>> would probably _also_ be a very good idea for that compiler to then
>>> support a variable attribute that says "don't initialize _this_
>>> variable, I will do that manually".
>>> I also think that the "initialize with poison" is
>>> pointless and wrong. Yes, it can find bugs, but it doesn't really
>help
>>> improve the general situation, and people see it as a debugging
>tool,
>>> not a "improve code quality and improve the life of kernel
>developers"
>>> tool.
>>> 
>>> So having a flag similar to -ftrivial-auto-var-init=zero in GCC will
>be
>>> appreciated by the Linux kernel community.
>>> 
>>> currently, kernel is using a gcc plugin to support stack variables
>>> auto-initialization:
>>>
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c
>>> 
>>> ** Current situation:
>>> 
>>> A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support
>pattern init and
>>> zero init already;
>>> http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
>>>
>https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/
>>> Pattern init is used in development build for debugging purpose,
>zero init is
>>> used in production build for security purpose.
>>> 
>>> B. for CLANG, even though zero init is controlled by
>>>
>"-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang",
>>> many end users have used it for production build.
>>> this functionality cannot be removed anymore.
>>>
>"-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang"
>>> might be changed to more meaningful name later in CLANG.
>>> 
>>> 
>>> ** My proposal:
>>> 
>>> A. add a new GCC option: (same name and meaning as CLANG)
>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as
>CLANG;
>>> 
>>> B. add a new attribute for variable:
>>> __attribute((uninitialized)
>>> the marked variable is uninitialized intentionaly for performance
>purpose.
>>> 
>>> C. The implementation needs to keep the current static warning on
>uninitialized
>>> variables untouched in order to avoid "forking the language”.
>>> 
>>> 
>>> 
>>> On Nov 25, 2020, at 3:11 AM, Richard Biener
><richard.guenther@gmail.com> wrote:
>>> 
>>> 
>>> 
>>> I am planing to add a new phase immediately after
>“pass_late_warn_uninitialized” to initialize all auto-variables that
>are
>>> not explicitly initialized in the declaration, the basic idea is
>following:
>>> 
>>> ** The proposal:
>>> 
>>> A. add a new GCC option: (same name and meaning as CLANG)
>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as
>CLANG;
>>> 
>>> B. add a new attribute for variable:
>>> __attribute((uninitialized)
>>> the marked variable is uninitialized intentionaly for performance
>purpose.
>>> 
>>> C. The implementation needs to keep the current static warning on
>uninitialized
>>> variables untouched in order to avoid "forking the language".
>>> 
>>> 
>>> ** The implementation:
>>> 
>>> There are two major requirements for the implementation:
>>> 
>>> 1. all auto-variables that do not have an explicit initializer
>should be initialized to
>>> zero by this option.  (Same behavior as CLANG)
>>> 
>>> 2. keep the current static warning on uninitialized variables
>untouched.
>>> 
>>> In order to satisfy 1, we should check whether an auto-variable has
>initializer
>>> or not;
>>> In order to satisfy 2, we should add this new transformation after
>>> "pass_late_warn_uninitialized".
>>> 
>>> So, we should be able to check whether an auto-variable has
>initializer or not after “pass_late_warn_uninitialized”,
>>> If Not, then insert an initialization for it.
>>> 
>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be
>better?
>>> 
>>> 
>>> I think both as long as they are source-level auto-variables. Then
>which one is better?
>>> 
>>> 
>>> Another issue is, in order to check whether an auto-variable has
>initializer, I plan to add a new bit in “decl_common” as:
>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>> unsigned decl_is_initialized :1;
>>> 
>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.
> */
>>> #define DECL_IS_INITIALIZED(NODE) \
>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>> 
>>> set this bit when setting DECL_INITIAL for the variables in FE. then
>keep it
>>> even though DECL_INITIAL might be NULLed.
>>> 
>>> 
>>> For locals it would be more reliable to set this
>flag-Wmaybe-uninitialized.
>>> 
>>> 
>>> 
>>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the
>routine “gimpley_decl_expr” (gimplify.c) as following:
>>> 
>>> if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>>  {
>>>    tree init = DECL_INITIAL (decl);
>>> ...
>>>    if (init && init != error_mark_node)
>>>      {
>>>        if (!TREE_STATIC (decl))
>>>  {
>>>    DECL_IS_INITIALIZED(decl) = 1;
>>>  }
>>> 
>>> Is this enough for all Frontends? Are there other places that I need
>to maintain this bit?
>>> 
>>> 
>>> 
>>> Do you have any comment and suggestions?
>>> 
>>> 
>>> As said above - do you want to cover registers as well as locals?
>>> 
>>> 
>>> All the locals from the source-code point of view should be covered.
>  (From my study so far,  looks like that Clang adds that phase in FE).
>>> If GCC adds this phase in FE, then the following design requirement
>>> 
>>> C. The implementation needs to keep the current static warning on
>uninitialized
>>> variables untouched in order to avoid "forking the language”.
>>> 
>>> cannot be satisfied.  Since gcc’s uninitialized variables analysis
>is applied quite late.
>>> 
>>> So, we have to add this new phase after
>“pass_late_warn_uninitialized”.
>>> 
>>> I'd do
>>> the actual zeroing during RTL expansion instead since otherwise you
>>> have to figure youself whether a local is actually used (see
>expand_stack_vars)
>>> 
>>> 
>>> Adding  this new transformation during RTL expansion is okay.  I
>will check on this in more details to see how to add it to RTL
>expansion phase.
>>> 
>>> 
>>> Note that optimization will already made have use of "uninitialized"
>state
>>> of locals so depending on what the actual goal is here "late" may be
>too late.
>>> 
>>> 
>>> This is a really good point…
>>> 
>>> In order to avoid optimization  to use the “uninitialized” state of
>locals, we should add the zeroing phase as early as possible (adding it
>in FE might be best
>>> for this issue). However, if we have to met the following
>requirement:
>>> 
>>> 
>>> So is optimization supposed to pick up zero or is it supposed to act
>>> as if the initializer
>>> is unknown?
>>> 
>>> C. The implementation needs to keep the current static warning on
>uninitialized
>>> variables untouched in order to avoid "forking the language”.
>>> 
>>> We have to move the new phase after all the uninitialized analysis
>is done in order to avoid “forking the language”.
>>> 
>>> So, this is a problem that is not easy to resolve.
>>> 
>>> 
>>> Indeed, those are conflicting goals.
>>> 
>>> Do you have suggestion on this?
>>> 
>>> 
>>> No, not any easy ones.  Doing more of the uninit analysis early
>(there
>>> is already an early
>>> uninit pass) which would mean doing IPA analysis turing GCC into
>more
>>> of a static analysis
>>> tool.  Theres the analyzer now, not sure if that can employ an early
>>> LTO phase for example.
>>> 
>>> 
>>> 
>>> 
>>> Richard.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-03 16:36                     ` Richard Biener
@ 2020-12-03 16:40                       ` Qing Zhao
  2020-12-03 16:56                       ` Richard Sandiford
  1 sibling, 0 replies; 56+ messages in thread
From: Qing Zhao @ 2020-12-03 16:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc Patches, kees Cook



> On Dec 3, 2020, at 10:36 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On December 3, 2020 5:07:28 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> wrote:
>> 
>> 
>>>> of uninitialized analysis in the later stage.
>>> 
>>> I don't see how the issue can be resolved, you can't get both, uninit
>>> warnings and no uninitialized memory.
>>> People can compile twice, once without -fzero-init to get uninit
>>> warnings and once with -fzero-init to get
>>> the extra "security".
>> 
>> So, for GCC, you think that it’s okay to get rid of the following
>> requirement:
>> 
>> C. The implementation needs to keep the current static warning on
>> uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> Then, we can add explanation in the user documentation of the new
>> -fzero-init and also 
>> that of the -Wuninitialized to inform users that -fzero-init will
>> change the behavior of -Wuninitialized.
>> In order to get the warnings, -fzero-init should not be added at the
>> same time?
>> 
>> With this requirement being eliminated, implementation will be much
>> easier. 
>> 
>> We can add the new initialization during simplification phase. Then
>> this new option will work
>> for all languages.  Is this reasonable?
> 
> I think that's reasonable indeed. Eventually doing the init after the early uninit pass is possible as well.

You suggested to put the new pass after the early uninit pass? Why?

Qing
> 
> Richard. 
> 
>> thanks.
>> 
>> Qing
>> 
>> 
>> 
>>> 
>>> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-03 16:36                     ` Richard Biener
  2020-12-03 16:40                       ` Qing Zhao
@ 2020-12-03 16:56                       ` Richard Sandiford
  1 sibling, 0 replies; 56+ messages in thread
From: Richard Sandiford @ 2020-12-03 16:56 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches; +Cc: Qing Zhao, Richard Biener, kees Cook

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On December 3, 2020 5:07:28 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM> wrote:
>>
>>
>>> On Dec 3, 2020, at 2:45 AM, Richard Biener
>><richard.guenther@gmail.com> wrote:
>>> 
>>> On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com
>><mailto:QING.ZHAO@oracle.com>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Dec 2, 2020, at 2:45 AM, Richard Biener
>><richard.guenther@gmail.com> wrote:
>>>> 
>>>> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com>
>>wrote:
>>>> 
>>>> 
>>>> Hi, Richard,
>>>> 
>>>> Could you please comment on the following approach:
>>>> 
>>>> Instead of adding the zero-initializer quite late at the pass
>>“pass_expand”, we can add it as early as during gimplification.
>>>> However, we will mark these new added zero-initializers as
>>“artificial”. And passing this “artificial” information to
>>>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”,
>>in these two uninitialized variable analysis passes,
>>>> (i.e., in tree-sea-uninit.c) We will update the checking on
>>“ssa_undefined_value_p”  to consider “artificial” zero-initializers.
>>>> (i.e, if the def_stmt is marked with “artificial”, then it’s a
>>undefined value).
>>>> 
>>>> With such approach, we should be able to address all those
>>conflicts.
>>>> 
>>>> Do you see any obvious issue with this approach?
>>>> 
>>>> 
>>>> Yes, DSE will happily elide an explicit zero-init following the
>>>> artificial one leading to false uninit diagnostics.
>>>> 
>>>> 
>>>> Indeed.  This is a big issue. And other optimizations might also be
>>impacted by the new zero-init, resulting changed behavior
>>>> of uninitialized analysis in the later stage.
>>> 
>>> I don't see how the issue can be resolved, you can't get both, uninit
>>> warnings and no uninitialized memory.
>>> People can compile twice, once without -fzero-init to get uninit
>>> warnings and once with -fzero-init to get
>>> the extra "security".
>>
>>So, for GCC, you think that it’s okay to get rid of the following
>>requirement:
>>
>>C. The implementation needs to keep the current static warning on
>>uninitialized
>>variables untouched in order to avoid "forking the language”.
>>
>>Then, we can add explanation in the user documentation of the new
>>-fzero-init and also 
>>that of the -Wuninitialized to inform users that -fzero-init will
>>change the behavior of -Wuninitialized.
>>In order to get the warnings, -fzero-init should not be added at the
>>same time?
>>
>>With this requirement being eliminated, implementation will be much
>>easier. 
>>
>>We can add the new initialization during simplification phase. Then
>>this new option will work
>>for all languages.  Is this reasonable?
>
> I think that's reasonable indeed. Eventually doing the init after the early uninit pass is possible as well.

Sorry to be awkward, but I kind-of disagree.  IIRC, clang was able to
give uninit warnings while implementing the initialisation as expected,
so I think this is a GCC restriction rather than a fundamental
incompatibility.

I don't think it's reasonable to expect people to read the documentation
of -ffoo for Clang and separately read the documentation of -ffoo for
GCC.  They'll at best read the documentation for one and (rightly)
expect the other compiler to behave in a compatible way.  I'm also not
sure people would build twice in practice.

I remember the issue of forking the language was discussed at length on
the Clang dev list at the time (but I haven't gone back and re-read the
thread, so I'm relying on memory here).  Not forking the language was an
important goal/requirement of the option and I don't think we should
drop it when implementing the option in GCC.

IMO, if we want to define a dialect of C/C++ in which uninitialised uses
are always well defined rather than UB, we should do that as a separate
option.  If we're implementing the Clang options, we should continue
to treat uninitialised uses as UB that triggers the same warnings as
if the option wasn't passed.

So TBH I'd rather not add the option until it can be implemented in a
way that is compatible with Clang.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-11-24 15:55     ` Richard Biener
  2020-11-24 16:54       ` Qing Zhao
@ 2020-12-03 17:32       ` Richard Sandiford
  2020-12-03 23:04         ` Qing Zhao
  2020-12-04  8:50         ` Richard Biener
  1 sibling, 2 replies; 56+ messages in thread
From: Richard Sandiford @ 2020-12-03 17:32 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>   /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>   unsigned decl_is_initialized :1;
>>
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>>   (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>
> For locals it would be more reliable to set this flag during gimplification.
>
>> Do you have any comment and suggestions?
>
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.

Haven't thought about this much, so it might be a daft idea, but would a
compromise be to use a const internal function:

  X1 = .DEFERRED_INIT (X0, INIT)

where the X0 argument is an uninitialised value and the INIT argument
describes the initialisation pattern?  So for a decl we'd have:

  X = .DEFERRED_INIT (X, INIT)

and for an SSA name we'd have:

  X_2 = .DEFERRED_INIT (X_1(D), INIT)

with all other uses of X_1(D) being replaced by X_2.  The idea is that:

* Having the X0 argument would keep the uninitialised use of the
  variable around for the later warning passes.

* Using a const function should still allow the UB to be deleted as dead
  if X1 isn't needed.

* Having a function in the way should stop passes from taking advantage
  of direct uninitialised uses for optimisation.

This means we won't be able to optimise based on the actual init
value at the gimple level, but that seems like a fair trade-off.
AIUI this is really a security feature or anti-UB hardening feature
(in the sense that users are more likely to see predictable behaviour
“in the field” even if the program has UB).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-03 17:32       ` Richard Sandiford
@ 2020-12-03 23:04         ` Qing Zhao
  2020-12-04  8:50         ` Richard Biener
  1 sibling, 0 replies; 56+ messages in thread
From: Qing Zhao @ 2020-12-03 23:04 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Richard Biener via Gcc-patches, Richard Biener

Hi, Richard,

Thanks a lot for your suggestion.

Actually, I like this idea. 

My understanding of your suggestion is:

1. During gimplification phase:

For each auto-variable that does not have an explicit initializer, insert the following initializer for it:

X = DEFERRED_INIT (X, INIT)

In which, DEFERRED_INIT is an internal const function, which can be defined as:

DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

It’s two arguments are:

1st argument:   this uninitialized auto-variable;
2nd argument:  initialized pattern (zero | pattern);

2.  During tree to SSA phase:  

No change, the current tree to SSA phase should automatically change the above new inserted statement as

X_2 = DEFERRED_INIT (X_1(D), INIT);
And all other uses of X-1(D) being replaced by X_2. 

3. During expanding phase:

Expand each call to “DEFERRED_INIT (X, INIT)” to zero or pattern depends on “INIT”. 

Is the above understanding correct? Do I miss anything? 

More comments and questions are embedded below:


> On Dec 3, 2020, at 11:32 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>  unsigned decl_is_initialized :1;
>>> 
>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>> #define DECL_IS_INITIALIZED(NODE) \
>>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>> 
>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>> even though DECL_INITIAL might be NULLed.
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>>> Do you have any comment and suggestions?
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
> 
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
> 
>  X1 = .DEFERRED_INIT (X0, INIT)
> 
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
> 
>  X = .DEFERRED_INIT (X, INIT)
> 
> and for an SSA name we'd have:
> 
>  X_2 = .DEFERRED_INIT (X_1(D), INIT)
> 
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
> 
> * Having the X0 argument would keep the uninitialised use of the
>  variable around for the later warning passes.
> 
> * Using a const function should still allow the UB to be deleted as dead
>  if X1 isn't needed.

So, current GCC will delete the UB as dead code when X1 is not needed, with
The new option, we should keep this behavior? 

> 
> * Having a function in the way should stop passes from taking advantage
>  of direct uninitialised uses for optimisation.

This will resolve the issue we raised before with directly adding “artificial” zero-initializer 
during gimplification. 

However, I am wondering whether the new added const internal functions will impact the 
optimization and then change the uninitialized analysis behavior? 
> 
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.

Yes, with this approach: 

At gimple level, we will not be able to optimize on the new added init values;
At RTL level, we will optimize on the new added init values;
RTL optimizations will be able to eliminate any redundancy introduced by this new
Initializations to reduce the cost of this options. 



> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).

Yes, this option is for security purpose, and currently have been used in productions by Microsoft, 
Apple and google, etc. 

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-03 17:32       ` Richard Sandiford
  2020-12-03 23:04         ` Qing Zhao
@ 2020-12-04  8:50         ` Richard Biener
  2020-12-04 16:19           ` Qing Zhao
  2020-12-07 17:21           ` How to traverse all the local variables that declared in the current routine? Richard Sandiford
  1 sibling, 2 replies; 56+ messages in thread
From: Richard Biener @ 2020-12-04  8:50 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches, Qing Zhao, Richard Biener,
	Richard Sandiford

On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
> >>   /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> >>   unsigned decl_is_initialized :1;
> >>
> >> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> >> #define DECL_IS_INITIALIZED(NODE) \
> >>   (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
> >>
> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> >> even though DECL_INITIAL might be NULLed.
> >
> > For locals it would be more reliable to set this flag during gimplification.
> >
> >> Do you have any comment and suggestions?
> >
> > As said above - do you want to cover registers as well as locals?  I'd do
> > the actual zeroing during RTL expansion instead since otherwise you
> > have to figure youself whether a local is actually used (see expand_stack_vars)
> >
> > Note that optimization will already made have use of "uninitialized" state
> > of locals so depending on what the actual goal is here "late" may be too late.
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
>   X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
>   X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
>   X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
>   variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
>   if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
>   of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).

The question is whether it's in line of peoples expectation that
explicitely zero-initialized code behaves differently from
implicitely zero-initialized code with respect to optimization
and secondary side-effects (late diagnostics, latent bugs, etc.).

Introducing a new concept like .DEFERRED_INIT is much more
heavy-weight than an explicit zero initializer.

As for optimization I fear you'll get a load of redundant zero-init
actually emitted if you can just rely on RTL DSE/DCE to remove it.

Btw, I don't think theres any reason to cling onto clangs semantics
for a particular switch.  We'll never be able to emulate 1:1 behavior
and our -Wuninit behavior is probably wastly different already.

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-04  8:50         ` Richard Biener
@ 2020-12-04 16:19           ` Qing Zhao
  2020-12-07  7:12             ` Richard Biener
  2020-12-07 17:21           ` How to traverse all the local variables that declared in the current routine? Richard Sandiford
  1 sibling, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-04 16:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Biener via Gcc-patches, Richard Sandiford

> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
> <richard.sandiford@arm.com <mailto:richard.sandiford@arm.com>> wrote:
>> 
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>  unsigned decl_is_initialized :1;
>>>> 
>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>> 
>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>>> even though DECL_INITIAL might be NULLed.
>>> 
>>> For locals it would be more reliable to set this flag during gimplification.
>>> 
>>>> Do you have any comment and suggestions?
>>> 
>>> As said above - do you want to cover registers as well as locals?  I'd do
>>> the actual zeroing during RTL expansion instead since otherwise you
>>> have to figure youself whether a local is actually used (see expand_stack_vars)
>>> 
>>> Note that optimization will already made have use of "uninitialized" state
>>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>>  X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>>  X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>>  X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>>  variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>>  if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>>  of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
> 
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
> 
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.

What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?

The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
the “uninitialized” info from source code level to “pass_expand”. 

If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. 

However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
be much easier and simpler, and also smaller run-time overhead.

> 
> As for optimization I fear you'll get a load of redundant zero-init
> actually emitted if you can just rely on RTL DSE/DCE to remove it.

Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
Initialization since it will be used in production build. 
We can do some run-time performance evaluation when we have an implementation ready. 

> 
> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.

From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. 

For example, for the following small testing case:
void blah(int);

int foo_2 (int n, int l, int m, int r)
{
  int v;

  if ( (n > 10) && (m != 100)  && (r < 20) )
    v = r;

  if (l > 100)
    if ( (n <= 8) &&  (m < 102)  && (r < 19) )
      blah(v); /* { dg-warning "uninitialized" "real warning" } */

  return 0;
}

GCC is able to report maybe uninitialized warning, but Clang cannot. 
Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. 

Really curious on how clang implement its uninitialized analysis?

Qing

> 
> Richard.
> 
>> Thanks,
>> Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-04 16:19           ` Qing Zhao
@ 2020-12-07  7:12             ` Richard Biener
  2020-12-07 16:20               ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2020-12-07  7:12 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Biener via Gcc-patches, Richard Sandiford

On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>  unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag during gimplification.
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
>  X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
>  X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
>  X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
>  variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
>  if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
>  of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).
>
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
>
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.
>
>
> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>
> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
> the “uninitialized” info from source code level to “pass_expand”.

Well, "untouched" is a bit oversimplified.  You do need to handle
.DEFERRED_INIT as not
being an initialization which will definitely get interesting.

> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>
> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
> be much easier and simpler, and also smaller run-time overhead.
>
>
> As for optimization I fear you'll get a load of redundant zero-init
> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>
>
> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
> Initialization since it will be used in production build.
> We can do some run-time performance evaluation when we have an implementation ready.

Note there will be other passes "confused" by .DEFERRED_INIT.  Note
that there's going to be other
considerations - namely where to emit the .DEFERRED_INIT - when
emitting it during gimplification
you can emit it at the start of the block of block-scope variables.
When emitting after gimplification
you have to emit at function start which will probably make stack slot
sharing inefficient because
the deferred init will cause overlapping lifetimes.  With emitting at
block boundary the .DEFERRED_INIT
will act as code-motion barrier (and it itself likely cannot be moved)
so for example invariant motion
will no longer happen.  Likewise optimizations like SRA will be
confused by .DEFERRED_INIT which
again will lead to bigger stack usage (and less optimization).

But sure, you can try implement a few variants but definitely
.DEFERRED_INIT will be the most
work.

> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.
>
>
> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same.
>
> For example, for the following small testing case:
> void blah(int);
>
> int foo_2 (int n, int l, int m, int r)
> {
>   int v;
>
>   if ( (n > 10) && (m != 100)  && (r < 20) )
>     v = r;
>
>   if (l > 100)
>     if ( (n <= 8) &&  (m < 102)  && (r < 19) )
>       blah(v); /* { dg-warning "uninitialized" "real warning" } */
>
>   return 0;
> }
>
> GCC is able to report maybe uninitialized warning, but Clang cannot.
> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG.
>
> Really curious on how clang implement its uninitialized analysis?
>
> Qing
>
>
>
>
> Richard.
>
> Thanks,
> Richard
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-07  7:12             ` Richard Biener
@ 2020-12-07 16:20               ` Qing Zhao
  2020-12-07 17:10                 ` Richard Sandiford
  2020-12-08  7:40                 ` Richard Biener
  0 siblings, 2 replies; 56+ messages in thread
From: Qing Zhao @ 2020-12-07 16:20 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford; +Cc: Richard Biener via Gcc-patches



> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> 
>> 
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
> 
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.

Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.

>> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>> 
>> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>> 
>> 
>> As for optimization I fear you'll get a load of redundant zero-init
>> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>> 
>> 
>> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
>> Initialization since it will be used in production build.
>> We can do some run-time performance evaluation when we have an implementation ready.
> 
> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
> that there's going to be other
> considerations - namely where to emit the .DEFERRED_INIT - when
> emitting it during gimplification
> you can emit it at the start of the block of block-scope variables.
> When emitting after gimplification
> you have to emit at function start which will probably make stack slot
> sharing inefficient because
> the deferred init will cause overlapping lifetimes.  With emitting at
> block boundary the .DEFERRED_INIT
> will act as code-motion barrier (and it itself likely cannot be moved)
> so for example invariant motion
> will no longer happen.  Likewise optimizations like SRA will be
> confused by .DEFERRED_INIT which
> again will lead to bigger stack usage (and less optimization).

Yes, looks like  that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations. 
> 
> But sure, you can try implement a few variants but definitely
> .DEFERRED_INIT will be the most
> work.

How about implement the following two approaches and compare the run-time cost:

A.  Insert the real initialization during gimplification phase. 
B.  Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase. 

The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC.
The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC. 

And then decide which approach we will go with?

What’s your opinion on this?

> 
>> Btw, I don't think theres any reason to cling onto clangs semantics
>> for a particular switch.  We'll never be able to emulate 1:1 behavior
>> and our -Wuninit behavior is probably wastly different already.
>> 
>> 
>> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same.
>> 
>> For example, for the following small testing case:
>> void blah(int);
>> 
>> int foo_2 (int n, int l, int m, int r)
>> {
>>  int v;
>> 
>>  if ( (n > 10) && (m != 100)  && (r < 20) )
>>    v = r;
>> 
>>  if (l > 100)
>>    if ( (n <= 8) &&  (m < 102)  && (r < 19) )
>>      blah(v); /* { dg-warning "uninitialized" "real warning" } */
>> 
>>  return 0;
>> }
>> 
>> GCC is able to report maybe uninitialized warning, but Clang cannot.
>> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG.
>> 
>> Really curious on how clang implement its uninitialized analysis?


Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday. 
And noticed that CLANG has a data flow analysis phase based on CLANG's AST. 
http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html

And clang’s uninitialized analysis is based on this data flow analysis. 

Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward.

However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase,
Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG. 

Qing


>> 
>> Qing
>> 
>> 
>> 
>> 
>> Richard.
>> 
>> Thanks,
>> Richard


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-07 16:20               ` Qing Zhao
@ 2020-12-07 17:10                 ` Richard Sandiford
  2020-12-07 17:36                   ` Qing Zhao
  2020-12-08  7:40                 ` Richard Biener
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Sandiford @ 2020-12-07 17:10 UTC (permalink / raw)
  To: Qing Zhao via Gcc-patches

Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>>> 
>>> 
>>> 
>>> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>>> <richard.sandiford@arm.com> wrote:
>>> 
>>> 
>>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> 
>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>>> 
>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>> unsigned decl_is_initialized :1;
>>> 
>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>> #define DECL_IS_INITIALIZED(NODE) \
>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>> 
>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>> even though DECL_INITIAL might be NULLed.
>>> 
>>> 
>>> For locals it would be more reliable to set this flag during gimplification.
>>> 
>>> Do you have any comment and suggestions?
>>> 
>>> 
>>> As said above - do you want to cover registers as well as locals?  I'd do
>>> the actual zeroing during RTL expansion instead since otherwise you
>>> have to figure youself whether a local is actually used (see expand_stack_vars)
>>> 
>>> Note that optimization will already made have use of "uninitialized" state
>>> of locals so depending on what the actual goal is here "late" may be too late.
>>> 
>>> 
>>> Haven't thought about this much, so it might be a daft idea, but would a
>>> compromise be to use a const internal function:
>>> 
>>> X1 = .DEFERRED_INIT (X0, INIT)
>>> 
>>> where the X0 argument is an uninitialised value and the INIT argument
>>> describes the initialisation pattern?  So for a decl we'd have:
>>> 
>>> X = .DEFERRED_INIT (X, INIT)
>>> 
>>> and for an SSA name we'd have:
>>> 
>>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>>> 
>>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>>> 
>>> * Having the X0 argument would keep the uninitialised use of the
>>> variable around for the later warning passes.
>>> 
>>> * Using a const function should still allow the UB to be deleted as dead
>>> if X1 isn't needed.
>>> 
>>> * Having a function in the way should stop passes from taking advantage
>>> of direct uninitialised uses for optimisation.
>>> 
>>> This means we won't be able to optimise based on the actual init
>>> value at the gimple level, but that seems like a fair trade-off.
>>> AIUI this is really a security feature or anti-UB hardening feature
>>> (in the sense that users are more likely to see predictable behaviour
>>> “in the field” even if the program has UB).
>>> 
>>> 
>>> The question is whether it's in line of peoples expectation that
>>> explicitely zero-initialized code behaves differently from
>>> implicitely zero-initialized code with respect to optimization
>>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>>> 
>>> Introducing a new concept like .DEFERRED_INIT is much more
>>> heavy-weight than an explicit zero initializer.
>>> 
>>> 
>>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>>> 
>>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>
> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.

Are you sure we need to do that?  The point of having the first argument
to .DEFERRED_INIT was that that argument would still provide an
uninitialised use of the variable.  And the values are passed and
returned by value, so the lack of initialisation is explicit in
the gcall itself, without knowing what the target function does.

The idea is that we can essentially treat .DEFERRED_INIT as a normal
(const) function call.  I'd be surprised if many passes needed to
handle it specially.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-04  8:50         ` Richard Biener
  2020-12-04 16:19           ` Qing Zhao
@ 2020-12-07 17:21           ` Richard Sandiford
  1 sibling, 0 replies; 56+ messages in thread
From: Richard Sandiford @ 2020-12-07 17:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Biener via Gcc-patches, Qing Zhao

Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> >>   /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> >>   unsigned decl_is_initialized :1;
>> >>
>> >> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> >> #define DECL_IS_INITIALIZED(NODE) \
>> >>   (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> >>
>> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> >> even though DECL_INITIAL might be NULLed.
>> >
>> > For locals it would be more reliable to set this flag during gimplification.
>> >
>> >> Do you have any comment and suggestions?
>> >
>> > As said above - do you want to cover registers as well as locals?  I'd do
>> > the actual zeroing during RTL expansion instead since otherwise you
>> > have to figure youself whether a local is actually used (see expand_stack_vars)
>> >
>> > Note that optimization will already made have use of "uninitialized" state
>> > of locals so depending on what the actual goal is here "late" may be too late.
>>
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>>
>>   X1 = .DEFERRED_INIT (X0, INIT)
>>
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>>
>>   X = .DEFERRED_INIT (X, INIT)
>>
>> and for an SSA name we'd have:
>>
>>   X_2 = .DEFERRED_INIT (X_1(D), INIT)
>>
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>>
>> * Having the X0 argument would keep the uninitialised use of the
>>   variable around for the later warning passes.
>>
>> * Using a const function should still allow the UB to be deleted as dead
>>   if X1 isn't needed.
>>
>> * Having a function in the way should stop passes from taking advantage
>>   of direct uninitialised uses for optimisation.
>>
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).

From my understanding, that's OK.  I don't think this option is like -g,
which is supposed to have no observable effect other than adding or
removing debug info.

It's OK for implicit zero initialisation to be slower than explicit
zero initialisation.  After all, if someone actively wants something
to be initialised to zero, they're still expected to do it in the
source code.  The implicit initalisation is just a safety net.

Similarly, I think it's OK that code won't be optimised identically
with and without .DEFERRED_INIT (or whatever other mechanism we use),
and so won't provide identical late warnings.  In both cases we should
just do our best to diagnose what we can.

> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.

Yeah, this isn't about trying to match compilers diagnostic-for-diagnostic.
It's more about matching them principle-for-principle.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-07 17:10                 ` Richard Sandiford
@ 2020-12-07 17:36                   ` Qing Zhao
  2020-12-07 18:05                     ` Richard Sandiford
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-07 17:36 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Qing Zhao via Gcc-patches, Richard Biener



> On Dec 7, 2020, at 11:10 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>> unsigned decl_is_initialized :1;
>>>> 
>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>> 
>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>>> even though DECL_INITIAL might be NULLed.
>>>> 
>>>> 
>>>> For locals it would be more reliable to set this flag during gimplification.
>>>> 
>>>> Do you have any comment and suggestions?
>>>> 
>>>> 
>>>> As said above - do you want to cover registers as well as locals?  I'd do
>>>> the actual zeroing during RTL expansion instead since otherwise you
>>>> have to figure youself whether a local is actually used (see expand_stack_vars)
>>>> 
>>>> Note that optimization will already made have use of "uninitialized" state
>>>> of locals so depending on what the actual goal is here "late" may be too late.
>>>> 
>>>> 
>>>> Haven't thought about this much, so it might be a daft idea, but would a
>>>> compromise be to use a const internal function:
>>>> 
>>>> X1 = .DEFERRED_INIT (X0, INIT)
>>>> 
>>>> where the X0 argument is an uninitialised value and the INIT argument
>>>> describes the initialisation pattern?  So for a decl we'd have:
>>>> 
>>>> X = .DEFERRED_INIT (X, INIT)
>>>> 
>>>> and for an SSA name we'd have:
>>>> 
>>>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>>>> 
>>>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>>>> 
>>>> * Having the X0 argument would keep the uninitialised use of the
>>>> variable around for the later warning passes.
>>>> 
>>>> * Using a const function should still allow the UB to be deleted as dead
>>>> if X1 isn't needed.
>>>> 
>>>> * Having a function in the way should stop passes from taking advantage
>>>> of direct uninitialised uses for optimisation.
>>>> 
>>>> This means we won't be able to optimise based on the actual init
>>>> value at the gimple level, but that seems like a fair trade-off.
>>>> AIUI this is really a security feature or anti-UB hardening feature
>>>> (in the sense that users are more likely to see predictable behaviour
>>>> “in the field” even if the program has UB).
>>>> 
>>>> 
>>>> The question is whether it's in line of peoples expectation that
>>>> explicitely zero-initialized code behaves differently from
>>>> implicitely zero-initialized code with respect to optimization
>>>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>>>> 
>>>> Introducing a new concept like .DEFERRED_INIT is much more
>>>> heavy-weight than an explicit zero initializer.
>>>> 
>>>> 
>>>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>>>> 
>>>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>>>> the “uninitialized” info from source code level to “pass_expand”.
>>> 
>>> Well, "untouched" is a bit oversimplified.  You do need to handle
>>> .DEFERRED_INIT as not
>>> being an initialization which will definitely get interesting.
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
> 
> Are you sure we need to do that?  The point of having the first argument
> to .DEFERRED_INIT was that that argument would still provide an
> uninitialised use of the variable.  And the values are passed and
> returned by value, so the lack of initialisation is explicit in
> the gcall itself, without knowing what the target function does.
> 
> The idea is that we can essentially treat .DEFERRED_INIT as a normal
> (const) function call.  I'd be surprised if many passes needed to
> handle it specially.
> 

Just checked with a small testing case (to emulate the .DEFERRED_INIT approach):

qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
extern int DEFFERED_INIT (int, int) __attribute__ ((const));

int foo (int n, int r)
{
  int v;

  v = DEFFERED_INIT (v, 0);
  if (n < 10) 
    v = r;

  return v;
}
qinzhao@gcc10:~/Bugs/auto-init$ sh t
/home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all -S t.c
t.c: In function ‘foo’:
t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
    7 |   v = DEFFERED_INIT (v, 0);
      |       ^~~~~~~~~~~~~~~~~~~~

We can see that the current uninitialized variable analysis treats the new added artificial initialization as the first use of the uninialized variable.  Therefore report the warning there.
However, we should report warning at “return v”. 
So, I think that we still need to specifically handle the new added artificial initialization during uninitialized analysis phase.

Do I still miss anything?

Qing



> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-07 17:36                   ` Qing Zhao
@ 2020-12-07 18:05                     ` Richard Sandiford
  2020-12-07 18:34                       ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Sandiford @ 2020-12-07 18:05 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Qing Zhao via Gcc-patches, Richard Biener

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> On Dec 7, 2020, at 11:10 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>> unsigned decl_is_initialized :1;
>>>>> 
>>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>>> 
>>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>>>> even though DECL_INITIAL might be NULLed.
>>>>> 
>>>>> 
>>>>> For locals it would be more reliable to set this flag during gimplification.
>>>>> 
>>>>> Do you have any comment and suggestions?
>>>>> 
>>>>> 
>>>>> As said above - do you want to cover registers as well as locals?  I'd do
>>>>> the actual zeroing during RTL expansion instead since otherwise you
>>>>> have to figure youself whether a local is actually used (see expand_stack_vars)
>>>>> 
>>>>> Note that optimization will already made have use of "uninitialized" state
>>>>> of locals so depending on what the actual goal is here "late" may be too late.
>>>>> 
>>>>> 
>>>>> Haven't thought about this much, so it might be a daft idea, but would a
>>>>> compromise be to use a const internal function:
>>>>> 
>>>>> X1 = .DEFERRED_INIT (X0, INIT)
>>>>> 
>>>>> where the X0 argument is an uninitialised value and the INIT argument
>>>>> describes the initialisation pattern?  So for a decl we'd have:
>>>>> 
>>>>> X = .DEFERRED_INIT (X, INIT)
>>>>> 
>>>>> and for an SSA name we'd have:
>>>>> 
>>>>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>>>>> 
>>>>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>>>>> 
>>>>> * Having the X0 argument would keep the uninitialised use of the
>>>>> variable around for the later warning passes.
>>>>> 
>>>>> * Using a const function should still allow the UB to be deleted as dead
>>>>> if X1 isn't needed.
>>>>> 
>>>>> * Having a function in the way should stop passes from taking advantage
>>>>> of direct uninitialised uses for optimisation.
>>>>> 
>>>>> This means we won't be able to optimise based on the actual init
>>>>> value at the gimple level, but that seems like a fair trade-off.
>>>>> AIUI this is really a security feature or anti-UB hardening feature
>>>>> (in the sense that users are more likely to see predictable behaviour
>>>>> “in the field” even if the program has UB).
>>>>> 
>>>>> 
>>>>> The question is whether it's in line of peoples expectation that
>>>>> explicitely zero-initialized code behaves differently from
>>>>> implicitely zero-initialized code with respect to optimization
>>>>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>>>>> 
>>>>> Introducing a new concept like .DEFERRED_INIT is much more
>>>>> heavy-weight than an explicit zero initializer.
>>>>> 
>>>>> 
>>>>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>>>>> 
>>>>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>>>>> the “uninitialized” info from source code level to “pass_expand”.
>>>> 
>>>> Well, "untouched" is a bit oversimplified.  You do need to handle
>>>> .DEFERRED_INIT as not
>>>> being an initialization which will definitely get interesting.
>>> 
>>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> Are you sure we need to do that?  The point of having the first argument
>> to .DEFERRED_INIT was that that argument would still provide an
>> uninitialised use of the variable.  And the values are passed and
>> returned by value, so the lack of initialisation is explicit in
>> the gcall itself, without knowing what the target function does.
>> 
>> The idea is that we can essentially treat .DEFERRED_INIT as a normal
>> (const) function call.  I'd be surprised if many passes needed to
>> handle it specially.
>> 
>
> Just checked with a small testing case (to emulate the .DEFERRED_INIT approach):
>
> qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
> extern int DEFFERED_INIT (int, int) __attribute__ ((const));
>
> int foo (int n, int r)
> {
>   int v;
>
>   v = DEFFERED_INIT (v, 0);
>   if (n < 10) 
>     v = r;
>
>   return v;
> }
> qinzhao@gcc10:~/Bugs/auto-init$ sh t
> /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all -S t.c
> t.c: In function ‘foo’:
> t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
>     7 |   v = DEFFERED_INIT (v, 0);
>       |       ^~~~~~~~~~~~~~~~~~~~
>
> We can see that the current uninitialized variable analysis treats the new added artificial initialization as the first use of the uninialized variable.  Therefore report the warning there.
> However, we should report warning at “return v”. 

Ah, OK, so this is about the quality of the warning, rather than about
whether we report a warning or not?

> So, I think that we still need to specifically handle the new added artificial initialization during uninitialized analysis phase.

Yeah, that sounds like one approach.  But if we're adding .DEFERRED_INIT
in response to known uninitialised uses, two other approaches might be:

(1) Give the call the same source location as one of the uninitialised uses.

(2) Pass the locations of all uninitialised uses as additional arguments.

The uninit pass would then be picking the source location differently
from normal, but I don't know what effect it would have on the quality
of diagnostics.  One obvious problem is that if there are multiple
uninitialised uses, some of them might get optimised away later.
On the other hand, using early source locations might give better
results in some cases.  I guess it will depend.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-07 18:05                     ` Richard Sandiford
@ 2020-12-07 18:34                       ` Qing Zhao
  2020-12-08  7:35                         ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-07 18:34 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Qing Zhao via Gcc-patches, Richard Biener



> On Dec 7, 2020, at 12:05 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> On Dec 7, 2020, at 11:10 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>>> 
>>>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>>>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>>> unsigned decl_is_initialized :1;
>>>>>> 
>>>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>>>> 
>>>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>>>>> even though DECL_INITIAL might be NULLed.
>>>>>> 
>>>>>> 
>>>>>> For locals it would be more reliable to set this flag during gimplification.
>>>>>> 
>>>>>> Do you have any comment and suggestions?
>>>>>> 
>>>>>> 
>>>>>> As said above - do you want to cover registers as well as locals?  I'd do
>>>>>> the actual zeroing during RTL expansion instead since otherwise you
>>>>>> have to figure youself whether a local is actually used (see expand_stack_vars)
>>>>>> 
>>>>>> Note that optimization will already made have use of "uninitialized" state
>>>>>> of locals so depending on what the actual goal is here "late" may be too late.
>>>>>> 
>>>>>> 
>>>>>> Haven't thought about this much, so it might be a daft idea, but would a
>>>>>> compromise be to use a const internal function:
>>>>>> 
>>>>>> X1 = .DEFERRED_INIT (X0, INIT)
>>>>>> 
>>>>>> where the X0 argument is an uninitialised value and the INIT argument
>>>>>> describes the initialisation pattern?  So for a decl we'd have:
>>>>>> 
>>>>>> X = .DEFERRED_INIT (X, INIT)
>>>>>> 
>>>>>> and for an SSA name we'd have:
>>>>>> 
>>>>>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>>>>>> 
>>>>>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>>>>>> 
>>>>>> * Having the X0 argument would keep the uninitialised use of the
>>>>>> variable around for the later warning passes.
>>>>>> 
>>>>>> * Using a const function should still allow the UB to be deleted as dead
>>>>>> if X1 isn't needed.
>>>>>> 
>>>>>> * Having a function in the way should stop passes from taking advantage
>>>>>> of direct uninitialised uses for optimisation.
>>>>>> 
>>>>>> This means we won't be able to optimise based on the actual init
>>>>>> value at the gimple level, but that seems like a fair trade-off.
>>>>>> AIUI this is really a security feature or anti-UB hardening feature
>>>>>> (in the sense that users are more likely to see predictable behaviour
>>>>>> “in the field” even if the program has UB).
>>>>>> 
>>>>>> 
>>>>>> The question is whether it's in line of peoples expectation that
>>>>>> explicitely zero-initialized code behaves differently from
>>>>>> implicitely zero-initialized code with respect to optimization
>>>>>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>>>>>> 
>>>>>> Introducing a new concept like .DEFERRED_INIT is much more
>>>>>> heavy-weight than an explicit zero initializer.
>>>>>> 
>>>>>> 
>>>>>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>>>>>> 
>>>>>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>>>>>> the “uninitialized” info from source code level to “pass_expand”.
>>>>> 
>>>>> Well, "untouched" is a bit oversimplified.  You do need to handle
>>>>> .DEFERRED_INIT as not
>>>>> being an initialization which will definitely get interesting.
>>>> 
>>>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>>> 
>>> Are you sure we need to do that?  The point of having the first argument
>>> to .DEFERRED_INIT was that that argument would still provide an
>>> uninitialised use of the variable.  And the values are passed and
>>> returned by value, so the lack of initialisation is explicit in
>>> the gcall itself, without knowing what the target function does.
>>> 
>>> The idea is that we can essentially treat .DEFERRED_INIT as a normal
>>> (const) function call.  I'd be surprised if many passes needed to
>>> handle it specially.
>>> 
>> 
>> Just checked with a small testing case (to emulate the .DEFERRED_INIT approach):
>> 
>> qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
>> extern int DEFFERED_INIT (int, int) __attribute__ ((const));
>> 
>> int foo (int n, int r)
>> {
>>  int v;
>> 
>>  v = DEFFERED_INIT (v, 0);
>>  if (n < 10) 
>>    v = r;
>> 
>>  return v;
>> }
>> qinzhao@gcc10:~/Bugs/auto-init$ sh t
>> /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all -S t.c
>> t.c: In function ‘foo’:
>> t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
>>    7 |   v = DEFFERED_INIT (v, 0);
>>      |       ^~~~~~~~~~~~~~~~~~~~
>> 
>> We can see that the current uninitialized variable analysis treats the new added artificial initialization as the first use of the uninialized variable.  Therefore report the warning there.
>> However, we should report warning at “return v”. 
> 
> Ah, OK, so this is about the quality of the warning, rather than about
> whether we report a warning or not?
> 
>> So, I think that we still need to specifically handle the new added artificial initialization during uninitialized analysis phase.
> 
> Yeah, that sounds like one approach.  But if we're adding .DEFERRED_INIT
> in response to known uninitialised uses, two other approaches might be:
> 
> (1) Give the call the same source location as one of the uninitialised uses.
> 
> (2) Pass the locations of all uninitialised uses as additional arguments.

If we add .DEFERRED_INIT during gimplification phase, is the “uninitialized uses” information available at that time? 

Qing
> 
> The uninit pass would then be picking the source location differently
> from normal, but I don't know what effect it would have on the quality
> of diagnostics.  One obvious problem is that if there are multiple
> uninitialised uses, some of them might get optimised away later.
> On the other hand, using early source locations might give better
> results in some cases.  I guess it will depend.
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-07 18:34                       ` Qing Zhao
@ 2020-12-08  7:35                         ` Richard Biener
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Biener @ 2020-12-08  7:35 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Qing Zhao via Gcc-patches

On Mon, Dec 7, 2020 at 7:34 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> On Dec 7, 2020, at 12:05 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>
> On Dec 7, 2020, at 11:10 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>
>
> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag during gimplification.
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
> X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
> X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
> variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
> if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
> of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).
>
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
>
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.
>
>
> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>
> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
> the “uninitialized” info from source code level to “pass_expand”.
>
>
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.
>
>
> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>
>
> Are you sure we need to do that?  The point of having the first argument
> to .DEFERRED_INIT was that that argument would still provide an
> uninitialised use of the variable.  And the values are passed and
> returned by value, so the lack of initialisation is explicit in
> the gcall itself, without knowing what the target function does.
>
> The idea is that we can essentially treat .DEFERRED_INIT as a normal
> (const) function call.  I'd be surprised if many passes needed to
> handle it specially.
>
>
> Just checked with a small testing case (to emulate the .DEFERRED_INIT approach):
>
> qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
> extern int DEFFERED_INIT (int, int) __attribute__ ((const));
>
> int foo (int n, int r)
> {
>  int v;
>
>  v = DEFFERED_INIT (v, 0);
>  if (n < 10)
>    v = r;
>
>  return v;
> }
> qinzhao@gcc10:~/Bugs/auto-init$ sh t
> /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all -S t.c
> t.c: In function ‘foo’:
> t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
>    7 |   v = DEFFERED_INIT (v, 0);
>      |       ^~~~~~~~~~~~~~~~~~~~
>
> We can see that the current uninitialized variable analysis treats the new added artificial initialization as the first use of the uninialized variable.  Therefore report the warning there.
> However, we should report warning at “return v”.
>
>
> Ah, OK, so this is about the quality of the warning, rather than about
> whether we report a warning or not?
>
> So, I think that we still need to specifically handle the new added artificial initialization during uninitialized analysis phase.
>
>
> Yeah, that sounds like one approach.  But if we're adding .DEFERRED_INIT
> in response to known uninitialised uses, two other approaches might be:
>
> (1) Give the call the same source location as one of the uninitialised uses.
>
> (2) Pass the locations of all uninitialised uses as additional arguments.
>
>
> If we add .DEFERRED_INIT during gimplification phase, is the “uninitialized uses” information available at that time?

No.

> Qing
>
>
> The uninit pass would then be picking the source location differently
> from normal, but I don't know what effect it would have on the quality
> of diagnostics.  One obvious problem is that if there are multiple
> uninitialised uses, some of them might get optimised away later.
> On the other hand, using early source locations might give better
> results in some cases.  I guess it will depend.
>
> Thanks,
> Richard
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-07 16:20               ` Qing Zhao
  2020-12-07 17:10                 ` Richard Sandiford
@ 2020-12-08  7:40                 ` Richard Biener
  2020-12-08 19:54                   ` Qing Zhao
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Biener @ 2020-12-08  7:40 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
>
> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag during gimplification.
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
> X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
> X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
> variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
> if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
> of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).
>
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
>
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.
>
>
> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>
> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
> the “uninitialized” info from source code level to “pass_expand”.
>
>
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.
>
>
> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>
> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>
> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
> be much easier and simpler, and also smaller run-time overhead.
>
>
> As for optimization I fear you'll get a load of redundant zero-init
> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>
>
> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
> Initialization since it will be used in production build.
> We can do some run-time performance evaluation when we have an implementation ready.
>
>
> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
> that there's going to be other
> considerations - namely where to emit the .DEFERRED_INIT - when
> emitting it during gimplification
> you can emit it at the start of the block of block-scope variables.
> When emitting after gimplification
> you have to emit at function start which will probably make stack slot
> sharing inefficient because
> the deferred init will cause overlapping lifetimes.  With emitting at
> block boundary the .DEFERRED_INIT
> will act as code-motion barrier (and it itself likely cannot be moved)
> so for example invariant motion
> will no longer happen.  Likewise optimizations like SRA will be
> confused by .DEFERRED_INIT which
> again will lead to bigger stack usage (and less optimization).
>
>
> Yes, looks like  that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations.
>
>
> But sure, you can try implement a few variants but definitely
> .DEFERRED_INIT will be the most
> work.
>
>
> How about implement the following two approaches and compare the run-time cost:
>
> A.  Insert the real initialization during gimplification phase.
> B.  Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase.
>
> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC.
> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC.
>
> And then decide which approach we will go with?
>
> What’s your opinion on this?

Well, in the end you have to try.  Note for the purpose of stack slot
sharing you do want the
instrumentation to happen during gimplification.

Another possibility is to materialize .DEFERRED_INIT earlier than
expand, for example shortly
after IPA optimizations to avoid pessimizing loop transforms and allow
SRA.  At the point you
materialize the inits you could run the late uninit warning pass
(which would then be earlier
than regular but would still see the .DEFERRED_INIT).

While users may be happy to pay some performance stack usage is
probably more critical
(just thinking of the kernel) so not regressing there should be as
important as preserving
uninit warnings (which I for practical purposes see not important at
all - people can do
"debug" builds without -fzero-init).

Richard.

>
> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.
>
>
> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same.
>
> For example, for the following small testing case:
> void blah(int);
>
> int foo_2 (int n, int l, int m, int r)
> {
>  int v;
>
>  if ( (n > 10) && (m != 100)  && (r < 20) )
>    v = r;
>
>  if (l > 100)
>    if ( (n <= 8) &&  (m < 102)  && (r < 19) )
>      blah(v); /* { dg-warning "uninitialized" "real warning" } */
>
>  return 0;
> }
>
> GCC is able to report maybe uninitialized warning, but Clang cannot.
> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG.
>
> Really curious on how clang implement its uninitialized analysis?
>
>
>
> Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday.
> And noticed that CLANG has a data flow analysis phase based on CLANG's AST.
> http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html
>
> And clang’s uninitialized analysis is based on this data flow analysis.
>
> Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward.
>
> However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase,
> Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG.
>
> Qing
>
>
>
> Qing
>
>
>
>
> Richard.
>
> Thanks,
> Richard
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-08  7:40                 ` Richard Biener
@ 2020-12-08 19:54                   ` Qing Zhao
  2020-12-09  8:23                     ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-08 19:54 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford; +Cc: Richard Biener via Gcc-patches



> On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> 
>> 
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>> 
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>> 
>> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>> 
>> 
>> As for optimization I fear you'll get a load of redundant zero-init
>> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>> 
>> 
>> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
>> Initialization since it will be used in production build.
>> We can do some run-time performance evaluation when we have an implementation ready.
>> 
>> 
>> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
>> that there's going to be other
>> considerations - namely where to emit the .DEFERRED_INIT - when
>> emitting it during gimplification
>> you can emit it at the start of the block of block-scope variables.
>> When emitting after gimplification
>> you have to emit at function start which will probably make stack slot
>> sharing inefficient because
>> the deferred init will cause overlapping lifetimes.  With emitting at
>> block boundary the .DEFERRED_INIT
>> will act as code-motion barrier (and it itself likely cannot be moved)
>> so for example invariant motion
>> will no longer happen.  Likewise optimizations like SRA will be
>> confused by .DEFERRED_INIT which
>> again will lead to bigger stack usage (and less optimization).
>> 
>> 
>> Yes, looks like  that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations.
>> 
>> 
>> But sure, you can try implement a few variants but definitely
>> .DEFERRED_INIT will be the most
>> work.
>> 
>> 
>> How about implement the following two approaches and compare the run-time cost:
>> 
>> A.  Insert the real initialization during gimplification phase.
>> B.  Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase.
>> 
>> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC.
>> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC.
>> 
>> And then decide which approach we will go with?
>> 
>> What’s your opinion on this?
> 
> Well, in the end you have to try.  Note for the purpose of stack slot
> sharing you do want the
> instrumentation to happen during gimplification.
> 
> Another possibility is to materialize .DEFERRED_INIT earlier than
> expand, for example shortly
> after IPA optimizations to avoid pessimizing loop transforms and allow
> SRA.  At the point you
> materialize the inits you could run the late uninit warning pass
> (which would then be earlier
> than regular but would still see the .DEFERRED_INIT).

If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above, 
the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”.
Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really
want. 

> 
> While users may be happy to pay some performance stack usage is
> probably more critical

So, which pass is for computing the stack usage?

> (just thinking of the kernel) so not regressing there should be as
> important as preserving
> uninit warnings (which I for practical purposes see not important at
> all - people can do
> "debug" builds without -fzero-init).

Looks like that the major issue with the “.DERERRED_INIT” approach is:  the new inserted calls to internal const function
might inhibit some important tree optimizations. 

So, I am thinking again the following another approach I raised in the very beginning:

During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this 
“no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are
marked with “no_explicit_init”. 

This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit
might be very difficult?

Do you have any comments on this approach?

thanks.

Qing


> 
> Richard.
> 
>> 
>> Btw, I don't think theres any reason to cling onto clangs semantics
>> for a particular switch.  We'll never be able to emulate 1:1 behavior
>> and our -Wuninit behavior is probably wastly different already.
>> 
>> 
>> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same.
>> 
>> For example, for the following small testing case:
>> void blah(int);
>> 
>> int foo_2 (int n, int l, int m, int r)
>> {
>> int v;
>> 
>> if ( (n > 10) && (m != 100)  && (r < 20) )
>>   v = r;
>> 
>> if (l > 100)
>>   if ( (n <= 8) &&  (m < 102)  && (r < 19) )
>>     blah(v); /* { dg-warning "uninitialized" "real warning" } */
>> 
>> return 0;
>> }
>> 
>> GCC is able to report maybe uninitialized warning, but Clang cannot.
>> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG.
>> 
>> Really curious on how clang implement its uninitialized analysis?
>> 
>> 
>> 
>> Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday.
>> And noticed that CLANG has a data flow analysis phase based on CLANG's AST.
>> http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html
>> 
>> And clang’s uninitialized analysis is based on this data flow analysis.
>> 
>> Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward.
>> 
>> However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase,
>> Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG.
>> 
>> Qing
>> 
>> 
>> 
>> Qing
>> 
>> 
>> 
>> 
>> Richard.
>> 
>> Thanks,
>> Richard


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-08 19:54                   ` Qing Zhao
@ 2020-12-09  8:23                     ` Richard Biener
  2020-12-09 15:04                       ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2020-12-09  8:23 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
>
> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
>
> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag during gimplification.
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
> X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
> X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
> variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
> if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
> of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).
>
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
>
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.
>
>
> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>
> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
> the “uninitialized” info from source code level to “pass_expand”.
>
>
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.
>
>
> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>
> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>
> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
> be much easier and simpler, and also smaller run-time overhead.
>
>
> As for optimization I fear you'll get a load of redundant zero-init
> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>
>
> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
> Initialization since it will be used in production build.
> We can do some run-time performance evaluation when we have an implementation ready.
>
>
> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
> that there's going to be other
> considerations - namely where to emit the .DEFERRED_INIT - when
> emitting it during gimplification
> you can emit it at the start of the block of block-scope variables.
> When emitting after gimplification
> you have to emit at function start which will probably make stack slot
> sharing inefficient because
> the deferred init will cause overlapping lifetimes.  With emitting at
> block boundary the .DEFERRED_INIT
> will act as code-motion barrier (and it itself likely cannot be moved)
> so for example invariant motion
> will no longer happen.  Likewise optimizations like SRA will be
> confused by .DEFERRED_INIT which
> again will lead to bigger stack usage (and less optimization).
>
>
> Yes, looks like  that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations.
>
>
> But sure, you can try implement a few variants but definitely
> .DEFERRED_INIT will be the most
> work.
>
>
> How about implement the following two approaches and compare the run-time cost:
>
> A.  Insert the real initialization during gimplification phase.
> B.  Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase.
>
> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC.
> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC.
>
> And then decide which approach we will go with?
>
> What’s your opinion on this?
>
>
> Well, in the end you have to try.  Note for the purpose of stack slot
> sharing you do want the
> instrumentation to happen during gimplification.
>
> Another possibility is to materialize .DEFERRED_INIT earlier than
> expand, for example shortly
> after IPA optimizations to avoid pessimizing loop transforms and allow
> SRA.  At the point you
> materialize the inits you could run the late uninit warning pass
> (which would then be earlier
> than regular but would still see the .DEFERRED_INIT).
>
>
> If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above,
> the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”.
> Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really
> want.
>
>
> While users may be happy to pay some performance stack usage is
> probably more critical
>
>
> So, which pass is for computing the stack usage?

There is no pass doing that, stack slot assignment and sharing (when
lifetimes do
not overlap) is done by RTL expansion.

> (just thinking of the kernel) so not regressing there should be as
> important as preserving
> uninit warnings (which I for practical purposes see not important at
> all - people can do
> "debug" builds without -fzero-init).
>
>
> Looks like that the major issue with the “.DERERRED_INIT” approach is:  the new inserted calls to internal const function
> might inhibit some important tree optimizations.
>
> So, I am thinking again the following another approach I raised in the very beginning:
>
> During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this
> “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are
> marked with “no_explicit_init”.
>
> This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit
> might be very difficult?
>
> Do you have any comments on this approach?

As said earlier you'll still get optimistic propagation bypassing the
still missing
implicit zero init.  Maybe that's OK - you don't get "garbage" but you'll get
some other defined value.

As said, you have to implement a few options and compare.

Richard.

> thanks.
>
> Qing
>
>
>
> Richard.
>
>
> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.
>
>
> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same.
>
> For example, for the following small testing case:
> void blah(int);
>
> int foo_2 (int n, int l, int m, int r)
> {
> int v;
>
> if ( (n > 10) && (m != 100)  && (r < 20) )
>   v = r;
>
> if (l > 100)
>   if ( (n <= 8) &&  (m < 102)  && (r < 19) )
>     blah(v); /* { dg-warning "uninitialized" "real warning" } */
>
> return 0;
> }
>
> GCC is able to report maybe uninitialized warning, but Clang cannot.
> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG.
>
> Really curious on how clang implement its uninitialized analysis?
>
>
>
> Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday.
> And noticed that CLANG has a data flow analysis phase based on CLANG's AST.
> http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html
>
> And clang’s uninitialized analysis is based on this data flow analysis.
>
> Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward.
>
> However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase,
> Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG.
>
> Qing
>
>
>
> Qing
>
>
>
>
> Richard.
>
> Thanks,
> Richard
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-09  8:23                     ` Richard Biener
@ 2020-12-09 15:04                       ` Qing Zhao
  2020-12-09 15:12                         ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-09 15:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches



> On Dec 9, 2020, at 2:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com <mailto:richard.guenther@gmail.com>> wrote:
>> 
>> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> 
>> 
>> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> 
>> 
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>> 
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>> 
>> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>> 
>> 
>> As for optimization I fear you'll get a load of redundant zero-init
>> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>> 
>> 
>> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
>> Initialization since it will be used in production build.
>> We can do some run-time performance evaluation when we have an implementation ready.
>> 
>> 
>> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
>> that there's going to be other
>> considerations - namely where to emit the .DEFERRED_INIT - when
>> emitting it during gimplification
>> you can emit it at the start of the block of block-scope variables.
>> When emitting after gimplification
>> you have to emit at function start which will probably make stack slot
>> sharing inefficient because
>> the deferred init will cause overlapping lifetimes.  With emitting at
>> block boundary the .DEFERRED_INIT
>> will act as code-motion barrier (and it itself likely cannot be moved)
>> so for example invariant motion
>> will no longer happen.  Likewise optimizations like SRA will be
>> confused by .DEFERRED_INIT which
>> again will lead to bigger stack usage (and less optimization).
>> 
>> 
>> Yes, looks like  that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations.
>> 
>> 
>> But sure, you can try implement a few variants but definitely
>> .DEFERRED_INIT will be the most
>> work.
>> 
>> 
>> How about implement the following two approaches and compare the run-time cost:
>> 
>> A.  Insert the real initialization during gimplification phase.
>> B.  Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase.
>> 
>> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC.
>> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC.
>> 
>> And then decide which approach we will go with?
>> 
>> What’s your opinion on this?
>> 
>> 
>> Well, in the end you have to try.  Note for the purpose of stack slot
>> sharing you do want the
>> instrumentation to happen during gimplification.
>> 
>> Another possibility is to materialize .DEFERRED_INIT earlier than
>> expand, for example shortly
>> after IPA optimizations to avoid pessimizing loop transforms and allow
>> SRA.  At the point you
>> materialize the inits you could run the late uninit warning pass
>> (which would then be earlier
>> than regular but would still see the .DEFERRED_INIT).
>> 
>> 
>> If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above,
>> the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”.
>> Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really
>> want.
>> 
>> 
>> While users may be happy to pay some performance stack usage is
>> probably more critical
>> 
>> 
>> So, which pass is for computing the stack usage?
> 
> There is no pass doing that, stack slot assignment and sharing (when
> lifetimes do
> not overlap) is done by RTL expansion.

Okay. I see.
> 
>> (just thinking of the kernel) so not regressing there should be as
>> important as preserving
>> uninit warnings (which I for practical purposes see not important at
>> all - people can do
>> "debug" builds without -fzero-init).
>> 
>> 
>> Looks like that the major issue with the “.DERERRED_INIT” approach is:  the new inserted calls to internal const function
>> might inhibit some important tree optimizations.
>> 
>> So, I am thinking again the following another approach I raised in the very beginning:
>> 
>> During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this
>> “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are
>> marked with “no_explicit_init”.
>> 
>> This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit
>> might be very difficult?
>> 
>> Do you have any comments on this approach?
> 
> As said earlier you'll still get optimistic propagation bypassing the
> still missing
> implicit zero init.  Maybe that's OK - you don't get "garbage" but you'll get
> some other defined value.
> 

There is another approach:

During gimplification phase, adding the real initialization to the uninitialized variables, but mark these initializations as “artificial_init”.
Then update the uninitialized analysis phase to handle these initializations marked with “artificial_init” specially as Non-initialization to
keep the uninitialized warnings.

Then we should be able to get the maximum optimization and also keep the uninitialized warning at the same time.

Microsoft compiler seems used this approach: (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/)

"
Does InitAll Break Static Analysis?

Static analysis is incredibly useful at letting developers know they forgot to initialize before use.

The InitAll feature indicates if a variable assignment was caused by InitAll to both PREfast and the compiler backend (both of which have uninitialized warnings). This allows the analysis tools to ignore InitAll variable assignments for the purposes of these warnings. With InitAll enabled, a developer will still receive static analysis warnings if they forget to initialize a variable even if InitAll forcibly initializes it for them.

“

Any comment on this?

> As said, you have to implement a few options and compare.

Yes, I will do that, just make sure which approaches we should implement and compare first.

Qing
> 
> Richard.
> 
>> thanks.
>> 
>> Qing
>> 
>> 
>> 
>> Richard.
>> 
>> 
>> Btw, I don't think theres any reason to cling onto clangs semantics
>> for a particular switch.  We'll never be able to emulate 1:1 behavior
>> and our -Wuninit behavior is probably wastly different already.
>> 
>> 
>> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same.
>> 
>> For example, for the following small testing case:
>> void blah(int);
>> 
>> int foo_2 (int n, int l, int m, int r)
>> {
>> int v;
>> 
>> if ( (n > 10) && (m != 100)  && (r < 20) )
>>  v = r;
>> 
>> if (l > 100)
>>  if ( (n <= 8) &&  (m < 102)  && (r < 19) )
>>    blah(v); /* { dg-warning "uninitialized" "real warning" } */
>> 
>> return 0;
>> }
>> 
>> GCC is able to report maybe uninitialized warning, but Clang cannot.
>> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG.
>> 
>> Really curious on how clang implement its uninitialized analysis?
>> 
>> 
>> 
>> Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday.
>> And noticed that CLANG has a data flow analysis phase based on CLANG's AST.
>> http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html
>> 
>> And clang’s uninitialized analysis is based on this data flow analysis.
>> 
>> Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward.
>> 
>> However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase,
>> Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG.
>> 
>> Qing
>> 
>> 
>> 
>> Qing
>> 
>> 
>> 
>> 
>> Richard.
>> 
>> Thanks,
>> Richard


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-09 15:04                       ` Qing Zhao
@ 2020-12-09 15:12                         ` Richard Biener
  2020-12-09 16:18                           ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2020-12-09 15:12 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

On Wed, Dec 9, 2020 at 4:04 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> On Dec 9, 2020, at 2:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
>
> On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
>
> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
>
> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> unsigned decl_is_initialized :1;
>
> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> #define DECL_IS_INITIALIZED(NODE) \
> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>
> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
> even though DECL_INITIAL might be NULLed.
>
>
> For locals it would be more reliable to set this flag during gimplification.
>
> Do you have any comment and suggestions?
>
>
> As said above - do you want to cover registers as well as locals?  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see expand_stack_vars)
>
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too late.
>
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
> X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
> X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
> variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
> if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
> of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).
>
>
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
>
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.
>
>
> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>
> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
> the “uninitialized” info from source code level to “pass_expand”.
>
>
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.
>
>
> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>
> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>
> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
> be much easier and simpler, and also smaller run-time overhead.
>
>
> As for optimization I fear you'll get a load of redundant zero-init
> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>
>
> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
> Initialization since it will be used in production build.
> We can do some run-time performance evaluation when we have an implementation ready.
>
>
> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
> that there's going to be other
> considerations - namely where to emit the .DEFERRED_INIT - when
> emitting it during gimplification
> you can emit it at the start of the block of block-scope variables.
> When emitting after gimplification
> you have to emit at function start which will probably make stack slot
> sharing inefficient because
> the deferred init will cause overlapping lifetimes.  With emitting at
> block boundary the .DEFERRED_INIT
> will act as code-motion barrier (and it itself likely cannot be moved)
> so for example invariant motion
> will no longer happen.  Likewise optimizations like SRA will be
> confused by .DEFERRED_INIT which
> again will lead to bigger stack usage (and less optimization).
>
>
> Yes, looks like  that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations.
>
>
> But sure, you can try implement a few variants but definitely
> .DEFERRED_INIT will be the most
> work.
>
>
> How about implement the following two approaches and compare the run-time cost:
>
> A.  Insert the real initialization during gimplification phase.
> B.  Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase.
>
> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC.
> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC.
>
> And then decide which approach we will go with?
>
> What’s your opinion on this?
>
>
> Well, in the end you have to try.  Note for the purpose of stack slot
> sharing you do want the
> instrumentation to happen during gimplification.
>
> Another possibility is to materialize .DEFERRED_INIT earlier than
> expand, for example shortly
> after IPA optimizations to avoid pessimizing loop transforms and allow
> SRA.  At the point you
> materialize the inits you could run the late uninit warning pass
> (which would then be earlier
> than regular but would still see the .DEFERRED_INIT).
>
>
> If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above,
> the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”.
> Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really
> want.
>
>
> While users may be happy to pay some performance stack usage is
> probably more critical
>
>
> So, which pass is for computing the stack usage?
>
>
> There is no pass doing that, stack slot assignment and sharing (when
> lifetimes do
> not overlap) is done by RTL expansion.
>
>
> Okay. I see.
>
>
> (just thinking of the kernel) so not regressing there should be as
> important as preserving
> uninit warnings (which I for practical purposes see not important at
> all - people can do
> "debug" builds without -fzero-init).
>
>
> Looks like that the major issue with the “.DERERRED_INIT” approach is:  the new inserted calls to internal const function
> might inhibit some important tree optimizations.
>
> So, I am thinking again the following another approach I raised in the very beginning:
>
> During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this
> “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are
> marked with “no_explicit_init”.
>
> This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit
> might be very difficult?
>
> Do you have any comments on this approach?
>
>
> As said earlier you'll still get optimistic propagation bypassing the
> still missing
> implicit zero init.  Maybe that's OK - you don't get "garbage" but you'll get
> some other defined value.
>
>
> There is another approach:
>
> During gimplification phase, adding the real initialization to the uninitialized variables, but mark these initializations as “artificial_init”.
> Then update the uninitialized analysis phase to handle these initializations marked with “artificial_init” specially as Non-initialization to
> keep the uninitialized warnings.
>
> Then we should be able to get the maximum optimization and also keep the uninitialized warning at the same time.
>
> Microsoft compiler seems used this approach: (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/)
>
> "
> Does InitAll Break Static Analysis?
>
> Static analysis is incredibly useful at letting developers know they forgot to initialize before use.
>
> The InitAll feature indicates if a variable assignment was caused by InitAll to both PREfast and the compiler backend (both of which have uninitialized warnings). This allows the analysis tools to ignore InitAll variable assignments for the purposes of these warnings. With InitAll enabled, a developer will still receive static analysis warnings if they forget to initialize a variable even if InitAll forcibly initializes it for them.
>
> “
>
> Any comment on this?

You have to try.  Bits to implement are adjusting the uninit pass and
maining the annotation
as well as making sure to not elide the real init because there's a
'fake' init (we have redundant
store elimination which works in this direction for example just to name one).

Richard.

> As said, you have to implement a few options and compare.
>
>
> Yes, I will do that, just make sure which approaches we should implement and compare first.
>
> Qing
>
>
> Richard.
>
> thanks.
>
> Qing
>
>
>
> Richard.
>
>
> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.
>
>
> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same.
>
> For example, for the following small testing case:
> void blah(int);
>
> int foo_2 (int n, int l, int m, int r)
> {
> int v;
>
> if ( (n > 10) && (m != 100)  && (r < 20) )
>  v = r;
>
> if (l > 100)
>  if ( (n <= 8) &&  (m < 102)  && (r < 19) )
>    blah(v); /* { dg-warning "uninitialized" "real warning" } */
>
> return 0;
> }
>
> GCC is able to report maybe uninitialized warning, but Clang cannot.
> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG.
>
> Really curious on how clang implement its uninitialized analysis?
>
>
>
> Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday.
> And noticed that CLANG has a data flow analysis phase based on CLANG's AST.
> http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html
>
> And clang’s uninitialized analysis is based on this data flow analysis.
>
> Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward.
>
> However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase,
> Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG.
>
> Qing
>
>
>
> Qing
>
>
>
>
> Richard.
>
> Thanks,
> Richard
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: How to traverse all the local variables that declared in the current routine?
  2020-12-09 15:12                         ` Richard Biener
@ 2020-12-09 16:18                           ` Qing Zhao
  2021-01-05 19:05                             ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2020-12-09 16:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches



> On Dec 9, 2020, at 9:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, Dec 9, 2020 at 4:04 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 9, 2020, at 2:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> 
>> On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> 
>> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> 
>> 
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>> 
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>> 
>> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>> 
>> 
>> As for optimization I fear you'll get a load of redundant zero-init
>> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>> 
>> 
>> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
>> Initialization since it will be used in production build.
>> We can do some run-time performance evaluation when we have an implementation ready.
>> 
>> 
>> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
>> that there's going to be other
>> considerations - namely where to emit the .DEFERRED_INIT - when
>> emitting it during gimplification
>> you can emit it at the start of the block of block-scope variables.
>> When emitting after gimplification
>> you have to emit at function start which will probably make stack slot
>> sharing inefficient because
>> the deferred init will cause overlapping lifetimes.  With emitting at
>> block boundary the .DEFERRED_INIT
>> will act as code-motion barrier (and it itself likely cannot be moved)
>> so for example invariant motion
>> will no longer happen.  Likewise optimizations like SRA will be
>> confused by .DEFERRED_INIT which
>> again will lead to bigger stack usage (and less optimization).
>> 
>> 
>> Yes, looks like  that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations.
>> 
>> 
>> But sure, you can try implement a few variants but definitely
>> .DEFERRED_INIT will be the most
>> work.
>> 
>> 
>> How about implement the following two approaches and compare the run-time cost:
>> 
>> A.  Insert the real initialization during gimplification phase.
>> B.  Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase.
>> 
>> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC.
>> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC.
>> 
>> And then decide which approach we will go with?
>> 
>> What’s your opinion on this?
>> 
>> 
>> Well, in the end you have to try.  Note for the purpose of stack slot
>> sharing you do want the
>> instrumentation to happen during gimplification.
>> 
>> Another possibility is to materialize .DEFERRED_INIT earlier than
>> expand, for example shortly
>> after IPA optimizations to avoid pessimizing loop transforms and allow
>> SRA.  At the point you
>> materialize the inits you could run the late uninit warning pass
>> (which would then be earlier
>> than regular but would still see the .DEFERRED_INIT).
>> 
>> 
>> If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above,
>> the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”.
>> Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really
>> want.
>> 
>> 
>> While users may be happy to pay some performance stack usage is
>> probably more critical
>> 
>> 
>> So, which pass is for computing the stack usage?
>> 
>> 
>> There is no pass doing that, stack slot assignment and sharing (when
>> lifetimes do
>> not overlap) is done by RTL expansion.
>> 
>> 
>> Okay. I see.
>> 
>> 
>> (just thinking of the kernel) so not regressing there should be as
>> important as preserving
>> uninit warnings (which I for practical purposes see not important at
>> all - people can do
>> "debug" builds without -fzero-init).
>> 
>> 
>> Looks like that the major issue with the “.DERERRED_INIT” approach is:  the new inserted calls to internal const function
>> might inhibit some important tree optimizations.
>> 
>> So, I am thinking again the following another approach I raised in the very beginning:
>> 
>> During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this
>> “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are
>> marked with “no_explicit_init”.
>> 
>> This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit
>> might be very difficult?
>> 
>> Do you have any comments on this approach?
>> 
>> 
>> As said earlier you'll still get optimistic propagation bypassing the
>> still missing
>> implicit zero init.  Maybe that's OK - you don't get "garbage" but you'll get
>> some other defined value.
>> 
>> 
>> There is another approach:
>> 
>> During gimplification phase, adding the real initialization to the uninitialized variables, but mark these initializations as “artificial_init”.
>> Then update the uninitialized analysis phase to handle these initializations marked with “artificial_init” specially as Non-initialization to
>> keep the uninitialized warnings.
>> 
>> Then we should be able to get the maximum optimization and also keep the uninitialized warning at the same time.
>> 
>> Microsoft compiler seems used this approach: (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/)
>> 
>> "
>> Does InitAll Break Static Analysis?
>> 
>> Static analysis is incredibly useful at letting developers know they forgot to initialize before use.
>> 
>> The InitAll feature indicates if a variable assignment was caused by InitAll to both PREfast and the compiler backend (both of which have uninitialized warnings). This allows the analysis tools to ignore InitAll variable assignments for the purposes of these warnings. With InitAll enabled, a developer will still receive static analysis warnings if they forget to initialize a variable even if InitAll forcibly initializes it for them.
>> 
>> “
>> 
>> Any comment on this?
> 
> You have to try.  Bits to implement are adjusting the uninit pass and
> maining the annotation
> as well as making sure to not elide the real init because there's a
> 'fake' init (we have redundant
> store elimination which works in this direction for example just to name one).

Okay, I see. 
> 
> Richard.
> 
>> As said, you have to implement a few options and compare.
>> 
>> 
>> Yes, I will do that, just make sure which approaches we should implement and compare first.

The following are the approaches I will implement and compare:

Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.

A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
B. Adding real initialization during gimplification, marking them with “artificial_init”. 
     Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
     Deleted from the fake init. 
C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
      maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
      add real initialization for all DECLs that are marked with “no_explicit_init”.
D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
     real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.


In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
comparison. 

I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
will be. If the data is good, maybe we can avoid the effort to implement B, and C. 

If the performance of D is not good, I will implement B or C at that time.

Let me know if you have any comment or suggestions.

Thanks.

Qing



^ permalink raw reply	[flat|nested] 56+ messages in thread

* The performance data for  two different implementation of new security feature -ftrivial-auto-var-init
  2020-12-09 16:18                           ` Qing Zhao
@ 2021-01-05 19:05                             ` Qing Zhao
  2021-01-05 19:10                               ` Qing Zhao
  2021-01-12 20:34                               ` Qing Zhao
  0 siblings, 2 replies; 56+ messages in thread
From: Qing Zhao @ 2021-01-05 19:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

Hi,

This is an update for our previous discussion. 

1. I implemented the following two different implementations in the latest upstream gcc:

A. Adding real initialization during gimplification, not maintain the uninitialized warnings.

D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
 real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.

Note, in this initial implementation,
	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 

	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
	   runtime performance study.
 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 

2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:

no: default. (-g -O2 -march=native )
A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 

And then compute the slowdown data for both A and D as following:

benchmarks		A / no	D /no
                        
500.perlbench_r	1.25%	1.25%
502.gcc_r		0.68%	1.80%
505.mcf_r		0.68%	0.14%
520.omnetpp_r	4.83%	4.68%
523.xalancbmk_r	0.18%	1.96%
525.x264_r		1.55%	2.07%
531.deepsjeng_	11.57%	11.85%
541.leela_r		0.64%	0.80%
557.xz_			 -0.41%	-0.41%
                        
507.cactuBSSN_r	0.44%	0.44%
508.namd_r		0.34%	0.34%
510.parest_r		0.17%	0.25%
511.povray_r		56.57%	57.27%
519.lbm_r		0.00%	0.00%
521.wrf_r			 -0.28%	-0.37%
526.blender_r		16.96%	17.71%
527.cam4_r		0.70%	0.53%
538.imagick_r		2.40%	2.40%
544.nab_r		0.00%	-0.65%

avg				5.17%	5.37%

From the above data, we can see that in general, the runtime performance slowdown for 
implementation A and D are similar for individual benchmarks.

There are several benchmarks that have significant slowdown with the new added initialization for both
A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
more on what kind of new initializations introduced such slowdown. 

From the current study so far, I think that approach D should be good enough for our final implementation. 
So, I will try to finish approach D with the following remaining work

      ** complete the implementation of -ftrivial-auto-var-init=pattern;
      ** complete the implementation of uninitialized warnings maintenance work for D. 


Let me know if you have any comments and suggestions on my current and future work.

Thanks a lot for your help.

Qing

> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> The following are the approaches I will implement and compare:
> 
> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
> 
> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
>     Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
>     Deleted from the fake init. 
> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
>      maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
>      add real initialization for all DECLs that are marked with “no_explicit_init”.
> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>     real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> 
> 
> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
> comparison. 
> 
> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
> 
> If the performance of D is not good, I will implement B or C at that time.
> 
> Let me know if you have any comment or suggestions.
> 
> Thanks.
> 
> Qing


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for  two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-05 19:05                             ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao
@ 2021-01-05 19:10                               ` Qing Zhao
  2021-01-12 20:34                               ` Qing Zhao
  1 sibling, 0 replies; 56+ messages in thread
From: Qing Zhao @ 2021-01-05 19:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

I am attaching my current (incomplete) patch to gcc for your reference.

From a71eb73bee5857440c4ff67c4c82be115e0675cb Mon Sep 17 00:00:00 2001
From: qing zhao <qinzhao@gcc.gnu.org>
Date: Sat, 12 Dec 2020 00:02:28 +0100
Subject: [PATCH] First version of -ftrivial-auto-var-init

---
 gcc/common.opt            | 35 ++++++++++++++++++
 gcc/flag-types.h          | 14 ++++++++
 gcc/gimple-pretty-print.c |  2 +-
 gcc/gimplify.c            | 90 +++++++++++++++++++++++++++++++++++++++++++++++
 gcc/internal-fn.c         | 20 +++++++++++
 gcc/internal-fn.def       |  5 +++
 gcc/tree-cfg.c            |  3 ++
 gcc/tree-ssa-uninit.c     |  3 ++
 gcc/tree-ssa.c            |  5 +++
 9 files changed, 176 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 6645539f5e5..c4c4fc28ef7 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3053,6 +3053,41 @@ ftree-scev-cprop
 Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
 Enable copy propagation of scalar-evolution information.
 
+ftrivial-auto-var-init=
+Common Joined RejectNegative Enum(auto_init_type) Var(flag_trivial_auto_var_init) Init(AUTO_INIT_UNINITIALIZED)
+-ftrivial-auto-var-init=[uninitialized|pattern|zero]	Add initializations to automatic variables.	
+
+Enum
+Name(auto_init_type) Type(enum auto_init_type) UnknownError(unrecognized automatic variable initialization type %qs)
+
+EnumValue
+Enum(auto_init_type) String(uninitialized) Value(AUTO_INIT_UNINITIALIZED)
+
+EnumValue
+Enum(auto_init_type) String(pattern) Value(AUTO_INIT_PATTERN)
+
+EnumValue
+Enum(auto_init_type) String(zero) Value(AUTO_INIT_ZERO)
+
+fauto-var-init-approach=
+Common Joined RejectNegative Enum(auto_init_approach) Var(flag_auto_init_approach) Init(AUTO_INIT_A))
+-fauto-var-init-approach=[A|B|C|D]	Choose the approach to initialize automatic variables.	
+
+Enum
+Name(auto_init_approach) Type(enum auto_init_approach) UnknownError(unrecognized automatic variable initialization approach %qs)
+
+EnumValue
+Enum(auto_init_approach) String(A) Value(AUTO_INIT_A)
+
+EnumValue
+Enum(auto_init_approach) String(B) Value(AUTO_INIT_B)
+
+EnumValue
+Enum(auto_init_approach) String(C) Value(AUTO_INIT_C)
+
+EnumValue
+Enum(auto_init_approach) String(D) Value(AUTO_INIT_D)
+
 ; -fverbose-asm causes extra commentary information to be produced in
 ; the generated assembly code (to make it more readable).  This option
 ; is generally only of use to those who actually need to read the
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 9342bd87be3..bfd0692b82c 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -242,6 +242,20 @@ enum vect_cost_model {
   VECT_COST_MODEL_DEFAULT = 1
 };
 
+/* Automatic variable initialization type.  */
+enum auto_init_type {
+  AUTO_INIT_UNINITIALIZED = 0,
+  AUTO_INIT_PATTERN = 1,
+  AUTO_INIT_ZERO = 2
+};
+
+enum auto_init_approach {
+  AUTO_INIT_A = 0,
+  AUTO_INIT_B = 1,
+  AUTO_INIT_C = 2,
+  AUTO_INIT_D = 3
+};
+
 /* Different instrumentation modes.  */
 enum sanitize_code {
   /* AddressSanitizer.  */
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 075d6e5208a..1044d54e8d3 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -81,7 +81,7 @@ newline_and_indent (pretty_printer *buffer, int spc)
 DEBUG_FUNCTION void
 debug_gimple_stmt (gimple *gs)
 {
-  print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS);
+  print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS|TDF_LINENO|TDF_ALIAS);
 }
 
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 54cb66bd1dd..1eb0747ea2f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1674,6 +1674,16 @@ gimplify_return_expr (tree stmt, gimple_seq *pre_p)
   return GS_ALL_DONE;
 }
 
+/* Return the value that is used to initialize the vla DECL based 
+   on INIT_TYPE.  */
+tree memset_init_node (enum auto_init_type init_type)
+{
+  if (init_type == AUTO_INIT_ZERO)
+    return integer_zero_node;
+  else
+    gcc_assert (0);
+}
+
 /* Gimplify a variable-length array DECL.  */
 
 static void
@@ -1712,6 +1722,19 @@ gimplify_vla_decl (tree decl, gimple_seq *seq_p)
 
   gimplify_and_add (t, seq_p);
 
+  /* Add a call to memset to initialize this vla when the user requested.  */
+  if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
+      && !DECL_ARTIFICIAL (decl)
+      && VAR_P (decl) 
+      && !DECL_EXTERNAL (decl) 
+      && !TREE_STATIC (decl))
+  {
+    t = builtin_decl_implicit (BUILT_IN_MEMSET);
+    tree init_node = memset_init_node (flag_trivial_auto_var_init);
+    t = build_call_expr (t, 3, addr, init_node, DECL_SIZE_UNIT (decl)); 
+    gimplify_and_add (t, seq_p);
+  }
+
   /* Record the dynamic allocation associated with DECL if requested.  */
   if (flag_callgraph_info & CALLGRAPH_INFO_DYNAMIC_ALLOC)
     record_dynamic_alloc (decl);
@@ -1734,6 +1757,63 @@ force_labels_r (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
   return NULL_TREE;
 }
 
+
+/* Build a call to internal const function DEFERRED_INIT,
+   1st argument: DECL;
+   2nd argument: INIT_TYPE;
+
+   as DEFERRED_INIT (DECL, INIT_TYPE)
+
+   DEFERRED_INIT is defined as:
+   DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL).  */
+
+static gimple * 
+build_deferred_init (tree decl,
+		     enum auto_init_type init_type)
+{
+   tree init_type_node =
+     build_int_cst (integer_type_node, (int) init_type);
+   return gimple_build_call_internal (IFN_DEFERRED_INIT, 2, decl, init_type_node);
+}
+
+
+/* Generate initialization to automatic variable DECL based on INIT_TYPE.  */
+static void
+gimple_add_init_for_auto_var (tree decl,
+			      enum auto_init_type init_type,
+			      enum auto_init_approach init_approach,
+			      gimple_seq *seq_p)
+{
+  gcc_assert (VAR_P (decl) && !DECL_EXTERNAL (decl) && !TREE_STATIC (decl));
+  switch (init_type)
+  {
+  case AUTO_INIT_UNINITIALIZED:
+  case AUTO_INIT_PATTERN:
+    gcc_assert (0);
+    break;
+  case AUTO_INIT_ZERO:
+    if (init_approach == AUTO_INIT_A)
+    {
+      tree init = build_zero_cst (TREE_TYPE (decl));
+      init = build2 (INIT_EXPR, void_type_node, decl, init);
+      gimplify_and_add (init, seq_p);
+      ggc_free (init);
+    }
+    else if (init_approach == AUTO_INIT_D)
+    {
+      gimple *call = build_deferred_init (decl, AUTO_INIT_ZERO);
+      gimple_call_set_lhs (call, decl);
+      gimplify_seq_add_stmt (seq_p, call);
+    }
+    else 
+      gcc_assert (0);
+    break;
+  default:
+    gcc_unreachable ();
+  }
+}
+
+
 /* Gimplify a DECL_EXPR node *STMT_P by making any necessary allocation
    and initialization explicit.  */
 
@@ -1821,6 +1901,16 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 	       as they may contain a label address.  */
 	    walk_tree (&init, force_labels_r, NULL, NULL);
 	}
+      /* When there is no explicit initializer, if the user requested,
+	 We should insert an artifical initializer for this automatic
+	 variable for non vla variables.  */
+      else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
+	       && !TREE_STATIC (decl)
+	       && !is_vla)
+	gimple_add_init_for_auto_var (decl, 
+				      flag_trivial_auto_var_init, 
+				      flag_auto_init_approach,
+				      seq_p);
     }
 
   return GS_ALL_DONE;
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 41223ff7d82..6eef6ddb259 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2971,6 +2971,26 @@ expand_UNIQUE (internal_fn, gcall *stmt)
     emit_insn (pattern);
 }
 
+/* Expand the IFN_DEFERRED_INIT function according to its second argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  enum auto_init_type init_type
+    = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+    {
+    default:
+      gcc_unreachable ();
+    case AUTO_INIT_PATTERN:
+      gcc_assert (0);
+    case AUTO_INIT_ZERO:
+      tree init = build_zero_cst (TREE_TYPE (var));
+      expand_assignment (var, init, false);
+    }
+}
+
 /* The size of an OpenACC compute dimension.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 91a7bfea3ee..fd077d8b55c 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -347,6 +347,11 @@ DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (PHI, 0, NULL)
 
+/* A function to represent an artifical initialization to an uninitialized
+   automatic variable. The first argument is the variable itself, the
+   second argument is the initialization type.  */
+DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+
 /* DIM_SIZE and DIM_POS return the size of a particular compute
    dimension and the executing thread's position within that
    dimension.  DIM_POS is pure (and not const) so that it isn't
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index f59a0c05200..3717c6d26a5 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3433,6 +3433,9 @@ verify_gimple_call (gcall *stmt)
 	}
     }
 
+  if (gimple_call_internal_p (stmt, IFN_DEFERRED_INIT))
+    return false;
+
   /* ???  The C frontend passes unpromoted arguments in case it
      didn't see a function declaration before the call.  So for now
      leave the call arguments mostly unverified.  Once we gimplify
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 516a7bd2c99..6c0946b0bc5 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -611,6 +611,9 @@ warn_uninitialized_vars (bool wmaybe_uninit)
 	  ssa_op_iter op_iter;
 	  tree use;
 
+	  if (gimple_call_internal_p (stmt, IFN_DEFERRED_INIT))
+	    continue;
+
 	  if (is_gimple_debug (stmt))
 	    continue;
 
diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index a575979aa13..319e4150dc4 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -1325,6 +1325,11 @@ ssa_undefined_value_p (tree t, bool partial)
   if (gimple_nop_p (def_stmt))
     return true;
 
+  /* The value is undefined iff the definition statement is a call
+     to .DEFERRED_INIT function.  */
+  if (gimple_call_internal_p (def_stmt, IFN_DEFERRED_INIT))
+    return true;
+
   /* Check if the complex was not only partially defined.  */
   if (partial && is_gimple_assign (def_stmt)
       && gimple_assign_rhs_code (def_stmt) == COMPLEX_EXPR)
-- 
2.11.0

> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Hi,
> 
> This is an update for our previous discussion. 
> 
> 1. I implemented the following two different implementations in the latest upstream gcc:
> 
> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> 
> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> 
> Note, in this initial implementation,
> 	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
> 	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 
> 
> 	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
> 	   runtime performance study.
> 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 
> 
> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:
> 
> no: default. (-g -O2 -march=native )
> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> 
> And then compute the slowdown data for both A and D as following:
> 
> benchmarks		A / no	D /no
> 
> 500.perlbench_r	1.25%	1.25%
> 502.gcc_r		0.68%	1.80%
> 505.mcf_r		0.68%	0.14%
> 520.omnetpp_r	4.83%	4.68%
> 523.xalancbmk_r	0.18%	1.96%
> 525.x264_r		1.55%	2.07%
> 531.deepsjeng_	11.57%	11.85%
> 541.leela_r		0.64%	0.80%
> 557.xz_			 -0.41%	-0.41%
> 
> 507.cactuBSSN_r	0.44%	0.44%
> 508.namd_r		0.34%	0.34%
> 510.parest_r		0.17%	0.25%
> 511.povray_r		56.57%	57.27%
> 519.lbm_r		0.00%	0.00%
> 521.wrf_r			 -0.28%	-0.37%
> 526.blender_r		16.96%	17.71%
> 527.cam4_r		0.70%	0.53%
> 538.imagick_r		2.40%	2.40%
> 544.nab_r		0.00%	-0.65%
> 
> avg				5.17%	5.37%
> 
> From the above data, we can see that in general, the runtime performance slowdown for 
> implementation A and D are similar for individual benchmarks.
> 
> There are several benchmarks that have significant slowdown with the new added initialization for both
> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
> more on what kind of new initializations introduced such slowdown. 
> 
> From the current study so far, I think that approach D should be good enough for our final implementation. 
> So, I will try to finish approach D with the following remaining work
> 
>      ** complete the implementation of -ftrivial-auto-var-init=pattern;
>      ** complete the implementation of uninitialized warnings maintenance work for D. 
> 
> 
> Let me know if you have any comments and suggestions on my current and future work.
> 
> Thanks a lot for your help.
> 
> Qing
> 
>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> The following are the approaches I will implement and compare:
>> 
>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
>> 
>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
>> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
>>    Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
>>    Deleted from the fake init. 
>> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
>>     maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
>>     add real initialization for all DECLs that are marked with “no_explicit_init”.
>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>    real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>> 
>> 
>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
>> comparison. 
>> 
>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
>> 
>> If the performance of D is not good, I will implement B or C at that time.
>> 
>> Let me know if you have any comment or suggestions.
>> 
>> Thanks.
>> 
>> Qing
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for  two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-05 19:05                             ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao
  2021-01-05 19:10                               ` Qing Zhao
@ 2021-01-12 20:34                               ` Qing Zhao
  2021-01-13  7:39                                 ` Richard Biener
  1 sibling, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-01-12 20:34 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford; +Cc: Richard Biener via Gcc-patches

Hi, 

Just check in to see whether you have any comments and suggestions on this:

FYI, I have been continue with Approach D implementation since last week:

D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.

For the remaining work of Approach D:

 ** complete the implementation of -ftrivial-auto-var-init=pattern;
 ** complete the implementation of uninitialized warnings maintenance work for D. 

I have completed the uninitialized warnings maintenance work for D.
And finished partial of the -ftrivial-auto-var-init=pattern implementation. 

The following are remaining work of Approach D:

   ** -ftrivial-auto-var-init=pattern for VLA;
   **add a new attribute for variable:
__attribute((uninitialized)
the marked variable is uninitialized intentionaly for performance purpose.
   ** adding complete testing cases;
  

Please let me know if you have any objection on my current decision on implementing approach D. 

Thanks a lot for your help.

Qing


> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Hi,
> 
> This is an update for our previous discussion. 
> 
> 1. I implemented the following two different implementations in the latest upstream gcc:
> 
> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> 
> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> 
> Note, in this initial implementation,
> 	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
> 	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 
> 
> 	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
> 	   runtime performance study.
> 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 
> 
> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:
> 
> no: default. (-g -O2 -march=native )
> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> 
> And then compute the slowdown data for both A and D as following:
> 
> benchmarks		A / no	D /no
> 
> 500.perlbench_r	1.25%	1.25%
> 502.gcc_r		0.68%	1.80%
> 505.mcf_r		0.68%	0.14%
> 520.omnetpp_r	4.83%	4.68%
> 523.xalancbmk_r	0.18%	1.96%
> 525.x264_r		1.55%	2.07%
> 531.deepsjeng_	11.57%	11.85%
> 541.leela_r		0.64%	0.80%
> 557.xz_			 -0.41%	-0.41%
> 
> 507.cactuBSSN_r	0.44%	0.44%
> 508.namd_r		0.34%	0.34%
> 510.parest_r		0.17%	0.25%
> 511.povray_r		56.57%	57.27%
> 519.lbm_r		0.00%	0.00%
> 521.wrf_r			 -0.28%	-0.37%
> 526.blender_r		16.96%	17.71%
> 527.cam4_r		0.70%	0.53%
> 538.imagick_r		2.40%	2.40%
> 544.nab_r		0.00%	-0.65%
> 
> avg				5.17%	5.37%
> 
> From the above data, we can see that in general, the runtime performance slowdown for 
> implementation A and D are similar for individual benchmarks.
> 
> There are several benchmarks that have significant slowdown with the new added initialization for both
> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
> more on what kind of new initializations introduced such slowdown. 
> 
> From the current study so far, I think that approach D should be good enough for our final implementation. 
> So, I will try to finish approach D with the following remaining work
> 
>      ** complete the implementation of -ftrivial-auto-var-init=pattern;
>      ** complete the implementation of uninitialized warnings maintenance work for D. 
> 
> 
> Let me know if you have any comments and suggestions on my current and future work.
> 
> Thanks a lot for your help.
> 
> Qing
> 
>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> The following are the approaches I will implement and compare:
>> 
>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
>> 
>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
>> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
>>    Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
>>    Deleted from the fake init. 
>> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
>>     maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
>>     add real initialization for all DECLs that are marked with “no_explicit_init”.
>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>    real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>> 
>> 
>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
>> comparison. 
>> 
>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
>> 
>> If the performance of D is not good, I will implement B or C at that time.
>> 
>> Let me know if you have any comment or suggestions.
>> 
>> Thanks.
>> 
>> Qing
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-12 20:34                               ` Qing Zhao
@ 2021-01-13  7:39                                 ` Richard Biener
  2021-01-13 15:06                                   ` Qing Zhao
  2021-01-14 21:16                                   ` Qing Zhao
  0 siblings, 2 replies; 56+ messages in thread
From: Richard Biener @ 2021-01-13  7:39 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

On Tue, 12 Jan 2021, Qing Zhao wrote:

> Hi, 
> 
> Just check in to see whether you have any comments and suggestions on this:
> 
> FYI, I have been continue with Approach D implementation since last week:
> 
> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> 
> For the remaining work of Approach D:
> 
>  ** complete the implementation of -ftrivial-auto-var-init=pattern;
>  ** complete the implementation of uninitialized warnings maintenance work for D. 
> 
> I have completed the uninitialized warnings maintenance work for D.
> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
> 
> The following are remaining work of Approach D:
> 
>    ** -ftrivial-auto-var-init=pattern for VLA;
>    **add a new attribute for variable:
> __attribute((uninitialized)
> the marked variable is uninitialized intentionaly for performance purpose.
>    ** adding complete testing cases;
>   
> 
> Please let me know if you have any objection on my current decision on implementing approach D. 

Did you do any analysis on how stack usage and code size are changed 
with approach D?  How does compile-time behave (we could gobble up
lots of .DEFERRED_INIT calls I guess)?

Richard.

> Thanks a lot for your help.
> 
> Qing
> 
> 
> > On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > 
> > Hi,
> > 
> > This is an update for our previous discussion. 
> > 
> > 1. I implemented the following two different implementations in the latest upstream gcc:
> > 
> > A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> > 
> > D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> > real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> > 
> > Note, in this initial implementation,
> > 	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
> > 	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 
> > 
> > 	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
> > 	   runtime performance study.
> > 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 
> > 
> > 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:
> > 
> > no: default. (-g -O2 -march=native )
> > A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> > D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> > 
> > And then compute the slowdown data for both A and D as following:
> > 
> > benchmarks		A / no	D /no
> > 
> > 500.perlbench_r	1.25%	1.25%
> > 502.gcc_r		0.68%	1.80%
> > 505.mcf_r		0.68%	0.14%
> > 520.omnetpp_r	4.83%	4.68%
> > 523.xalancbmk_r	0.18%	1.96%
> > 525.x264_r		1.55%	2.07%
> > 531.deepsjeng_	11.57%	11.85%
> > 541.leela_r		0.64%	0.80%
> > 557.xz_			 -0.41%	-0.41%
> > 
> > 507.cactuBSSN_r	0.44%	0.44%
> > 508.namd_r		0.34%	0.34%
> > 510.parest_r		0.17%	0.25%
> > 511.povray_r		56.57%	57.27%
> > 519.lbm_r		0.00%	0.00%
> > 521.wrf_r			 -0.28%	-0.37%
> > 526.blender_r		16.96%	17.71%
> > 527.cam4_r		0.70%	0.53%
> > 538.imagick_r		2.40%	2.40%
> > 544.nab_r		0.00%	-0.65%
> > 
> > avg				5.17%	5.37%
> > 
> > From the above data, we can see that in general, the runtime performance slowdown for 
> > implementation A and D are similar for individual benchmarks.
> > 
> > There are several benchmarks that have significant slowdown with the new added initialization for both
> > A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
> > more on what kind of new initializations introduced such slowdown. 
> > 
> > From the current study so far, I think that approach D should be good enough for our final implementation. 
> > So, I will try to finish approach D with the following remaining work
> > 
> >      ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >      ** complete the implementation of uninitialized warnings maintenance work for D. 
> > 
> > 
> > Let me know if you have any comments and suggestions on my current and future work.
> > 
> > Thanks a lot for your help.
> > 
> > Qing
> > 
> >> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >> 
> >> The following are the approaches I will implement and compare:
> >> 
> >> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
> >> 
> >> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> >> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
> >>    Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
> >>    Deleted from the fake init. 
> >> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
> >>     maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
> >>     add real initialization for all DECLs that are marked with “no_explicit_init”.
> >> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> >>    real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> >> 
> >> 
> >> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
> >> comparison. 
> >> 
> >> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
> >> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
> >> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
> >> 
> >> If the performance of D is not good, I will implement B or C at that time.
> >> 
> >> Let me know if you have any comment or suggestions.
> >> 
> >> Thanks.
> >> 
> >> Qing
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-13  7:39                                 ` Richard Biener
@ 2021-01-13 15:06                                   ` Qing Zhao
  2021-01-13 15:10                                     ` Richard Biener
  2021-01-14 21:16                                   ` Qing Zhao
  1 sibling, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-01-13 15:06 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches



> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Tue, 12 Jan 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> Just check in to see whether you have any comments and suggestions on this:
>> 
>> FYI, I have been continue with Approach D implementation since last week:
>> 
>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>> 
>> For the remaining work of Approach D:
>> 
>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>> ** complete the implementation of uninitialized warnings maintenance work for D. 
>> 
>> I have completed the uninitialized warnings maintenance work for D.
>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
>> 
>> The following are remaining work of Approach D:
>> 
>>   ** -ftrivial-auto-var-init=pattern for VLA;
>>   **add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>>   ** adding complete testing cases;
>> 
>> 
>> Please let me know if you have any objection on my current decision on implementing approach D. 
> 
> Did you do any analysis on how stack usage and code size are changed 
> with approach D?

I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually).

But not the stack usage.  Not sure how to collect the stack usage data, do you have any suggestion on this?


> How does compile-time behave (we could gobble up
> lots of .DEFERRED_INIT calls I guess)?
I can collect this data too and report it later.

Thanks.

Qing
> 
> Richard.
> 
>> Thanks a lot for your help.
>> 
>> Qing
>> 
>> 
>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> Hi,
>>> 
>>> This is an update for our previous discussion. 
>>> 
>>> 1. I implemented the following two different implementations in the latest upstream gcc:
>>> 
>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
>>> 
>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>>> 
>>> Note, in this initial implementation,
>>> 	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
>>> 	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 
>>> 
>>> 	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
>>> 	   runtime performance study.
>>> 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 
>>> 
>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:
>>> 
>>> no: default. (-g -O2 -march=native )
>>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
>>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
>>> 
>>> And then compute the slowdown data for both A and D as following:
>>> 
>>> benchmarks		A / no	D /no
>>> 
>>> 500.perlbench_r	1.25%	1.25%
>>> 502.gcc_r		0.68%	1.80%
>>> 505.mcf_r		0.68%	0.14%
>>> 520.omnetpp_r	4.83%	4.68%
>>> 523.xalancbmk_r	0.18%	1.96%
>>> 525.x264_r		1.55%	2.07%
>>> 531.deepsjeng_	11.57%	11.85%
>>> 541.leela_r		0.64%	0.80%
>>> 557.xz_			 -0.41%	-0.41%
>>> 
>>> 507.cactuBSSN_r	0.44%	0.44%
>>> 508.namd_r		0.34%	0.34%
>>> 510.parest_r		0.17%	0.25%
>>> 511.povray_r		56.57%	57.27%
>>> 519.lbm_r		0.00%	0.00%
>>> 521.wrf_r			 -0.28%	-0.37%
>>> 526.blender_r		16.96%	17.71%
>>> 527.cam4_r		0.70%	0.53%
>>> 538.imagick_r		2.40%	2.40%
>>> 544.nab_r		0.00%	-0.65%
>>> 
>>> avg				5.17%	5.37%
>>> 
>>> From the above data, we can see that in general, the runtime performance slowdown for 
>>> implementation A and D are similar for individual benchmarks.
>>> 
>>> There are several benchmarks that have significant slowdown with the new added initialization for both
>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
>>> more on what kind of new initializations introduced such slowdown. 
>>> 
>>> From the current study so far, I think that approach D should be good enough for our final implementation. 
>>> So, I will try to finish approach D with the following remaining work
>>> 
>>>     ** complete the implementation of -ftrivial-auto-var-init=pattern;
>>>     ** complete the implementation of uninitialized warnings maintenance work for D. 
>>> 
>>> 
>>> Let me know if you have any comments and suggestions on my current and future work.
>>> 
>>> Thanks a lot for your help.
>>> 
>>> Qing
>>> 
>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>> 
>>>> The following are the approaches I will implement and compare:
>>>> 
>>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
>>>> 
>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
>>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
>>>>   Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
>>>>   Deleted from the fake init. 
>>>> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
>>>>    maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
>>>>    add real initialization for all DECLs that are marked with “no_explicit_init”.
>>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>>>   real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>>>> 
>>>> 
>>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
>>>> comparison. 
>>>> 
>>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
>>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
>>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
>>>> 
>>>> If the performance of D is not good, I will implement B or C at that time.
>>>> 
>>>> Let me know if you have any comment or suggestions.
>>>> 
>>>> Thanks.
>>>> 
>>>> Qing
>>> 
>> 
>> 
> 
> -- 
> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-13 15:06                                   ` Qing Zhao
@ 2021-01-13 15:10                                     ` Richard Biener
  2021-01-13 15:35                                       ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2021-01-13 15:10 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

On Wed, 13 Jan 2021, Qing Zhao wrote:

> 
> 
> > On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote:
> > 
> > On Tue, 12 Jan 2021, Qing Zhao wrote:
> > 
> >> Hi, 
> >> 
> >> Just check in to see whether you have any comments and suggestions on this:
> >> 
> >> FYI, I have been continue with Approach D implementation since last week:
> >> 
> >> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> >> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> >> 
> >> For the remaining work of Approach D:
> >> 
> >> ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >> ** complete the implementation of uninitialized warnings maintenance work for D. 
> >> 
> >> I have completed the uninitialized warnings maintenance work for D.
> >> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
> >> 
> >> The following are remaining work of Approach D:
> >> 
> >>   ** -ftrivial-auto-var-init=pattern for VLA;
> >>   **add a new attribute for variable:
> >> __attribute((uninitialized)
> >> the marked variable is uninitialized intentionaly for performance purpose.
> >>   ** adding complete testing cases;
> >> 
> >> 
> >> Please let me know if you have any objection on my current decision on implementing approach D. 
> > 
> > Did you do any analysis on how stack usage and code size are changed 
> > with approach D?
> 
> I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually).
> 
> But not the stack usage.  Not sure how to collect the stack usage data, 
> do you have any suggestion on this?

There is -fstack-usage you could use, then of course watching
the stack segment at runtime.  I'm mostly concerned about
stack-limited "processes" such as the linux kernel which I think
is a primary target of your work.

Richard.

> 
> > How does compile-time behave (we could gobble up
> > lots of .DEFERRED_INIT calls I guess)?
> I can collect this data too and report it later.
> 
> Thanks.
> 
> Qing
> > 
> > Richard.
> > 
> >> Thanks a lot for your help.
> >> 
> >> Qing
> >> 
> >> 
> >>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> This is an update for our previous discussion. 
> >>> 
> >>> 1. I implemented the following two different implementations in the latest upstream gcc:
> >>> 
> >>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> >>> 
> >>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> >>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> >>> 
> >>> Note, in this initial implementation,
> >>> 	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
> >>> 	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 
> >>> 
> >>> 	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
> >>> 	   runtime performance study.
> >>> 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 
> >>> 
> >>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:
> >>> 
> >>> no: default. (-g -O2 -march=native )
> >>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> >>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> >>> 
> >>> And then compute the slowdown data for both A and D as following:
> >>> 
> >>> benchmarks		A / no	D /no
> >>> 
> >>> 500.perlbench_r	1.25%	1.25%
> >>> 502.gcc_r		0.68%	1.80%
> >>> 505.mcf_r		0.68%	0.14%
> >>> 520.omnetpp_r	4.83%	4.68%
> >>> 523.xalancbmk_r	0.18%	1.96%
> >>> 525.x264_r		1.55%	2.07%
> >>> 531.deepsjeng_	11.57%	11.85%
> >>> 541.leela_r		0.64%	0.80%
> >>> 557.xz_			 -0.41%	-0.41%
> >>> 
> >>> 507.cactuBSSN_r	0.44%	0.44%
> >>> 508.namd_r		0.34%	0.34%
> >>> 510.parest_r		0.17%	0.25%
> >>> 511.povray_r		56.57%	57.27%
> >>> 519.lbm_r		0.00%	0.00%
> >>> 521.wrf_r			 -0.28%	-0.37%
> >>> 526.blender_r		16.96%	17.71%
> >>> 527.cam4_r		0.70%	0.53%
> >>> 538.imagick_r		2.40%	2.40%
> >>> 544.nab_r		0.00%	-0.65%
> >>> 
> >>> avg				5.17%	5.37%
> >>> 
> >>> From the above data, we can see that in general, the runtime performance slowdown for 
> >>> implementation A and D are similar for individual benchmarks.
> >>> 
> >>> There are several benchmarks that have significant slowdown with the new added initialization for both
> >>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
> >>> more on what kind of new initializations introduced such slowdown. 
> >>> 
> >>> From the current study so far, I think that approach D should be good enough for our final implementation. 
> >>> So, I will try to finish approach D with the following remaining work
> >>> 
> >>>     ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >>>     ** complete the implementation of uninitialized warnings maintenance work for D. 
> >>> 
> >>> 
> >>> Let me know if you have any comments and suggestions on my current and future work.
> >>> 
> >>> Thanks a lot for your help.
> >>> 
> >>> Qing
> >>> 
> >>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>>> 
> >>>> The following are the approaches I will implement and compare:
> >>>> 
> >>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
> >>>> 
> >>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> >>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
> >>>>   Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
> >>>>   Deleted from the fake init. 
> >>>> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
> >>>>    maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
> >>>>    add real initialization for all DECLs that are marked with “no_explicit_init”.
> >>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> >>>>   real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> >>>> 
> >>>> 
> >>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
> >>>> comparison. 
> >>>> 
> >>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
> >>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
> >>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
> >>>> 
> >>>> If the performance of D is not good, I will implement B or C at that time.
> >>>> 
> >>>> Let me know if you have any comment or suggestions.
> >>>> 
> >>>> Thanks.
> >>>> 
> >>>> Qing
> >>> 
> >> 
> >> 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-13 15:10                                     ` Richard Biener
@ 2021-01-13 15:35                                       ` Qing Zhao
  2021-01-13 15:40                                         ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-01-13 15:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches



> On Jan 13, 2021, at 9:10 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Wed, 13 Jan 2021, Qing Zhao wrote:
> 
>> 
>> 
>>> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote:
>>> 
>>> On Tue, 12 Jan 2021, Qing Zhao wrote:
>>> 
>>>> Hi, 
>>>> 
>>>> Just check in to see whether you have any comments and suggestions on this:
>>>> 
>>>> FYI, I have been continue with Approach D implementation since last week:
>>>> 
>>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>>>> 
>>>> For the remaining work of Approach D:
>>>> 
>>>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>>>> ** complete the implementation of uninitialized warnings maintenance work for D. 
>>>> 
>>>> I have completed the uninitialized warnings maintenance work for D.
>>>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
>>>> 
>>>> The following are remaining work of Approach D:
>>>> 
>>>>  ** -ftrivial-auto-var-init=pattern for VLA;
>>>>  **add a new attribute for variable:
>>>> __attribute((uninitialized)
>>>> the marked variable is uninitialized intentionaly for performance purpose.
>>>>  ** adding complete testing cases;
>>>> 
>>>> 
>>>> Please let me know if you have any objection on my current decision on implementing approach D. 
>>> 
>>> Did you do any analysis on how stack usage and code size are changed 
>>> with approach D?
>> 
>> I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually).
>> 
>> But not the stack usage.  Not sure how to collect the stack usage data, 
>> do you have any suggestion on this?
> 
> There is -fstack-usage you could use, then of course watching
> the stack segment at runtime.

I can do this for CPU2017 to collect the stack usage data and report back.

>  I'm mostly concerned about
> stack-limited "processes" such as the linux kernel which I think
> is a primary target of your work.

I don’t have any experience on building linux kernel. 
Do we have to collect data for linux kernel at this time? Is CPU2017 data not enough?

Qing
> 
> Richard.
> 
>> 
>>> How does compile-time behave (we could gobble up
>>> lots of .DEFERRED_INIT calls I guess)?
>> I can collect this data too and report it later.
>> 
>> Thanks.
>> 
>> Qing
>>> 
>>> Richard.
>>> 
>>>> Thanks a lot for your help.
>>>> 
>>>> Qing
>>>> 
>>>> 
>>>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> This is an update for our previous discussion. 
>>>>> 
>>>>> 1. I implemented the following two different implementations in the latest upstream gcc:
>>>>> 
>>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
>>>>> 
>>>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>>>>> 
>>>>> Note, in this initial implementation,
>>>>> 	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
>>>>> 	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 
>>>>> 
>>>>> 	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
>>>>> 	   runtime performance study.
>>>>> 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 
>>>>> 
>>>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:
>>>>> 
>>>>> no: default. (-g -O2 -march=native )
>>>>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
>>>>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
>>>>> 
>>>>> And then compute the slowdown data for both A and D as following:
>>>>> 
>>>>> benchmarks		A / no	D /no
>>>>> 
>>>>> 500.perlbench_r	1.25%	1.25%
>>>>> 502.gcc_r		0.68%	1.80%
>>>>> 505.mcf_r		0.68%	0.14%
>>>>> 520.omnetpp_r	4.83%	4.68%
>>>>> 523.xalancbmk_r	0.18%	1.96%
>>>>> 525.x264_r		1.55%	2.07%
>>>>> 531.deepsjeng_	11.57%	11.85%
>>>>> 541.leela_r		0.64%	0.80%
>>>>> 557.xz_			 -0.41%	-0.41%
>>>>> 
>>>>> 507.cactuBSSN_r	0.44%	0.44%
>>>>> 508.namd_r		0.34%	0.34%
>>>>> 510.parest_r		0.17%	0.25%
>>>>> 511.povray_r		56.57%	57.27%
>>>>> 519.lbm_r		0.00%	0.00%
>>>>> 521.wrf_r			 -0.28%	-0.37%
>>>>> 526.blender_r		16.96%	17.71%
>>>>> 527.cam4_r		0.70%	0.53%
>>>>> 538.imagick_r		2.40%	2.40%
>>>>> 544.nab_r		0.00%	-0.65%
>>>>> 
>>>>> avg				5.17%	5.37%
>>>>> 
>>>>> From the above data, we can see that in general, the runtime performance slowdown for 
>>>>> implementation A and D are similar for individual benchmarks.
>>>>> 
>>>>> There are several benchmarks that have significant slowdown with the new added initialization for both
>>>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
>>>>> more on what kind of new initializations introduced such slowdown. 
>>>>> 
>>>>> From the current study so far, I think that approach D should be good enough for our final implementation. 
>>>>> So, I will try to finish approach D with the following remaining work
>>>>> 
>>>>>    ** complete the implementation of -ftrivial-auto-var-init=pattern;
>>>>>    ** complete the implementation of uninitialized warnings maintenance work for D. 
>>>>> 
>>>>> 
>>>>> Let me know if you have any comments and suggestions on my current and future work.
>>>>> 
>>>>> Thanks a lot for your help.
>>>>> 
>>>>> Qing
>>>>> 
>>>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>>>> 
>>>>>> The following are the approaches I will implement and compare:
>>>>>> 
>>>>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
>>>>>> 
>>>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
>>>>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
>>>>>>  Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
>>>>>>  Deleted from the fake init. 
>>>>>> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
>>>>>>   maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
>>>>>>   add real initialization for all DECLs that are marked with “no_explicit_init”.
>>>>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>>>>>  real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>>>>>> 
>>>>>> 
>>>>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
>>>>>> comparison. 
>>>>>> 
>>>>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
>>>>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
>>>>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
>>>>>> 
>>>>>> If the performance of D is not good, I will implement B or C at that time.
>>>>>> 
>>>>>> Let me know if you have any comment or suggestions.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> Qing
>>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de> <mailto:rguenther@suse.de <mailto:rguenther@suse.de>>>
>>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>> 
>> 
> 
> -- 
> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-13 15:35                                       ` Qing Zhao
@ 2021-01-13 15:40                                         ` Richard Biener
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Biener @ 2021-01-13 15:40 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

On Wed, 13 Jan 2021, Qing Zhao wrote:

> 
> 
> > On Jan 13, 2021, at 9:10 AM, Richard Biener <rguenther@suse.de> wrote:
> > 
> > On Wed, 13 Jan 2021, Qing Zhao wrote:
> > 
> >> 
> >> 
> >>> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote:
> >>> 
> >>> On Tue, 12 Jan 2021, Qing Zhao wrote:
> >>> 
> >>>> Hi, 
> >>>> 
> >>>> Just check in to see whether you have any comments and suggestions on this:
> >>>> 
> >>>> FYI, I have been continue with Approach D implementation since last week:
> >>>> 
> >>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> >>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> >>>> 
> >>>> For the remaining work of Approach D:
> >>>> 
> >>>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >>>> ** complete the implementation of uninitialized warnings maintenance work for D. 
> >>>> 
> >>>> I have completed the uninitialized warnings maintenance work for D.
> >>>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
> >>>> 
> >>>> The following are remaining work of Approach D:
> >>>> 
> >>>>  ** -ftrivial-auto-var-init=pattern for VLA;
> >>>>  **add a new attribute for variable:
> >>>> __attribute((uninitialized)
> >>>> the marked variable is uninitialized intentionaly for performance purpose.
> >>>>  ** adding complete testing cases;
> >>>> 
> >>>> 
> >>>> Please let me know if you have any objection on my current decision on implementing approach D. 
> >>> 
> >>> Did you do any analysis on how stack usage and code size are changed 
> >>> with approach D?
> >> 
> >> I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually).
> >> 
> >> But not the stack usage.  Not sure how to collect the stack usage data, 
> >> do you have any suggestion on this?
> > 
> > There is -fstack-usage you could use, then of course watching
> > the stack segment at runtime.
> 
> I can do this for CPU2017 to collect the stack usage data and report back.
> 
> >  I'm mostly concerned about
> > stack-limited "processes" such as the linux kernel which I think
> > is a primary target of your work.
> 
> I don’t have any experience on building linux kernel. 
> Do we have to collect data for linux kernel at this time? Is CPU2017 data not enough?

Well, it depends on the desired target.  The linux kernel has a
8kb hard stack limit for kernel threads on x86_64 (IIRC).  You
don't have to do anything, it was just a suggestion.  For normal
program stack usage is probably the least important problem.

Richard.

> Qing
> > 
> > Richard.
> > 
> >> 
> >>> How does compile-time behave (we could gobble up
> >>> lots of .DEFERRED_INIT calls I guess)?
> >> I can collect this data too and report it later.
> >> 
> >> Thanks.
> >> 
> >> Qing
> >>> 
> >>> Richard.
> >>> 
> >>>> Thanks a lot for your help.
> >>>> 
> >>>> Qing
> >>>> 
> >>>> 
> >>>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>>>> 
> >>>>> Hi,
> >>>>> 
> >>>>> This is an update for our previous discussion. 
> >>>>> 
> >>>>> 1. I implemented the following two different implementations in the latest upstream gcc:
> >>>>> 
> >>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> >>>>> 
> >>>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> >>>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> >>>>> 
> >>>>> Note, in this initial implementation,
> >>>>> 	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
> >>>>> 	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 
> >>>>> 
> >>>>> 	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
> >>>>> 	   runtime performance study.
> >>>>> 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 
> >>>>> 
> >>>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:
> >>>>> 
> >>>>> no: default. (-g -O2 -march=native )
> >>>>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> >>>>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> >>>>> 
> >>>>> And then compute the slowdown data for both A and D as following:
> >>>>> 
> >>>>> benchmarks		A / no	D /no
> >>>>> 
> >>>>> 500.perlbench_r	1.25%	1.25%
> >>>>> 502.gcc_r		0.68%	1.80%
> >>>>> 505.mcf_r		0.68%	0.14%
> >>>>> 520.omnetpp_r	4.83%	4.68%
> >>>>> 523.xalancbmk_r	0.18%	1.96%
> >>>>> 525.x264_r		1.55%	2.07%
> >>>>> 531.deepsjeng_	11.57%	11.85%
> >>>>> 541.leela_r		0.64%	0.80%
> >>>>> 557.xz_			 -0.41%	-0.41%
> >>>>> 
> >>>>> 507.cactuBSSN_r	0.44%	0.44%
> >>>>> 508.namd_r		0.34%	0.34%
> >>>>> 510.parest_r		0.17%	0.25%
> >>>>> 511.povray_r		56.57%	57.27%
> >>>>> 519.lbm_r		0.00%	0.00%
> >>>>> 521.wrf_r			 -0.28%	-0.37%
> >>>>> 526.blender_r		16.96%	17.71%
> >>>>> 527.cam4_r		0.70%	0.53%
> >>>>> 538.imagick_r		2.40%	2.40%
> >>>>> 544.nab_r		0.00%	-0.65%
> >>>>> 
> >>>>> avg				5.17%	5.37%
> >>>>> 
> >>>>> From the above data, we can see that in general, the runtime performance slowdown for 
> >>>>> implementation A and D are similar for individual benchmarks.
> >>>>> 
> >>>>> There are several benchmarks that have significant slowdown with the new added initialization for both
> >>>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
> >>>>> more on what kind of new initializations introduced such slowdown. 
> >>>>> 
> >>>>> From the current study so far, I think that approach D should be good enough for our final implementation. 
> >>>>> So, I will try to finish approach D with the following remaining work
> >>>>> 
> >>>>>    ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >>>>>    ** complete the implementation of uninitialized warnings maintenance work for D. 
> >>>>> 
> >>>>> 
> >>>>> Let me know if you have any comments and suggestions on my current and future work.
> >>>>> 
> >>>>> Thanks a lot for your help.
> >>>>> 
> >>>>> Qing
> >>>>> 
> >>>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>>>>> 
> >>>>>> The following are the approaches I will implement and compare:
> >>>>>> 
> >>>>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
> >>>>>> 
> >>>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
> >>>>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
> >>>>>>  Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
> >>>>>>  Deleted from the fake init. 
> >>>>>> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
> >>>>>>   maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
> >>>>>>   add real initialization for all DECLs that are marked with “no_explicit_init”.
> >>>>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
> >>>>>>  real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
> >>>>>> 
> >>>>>> 
> >>>>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
> >>>>>> comparison. 
> >>>>>> 
> >>>>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
> >>>>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
> >>>>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
> >>>>>> 
> >>>>>> If the performance of D is not good, I will implement B or C at that time.
> >>>>>> 
> >>>>>> Let me know if you have any comment or suggestions.
> >>>>>> 
> >>>>>> Thanks.
> >>>>>> 
> >>>>>> Qing
> >>>>> 
> >>>> 
> >>>> 
> >>> 
> >>> -- 
> >>> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de> <mailto:rguenther@suse.de <mailto:rguenther@suse.de>>>
> >>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> >>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >> 
> >> 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-13  7:39                                 ` Richard Biener
  2021-01-13 15:06                                   ` Qing Zhao
@ 2021-01-14 21:16                                   ` Qing Zhao
  2021-01-15  8:11                                     ` Richard Biener
  1 sibling, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-01-14 21:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

Hi, 

More data on code size and compilation time with CPU2017:

********Compilation time data:   the numbers are the slowdown against the default “no”:

benchmarks		 A/no	D/no
                        
500.perlbench_r	5.19%	1.95%
502.gcc_r		0.46%	-0.23%
505.mcf_r		0.00%	0.00%
520.omnetpp_r	0.85%	0.00%
523.xalancbmk_r	0.79%	-0.40%
525.x264_r		-4.48%	0.00%
531.deepsjeng_r	16.67%	16.67%
541.leela_r		 0.00%	 0.00%
557.xz_r			0.00%	 0.00%
                        
507.cactuBSSN_r	1.16%	0.58%
508.namd_r		9.62%	8.65%
510.parest_r		0.48%	1.19%
511.povray_r		3.70%	3.70%
519.lbm_r		0.00%	0.00%
521.wrf_r			0.05%	0.02%
526.blender_r		0.33%	1.32%
527.cam4_r		-0.93%	-0.93%
538.imagick_r		1.32%	3.95%
544.nab_r 		0.00%	0.00%

From the above data, looks like that the compilation time impact from implementation A and D are almost the same.

*******code size data: the numbers are the code size increase against the default “no”:
benchmarks		A/no		D/no
                        
500.perlbench_r	2.84%	0.34%
502.gcc_r		2.59%	0.35%
505.mcf_r		3.55%	0.39%
520.omnetpp_r	0.54%	0.03%
523.xalancbmk_r	0.36%	 0.39%
525.x264_r		1.39%	0.13%
531.deepsjeng_r	2.15%	-1.12%
541.leela_r		0.50%	-0.20%
557.xz_r			0.31%	0.13%
                        
507.cactuBSSN_r	5.00%	-0.01%
508.namd_r		3.64%	-0.07%
510.parest_r		1.12%	0.33%
511.povray_r		4.18%	1.16%
519.lbm_r		8.83%	6.44%
521.wrf_r			0.08%	0.02%
526.blender_r		1.63%	0.45%
527.cam4_r		 0.16%	0.06%
538.imagick_r		3.18%	-0.80%
544.nab_r		5.76%	-1.11%
Avg				2.52%	0.36%

From the above data, the implementation D is always better than A, it’s a surprising to me, not sure what’s the reason for this.

********stack usage data, I added -fstack-usage to the compilation line when compiling CPU2017 benchmarks. And all the *.su files were generated for each of the modules.
Since there a lot of such files, and the stack size information are embedded in each of the files.  I just picked up one benchmark 511.povray to check. Which is the one that 
has the most runtime overhead when adding initialization (both A and D). 

I identified all the *.su files that are different between A and D and do a diff on those *.su files, and looks like that the stack size is much higher with D than that with A, for example:

$ diff build_base_auto_init.D.0000/bbox.su build_base_auto_init.A.0000/bbox.su
5c5
< bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, pov::BBOX_TREE**&, long int*, long int, long int)	160	static
---
> bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, pov::BBOX_TREE**&, long int*, long int, long int)	96	static

$ diff build_base_auto_init.D.0000/image.su build_base_auto_init.A.0000/image.su
9c9
< image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*)	624	static
---
> image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*)	272	static
….

Looks like that implementation D has more stack size impact than A. 

Do you have any insight on what the reason for this?

Let me know if you have any comments and suggestions.

thanks.

Qing
> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Tue, 12 Jan 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> Just check in to see whether you have any comments and suggestions on this:
>> 
>> FYI, I have been continue with Approach D implementation since last week:
>> 
>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>> 
>> For the remaining work of Approach D:
>> 
>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>> ** complete the implementation of uninitialized warnings maintenance work for D. 
>> 
>> I have completed the uninitialized warnings maintenance work for D.
>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
>> 
>> The following are remaining work of Approach D:
>> 
>>   ** -ftrivial-auto-var-init=pattern for VLA;
>>   **add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>>   ** adding complete testing cases;
>> 
>> 
>> Please let me know if you have any objection on my current decision on implementing approach D. 
> 
> Did you do any analysis on how stack usage and code size are changed 
> with approach D?  How does compile-time behave (we could gobble up
> lots of .DEFERRED_INIT calls I guess)?
> 
> Richard.
> 
>> Thanks a lot for your help.
>> 
>> Qing
>> 
>> 
>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> Hi,
>>> 
>>> This is an update for our previous discussion. 
>>> 
>>> 1. I implemented the following two different implementations in the latest upstream gcc:
>>> 
>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
>>> 
>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>>> 
>>> Note, in this initial implementation,
>>> 	** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern 
>>> 	   is not done yet.  Therefore, the performance data is only about -ftrivial-auto-var-init=zero. 
>>> 
>>> 	** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
>>> 	   runtime performance study.
>>> 	** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 
>>> 
>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases:
>>> 
>>> no: default. (-g -O2 -march=native )
>>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
>>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
>>> 
>>> And then compute the slowdown data for both A and D as following:
>>> 
>>> benchmarks		A / no	D /no
>>> 
>>> 500.perlbench_r	1.25%	1.25%
>>> 502.gcc_r		0.68%	1.80%
>>> 505.mcf_r		0.68%	0.14%
>>> 520.omnetpp_r	4.83%	4.68%
>>> 523.xalancbmk_r	0.18%	1.96%
>>> 525.x264_r		1.55%	2.07%
>>> 531.deepsjeng_	11.57%	11.85%
>>> 541.leela_r		0.64%	0.80%
>>> 557.xz_			 -0.41%	-0.41%
>>> 
>>> 507.cactuBSSN_r	0.44%	0.44%
>>> 508.namd_r		0.34%	0.34%
>>> 510.parest_r		0.17%	0.25%
>>> 511.povray_r		56.57%	57.27%
>>> 519.lbm_r		0.00%	0.00%
>>> 521.wrf_r			 -0.28%	-0.37%
>>> 526.blender_r		16.96%	17.71%
>>> 527.cam4_r		0.70%	0.53%
>>> 538.imagick_r		2.40%	2.40%
>>> 544.nab_r		0.00%	-0.65%
>>> 
>>> avg				5.17%	5.37%
>>> 
>>> From the above data, we can see that in general, the runtime performance slowdown for 
>>> implementation A and D are similar for individual benchmarks.
>>> 
>>> There are several benchmarks that have significant slowdown with the new added initialization for both
>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit
>>> more on what kind of new initializations introduced such slowdown. 
>>> 
>>> From the current study so far, I think that approach D should be good enough for our final implementation. 
>>> So, I will try to finish approach D with the following remaining work
>>> 
>>>     ** complete the implementation of -ftrivial-auto-var-init=pattern;
>>>     ** complete the implementation of uninitialized warnings maintenance work for D. 
>>> 
>>> 
>>> Let me know if you have any comments and suggestions on my current and future work.
>>> 
>>> Thanks a lot for your help.
>>> 
>>> Qing
>>> 
>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>> 
>>>> The following are the approaches I will implement and compare:
>>>> 
>>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.
>>>> 
>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
>>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. 
>>>>   Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
>>>>   Deleted from the fake init. 
>>>> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
>>>>    maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
>>>>    add real initialization for all DECLs that are marked with “no_explicit_init”.
>>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
>>>>   real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.
>>>> 
>>>> 
>>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
>>>> comparison. 
>>>> 
>>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
>>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
>>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. 
>>>> 
>>>> If the performance of D is not good, I will implement B or C at that time.
>>>> 
>>>> Let me know if you have any comment or suggestions.
>>>> 
>>>> Thanks.
>>>> 
>>>> Qing
>>> 
>> 
>> 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-14 21:16                                   ` Qing Zhao
@ 2021-01-15  8:11                                     ` Richard Biener
  2021-01-15 16:16                                       ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2021-01-15  8:11 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches



On Thu, 14 Jan 2021, Qing Zhao wrote:

> Hi, 
> More data on code size and compilation time with CPU2017:
> 
> ********Compilation time data:   the numbers are the slowdown against the
> default “no”:
> 
> benchmarks  A/no D/no
>                         
> 500.perlbench_r 5.19% 1.95%
> 502.gcc_r 0.46% -0.23%
> 505.mcf_r 0.00% 0.00%
> 520.omnetpp_r 0.85% 0.00%
> 523.xalancbmk_r 0.79% -0.40%
> 525.x264_r -4.48% 0.00%
> 531.deepsjeng_r 16.67% 16.67%
> 541.leela_r  0.00%  0.00%
> 557.xz_r 0.00%  0.00%
>                         
> 507.cactuBSSN_r 1.16% 0.58%
> 508.namd_r 9.62% 8.65%
> 510.parest_r 0.48% 1.19%
> 511.povray_r 3.70% 3.70%
> 519.lbm_r 0.00% 0.00%
> 521.wrf_r 0.05% 0.02%
> 526.blender_r 0.33% 1.32%
> 527.cam4_r -0.93% -0.93%
> 538.imagick_r 1.32% 3.95%
> 544.nab_r  0.00% 0.00%
> 
> From the above data, looks like that the compilation time impact
> from implementation A and D are almost the same.
> *******code size data: the numbers are the code size increase against the
> default “no”:
> benchmarks A/no D/no
>                         
> 500.perlbench_r 2.84% 0.34%
> 502.gcc_r 2.59% 0.35%
> 505.mcf_r 3.55% 0.39%
> 520.omnetpp_r 0.54% 0.03%
> 523.xalancbmk_r 0.36%  0.39%
> 525.x264_r 1.39% 0.13%
> 531.deepsjeng_r 2.15% -1.12%
> 541.leela_r 0.50% -0.20%
> 557.xz_r 0.31% 0.13%
>                         
> 507.cactuBSSN_r 5.00% -0.01%
> 508.namd_r 3.64% -0.07%
> 510.parest_r 1.12% 0.33%
> 511.povray_r 4.18% 1.16%
> 519.lbm_r 8.83% 6.44%
> 521.wrf_r 0.08% 0.02%
> 526.blender_r 1.63% 0.45%
> 527.cam4_r  0.16% 0.06%
> 538.imagick_r 3.18% -0.80%
> 544.nab_r 5.76% -1.11%
> Avg 2.52% 0.36%
> 
> From the above data, the implementation D is always better than A, it’s a
> surprising to me, not sure what’s the reason for this.

D probably inhibits most interesting loop transforms (check SPEC FP
performance).  It will also most definitely disallow SRA which, when
an aggregate is not completely elided, tends to grow code.

> ********stack usage data, I added -fstack-usage to the compilation line when
> compiling CPU2017 benchmarks. And all the *.su files were generated for each
> of the modules.
> Since there a lot of such files, and the stack size information are embedded
> in each of the files.  I just picked up one benchmark 511.povray to
> check. Which is the one that 
> has the most runtime overhead when adding initialization (both A and D). 
> 
> I identified all the *.su files that are different between A and D and do a
> diff on those *.su files, and looks like that the stack size is much higher
> with D than that with A, for example:
> 
> $ diff build_base_auto_init.D.0000/bbox.su
> build_base_auto_init.A.0000/bbox.su5c5
> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
> pov::BBOX_TREE**&, long int*, long int, long int) 160 static
> ---
> > bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
> pov::BBOX_TREE**&, long int*, long int, long int) 96 static
> 
> $ diff build_base_auto_init.D.0000/image.su
> build_base_auto_init.A.0000/image.su
> 9c9
> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 624
> static
> ---
> > image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 272
> static
> ….
> Looks like that implementation D has more stack size impact than A. 
> 
> Do you have any insight on what the reason for this?

D will keep all initialized aggregates as aggregates and live which
means stack will be allocated for it.  With A the usual optimizations
to reduce stack usage can be applied.

> Let me know if you have any comments and suggestions.

First of all I would check whether the prototype implementations
work as expected.

Richard.


> thanks.
> 
> Qing
>       On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de>
>       wrote:
>
>       On Tue, 12 Jan 2021, Qing Zhao wrote:
>
>             Hi, 
>
>             Just check in to see whether you have any comments
>             and suggestions on this:
>
>             FYI, I have been continue with Approach D
>             implementation since last week:
>
>             D. Adding  calls to .DEFFERED_INIT during
>             gimplification, expand the .DEFFERED_INIT during
>             expand to
>             real initialization. Adjusting uninitialized pass
>             with the new refs with “.DEFFERED_INIT”.
>
>             For the remaining work of Approach D:
>
>             ** complete the implementation of
>             -ftrivial-auto-var-init=pattern;
>             ** complete the implementation of uninitialized
>             warnings maintenance work for D. 
>
>             I have completed the uninitialized warnings
>             maintenance work for D.
>             And finished partial of the
>             -ftrivial-auto-var-init=pattern implementation. 
>
>             The following are remaining work of Approach D:
>
>               ** -ftrivial-auto-var-init=pattern for VLA;
>               **add a new attribute for variable:
>             __attribute((uninitialized)
>             the marked variable is uninitialized intentionaly
>             for performance purpose.
>               ** adding complete testing cases;
> 
>
>             Please let me know if you have any objection on my
>             current decision on implementing approach D. 
> 
>
>       Did you do any analysis on how stack usage and code size are
>       changed 
>       with approach D?  How does compile-time behave (we could gobble
>       up
>       lots of .DEFERRED_INIT calls I guess)?
>
>       Richard.
>
>             Thanks a lot for your help.
>
>             Qing
> 
>
>                   On Jan 5, 2021, at 1:05 PM, Qing Zhao
>                   via Gcc-patches
>                   <gcc-patches@gcc.gnu.org> wrote:
>
>                   Hi,
>
>                   This is an update for our previous
>                   discussion. 
>
>                   1. I implemented the following two
>                   different implementations in the latest
>                   upstream gcc:
>
>                   A. Adding real initialization during
>                   gimplification, not maintain the
>                   uninitialized warnings.
>
>                   D. Adding  calls to .DEFFERED_INIT
>                   during gimplification, expand the
>                   .DEFFERED_INIT during expand to
>                   real initialization. Adjusting
>                   uninitialized pass with the new refs
>                   with “.DEFFERED_INIT”.
>
>                   Note, in this initial implementation,
>                   ** I ONLY implement
>                   -ftrivial-auto-var-init=zero, the
>                   implementation of
>                   -ftrivial-auto-var-init=pattern 
>                      is not done yet.  Therefore, the
>                   performance data is only about
>                   -ftrivial-auto-var-init=zero. 
>
>                   ** I added an temporary  option
>                   -fauto-var-init-approach=A|B|C|D  to
>                   choose implementation A or D for 
>                      runtime performance study.
>                   ** I didn’t finish the uninitialized
>                   warnings maintenance work for D. (That
>                   might take more time than I expected). 
>
>                   2. I collected runtime data for CPU2017
>                   on a x86 machine with this new gcc for
>                   the following 3 cases:
>
>                   no: default. (-g -O2 -march=native )
>                   A:  default +
>                    -ftrivial-auto-var-init=zero
>                   -fauto-var-init-approach=A 
>                   D:  default +
>                    -ftrivial-auto-var-init=zero
>                   -fauto-var-init-approach=D 
>
>                   And then compute the slowdown data for
>                   both A and D as following:
>
>                   benchmarks A / no D /no
>
>                   500.perlbench_r 1.25% 1.25%
>                   502.gcc_r 0.68% 1.80%
>                   505.mcf_r 0.68% 0.14%
>                   520.omnetpp_r 4.83% 4.68%
>                   523.xalancbmk_r 0.18% 1.96%
>                   525.x264_r 1.55% 2.07%
>                   531.deepsjeng_ 11.57% 11.85%
>                   541.leela_r 0.64% 0.80%
>                   557.xz_  -0.41% -0.41%
>
>                   507.cactuBSSN_r 0.44% 0.44%
>                   508.namd_r 0.34% 0.34%
>                   510.parest_r 0.17% 0.25%
>                   511.povray_r 56.57% 57.27%
>                   519.lbm_r 0.00% 0.00%
>                   521.wrf_r  -0.28% -0.37%
>                   526.blender_r 16.96% 17.71%
>                   527.cam4_r 0.70% 0.53%
>                   538.imagick_r 2.40% 2.40%
>                   544.nab_r 0.00% -0.65%
>
>                   avg 5.17% 5.37%
>
>                   From the above data, we can see that in
>                   general, the runtime performance
>                   slowdown for 
>                   implementation A and D are similar for
>                   individual benchmarks.
>
>                   There are several benchmarks that have
>                   significant slowdown with the new added
>                   initialization for both
>                   A and D, for example, 511.povray_r,
>                   526.blender_, and 531.deepsjeng_r, I
>                   will try to study a little bit
>                   more on what kind of new initializations
>                   introduced such slowdown. 
>
>                   From the current study so far, I think
>                   that approach D should be good enough
>                   for our final implementation. 
>                   So, I will try to finish approach D with
>                   the following remaining work
>
>                       ** complete the implementation of
>                   -ftrivial-auto-var-init=pattern;
>                       ** complete the implementation of
>                   uninitialized warnings maintenance work
>                   for D. 
> 
>
>                   Let me know if you have any comments and
>                   suggestions on my current and future
>                   work.
>
>                   Thanks a lot for your help.
>
>                   Qing
>
>                         On Dec 9, 2020, at 10:18 AM,
>                         Qing Zhao via Gcc-patches
>                         <gcc-patches@gcc.gnu.org>
>                         wrote:
>
>                         The following are the
>                         approaches I will implement
>                         and compare:
>
>                         Our final goal is to keep
>                         the uninitialized warning
>                         and minimize the run-time
>                         performance cost.
>
>                         A. Adding real
>                         initialization during
>                         gimplification, not maintain
>                         the uninitialized warnings.
>                         B. Adding real
>                         initialization during
>                         gimplification, marking them
>                         with “artificial_init”. 
>                           Adjusting uninitialized
>                         pass, maintaining the
>                         annotation, making sure the
>                         real init not
>                           Deleted from the fake
>                         init. 
>                         C.  Marking the DECL for an
>                         uninitialized auto variable
>                         as “no_explicit_init” during
>                         gimplification,
>                            maintain this
>                         “no_explicit_init” bit till
>                         after
>                         pass_late_warn_uninitialized,
>                         or till pass_expand, 
>                            add real initialization
>                         for all DECLs that are
>                         marked with
>                         “no_explicit_init”.
>                         D. Adding .DEFFERED_INIT
>                         during gimplification,
>                         expand the .DEFFERED_INIT
>                         during expand to
>                           real initialization.
>                         Adjusting uninitialized pass
>                         with the new refs with
>                         “.DEFFERED_INIT”.
> 
>
>                         In the above, approach A
>                         will be the one that have
>                         the minimum run-time cost,
>                         will be the base for the
>                         performance
>                         comparison. 
>
>                         I will implement approach D
>                         then, this one is expected
>                         to have the most run-time
>                         overhead among the above
>                         list, but
>                         Implementation should be the
>                         cleanest among B, C, D.
>                         Let’s see how much more
>                         performance overhead this
>                         approach
>                         will be. If the data is
>                         good, maybe we can avoid the
>                         effort to implement B, and
>                         C. 
>
>                         If the performance of D is
>                         not good, I will implement B
>                         or C at that time.
>
>                         Let me know if you have any
>                         comment or suggestions.
>
>                         Thanks.
>
>                         Qing
> 
> 
> 
> 
>
>       -- 
>       Richard Biener <rguenther@suse.de>
>       SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
>       Nuernberg,
>       Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 
> 
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-15  8:11                                     ` Richard Biener
@ 2021-01-15 16:16                                       ` Qing Zhao
  2021-01-15 17:22                                         ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-01-15 16:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches



> On Jan 15, 2021, at 2:11 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> 
> 
> On Thu, 14 Jan 2021, Qing Zhao wrote:
> 
>> Hi, 
>> More data on code size and compilation time with CPU2017:
>> ********Compilation time data:   the numbers are the slowdown against the
>> default “no”:
>> benchmarks  A/no D/no
>>                         
>> 500.perlbench_r 5.19% 1.95%
>> 502.gcc_r 0.46% -0.23%
>> 505.mcf_r 0.00% 0.00%
>> 520.omnetpp_r 0.85% 0.00%
>> 523.xalancbmk_r 0.79% -0.40%
>> 525.x264_r -4.48% 0.00%
>> 531.deepsjeng_r 16.67% 16.67%
>> 541.leela_r  0.00%  0.00%
>> 557.xz_r 0.00%  0.00%
>>                         
>> 507.cactuBSSN_r 1.16% 0.58%
>> 508.namd_r 9.62% 8.65%
>> 510.parest_r 0.48% 1.19%
>> 511.povray_r 3.70% 3.70%
>> 519.lbm_r 0.00% 0.00%
>> 521.wrf_r 0.05% 0.02%
>> 526.blender_r 0.33% 1.32%
>> 527.cam4_r -0.93% -0.93%
>> 538.imagick_r 1.32% 3.95%
>> 544.nab_r  0.00% 0.00%
>> From the above data, looks like that the compilation time impact
>> from implementation A and D are almost the same.
>> *******code size data: the numbers are the code size increase against the
>> default “no”:
>> benchmarks A/no D/no
>>                         
>> 500.perlbench_r 2.84% 0.34%
>> 502.gcc_r 2.59% 0.35%
>> 505.mcf_r 3.55% 0.39%
>> 520.omnetpp_r 0.54% 0.03%
>> 523.xalancbmk_r 0.36%  0.39%
>> 525.x264_r 1.39% 0.13%
>> 531.deepsjeng_r 2.15% -1.12%
>> 541.leela_r 0.50% -0.20%
>> 557.xz_r 0.31% 0.13%
>>                         
>> 507.cactuBSSN_r 5.00% -0.01%
>> 508.namd_r 3.64% -0.07%
>> 510.parest_r 1.12% 0.33%
>> 511.povray_r 4.18% 1.16%
>> 519.lbm_r 8.83% 6.44%
>> 521.wrf_r 0.08% 0.02%
>> 526.blender_r 1.63% 0.45%
>> 527.cam4_r  0.16% 0.06%
>> 538.imagick_r 3.18% -0.80%
>> 544.nab_r 5.76% -1.11%
>> Avg 2.52% 0.36%
>> From the above data, the implementation D is always better than A, it’s a
>> surprising to me, not sure what’s the reason for this.
> 
> D probably inhibits most interesting loop transforms (check SPEC FP
> performance).

The call to .DEFERRED_INIT is marked as ECF_CONST:

/* A function to represent an artifical initialization to an uninitialized
   automatic variable. The first argument is the variable itself, the
   second argument is the initialization type.  */
DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

So, I assume that such const call should minimize the impact to loop optimizations. But yes, it will still inhibit some of the loop transformations.

>  It will also most definitely disallow SRA which, when
> an aggregate is not completely elided, tends to grow code.

Make sense to me. 

The run-time performance data for D and A are actually very similar as I posted in the previous email (I listed it here for convenience)

Run-time performance overhead with A and D:

benchmarks		A / no	D /no

500.perlbench_r	1.25%	1.25%
502.gcc_r		0.68%	1.80%
505.mcf_r		0.68%	0.14%
520.omnetpp_r	4.83%	4.68%
523.xalancbmk_r	0.18%	1.96%
525.x264_r		1.55%	2.07%
531.deepsjeng_	11.57%	11.85%
541.leela_r		0.64%	0.80%
557.xz_			 -0.41%	-0.41%

507.cactuBSSN_r	0.44%	0.44%
508.namd_r		0.34%	0.34%
510.parest_r		0.17%	0.25%
511.povray_r		56.57%	57.27%
519.lbm_r		0.00%	0.00%
521.wrf_r			 -0.28%	-0.37%
526.blender_r		16.96%	17.71%
527.cam4_r		0.70%	0.53%
538.imagick_r		2.40%	2.40%
544.nab_r		0.00%	-0.65%

avg				5.17%	5.37%

Especially for the SPEC FP benchmarks, I didn’t see too much performance difference between A and D. 
I guess that the RTL optimizations might be enough to get rid of most of the overhead introduced by the additional initialization. 

> 
>> ********stack usage data, I added -fstack-usage to the compilation line when
>> compiling CPU2017 benchmarks. And all the *.su files were generated for each
>> of the modules.
>> Since there a lot of such files, and the stack size information are embedded
>> in each of the files.  I just picked up one benchmark 511.povray to
>> check. Which is the one that 
>> has the most runtime overhead when adding initialization (both A and D). 
>> I identified all the *.su files that are different between A and D and do a
>> diff on those *.su files, and looks like that the stack size is much higher
>> with D than that with A, for example:
>> $ diff build_base_auto_init.D.0000/bbox.su
>> build_base_auto_init.A.0000/bbox.su5c5
>> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>> pov::BBOX_TREE**&, long int*, long int, long int) 160 static
>> ---
>> > bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>> pov::BBOX_TREE**&, long int*, long int, long int) 96 static
>> $ diff build_base_auto_init.D.0000/image.su
>> build_base_auto_init.A.0000/image.su
>> 9c9
>> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 624
>> static
>> ---
>> > image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 272
>> static
>> ….
>> Looks like that implementation D has more stack size impact than A. 
>> Do you have any insight on what the reason for this?
> 
> D will keep all initialized aggregates as aggregates and live which
> means stack will be allocated for it.  With A the usual optimizations
> to reduce stack usage can be applied.

I checked the routine “poverties::bump_map” in 511.povray_r since it has a lot stack increase 
due to implementation D, by examine the IR immediate before RTL expansion phase.  
(image.cpp.244t.optimized), I found that we have the following additional statements for the array elements:

void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double * normal)
{
…
  double p3[3];
  double p2[3];
  double p1[3];
  float colour3[5];
  float colour2[5];
  float colour1[5];
…
   # DEBUG BEGIN_STMT
  colour1 = .DEFERRED_INIT (colour1, 2);
  colour2 = .DEFERRED_INIT (colour2, 2);
  colour3 = .DEFERRED_INIT (colour3, 2);
  # DEBUG BEGIN_STMT
  MEM <double> [(double[3] *)&p1] = p1$0_144(D);
  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
  p1 = .DEFERRED_INIT (p1, 2);
  # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
  # DEBUG p1$0 => D#12
  # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
  # DEBUG p1$1 => D#11
  # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
  # DEBUG p1$2 => D#10
  MEM <double> [(double[3] *)&p2] = p2$0_109(D);
  MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
  MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
  p2 = .DEFERRED_INIT (p2, 2);
  # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
  # DEBUG p2$0 => D#9
  # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
  # DEBUG p2$1 => D#8
  # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
  # DEBUG p2$2 => D#7
  MEM <double> [(double[3] *)&p3] = p3$0_256(D);
  MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
  MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
  p3 = .DEFERRED_INIT (p3, 2);
  ….
}

I guess that the above “MEM <double>….. = …” are the ones that make the differences. Which phase introduced them?

> 
>> Let me know if you have any comments and suggestions.
> 
> First of all I would check whether the prototype implementations
> work as expected.
I have done such check with small testing cases already, checking the IR generated with the implementation A or D, mainly
Focus on *.c.006t.gimple.  and *.c.*t.expand, all worked as expected. 

For the CPU2017, for example as the above, I also checked the IR for both A and D, looks like all worked as expected.

Thanks. 

Qing
> 
> Richard.
> 
> 
>> thanks.
>> Qing
>>      On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de>
>>      wrote:
>> 
>>      On Tue, 12 Jan 2021, Qing Zhao wrote:
>> 
>>            Hi, 
>> 
>>            Just check in to see whether you have any comments
>>            and suggestions on this:
>> 
>>            FYI, I have been continue with Approach D
>>            implementation since last week:
>> 
>>            D. Adding  calls to .DEFFERED_INIT during
>>            gimplification, expand the .DEFFERED_INIT during
>>            expand to
>>            real initialization. Adjusting uninitialized pass
>>            with the new refs with “.DEFFERED_INIT”.
>> 
>>            For the remaining work of Approach D:
>> 
>>            ** complete the implementation of
>>            -ftrivial-auto-var-init=pattern;
>>            ** complete the implementation of uninitialized
>>            warnings maintenance work for D. 
>> 
>>            I have completed the uninitialized warnings
>>            maintenance work for D.
>>            And finished partial of the
>>            -ftrivial-auto-var-init=pattern implementation. 
>> 
>>            The following are remaining work of Approach D:
>> 
>>              ** -ftrivial-auto-var-init=pattern for VLA;
>>              **add a new attribute for variable:
>>            __attribute((uninitialized)
>>            the marked variable is uninitialized intentionaly
>>            for performance purpose.
>>              ** adding complete testing cases;
>> 
>>            Please let me know if you have any objection on my
>>            current decision on implementing approach D. 
>> 
>>      Did you do any analysis on how stack usage and code size are
>>      changed 
>>      with approach D?  How does compile-time behave (we could gobble
>>      up
>>      lots of .DEFERRED_INIT calls I guess)?
>> 
>>      Richard.
>> 
>>            Thanks a lot for your help.
>> 
>>            Qing
>> 
>>                  On Jan 5, 2021, at 1:05 PM, Qing Zhao
>>                  via Gcc-patches
>>                  <gcc-patches@gcc.gnu.org> wrote:
>> 
>>                  Hi,
>> 
>>                  This is an update for our previous
>>                  discussion. 
>> 
>>                  1. I implemented the following two
>>                  different implementations in the latest
>>                  upstream gcc:
>> 
>>                  A. Adding real initialization during
>>                  gimplification, not maintain the
>>                  uninitialized warnings.
>> 
>>                  D. Adding  calls to .DEFFERED_INIT
>>                  during gimplification, expand the
>>                  .DEFFERED_INIT during expand to
>>                  real initialization. Adjusting
>>                  uninitialized pass with the new refs
>>                  with “.DEFFERED_INIT”.
>> 
>>                  Note, in this initial implementation,
>>                  ** I ONLY implement
>>                  -ftrivial-auto-var-init=zero, the
>>                  implementation of
>>                  -ftrivial-auto-var-init=pattern 
>>                     is not done yet.  Therefore, the
>>                  performance data is only about
>>                  -ftrivial-auto-var-init=zero. 
>> 
>>                  ** I added an temporary  option
>>                  -fauto-var-init-approach=A|B|C|D  to
>>                  choose implementation A or D for 
>>                     runtime performance study.
>>                  ** I didn’t finish the uninitialized
>>                  warnings maintenance work for D. (That
>>                  might take more time than I expected). 
>> 
>>                  2. I collected runtime data for CPU2017
>>                  on a x86 machine with this new gcc for
>>                  the following 3 cases:
>> 
>>                  no: default. (-g -O2 -march=native )
>>                  A:  default +
>>                   -ftrivial-auto-var-init=zero
>>                  -fauto-var-init-approach=A 
>>                  D:  default +
>>                   -ftrivial-auto-var-init=zero
>>                  -fauto-var-init-approach=D 
>> 
>>                  And then compute the slowdown data for
>>                  both A and D as following:
>> 
>>                  benchmarks A / no D /no
>> 
>>                  500.perlbench_r 1.25% 1.25%
>>                  502.gcc_r 0.68% 1.80%
>>                  505.mcf_r 0.68% 0.14%
>>                  520.omnetpp_r 4.83% 4.68%
>>                  523.xalancbmk_r 0.18% 1.96%
>>                  525.x264_r 1.55% 2.07%
>>                  531.deepsjeng_ 11.57% 11.85%
>>                  541.leela_r 0.64% 0.80%
>>                  557.xz_  -0.41% -0.41%
>> 
>>                  507.cactuBSSN_r 0.44% 0.44%
>>                  508.namd_r 0.34% 0.34%
>>                  510.parest_r 0.17% 0.25%
>>                  511.povray_r 56.57% 57.27%
>>                  519.lbm_r 0.00% 0.00%
>>                  521.wrf_r  -0.28% -0.37%
>>                  526.blender_r 16.96% 17.71%
>>                  527.cam4_r 0.70% 0.53%
>>                  538.imagick_r 2.40% 2.40%
>>                  544.nab_r 0.00% -0.65%
>> 
>>                  avg 5.17% 5.37%
>> 
>>                  From the above data, we can see that in
>>                  general, the runtime performance
>>                  slowdown for 
>>                  implementation A and D are similar for
>>                  individual benchmarks.
>> 
>>                  There are several benchmarks that have
>>                  significant slowdown with the new added
>>                  initialization for both
>>                  A and D, for example, 511.povray_r,
>>                  526.blender_, and 531.deepsjeng_r, I
>>                  will try to study a little bit
>>                  more on what kind of new initializations
>>                  introduced such slowdown. 
>> 
>>                  From the current study so far, I think
>>                  that approach D should be good enough
>>                  for our final implementation. 
>>                  So, I will try to finish approach D with
>>                  the following remaining work
>> 
>>                      ** complete the implementation of
>>                  -ftrivial-auto-var-init=pattern;
>>                      ** complete the implementation of
>>                  uninitialized warnings maintenance work
>>                  for D. 
>> 
>>                  Let me know if you have any comments and
>>                  suggestions on my current and future
>>                  work.
>> 
>>                  Thanks a lot for your help.
>> 
>>                  Qing
>> 
>>                        On Dec 9, 2020, at 10:18 AM,
>>                        Qing Zhao via Gcc-patches
>>                        <gcc-patches@gcc.gnu.org>
>>                        wrote:
>> 
>>                        The following are the
>>                        approaches I will implement
>>                        and compare:
>> 
>>                        Our final goal is to keep
>>                        the uninitialized warning
>>                        and minimize the run-time
>>                        performance cost.
>> 
>>                        A. Adding real
>>                        initialization during
>>                        gimplification, not maintain
>>                        the uninitialized warnings.
>>                        B. Adding real
>>                        initialization during
>>                        gimplification, marking them
>>                        with “artificial_init”. 
>>                          Adjusting uninitialized
>>                        pass, maintaining the
>>                        annotation, making sure the
>>                        real init not
>>                          Deleted from the fake
>>                        init. 
>>                        C.  Marking the DECL for an
>>                        uninitialized auto variable
>>                        as “no_explicit_init” during
>>                        gimplification,
>>                           maintain this
>>                        “no_explicit_init” bit till
>>                        after
>>                        pass_late_warn_uninitialized,
>>                        or till pass_expand, 
>>                           add real initialization
>>                        for all DECLs that are
>>                        marked with
>>                        “no_explicit_init”.
>>                        D. Adding .DEFFERED_INIT
>>                        during gimplification,
>>                        expand the .DEFFERED_INIT
>>                        during expand to
>>                          real initialization.
>>                        Adjusting uninitialized pass
>>                        with the new refs with
>>                        “.DEFFERED_INIT”.
>> 
>>                        In the above, approach A
>>                        will be the one that have
>>                        the minimum run-time cost,
>>                        will be the base for the
>>                        performance
>>                        comparison. 
>> 
>>                        I will implement approach D
>>                        then, this one is expected
>>                        to have the most run-time
>>                        overhead among the above
>>                        list, but
>>                        Implementation should be the
>>                        cleanest among B, C, D.
>>                        Let’s see how much more
>>                        performance overhead this
>>                        approach
>>                        will be. If the data is
>>                        good, maybe we can avoid the
>>                        effort to implement B, and
>>                        C. 
>> 
>>                        If the performance of D is
>>                        not good, I will implement B
>>                        or C at that time.
>> 
>>                        Let me know if you have any
>>                        comment or suggestions.
>> 
>>                        Thanks.
>> 
>>                        Qing
>> 
>>      -- 
>>      Richard Biener <rguenther@suse.de>
>>      SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
>>      Nuernberg,
>>      Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-15 16:16                                       ` Qing Zhao
@ 2021-01-15 17:22                                         ` Richard Biener
  2021-01-15 17:57                                           ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2021-01-15 17:22 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

On January 15, 2021 5:16:40 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM> wrote:
>
>
>> On Jan 15, 2021, at 2:11 AM, Richard Biener <rguenther@suse.de>
>wrote:
>> 
>> 
>> 
>> On Thu, 14 Jan 2021, Qing Zhao wrote:
>> 
>>> Hi, 
>>> More data on code size and compilation time with CPU2017:
>>> ********Compilation time data:   the numbers are the slowdown
>against the
>>> default “no”:
>>> benchmarks  A/no D/no
>>>                         
>>> 500.perlbench_r 5.19% 1.95%
>>> 502.gcc_r 0.46% -0.23%
>>> 505.mcf_r 0.00% 0.00%
>>> 520.omnetpp_r 0.85% 0.00%
>>> 523.xalancbmk_r 0.79% -0.40%
>>> 525.x264_r -4.48% 0.00%
>>> 531.deepsjeng_r 16.67% 16.67%
>>> 541.leela_r  0.00%  0.00%
>>> 557.xz_r 0.00%  0.00%
>>>                         
>>> 507.cactuBSSN_r 1.16% 0.58%
>>> 508.namd_r 9.62% 8.65%
>>> 510.parest_r 0.48% 1.19%
>>> 511.povray_r 3.70% 3.70%
>>> 519.lbm_r 0.00% 0.00%
>>> 521.wrf_r 0.05% 0.02%
>>> 526.blender_r 0.33% 1.32%
>>> 527.cam4_r -0.93% -0.93%
>>> 538.imagick_r 1.32% 3.95%
>>> 544.nab_r  0.00% 0.00%
>>> From the above data, looks like that the compilation time impact
>>> from implementation A and D are almost the same.
>>> *******code size data: the numbers are the code size increase
>against the
>>> default “no”:
>>> benchmarks A/no D/no
>>>                         
>>> 500.perlbench_r 2.84% 0.34%
>>> 502.gcc_r 2.59% 0.35%
>>> 505.mcf_r 3.55% 0.39%
>>> 520.omnetpp_r 0.54% 0.03%
>>> 523.xalancbmk_r 0.36%  0.39%
>>> 525.x264_r 1.39% 0.13%
>>> 531.deepsjeng_r 2.15% -1.12%
>>> 541.leela_r 0.50% -0.20%
>>> 557.xz_r 0.31% 0.13%
>>>                         
>>> 507.cactuBSSN_r 5.00% -0.01%
>>> 508.namd_r 3.64% -0.07%
>>> 510.parest_r 1.12% 0.33%
>>> 511.povray_r 4.18% 1.16%
>>> 519.lbm_r 8.83% 6.44%
>>> 521.wrf_r 0.08% 0.02%
>>> 526.blender_r 1.63% 0.45%
>>> 527.cam4_r  0.16% 0.06%
>>> 538.imagick_r 3.18% -0.80%
>>> 544.nab_r 5.76% -1.11%
>>> Avg 2.52% 0.36%
>>> From the above data, the implementation D is always better than A,
>it’s a
>>> surprising to me, not sure what’s the reason for this.
>> 
>> D probably inhibits most interesting loop transforms (check SPEC FP
>> performance).
>
>The call to .DEFERRED_INIT is marked as ECF_CONST:
>
>/* A function to represent an artifical initialization to an
>uninitialized
>   automatic variable. The first argument is the variable itself, the
>   second argument is the initialization type.  */
>DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW,
>NULL)
>
>So, I assume that such const call should minimize the impact to loop
>optimizations. But yes, it will still inhibit some of the loop
>transformations.
>
>>  It will also most definitely disallow SRA which, when
>> an aggregate is not completely elided, tends to grow code.
>
>Make sense to me. 
>
>The run-time performance data for D and A are actually very similar as
>I posted in the previous email (I listed it here for convenience)
>
>Run-time performance overhead with A and D:
>
>benchmarks		A / no	D /no
>
>500.perlbench_r	1.25%	1.25%
>502.gcc_r		0.68%	1.80%
>505.mcf_r		0.68%	0.14%
>520.omnetpp_r	4.83%	4.68%
>523.xalancbmk_r	0.18%	1.96%
>525.x264_r		1.55%	2.07%
>531.deepsjeng_	11.57%	11.85%
>541.leela_r		0.64%	0.80%
>557.xz_			 -0.41%	-0.41%
>
>507.cactuBSSN_r	0.44%	0.44%
>508.namd_r		0.34%	0.34%
>510.parest_r		0.17%	0.25%
>511.povray_r		56.57%	57.27%
>519.lbm_r		0.00%	0.00%
>521.wrf_r			 -0.28%	-0.37%
>526.blender_r		16.96%	17.71%
>527.cam4_r		0.70%	0.53%
>538.imagick_r		2.40%	2.40%
>544.nab_r		0.00%	-0.65%
>
>avg				5.17%	5.37%
>
>Especially for the SPEC FP benchmarks, I didn’t see too much
>performance difference between A and D. 
>I guess that the RTL optimizations might be enough to get rid of most
>of the overhead introduced by the additional initialization. 
>
>> 
>>> ********stack usage data, I added -fstack-usage to the compilation
>line when
>>> compiling CPU2017 benchmarks. And all the *.su files were generated
>for each
>>> of the modules.
>>> Since there a lot of such files, and the stack size information are
>embedded
>>> in each of the files.  I just picked up one benchmark 511.povray to
>>> check. Which is the one that 
>>> has the most runtime overhead when adding initialization (both A and
>D). 
>>> I identified all the *.su files that are different between A and D
>and do a
>>> diff on those *.su files, and looks like that the stack size is much
>higher
>>> with D than that with A, for example:
>>> $ diff build_base_auto_init.D.0000/bbox.su
>>> build_base_auto_init.A.0000/bbox.su5c5
>>> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>>> pov::BBOX_TREE**&, long int*, long int, long int) 160 static
>>> ---
>>> > bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>>> pov::BBOX_TREE**&, long int*, long int, long int) 96 static
>>> $ diff build_base_auto_init.D.0000/image.su
>>> build_base_auto_init.A.0000/image.su
>>> 9c9
>>> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*,
>double*) 624
>>> static
>>> ---
>>> > image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*,
>double*) 272
>>> static
>>> ….
>>> Looks like that implementation D has more stack size impact than A. 
>>> Do you have any insight on what the reason for this?
>> 
>> D will keep all initialized aggregates as aggregates and live which
>> means stack will be allocated for it.  With A the usual optimizations
>> to reduce stack usage can be applied.
>
>I checked the routine “poverties::bump_map” in 511.povray_r since it
>has a lot stack increase 
>due to implementation D, by examine the IR immediate before RTL
>expansion phase.  
>(image.cpp.244t.optimized), I found that we have the following
>additional statements for the array elements:
>
>void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>* normal)
>{
>…
>  double p3[3];
>  double p2[3];
>  double p1[3];
>  float colour3[5];
>  float colour2[5];
>  float colour1[5];
>…
>   # DEBUG BEGIN_STMT
>  colour1 = .DEFERRED_INIT (colour1, 2);
>  colour2 = .DEFERRED_INIT (colour2, 2);
>  colour3 = .DEFERRED_INIT (colour3, 2);
>  # DEBUG BEGIN_STMT
>  MEM <double> [(double[3] *)&p1] = p1$0_144(D);
>  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
>  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
>  p1 = .DEFERRED_INIT (p1, 2);
>  # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
>  # DEBUG p1$0 => D#12
>  # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
>  # DEBUG p1$1 => D#11
>  # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
>  # DEBUG p1$2 => D#10
>  MEM <double> [(double[3] *)&p2] = p2$0_109(D);
>  MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
>  MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
>  p2 = .DEFERRED_INIT (p2, 2);
>  # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
>  # DEBUG p2$0 => D#9
>  # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
>  # DEBUG p2$1 => D#8
>  # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
>  # DEBUG p2$2 => D#7
>  MEM <double> [(double[3] *)&p3] = p3$0_256(D);
>  MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
>  MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
>  p3 = .DEFERRED_INIT (p3, 2);
>  ….
>}
>
>I guess that the above “MEM <double>….. = …” are the ones that make the
>differences. Which phase introduced them?

Looks like SRA. But you can just dump all and grep for the first occurrence. 


>> 
>>> Let me know if you have any comments and suggestions.
>> 
>> First of all I would check whether the prototype implementations
>> work as expected.
>I have done such check with small testing cases already, checking the
>IR generated with the implementation A or D, mainly
>Focus on *.c.006t.gimple.  and *.c.*t.expand, all worked as expected. 
>
>For the CPU2017, for example as the above, I also checked the IR for
>both A and D, looks like all worked as expected.
>
>Thanks. 
>
>Qing
>> 
>> Richard.
>> 
>> 
>>> thanks.
>>> Qing
>>>      On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de>
>>>      wrote:
>>> 
>>>      On Tue, 12 Jan 2021, Qing Zhao wrote:
>>> 
>>>            Hi, 
>>> 
>>>            Just check in to see whether you have any comments
>>>            and suggestions on this:
>>> 
>>>            FYI, I have been continue with Approach D
>>>            implementation since last week:
>>> 
>>>            D. Adding  calls to .DEFFERED_INIT during
>>>            gimplification, expand the .DEFFERED_INIT during
>>>            expand to
>>>            real initialization. Adjusting uninitialized pass
>>>            with the new refs with “.DEFFERED_INIT”.
>>> 
>>>            For the remaining work of Approach D:
>>> 
>>>            ** complete the implementation of
>>>            -ftrivial-auto-var-init=pattern;
>>>            ** complete the implementation of uninitialized
>>>            warnings maintenance work for D. 
>>> 
>>>            I have completed the uninitialized warnings
>>>            maintenance work for D.
>>>            And finished partial of the
>>>            -ftrivial-auto-var-init=pattern implementation. 
>>> 
>>>            The following are remaining work of Approach D:
>>> 
>>>              ** -ftrivial-auto-var-init=pattern for VLA;
>>>              **add a new attribute for variable:
>>>            __attribute((uninitialized)
>>>            the marked variable is uninitialized intentionaly
>>>            for performance purpose.
>>>              ** adding complete testing cases;
>>> 
>>>            Please let me know if you have any objection on my
>>>            current decision on implementing approach D. 
>>> 
>>>      Did you do any analysis on how stack usage and code size are
>>>      changed 
>>>      with approach D?  How does compile-time behave (we could gobble
>>>      up
>>>      lots of .DEFERRED_INIT calls I guess)?
>>> 
>>>      Richard.
>>> 
>>>            Thanks a lot for your help.
>>> 
>>>            Qing
>>> 
>>>                  On Jan 5, 2021, at 1:05 PM, Qing Zhao
>>>                  via Gcc-patches
>>>                  <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>>                  Hi,
>>> 
>>>                  This is an update for our previous
>>>                  discussion. 
>>> 
>>>                  1. I implemented the following two
>>>                  different implementations in the latest
>>>                  upstream gcc:
>>> 
>>>                  A. Adding real initialization during
>>>                  gimplification, not maintain the
>>>                  uninitialized warnings.
>>> 
>>>                  D. Adding  calls to .DEFFERED_INIT
>>>                  during gimplification, expand the
>>>                  .DEFFERED_INIT during expand to
>>>                  real initialization. Adjusting
>>>                  uninitialized pass with the new refs
>>>                  with “.DEFFERED_INIT”.
>>> 
>>>                  Note, in this initial implementation,
>>>                  ** I ONLY implement
>>>                  -ftrivial-auto-var-init=zero, the
>>>                  implementation of
>>>                  -ftrivial-auto-var-init=pattern 
>>>                     is not done yet.  Therefore, the
>>>                  performance data is only about
>>>                  -ftrivial-auto-var-init=zero. 
>>> 
>>>                  ** I added an temporary  option
>>>                  -fauto-var-init-approach=A|B|C|D  to
>>>                  choose implementation A or D for 
>>>                     runtime performance study.
>>>                  ** I didn’t finish the uninitialized
>>>                  warnings maintenance work for D. (That
>>>                  might take more time than I expected). 
>>> 
>>>                  2. I collected runtime data for CPU2017
>>>                  on a x86 machine with this new gcc for
>>>                  the following 3 cases:
>>> 
>>>                  no: default. (-g -O2 -march=native )
>>>                  A:  default +
>>>                   -ftrivial-auto-var-init=zero
>>>                  -fauto-var-init-approach=A 
>>>                  D:  default +
>>>                   -ftrivial-auto-var-init=zero
>>>                  -fauto-var-init-approach=D 
>>> 
>>>                  And then compute the slowdown data for
>>>                  both A and D as following:
>>> 
>>>                  benchmarks A / no D /no
>>> 
>>>                  500.perlbench_r 1.25% 1.25%
>>>                  502.gcc_r 0.68% 1.80%
>>>                  505.mcf_r 0.68% 0.14%
>>>                  520.omnetpp_r 4.83% 4.68%
>>>                  523.xalancbmk_r 0.18% 1.96%
>>>                  525.x264_r 1.55% 2.07%
>>>                  531.deepsjeng_ 11.57% 11.85%
>>>                  541.leela_r 0.64% 0.80%
>>>                  557.xz_  -0.41% -0.41%
>>> 
>>>                  507.cactuBSSN_r 0.44% 0.44%
>>>                  508.namd_r 0.34% 0.34%
>>>                  510.parest_r 0.17% 0.25%
>>>                  511.povray_r 56.57% 57.27%
>>>                  519.lbm_r 0.00% 0.00%
>>>                  521.wrf_r  -0.28% -0.37%
>>>                  526.blender_r 16.96% 17.71%
>>>                  527.cam4_r 0.70% 0.53%
>>>                  538.imagick_r 2.40% 2.40%
>>>                  544.nab_r 0.00% -0.65%
>>> 
>>>                  avg 5.17% 5.37%
>>> 
>>>                  From the above data, we can see that in
>>>                  general, the runtime performance
>>>                  slowdown for 
>>>                  implementation A and D are similar for
>>>                  individual benchmarks.
>>> 
>>>                  There are several benchmarks that have
>>>                  significant slowdown with the new added
>>>                  initialization for both
>>>                  A and D, for example, 511.povray_r,
>>>                  526.blender_, and 531.deepsjeng_r, I
>>>                  will try to study a little bit
>>>                  more on what kind of new initializations
>>>                  introduced such slowdown. 
>>> 
>>>                  From the current study so far, I think
>>>                  that approach D should be good enough
>>>                  for our final implementation. 
>>>                  So, I will try to finish approach D with
>>>                  the following remaining work
>>> 
>>>                      ** complete the implementation of
>>>                  -ftrivial-auto-var-init=pattern;
>>>                      ** complete the implementation of
>>>                  uninitialized warnings maintenance work
>>>                  for D. 
>>> 
>>>                  Let me know if you have any comments and
>>>                  suggestions on my current and future
>>>                  work.
>>> 
>>>                  Thanks a lot for your help.
>>> 
>>>                  Qing
>>> 
>>>                        On Dec 9, 2020, at 10:18 AM,
>>>                        Qing Zhao via Gcc-patches
>>>                        <gcc-patches@gcc.gnu.org>
>>>                        wrote:
>>> 
>>>                        The following are the
>>>                        approaches I will implement
>>>                        and compare:
>>> 
>>>                        Our final goal is to keep
>>>                        the uninitialized warning
>>>                        and minimize the run-time
>>>                        performance cost.
>>> 
>>>                        A. Adding real
>>>                        initialization during
>>>                        gimplification, not maintain
>>>                        the uninitialized warnings.
>>>                        B. Adding real
>>>                        initialization during
>>>                        gimplification, marking them
>>>                        with “artificial_init”. 
>>>                          Adjusting uninitialized
>>>                        pass, maintaining the
>>>                        annotation, making sure the
>>>                        real init not
>>>                          Deleted from the fake
>>>                        init. 
>>>                        C.  Marking the DECL for an
>>>                        uninitialized auto variable
>>>                        as “no_explicit_init” during
>>>                        gimplification,
>>>                           maintain this
>>>                        “no_explicit_init” bit till
>>>                        after
>>>                        pass_late_warn_uninitialized,
>>>                        or till pass_expand, 
>>>                           add real initialization
>>>                        for all DECLs that are
>>>                        marked with
>>>                        “no_explicit_init”.
>>>                        D. Adding .DEFFERED_INIT
>>>                        during gimplification,
>>>                        expand the .DEFFERED_INIT
>>>                        during expand to
>>>                          real initialization.
>>>                        Adjusting uninitialized pass
>>>                        with the new refs with
>>>                        “.DEFFERED_INIT”.
>>> 
>>>                        In the above, approach A
>>>                        will be the one that have
>>>                        the minimum run-time cost,
>>>                        will be the base for the
>>>                        performance
>>>                        comparison. 
>>> 
>>>                        I will implement approach D
>>>                        then, this one is expected
>>>                        to have the most run-time
>>>                        overhead among the above
>>>                        list, but
>>>                        Implementation should be the
>>>                        cleanest among B, C, D.
>>>                        Let’s see how much more
>>>                        performance overhead this
>>>                        approach
>>>                        will be. If the data is
>>>                        good, maybe we can avoid the
>>>                        effort to implement B, and
>>>                        C. 
>>> 
>>>                        If the performance of D is
>>>                        not good, I will implement B
>>>                        or C at that time.
>>> 
>>>                        Let me know if you have any
>>>                        comment or suggestions.
>>> 
>>>                        Thanks.
>>> 
>>>                        Qing
>>> 
>>>      -- 
>>>      Richard Biener <rguenther@suse.de>
>>>      SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
>>>      Nuernberg,
>>>      Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-15 17:22                                         ` Richard Biener
@ 2021-01-15 17:57                                           ` Qing Zhao
  2021-01-18 13:09                                             ` Richard Sandiford
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-01-15 17:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches



> On Jan 15, 2021, at 11:22 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On January 15, 2021 5:16:40 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> wrote:
>> 
>> 
>>> On Jan 15, 2021, at 2:11 AM, Richard Biener <rguenther@suse.de>
>> wrote:
>>> 
>>> 
>>> 
>>> On Thu, 14 Jan 2021, Qing Zhao wrote:
>>> 
>>>> Hi, 
>>>> More data on code size and compilation time with CPU2017:
>>>> ********Compilation time data:   the numbers are the slowdown
>> against the
>>>> default “no”:
>>>> benchmarks  A/no D/no
>>>> 
>>>> 500.perlbench_r 5.19% 1.95%
>>>> 502.gcc_r 0.46% -0.23%
>>>> 505.mcf_r 0.00% 0.00%
>>>> 520.omnetpp_r 0.85% 0.00%
>>>> 523.xalancbmk_r 0.79% -0.40%
>>>> 525.x264_r -4.48% 0.00%
>>>> 531.deepsjeng_r 16.67% 16.67%
>>>> 541.leela_r  0.00%  0.00%
>>>> 557.xz_r 0.00%  0.00%
>>>> 
>>>> 507.cactuBSSN_r 1.16% 0.58%
>>>> 508.namd_r 9.62% 8.65%
>>>> 510.parest_r 0.48% 1.19%
>>>> 511.povray_r 3.70% 3.70%
>>>> 519.lbm_r 0.00% 0.00%
>>>> 521.wrf_r 0.05% 0.02%
>>>> 526.blender_r 0.33% 1.32%
>>>> 527.cam4_r -0.93% -0.93%
>>>> 538.imagick_r 1.32% 3.95%
>>>> 544.nab_r  0.00% 0.00%
>>>> From the above data, looks like that the compilation time impact
>>>> from implementation A and D are almost the same.
>>>> *******code size data: the numbers are the code size increase
>> against the
>>>> default “no”:
>>>> benchmarks A/no D/no
>>>> 
>>>> 500.perlbench_r 2.84% 0.34%
>>>> 502.gcc_r 2.59% 0.35%
>>>> 505.mcf_r 3.55% 0.39%
>>>> 520.omnetpp_r 0.54% 0.03%
>>>> 523.xalancbmk_r 0.36%  0.39%
>>>> 525.x264_r 1.39% 0.13%
>>>> 531.deepsjeng_r 2.15% -1.12%
>>>> 541.leela_r 0.50% -0.20%
>>>> 557.xz_r 0.31% 0.13%
>>>> 
>>>> 507.cactuBSSN_r 5.00% -0.01%
>>>> 508.namd_r 3.64% -0.07%
>>>> 510.parest_r 1.12% 0.33%
>>>> 511.povray_r 4.18% 1.16%
>>>> 519.lbm_r 8.83% 6.44%
>>>> 521.wrf_r 0.08% 0.02%
>>>> 526.blender_r 1.63% 0.45%
>>>> 527.cam4_r  0.16% 0.06%
>>>> 538.imagick_r 3.18% -0.80%
>>>> 544.nab_r 5.76% -1.11%
>>>> Avg 2.52% 0.36%
>>>> From the above data, the implementation D is always better than A,
>> it’s a
>>>> surprising to me, not sure what’s the reason for this.
>>> 
>>> D probably inhibits most interesting loop transforms (check SPEC FP
>>> performance).
>> 
>> The call to .DEFERRED_INIT is marked as ECF_CONST:
>> 
>> /* A function to represent an artifical initialization to an
>> uninitialized
>>  automatic variable. The first argument is the variable itself, the
>>  second argument is the initialization type.  */
>> DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW,
>> NULL)
>> 
>> So, I assume that such const call should minimize the impact to loop
>> optimizations. But yes, it will still inhibit some of the loop
>> transformations.
>> 
>>> It will also most definitely disallow SRA which, when
>>> an aggregate is not completely elided, tends to grow code.
>> 
>> Make sense to me. 
>> 
>> The run-time performance data for D and A are actually very similar as
>> I posted in the previous email (I listed it here for convenience)
>> 
>> Run-time performance overhead with A and D:
>> 
>> benchmarks		A / no	D /no
>> 
>> 500.perlbench_r	1.25%	1.25%
>> 502.gcc_r		0.68%	1.80%
>> 505.mcf_r		0.68%	0.14%
>> 520.omnetpp_r	4.83%	4.68%
>> 523.xalancbmk_r	0.18%	1.96%
>> 525.x264_r		1.55%	2.07%
>> 531.deepsjeng_	11.57%	11.85%
>> 541.leela_r		0.64%	0.80%
>> 557.xz_			 -0.41%	-0.41%
>> 
>> 507.cactuBSSN_r	0.44%	0.44%
>> 508.namd_r		0.34%	0.34%
>> 510.parest_r		0.17%	0.25%
>> 511.povray_r		56.57%	57.27%
>> 519.lbm_r		0.00%	0.00%
>> 521.wrf_r			 -0.28%	-0.37%
>> 526.blender_r		16.96%	17.71%
>> 527.cam4_r		0.70%	0.53%
>> 538.imagick_r		2.40%	2.40%
>> 544.nab_r		0.00%	-0.65%
>> 
>> avg				5.17%	5.37%
>> 
>> Especially for the SPEC FP benchmarks, I didn’t see too much
>> performance difference between A and D. 
>> I guess that the RTL optimizations might be enough to get rid of most
>> of the overhead introduced by the additional initialization. 
>> 
>>> 
>>>> ********stack usage data, I added -fstack-usage to the compilation
>> line when
>>>> compiling CPU2017 benchmarks. And all the *.su files were generated
>> for each
>>>> of the modules.
>>>> Since there a lot of such files, and the stack size information are
>> embedded
>>>> in each of the files.  I just picked up one benchmark 511.povray to
>>>> check. Which is the one that 
>>>> has the most runtime overhead when adding initialization (both A and
>> D). 
>>>> I identified all the *.su files that are different between A and D
>> and do a
>>>> diff on those *.su files, and looks like that the stack size is much
>> higher
>>>> with D than that with A, for example:
>>>> $ diff build_base_auto_init.D.0000/bbox.su
>>>> build_base_auto_init.A.0000/bbox.su5c5
>>>> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>>>> pov::BBOX_TREE**&, long int*, long int, long int) 160 static
>>>> ---
>>>>> bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>>>> pov::BBOX_TREE**&, long int*, long int, long int) 96 static
>>>> $ diff build_base_auto_init.D.0000/image.su
>>>> build_base_auto_init.A.0000/image.su
>>>> 9c9
>>>> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*,
>> double*) 624
>>>> static
>>>> ---
>>>>> image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*,
>> double*) 272
>>>> static
>>>> ….
>>>> Looks like that implementation D has more stack size impact than A. 
>>>> Do you have any insight on what the reason for this?
>>> 
>>> D will keep all initialized aggregates as aggregates and live which
>>> means stack will be allocated for it.  With A the usual optimizations
>>> to reduce stack usage can be applied.
>> 
>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>> has a lot stack increase 
>> due to implementation D, by examine the IR immediate before RTL
>> expansion phase.  
>> (image.cpp.244t.optimized), I found that we have the following
>> additional statements for the array elements:
>> 
>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>> * normal)
>> {
>> …
>> double p3[3];
>> double p2[3];
>> double p1[3];
>> float colour3[5];
>> float colour2[5];
>> float colour1[5];
>> …
>>  # DEBUG BEGIN_STMT
>> colour1 = .DEFERRED_INIT (colour1, 2);
>> colour2 = .DEFERRED_INIT (colour2, 2);
>> colour3 = .DEFERRED_INIT (colour3, 2);
>> # DEBUG BEGIN_STMT
>> MEM <double> [(double[3] *)&p1] = p1$0_144(D);
>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
>> p1 = .DEFERRED_INIT (p1, 2);
>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
>> # DEBUG p1$0 => D#12
>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
>> # DEBUG p1$1 => D#11
>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
>> # DEBUG p1$2 => D#10
>> MEM <double> [(double[3] *)&p2] = p2$0_109(D);
>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
>> p2 = .DEFERRED_INIT (p2, 2);
>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
>> # DEBUG p2$0 => D#9
>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
>> # DEBUG p2$1 => D#8
>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
>> # DEBUG p2$2 => D#7
>> MEM <double> [(double[3] *)&p3] = p3$0_256(D);
>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
>> p3 = .DEFERRED_INIT (p3, 2);
>> ….
>> }
>> 
>> I guess that the above “MEM <double>….. = …” are the ones that make the
>> differences. Which phase introduced them?
> 
> Looks like SRA. But you can just dump all and grep for the first occurrence. 

Yes, looks like that SRA is the one:

image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] = p1$0_195(D);
image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D);
image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);

Qing
> 
> 
>>> 
>>>> Let me know if you have any comments and suggestions.
>>> 
>>> First of all I would check whether the prototype implementations
>>> work as expected.
>> I have done such check with small testing cases already, checking the
>> IR generated with the implementation A or D, mainly
>> Focus on *.c.006t.gimple.  and *.c.*t.expand, all worked as expected. 
>> 
>> For the CPU2017, for example as the above, I also checked the IR for
>> both A and D, looks like all worked as expected.
>> 
>> Thanks. 
>> 
>> Qing
>>> 
>>> Richard.
>>> 
>>> 
>>>> thanks.
>>>> Qing
>>>>     On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de>
>>>>     wrote:
>>>> 
>>>>     On Tue, 12 Jan 2021, Qing Zhao wrote:
>>>> 
>>>>           Hi, 
>>>> 
>>>>           Just check in to see whether you have any comments
>>>>           and suggestions on this:
>>>> 
>>>>           FYI, I have been continue with Approach D
>>>>           implementation since last week:
>>>> 
>>>>           D. Adding  calls to .DEFFERED_INIT during
>>>>           gimplification, expand the .DEFFERED_INIT during
>>>>           expand to
>>>>           real initialization. Adjusting uninitialized pass
>>>>           with the new refs with “.DEFFERED_INIT”.
>>>> 
>>>>           For the remaining work of Approach D:
>>>> 
>>>>           ** complete the implementation of
>>>>           -ftrivial-auto-var-init=pattern;
>>>>           ** complete the implementation of uninitialized
>>>>           warnings maintenance work for D. 
>>>> 
>>>>           I have completed the uninitialized warnings
>>>>           maintenance work for D.
>>>>           And finished partial of the
>>>>           -ftrivial-auto-var-init=pattern implementation. 
>>>> 
>>>>           The following are remaining work of Approach D:
>>>> 
>>>>             ** -ftrivial-auto-var-init=pattern for VLA;
>>>>             **add a new attribute for variable:
>>>>           __attribute((uninitialized)
>>>>           the marked variable is uninitialized intentionaly
>>>>           for performance purpose.
>>>>             ** adding complete testing cases;
>>>> 
>>>>           Please let me know if you have any objection on my
>>>>           current decision on implementing approach D. 
>>>> 
>>>>     Did you do any analysis on how stack usage and code size are
>>>>     changed 
>>>>     with approach D?  How does compile-time behave (we could gobble
>>>>     up
>>>>     lots of .DEFERRED_INIT calls I guess)?
>>>> 
>>>>     Richard.
>>>> 
>>>>           Thanks a lot for your help.
>>>> 
>>>>           Qing
>>>> 
>>>>                 On Jan 5, 2021, at 1:05 PM, Qing Zhao
>>>>                 via Gcc-patches
>>>>                 <gcc-patches@gcc.gnu.org> wrote:
>>>> 
>>>>                 Hi,
>>>> 
>>>>                 This is an update for our previous
>>>>                 discussion. 
>>>> 
>>>>                 1. I implemented the following two
>>>>                 different implementations in the latest
>>>>                 upstream gcc:
>>>> 
>>>>                 A. Adding real initialization during
>>>>                 gimplification, not maintain the
>>>>                 uninitialized warnings.
>>>> 
>>>>                 D. Adding  calls to .DEFFERED_INIT
>>>>                 during gimplification, expand the
>>>>                 .DEFFERED_INIT during expand to
>>>>                 real initialization. Adjusting
>>>>                 uninitialized pass with the new refs
>>>>                 with “.DEFFERED_INIT”.
>>>> 
>>>>                 Note, in this initial implementation,
>>>>                 ** I ONLY implement
>>>>                 -ftrivial-auto-var-init=zero, the
>>>>                 implementation of
>>>>                 -ftrivial-auto-var-init=pattern 
>>>>                    is not done yet.  Therefore, the
>>>>                 performance data is only about
>>>>                 -ftrivial-auto-var-init=zero. 
>>>> 
>>>>                 ** I added an temporary  option
>>>>                 -fauto-var-init-approach=A|B|C|D  to
>>>>                 choose implementation A or D for 
>>>>                    runtime performance study.
>>>>                 ** I didn’t finish the uninitialized
>>>>                 warnings maintenance work for D. (That
>>>>                 might take more time than I expected). 
>>>> 
>>>>                 2. I collected runtime data for CPU2017
>>>>                 on a x86 machine with this new gcc for
>>>>                 the following 3 cases:
>>>> 
>>>>                 no: default. (-g -O2 -march=native )
>>>>                 A:  default +
>>>>                  -ftrivial-auto-var-init=zero
>>>>                 -fauto-var-init-approach=A 
>>>>                 D:  default +
>>>>                  -ftrivial-auto-var-init=zero
>>>>                 -fauto-var-init-approach=D 
>>>> 
>>>>                 And then compute the slowdown data for
>>>>                 both A and D as following:
>>>> 
>>>>                 benchmarks A / no D /no
>>>> 
>>>>                 500.perlbench_r 1.25% 1.25%
>>>>                 502.gcc_r 0.68% 1.80%
>>>>                 505.mcf_r 0.68% 0.14%
>>>>                 520.omnetpp_r 4.83% 4.68%
>>>>                 523.xalancbmk_r 0.18% 1.96%
>>>>                 525.x264_r 1.55% 2.07%
>>>>                 531.deepsjeng_ 11.57% 11.85%
>>>>                 541.leela_r 0.64% 0.80%
>>>>                 557.xz_  -0.41% -0.41%
>>>> 
>>>>                 507.cactuBSSN_r 0.44% 0.44%
>>>>                 508.namd_r 0.34% 0.34%
>>>>                 510.parest_r 0.17% 0.25%
>>>>                 511.povray_r 56.57% 57.27%
>>>>                 519.lbm_r 0.00% 0.00%
>>>>                 521.wrf_r  -0.28% -0.37%
>>>>                 526.blender_r 16.96% 17.71%
>>>>                 527.cam4_r 0.70% 0.53%
>>>>                 538.imagick_r 2.40% 2.40%
>>>>                 544.nab_r 0.00% -0.65%
>>>> 
>>>>                 avg 5.17% 5.37%
>>>> 
>>>>                 From the above data, we can see that in
>>>>                 general, the runtime performance
>>>>                 slowdown for 
>>>>                 implementation A and D are similar for
>>>>                 individual benchmarks.
>>>> 
>>>>                 There are several benchmarks that have
>>>>                 significant slowdown with the new added
>>>>                 initialization for both
>>>>                 A and D, for example, 511.povray_r,
>>>>                 526.blender_, and 531.deepsjeng_r, I
>>>>                 will try to study a little bit
>>>>                 more on what kind of new initializations
>>>>                 introduced such slowdown. 
>>>> 
>>>>                 From the current study so far, I think
>>>>                 that approach D should be good enough
>>>>                 for our final implementation. 
>>>>                 So, I will try to finish approach D with
>>>>                 the following remaining work
>>>> 
>>>>                     ** complete the implementation of
>>>>                 -ftrivial-auto-var-init=pattern;
>>>>                     ** complete the implementation of
>>>>                 uninitialized warnings maintenance work
>>>>                 for D. 
>>>> 
>>>>                 Let me know if you have any comments and
>>>>                 suggestions on my current and future
>>>>                 work.
>>>> 
>>>>                 Thanks a lot for your help.
>>>> 
>>>>                 Qing
>>>> 
>>>>                       On Dec 9, 2020, at 10:18 AM,
>>>>                       Qing Zhao via Gcc-patches
>>>>                       <gcc-patches@gcc.gnu.org>
>>>>                       wrote:
>>>> 
>>>>                       The following are the
>>>>                       approaches I will implement
>>>>                       and compare:
>>>> 
>>>>                       Our final goal is to keep
>>>>                       the uninitialized warning
>>>>                       and minimize the run-time
>>>>                       performance cost.
>>>> 
>>>>                       A. Adding real
>>>>                       initialization during
>>>>                       gimplification, not maintain
>>>>                       the uninitialized warnings.
>>>>                       B. Adding real
>>>>                       initialization during
>>>>                       gimplification, marking them
>>>>                       with “artificial_init”. 
>>>>                         Adjusting uninitialized
>>>>                       pass, maintaining the
>>>>                       annotation, making sure the
>>>>                       real init not
>>>>                         Deleted from the fake
>>>>                       init. 
>>>>                       C.  Marking the DECL for an
>>>>                       uninitialized auto variable
>>>>                       as “no_explicit_init” during
>>>>                       gimplification,
>>>>                          maintain this
>>>>                       “no_explicit_init” bit till
>>>>                       after
>>>>                       pass_late_warn_uninitialized,
>>>>                       or till pass_expand, 
>>>>                          add real initialization
>>>>                       for all DECLs that are
>>>>                       marked with
>>>>                       “no_explicit_init”.
>>>>                       D. Adding .DEFFERED_INIT
>>>>                       during gimplification,
>>>>                       expand the .DEFFERED_INIT
>>>>                       during expand to
>>>>                         real initialization.
>>>>                       Adjusting uninitialized pass
>>>>                       with the new refs with
>>>>                       “.DEFFERED_INIT”.
>>>> 
>>>>                       In the above, approach A
>>>>                       will be the one that have
>>>>                       the minimum run-time cost,
>>>>                       will be the base for the
>>>>                       performance
>>>>                       comparison. 
>>>> 
>>>>                       I will implement approach D
>>>>                       then, this one is expected
>>>>                       to have the most run-time
>>>>                       overhead among the above
>>>>                       list, but
>>>>                       Implementation should be the
>>>>                       cleanest among B, C, D.
>>>>                       Let’s see how much more
>>>>                       performance overhead this
>>>>                       approach
>>>>                       will be. If the data is
>>>>                       good, maybe we can avoid the
>>>>                       effort to implement B, and
>>>>                       C. 
>>>> 
>>>>                       If the performance of D is
>>>>                       not good, I will implement B
>>>>                       or C at that time.
>>>> 
>>>>                       Let me know if you have any
>>>>                       comment or suggestions.
>>>> 
>>>>                       Thanks.
>>>> 
>>>>                       Qing
>>>> 
>>>>     -- 
>>>>     Richard Biener <rguenther@suse.de>
>>>>     SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
>>>>     Nuernberg,
>>>>     Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-15 17:57                                           ` Qing Zhao
@ 2021-01-18 13:09                                             ` Richard Sandiford
  2021-01-18 16:12                                               ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Sandiford @ 2021-01-18 13:09 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Biener, Richard Biener via Gcc-patches

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> D will keep all initialized aggregates as aggregates and live which
>>>> means stack will be allocated for it.  With A the usual optimizations
>>>> to reduce stack usage can be applied.
>>> 
>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>>> has a lot stack increase 
>>> due to implementation D, by examine the IR immediate before RTL
>>> expansion phase.  
>>> (image.cpp.244t.optimized), I found that we have the following
>>> additional statements for the array elements:
>>> 
>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>>> * normal)
>>> {
>>> …
>>> double p3[3];
>>> double p2[3];
>>> double p1[3];
>>> float colour3[5];
>>> float colour2[5];
>>> float colour1[5];
>>> …
>>>  # DEBUG BEGIN_STMT
>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>> # DEBUG BEGIN_STMT
>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D);
>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
>>> p1 = .DEFERRED_INIT (p1, 2);
>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
>>> # DEBUG p1$0 => D#12
>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
>>> # DEBUG p1$1 => D#11
>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
>>> # DEBUG p1$2 => D#10
>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D);
>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
>>> p2 = .DEFERRED_INIT (p2, 2);
>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
>>> # DEBUG p2$0 => D#9
>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
>>> # DEBUG p2$1 => D#8
>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
>>> # DEBUG p2$2 => D#7
>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D);
>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
>>> p3 = .DEFERRED_INIT (p3, 2);
>>> ….
>>> }
>>> 
>>> I guess that the above “MEM <double>….. = …” are the ones that make the
>>> differences. Which phase introduced them?
>> 
>> Looks like SRA. But you can just dump all and grep for the first occurrence. 
>
> Yes, looks like that SRA is the one:
>
> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] = p1$0_195(D);
> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D);
> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);

I realise no-one was suggesting otherwise, but FWIW: SRA could easily
be extended to handle .DEFERRED_INIT if that's the main source of
excess stack usage.  A single .DEFERRED_INIT of an aggregate can
be split into .DEFERRED_INITs of individual components.

In other words, the investigation you're doing looks like the right way
of deciding which passes are worth extending to handle .DEFERRED_INIT.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-18 13:09                                             ` Richard Sandiford
@ 2021-01-18 16:12                                               ` Qing Zhao
  2021-02-01 19:12                                                 ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-01-18 16:12 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Richard Biener, Richard Biener via Gcc-patches



> On Jan 18, 2021, at 7:09 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> D will keep all initialized aggregates as aggregates and live which
>>>>> means stack will be allocated for it.  With A the usual optimizations
>>>>> to reduce stack usage can be applied.
>>>> 
>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>>>> has a lot stack increase 
>>>> due to implementation D, by examine the IR immediate before RTL
>>>> expansion phase.  
>>>> (image.cpp.244t.optimized), I found that we have the following
>>>> additional statements for the array elements:
>>>> 
>>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>>>> * normal)
>>>> {
>>>> …
>>>> double p3[3];
>>>> double p2[3];
>>>> double p1[3];
>>>> float colour3[5];
>>>> float colour2[5];
>>>> float colour1[5];
>>>> …
>>>> # DEBUG BEGIN_STMT
>>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>>> # DEBUG BEGIN_STMT
>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D);
>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
>>>> p1 = .DEFERRED_INIT (p1, 2);
>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
>>>> # DEBUG p1$0 => D#12
>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
>>>> # DEBUG p1$1 => D#11
>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
>>>> # DEBUG p1$2 => D#10
>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D);
>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
>>>> p2 = .DEFERRED_INIT (p2, 2);
>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
>>>> # DEBUG p2$0 => D#9
>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
>>>> # DEBUG p2$1 => D#8
>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
>>>> # DEBUG p2$2 => D#7
>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D);
>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
>>>> p3 = .DEFERRED_INIT (p3, 2);
>>>> ….
>>>> }
>>>> 
>>>> I guess that the above “MEM <double>….. = …” are the ones that make the
>>>> differences. Which phase introduced them?
>>> 
>>> Looks like SRA. But you can just dump all and grep for the first occurrence. 
>> 
>> Yes, looks like that SRA is the one:
>> 
>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] = p1$0_195(D);
>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D);
>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);
> 
> I realise no-one was suggesting otherwise, but FWIW: SRA could easily
> be extended to handle .DEFERRED_INIT if that's the main source of
> excess stack usage.  A single .DEFERRED_INIT of an aggregate can
> be split into .DEFERRED_INITs of individual components.

Thanks a lot for the suggestion,
I will study the code of SRA to see how to do this and then see whether this can resolve the issue.
> 
> In other words, the investigation you're doing looks like the right way
> of deciding which passes are worth extending to handle .DEFERRED_INIT.
Yes, with the study so far, looks like the major issue with the .DERERRED_INIT approach is the stack size increase.
Hopefully after resolving this issue, we will be done.

Qing

> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-01-18 16:12                                               ` Qing Zhao
@ 2021-02-01 19:12                                                 ` Qing Zhao
  2021-02-02  7:43                                                   ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-02-01 19:12 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Richard Biener via Gcc-patches, Richard Biener

Hi, Richard,

I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.

And now the routine “bump_map” in 511.povray is like following:
...

 # DEBUG BEGIN_STMT
  xcoor = 0.0;
  ycoor = 0.0;
  # DEBUG BEGIN_STMT
  index = .DEFERRED_INIT (index, 2);
  index2 = .DEFERRED_INIT (index2, 2);
  index3 = .DEFERRED_INIT (index3, 2);
  # DEBUG BEGIN_STMT
  colour1 = .DEFERRED_INIT (colour1, 2);
  colour2 = .DEFERRED_INIT (colour2, 2);
  colour3 = .DEFERRED_INIT (colour3, 2);
  # DEBUG BEGIN_STMT
  p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
  # DEBUG p1$0 => p1$0_181
  p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
  # DEBUG p1$1 => p1$1_184
  p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
  # DEBUG p1$2 => p1$2_172
  p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
  # DEBUG p2$0 => p2$0_177
  p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
  # DEBUG p2$1 => p2$1_135
  p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
  # DEBUG p2$2 => p2$2_137
  p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
  # DEBUG p3$0 => p3$0_377
  p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
  # DEBUG p3$1 => p3$1_379
  p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
  # DEBUG p3$2 => p3$2_381


In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. 

With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are:

  Approach A	Approach D-old	Approach D-new

	272			624			368

From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. 

However, looks like that the stack size for D is still bigger than A. 

I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D):

(Due to the call to:

  colour1 = .DEFERRED_INIT (colour1, 2);
)

******Approach A:

Points_to analysis:

Constraints:
…
colour1 = &NULL
…
colour1 = &NONLOCAL
colour1 = &NONLOCAL
colour1 = &NONLOCAL
colour1 = &NONLOCAL
colour1 = &NONLOCAL
...
callarg(53) = &colour1
...
_53 = colour1

Points_to sets:
…
colour1 = { NULL ESCAPED NONLOCAL } same as _53
...
CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48)
...
callarg(53) = { NULL ESCAPED NONLOCAL colour1 }

******Apprach D:

Points_to analysis:

Constraints:
…
callarg(19) = colour1
callarg(19) = &NONLOCAL
colour1 = callarg(19) + UNKNOWN
colour1 = &NONLOCAL
…
colour1 = &NONLOCAL
colour1 = &NONLOCAL
colour1 = &NONLOCAL
colour1 = &NONLOCAL
colour1 = &NONLOCAL
…
callarg(74) = &colour1
callarg(74) = callarg(74) + UNKNOWN
callarg(74) = *callarg(74) + UNKNOWN
…
_53 = colour1
_54 = _53
_55 = _54 + UNKNOWN
_55 = &NONLOCAL
_56 = colour1
_57 = _56
_58 = _57 + UNKNOWN
_58 = &NONLOCAL
_59 = _55 + UNKNOWN
_59 = _58 + UNKNOWN
_60 = colour1
_61 = _60
_62 = _61 + UNKNOWN
_62 = &NONLOCAL
_63 = _59 + UNKNOWN
_63 = _62 + UNKNOWN
_64 = _63 + UNKNOWN
..
Points_to set:
…
colour1 = { ESCAPED NONLOCAL } same as callarg(19)
…
CALLUSED(69) = { ESCAPED NONLOCAL index colour1 }
CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69)
callarg(71) = { ESCAPED NONLOCAL }
callarg(72) = { ESCAPED NONLOCAL }
callarg(73) = { ESCAPED NONLOCAL }
callarg(74) = { ESCAPED NONLOCAL colour1 }

My question:

Is it possible to adjust alias analysis to resolve this issue?

thanks.

Qing

> On Jan 18, 2021, at 10:12 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
>>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>>>>> has a lot stack increase 
>>>>> due to implementation D, by examine the IR immediate before RTL
>>>>> expansion phase.  
>>>>> (image.cpp.244t.optimized), I found that we have the following
>>>>> additional statements for the array elements:
>>>>> 
>>>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>>>>> * normal)
>>>>> {
>>>>> …
>>>>> double p3[3];
>>>>> double p2[3];
>>>>> double p1[3];
>>>>> float colour3[5];
>>>>> float colour2[5];
>>>>> float colour1[5];
>>>>> …
>>>>> # DEBUG BEGIN_STMT
>>>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>>>> # DEBUG BEGIN_STMT
>>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D);
>>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
>>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
>>>>> p1 = .DEFERRED_INIT (p1, 2);
>>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
>>>>> # DEBUG p1$0 => D#12
>>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
>>>>> # DEBUG p1$1 => D#11
>>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
>>>>> # DEBUG p1$2 => D#10
>>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D);
>>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
>>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
>>>>> p2 = .DEFERRED_INIT (p2, 2);
>>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
>>>>> # DEBUG p2$0 => D#9
>>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
>>>>> # DEBUG p2$1 => D#8
>>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
>>>>> # DEBUG p2$2 => D#7
>>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D);
>>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
>>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
>>>>> p3 = .DEFERRED_INIT (p3, 2);
>>>>> ….
>>>>> }
>>>>> 
>>>>> I guess that the above “MEM <double>….. = …” are the ones that make the
>>>>> differences. Which phase introduced them?
>>>> 
>>>> Looks like SRA. But you can just dump all and grep for the first occurrence. 
>>> 
>>> Yes, looks like that SRA is the one:
>>> 
>>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] = p1$0_195(D);
>>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D);
>>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);
>> 
>> I realise no-one was suggesting otherwise, but FWIW: SRA could easily
>> be extended to handle .DEFERRED_INIT if that's the main source of
>> excess stack usage.  A single .DEFERRED_INIT of an aggregate can
>> be split into .DEFERRED_INITs of individual components.
> 
> Thanks a lot for the suggestion,
> I will study the code of SRA to see how to do this and then see whether this can resolve the issue.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-02-01 19:12                                                 ` Qing Zhao
@ 2021-02-02  7:43                                                   ` Richard Biener
  2021-02-02 15:17                                                     ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2021-02-02  7:43 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

On Mon, 1 Feb 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.
> 
> And now the routine “bump_map” in 511.povray is like following:
> ...
> 
>  # DEBUG BEGIN_STMT
>   xcoor = 0.0;
>   ycoor = 0.0;
>   # DEBUG BEGIN_STMT
>   index = .DEFERRED_INIT (index, 2);
>   index2 = .DEFERRED_INIT (index2, 2);
>   index3 = .DEFERRED_INIT (index3, 2);
>   # DEBUG BEGIN_STMT
>   colour1 = .DEFERRED_INIT (colour1, 2);
>   colour2 = .DEFERRED_INIT (colour2, 2);
>   colour3 = .DEFERRED_INIT (colour3, 2);
>   # DEBUG BEGIN_STMT
>   p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
>   # DEBUG p1$0 => p1$0_181
>   p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
>   # DEBUG p1$1 => p1$1_184
>   p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
>   # DEBUG p1$2 => p1$2_172
>   p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
>   # DEBUG p2$0 => p2$0_177
>   p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
>   # DEBUG p2$1 => p2$1_135
>   p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
>   # DEBUG p2$2 => p2$2_137
>   p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
>   # DEBUG p3$0 => p3$0_377
>   p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
>   # DEBUG p3$1 => p3$1_379
>   p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
>   # DEBUG p3$2 => p3$2_381
> 
> 
> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. 
> 
> With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are:
> 
>   Approach A	Approach D-old	Approach D-new
> 
> 	272			624			368
> 
> From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. 
> 
> However, looks like that the stack size for D is still bigger than A. 
> 
> I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D):
> 
> (Due to the call to:
> 
>   colour1 = .DEFERRED_INIT (colour1, 2);
> )
> 
> ******Approach A:
> 
> Points_to analysis:
> 
> Constraints:
> …
> colour1 = &NULL
> …
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> ...
> callarg(53) = &colour1
> ...
> _53 = colour1
> 
> Points_to sets:
> …
> colour1 = { NULL ESCAPED NONLOCAL } same as _53
> ...
> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48)
> ...
> callarg(53) = { NULL ESCAPED NONLOCAL colour1 }
> 
> ******Apprach D:
> 
> Points_to analysis:
> 
> Constraints:
> …
> callarg(19) = colour1
> callarg(19) = &NONLOCAL
> colour1 = callarg(19) + UNKNOWN
> colour1 = &NONLOCAL
> …
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> colour1 = &NONLOCAL
> …
> callarg(74) = &colour1
> callarg(74) = callarg(74) + UNKNOWN
> callarg(74) = *callarg(74) + UNKNOWN
> …
> _53 = colour1
> _54 = _53
> _55 = _54 + UNKNOWN
> _55 = &NONLOCAL
> _56 = colour1
> _57 = _56
> _58 = _57 + UNKNOWN
> _58 = &NONLOCAL
> _59 = _55 + UNKNOWN
> _59 = _58 + UNKNOWN
> _60 = colour1
> _61 = _60
> _62 = _61 + UNKNOWN
> _62 = &NONLOCAL
> _63 = _59 + UNKNOWN
> _63 = _62 + UNKNOWN
> _64 = _63 + UNKNOWN
> ..
> Points_to set:
> …
> colour1 = { ESCAPED NONLOCAL } same as callarg(19)
> …
> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 }
> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69)
> callarg(71) = { ESCAPED NONLOCAL }
> callarg(72) = { ESCAPED NONLOCAL }
> callarg(73) = { ESCAPED NONLOCAL }
> callarg(74) = { ESCAPED NONLOCAL colour1 }
> 
> My question:
> 
> Is it possible to adjust alias analysis to resolve this issue?

You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c
find_func_aliases_for_call (it's not a builtin but you can look in
the respective subroutine for examples).  Specifically you want to
avoid making anything escaped or clobbered.

> thanks.
> 
> Qing
> 
> > On Jan 18, 2021, at 10:12 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > 
> >>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
> >>>>> has a lot stack increase 
> >>>>> due to implementation D, by examine the IR immediate before RTL
> >>>>> expansion phase.  
> >>>>> (image.cpp.244t.optimized), I found that we have the following
> >>>>> additional statements for the array elements:
> >>>>> 
> >>>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
> >>>>> * normal)
> >>>>> {
> >>>>> …
> >>>>> double p3[3];
> >>>>> double p2[3];
> >>>>> double p1[3];
> >>>>> float colour3[5];
> >>>>> float colour2[5];
> >>>>> float colour1[5];
> >>>>> …
> >>>>> # DEBUG BEGIN_STMT
> >>>>> colour1 = .DEFERRED_INIT (colour1, 2);
> >>>>> colour2 = .DEFERRED_INIT (colour2, 2);
> >>>>> colour3 = .DEFERRED_INIT (colour3, 2);
> >>>>> # DEBUG BEGIN_STMT
> >>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D);
> >>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
> >>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
> >>>>> p1 = .DEFERRED_INIT (p1, 2);
> >>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
> >>>>> # DEBUG p1$0 => D#12
> >>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
> >>>>> # DEBUG p1$1 => D#11
> >>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
> >>>>> # DEBUG p1$2 => D#10
> >>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D);
> >>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
> >>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
> >>>>> p2 = .DEFERRED_INIT (p2, 2);
> >>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
> >>>>> # DEBUG p2$0 => D#9
> >>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
> >>>>> # DEBUG p2$1 => D#8
> >>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
> >>>>> # DEBUG p2$2 => D#7
> >>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D);
> >>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
> >>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
> >>>>> p3 = .DEFERRED_INIT (p3, 2);
> >>>>> ….
> >>>>> }
> >>>>> 
> >>>>> I guess that the above “MEM <double>….. = …” are the ones that make the
> >>>>> differences. Which phase introduced them?
> >>>> 
> >>>> Looks like SRA. But you can just dump all and grep for the first occurrence. 
> >>> 
> >>> Yes, looks like that SRA is the one:
> >>> 
> >>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] = p1$0_195(D);
> >>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D);
> >>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);
> >> 
> >> I realise no-one was suggesting otherwise, but FWIW: SRA could easily
> >> be extended to handle .DEFERRED_INIT if that's the main source of
> >> excess stack usage.  A single .DEFERRED_INIT of an aggregate can
> >> be split into .DEFERRED_INITs of individual components.
> > 
> > Thanks a lot for the suggestion,
> > I will study the code of SRA to see how to do this and then see whether this can resolve the issue.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-02-02  7:43                                                   ` Richard Biener
@ 2021-02-02 15:17                                                     ` Qing Zhao
  2021-02-02 23:32                                                       ` Qing Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Qing Zhao @ 2021-02-02 15:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches



> On Feb 2, 2021, at 1:43 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> On Mon, 1 Feb 2021, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.
>> 
>> And now the routine “bump_map” in 511.povray is like following:
>> ...
>> 
>> # DEBUG BEGIN_STMT
>>  xcoor = 0.0;
>>  ycoor = 0.0;
>>  # DEBUG BEGIN_STMT
>>  index = .DEFERRED_INIT (index, 2);
>>  index2 = .DEFERRED_INIT (index2, 2);
>>  index3 = .DEFERRED_INIT (index3, 2);
>>  # DEBUG BEGIN_STMT
>>  colour1 = .DEFERRED_INIT (colour1, 2);
>>  colour2 = .DEFERRED_INIT (colour2, 2);
>>  colour3 = .DEFERRED_INIT (colour3, 2);
>>  # DEBUG BEGIN_STMT
>>  p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
>>  # DEBUG p1$0 => p1$0_181
>>  p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
>>  # DEBUG p1$1 => p1$1_184
>>  p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
>>  # DEBUG p1$2 => p1$2_172
>>  p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
>>  # DEBUG p2$0 => p2$0_177
>>  p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
>>  # DEBUG p2$1 => p2$1_135
>>  p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
>>  # DEBUG p2$2 => p2$2_137
>>  p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
>>  # DEBUG p3$0 => p3$0_377
>>  p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
>>  # DEBUG p3$1 => p3$1_379
>>  p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
>>  # DEBUG p3$2 => p3$2_381
>> 
>> 
>> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. 
>> 
>> With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are:
>> 
>>  Approach A	Approach D-old	Approach D-new
>> 
>> 	272			624			368
>> 
>> From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. 
>> 
>> However, looks like that the stack size for D is still bigger than A. 
>> 
>> I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D):
>> 
>> (Due to the call to:
>> 
>>  colour1 = .DEFERRED_INIT (colour1, 2);
>> )
>> 
>> ******Approach A:
>> 
>> Points_to analysis:
>> 
>> Constraints:
>> …
>> colour1 = &NULL
>> …
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> ...
>> callarg(53) = &colour1
>> ...
>> _53 = colour1
>> 
>> Points_to sets:
>> …
>> colour1 = { NULL ESCAPED NONLOCAL } same as _53
>> ...
>> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
>> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48)
>> ...
>> callarg(53) = { NULL ESCAPED NONLOCAL colour1 }
>> 
>> ******Apprach D:
>> 
>> Points_to analysis:
>> 
>> Constraints:
>> …
>> callarg(19) = colour1
>> callarg(19) = &NONLOCAL
>> colour1 = callarg(19) + UNKNOWN
>> colour1 = &NONLOCAL
>> …
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> …
>> callarg(74) = &colour1
>> callarg(74) = callarg(74) + UNKNOWN
>> callarg(74) = *callarg(74) + UNKNOWN
>> …
>> _53 = colour1
>> _54 = _53
>> _55 = _54 + UNKNOWN
>> _55 = &NONLOCAL
>> _56 = colour1
>> _57 = _56
>> _58 = _57 + UNKNOWN
>> _58 = &NONLOCAL
>> _59 = _55 + UNKNOWN
>> _59 = _58 + UNKNOWN
>> _60 = colour1
>> _61 = _60
>> _62 = _61 + UNKNOWN
>> _62 = &NONLOCAL
>> _63 = _59 + UNKNOWN
>> _63 = _62 + UNKNOWN
>> _64 = _63 + UNKNOWN
>> ..
>> Points_to set:
>> …
>> colour1 = { ESCAPED NONLOCAL } same as callarg(19)
>> …
>> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 }
>> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69)
>> callarg(71) = { ESCAPED NONLOCAL }
>> callarg(72) = { ESCAPED NONLOCAL }
>> callarg(73) = { ESCAPED NONLOCAL }
>> callarg(74) = { ESCAPED NONLOCAL colour1 }
>> 
>> My question:
>> 
>> Is it possible to adjust alias analysis to resolve this issue?
> 
> You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c
> find_func_aliases_for_call (it's not a builtin but you can look in
> the respective subroutine for examples).  Specifically you want to
> avoid making anything escaped or clobbered.

Okay, thanks.

Will check on that.

Qing
>> 
> 
> -- 
> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
  2021-02-02 15:17                                                     ` Qing Zhao
@ 2021-02-02 23:32                                                       ` Qing Zhao
  0 siblings, 0 replies; 56+ messages in thread
From: Qing Zhao @ 2021-02-02 23:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches

Hi,

With the following patch:

[qinzhao@localhost gcc]$ git diff tree-ssa-structalias.c
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index cf653be..bd18841 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4851,6 +4851,30 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
   return false;
 }
 
+static void
+find_func_aliases_for_deferred_init (gcall *t)
+{
+  
+  tree lhsop = gimple_call_lhs (t);
+  enum auto_init_type init_type
+    = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (t, 1));
+  auto_vec<ce_s, 2> lhsc;
+  auto_vec<ce_s, 4> rhsc;
+  struct constraint_expr temp;
+ 
+  get_constraint_for (lhsop, &lhsc);
+  if (init_type == AUTO_INIT_ZERO && flag_delete_null_pointer_checks)
+    temp.var = nothing_id;
+  else
+    temp.var = nonlocal_id;
+  temp.type = ADDRESSOF;
+  temp.offset = 0;
+  rhsc.safe_push (temp);
+
+  process_all_all_constraints (lhsc, rhsc);
+  return;
+}
+
 /* Create constraints for the call T.  */
 
 static void
@@ -4864,6 +4888,12 @@ find_func_aliases_for_call (struct function *fn, gcall *t)
       && find_func_aliases_for_builtin_call (fn, t))
     return;
 
+  if (gimple_call_internal_p (t, IFN_DEFERRED_INIT))
+    {
+      find_func_aliases_for_deferred_init (t);
+      return;
+    }
+

The *.ealias dump for the routine “bump_map” are exactly the same for approach A and D. 
However, the stack size for D still bigger than A. 

Any suggestions?

Qing


On Feb 2, 2021, at 9:17 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Feb 2, 2021, at 1:43 AM, Richard Biener <rguenther@suse.de> wrote:
>> 
>> On Mon, 1 Feb 2021, Qing Zhao wrote:
>> 
>>> Hi, Richard,
>>> 
>>> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.
>>> 
>>> And now the routine “bump_map” in 511.povray is like following:
>>> ...
>>> 
>>> # DEBUG BEGIN_STMT
>>> xcoor = 0.0;
>>> ycoor = 0.0;
>>> # DEBUG BEGIN_STMT
>>> index = .DEFERRED_INIT (index, 2);
>>> index2 = .DEFERRED_INIT (index2, 2);
>>> index3 = .DEFERRED_INIT (index3, 2);
>>> # DEBUG BEGIN_STMT
>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>> # DEBUG BEGIN_STMT
>>> p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
>>> # DEBUG p1$0 => p1$0_181
>>> p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
>>> # DEBUG p1$1 => p1$1_184
>>> p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
>>> # DEBUG p1$2 => p1$2_172
>>> p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
>>> # DEBUG p2$0 => p2$0_177
>>> p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
>>> # DEBUG p2$1 => p2$1_135
>>> p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
>>> # DEBUG p2$2 => p2$2_137
>>> p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
>>> # DEBUG p3$0 => p3$0_377
>>> p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
>>> # DEBUG p3$1 => p3$1_379
>>> p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
>>> # DEBUG p3$2 => p3$2_381
>>> 
>>> 
>>> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. 
>>> 
>>> With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are:
>>> 
>>> Approach A	Approach D-old	Approach D-new
>>> 
>>> 	272			624			368
>>> 
>>> From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. 
>>> 
>>> However, looks like that the stack size for D is still bigger than A. 
>>> 
>>> I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D):
>>> 
>>> (Due to the call to:
>>> 
>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>> )
>>> 
>>> ******Approach A:
>>> 
>>> Points_to analysis:
>>> 
>>> Constraints:
>>> …
>>> colour1 = &NULL
>>> …
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> ...
>>> callarg(53) = &colour1
>>> ...
>>> _53 = colour1
>>> 
>>> Points_to sets:
>>> …
>>> colour1 = { NULL ESCAPED NONLOCAL } same as _53
>>> ...
>>> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
>>> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48)
>>> ...
>>> callarg(53) = { NULL ESCAPED NONLOCAL colour1 }
>>> 
>>> ******Apprach D:
>>> 
>>> Points_to analysis:
>>> 
>>> Constraints:
>>> …
>>> callarg(19) = colour1
>>> callarg(19) = &NONLOCAL
>>> colour1 = callarg(19) + UNKNOWN
>>> colour1 = &NONLOCAL
>>> …
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> …
>>> callarg(74) = &colour1
>>> callarg(74) = callarg(74) + UNKNOWN
>>> callarg(74) = *callarg(74) + UNKNOWN
>>> …
>>> _53 = colour1
>>> _54 = _53
>>> _55 = _54 + UNKNOWN
>>> _55 = &NONLOCAL
>>> _56 = colour1
>>> _57 = _56
>>> _58 = _57 + UNKNOWN
>>> _58 = &NONLOCAL
>>> _59 = _55 + UNKNOWN
>>> _59 = _58 + UNKNOWN
>>> _60 = colour1
>>> _61 = _60
>>> _62 = _61 + UNKNOWN
>>> _62 = &NONLOCAL
>>> _63 = _59 + UNKNOWN
>>> _63 = _62 + UNKNOWN
>>> _64 = _63 + UNKNOWN
>>> ..
>>> Points_to set:
>>> …
>>> colour1 = { ESCAPED NONLOCAL } same as callarg(19)
>>> …
>>> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 }
>>> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69)
>>> callarg(71) = { ESCAPED NONLOCAL }
>>> callarg(72) = { ESCAPED NONLOCAL }
>>> callarg(73) = { ESCAPED NONLOCAL }
>>> callarg(74) = { ESCAPED NONLOCAL colour1 }
>>> 
>>> My question:
>>> 
>>> Is it possible to adjust alias analysis to resolve this issue?
>> 
>> You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c
>> find_func_aliases_for_call (it's not a builtin but you can look in
>> the respective subroutine for examples).  Specifically you want to
>> avoid making anything escaped or clobbered.
> 
> Okay, thanks.
> 
> Will check on that.
> 
> Qing
>>> 
>> 
>> -- 
>> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de> <mailto:rguenther@suse.de <mailto:rguenther@suse.de>>>
>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2021-02-02 23:32 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-23 23:05 How to traverse all the local variables that declared in the current routine? Qing Zhao
2020-11-24  7:32 ` Richard Biener
2020-11-24 15:47   ` Qing Zhao
2020-11-24 15:55     ` Richard Biener
2020-11-24 16:54       ` Qing Zhao
2020-11-25  9:11         ` Richard Biener
2020-11-25 17:41           ` Qing Zhao
2020-12-01 19:47           ` Qing Zhao
2020-12-02  8:45             ` Richard Biener
2020-12-02 15:36               ` Qing Zhao
2020-12-03  8:45                 ` Richard Biener
2020-12-03 16:07                   ` Qing Zhao
2020-12-03 16:36                     ` Richard Biener
2020-12-03 16:40                       ` Qing Zhao
2020-12-03 16:56                       ` Richard Sandiford
2020-11-26  0:08         ` Martin Sebor
2020-11-30 16:23           ` Qing Zhao
2020-11-30 17:18             ` Martin Sebor
2020-11-30 23:05               ` Qing Zhao
2020-12-03 17:32       ` Richard Sandiford
2020-12-03 23:04         ` Qing Zhao
2020-12-04  8:50         ` Richard Biener
2020-12-04 16:19           ` Qing Zhao
2020-12-07  7:12             ` Richard Biener
2020-12-07 16:20               ` Qing Zhao
2020-12-07 17:10                 ` Richard Sandiford
2020-12-07 17:36                   ` Qing Zhao
2020-12-07 18:05                     ` Richard Sandiford
2020-12-07 18:34                       ` Qing Zhao
2020-12-08  7:35                         ` Richard Biener
2020-12-08  7:40                 ` Richard Biener
2020-12-08 19:54                   ` Qing Zhao
2020-12-09  8:23                     ` Richard Biener
2020-12-09 15:04                       ` Qing Zhao
2020-12-09 15:12                         ` Richard Biener
2020-12-09 16:18                           ` Qing Zhao
2021-01-05 19:05                             ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao
2021-01-05 19:10                               ` Qing Zhao
2021-01-12 20:34                               ` Qing Zhao
2021-01-13  7:39                                 ` Richard Biener
2021-01-13 15:06                                   ` Qing Zhao
2021-01-13 15:10                                     ` Richard Biener
2021-01-13 15:35                                       ` Qing Zhao
2021-01-13 15:40                                         ` Richard Biener
2021-01-14 21:16                                   ` Qing Zhao
2021-01-15  8:11                                     ` Richard Biener
2021-01-15 16:16                                       ` Qing Zhao
2021-01-15 17:22                                         ` Richard Biener
2021-01-15 17:57                                           ` Qing Zhao
2021-01-18 13:09                                             ` Richard Sandiford
2021-01-18 16:12                                               ` Qing Zhao
2021-02-01 19:12                                                 ` Qing Zhao
2021-02-02  7:43                                                   ` Richard Biener
2021-02-02 15:17                                                     ` Qing Zhao
2021-02-02 23:32                                                       ` Qing Zhao
2020-12-07 17:21           ` How to traverse all the local variables that declared in the current routine? Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).