* How to traverse all the local variables that declared in the current routine? @ 2020-11-23 23:05 Qing Zhao 2020-11-24 7:32 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-11-23 23:05 UTC (permalink / raw) To: Richard Sandiford; +Cc: gcc Patches Hi, Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? If not, what’s the best way to traverse the local variables? Thanks. Qing ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-23 23:05 How to traverse all the local variables that declared in the current routine? Qing Zhao @ 2020-11-24 7:32 ` Richard Biener 2020-11-24 15:47 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2020-11-24 7:32 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi, > > Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? > > If not, what’s the best way to traverse the local variables? Depends on what for. There's the source level view you get by walking BLOCK_VARS of the scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and there's SSA names (FOR_EACH_SSA_NAME). Richard. > > Thanks. > > Qing ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-24 7:32 ` Richard Biener @ 2020-11-24 15:47 ` Qing Zhao 2020-11-24 15:55 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-11-24 15:47 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, gcc Patches > On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> Hi, >> >> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? >> >> If not, what’s the best way to traverse the local variables? > > Depends on what for. There's the source level view you get by walking > BLOCK_VARS of the > scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and > there's SSA names > (FOR_EACH_SSA_NAME). I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are not explicitly initialized in the declaration, the basic idea is following: ** The proposal: A. add a new GCC option: (same name and meaning as CLANG) -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; B. add a new attribute for variable: __attribute((uninitialized) the marked variable is uninitialized intentionaly for performance purpose. C. The implementation needs to keep the current static warning on uninitialized variables untouched in order to avoid "forking the language". ** The implementation: There are two major requirements for the implementation: 1. all auto-variables that do not have an explicit initializer should be initialized to zero by this option. (Same behavior as CLANG) 2. keep the current static warning on uninitialized variables untouched. In order to satisfy 1, we should check whether an auto-variable has initializer or not; In order to satisfy 2, we should add this new transformation after "pass_late_warn_uninitialized". So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, If Not, then insert an initialization for it. For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ unsigned decl_is_initialized :1; /* IN VAR_DECL, set when the decl is initialized at the declaration. */ #define DECL_IS_INITIALIZED(NODE) \ (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) set this bit when setting DECL_INITIAL for the variables in FE. then keep it even though DECL_INITIAL might be NULLed. Do you have any comment and suggestions? Thanks a lot for the help. Qing > Richard. > >> >> Thanks. >> >> Qing ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-24 15:47 ` Qing Zhao @ 2020-11-24 15:55 ` Richard Biener 2020-11-24 16:54 ` Qing Zhao 2020-12-03 17:32 ` Richard Sandiford 0 siblings, 2 replies; 56+ messages in thread From: Richard Biener @ 2020-11-24 15:55 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > > On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> > >> Hi, > >> > >> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? > >> > >> If not, what’s the best way to traverse the local variables? > > > > Depends on what for. There's the source level view you get by walking > > BLOCK_VARS of the > > scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and > > there's SSA names > > (FOR_EACH_SSA_NAME). > > I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are > not explicitly initialized in the declaration, the basic idea is following: > > ** The proposal: > > A. add a new GCC option: (same name and meaning as CLANG) > -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; > > B. add a new attribute for variable: > __attribute((uninitialized) > the marked variable is uninitialized intentionaly for performance purpose. > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language". > > > ** The implementation: > > There are two major requirements for the implementation: > > 1. all auto-variables that do not have an explicit initializer should be initialized to > zero by this option. (Same behavior as CLANG) > > 2. keep the current static warning on uninitialized variables untouched. > > In order to satisfy 1, we should check whether an auto-variable has initializer > or not; > In order to satisfy 2, we should add this new transformation after > "pass_late_warn_uninitialized". > > So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, > If Not, then insert an initialization for it. > > For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? Yes, but do you want to catch variables promoted to register as well or just variables on the stack? > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. For locals it would be more reliable to set this flag during gimplification. > Do you have any comment and suggestions? As said above - do you want to cover registers as well as locals? I'd do the actual zeroing during RTL expansion instead since otherwise you have to figure youself whether a local is actually used (see expand_stack_vars) Note that optimization will already made have use of "uninitialized" state of locals so depending on what the actual goal is here "late" may be too late. Richard. > > Thanks a lot for the help. > > Qing > > > Richard. > > > >> > >> Thanks. > >> > >> Qing > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-24 15:55 ` Richard Biener @ 2020-11-24 16:54 ` Qing Zhao 2020-11-25 9:11 ` Richard Biener 2020-11-26 0:08 ` Martin Sebor 2020-12-03 17:32 ` Richard Sandiford 1 sibling, 2 replies; 56+ messages in thread From: Qing Zhao @ 2020-11-24 16:54 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, gcc Patches > On Nov 24, 2020, at 9:55 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> >> >>> On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote: >>> >>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches >>> <gcc-patches@gcc.gnu.org> wrote: >>>> >>>> Hi, >>>> >>>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? >>>> >>>> If not, what’s the best way to traverse the local variables? >>> >>> Depends on what for. There's the source level view you get by walking >>> BLOCK_VARS of the >>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and >>> there's SSA names >>> (FOR_EACH_SSA_NAME). >> >> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are >> not explicitly initialized in the declaration, the basic idea is following: >> >> ** The proposal: >> >> A. add a new GCC option: (same name and meaning as CLANG) >> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >> >> B. add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly for performance purpose. >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language". >> >> >> ** The implementation: >> >> There are two major requirements for the implementation: >> >> 1. all auto-variables that do not have an explicit initializer should be initialized to >> zero by this option. (Same behavior as CLANG) >> >> 2. keep the current static warning on uninitialized variables untouched. >> >> In order to satisfy 1, we should check whether an auto-variable has initializer >> or not; >> In order to satisfy 2, we should add this new transformation after >> "pass_late_warn_uninitialized". >> >> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, >> If Not, then insert an initialization for it. >> >> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? > > Yes, but do you want to catch variables promoted to register as well > or just variables > on the stack? I think both as long as they are source-level auto-variables. Then which one is better? > >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. > > For locals it would be more reliable to set this flag during gimplification. You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: if (VAR_P (decl) && !DECL_EXTERNAL (decl)) { tree init = DECL_INITIAL (decl); ... if (init && init != error_mark_node) { if (!TREE_STATIC (decl)) { DECL_IS_INITIALIZED(decl) = 1; } Is this enough for all Frontends? Are there other places that I need to maintain this bit? > >> Do you have any comment and suggestions? > > As said above - do you want to cover registers as well as locals? All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). If GCC adds this phase in FE, then the following design requirement C. The implementation needs to keep the current static warning on uninitialized variables untouched in order to avoid "forking the language”. cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. So, we have to add this new phase after “pass_late_warn_uninitialized”. > I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. This is a really good point… In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best for this issue). However, if we have to met the following requirement: C. The implementation needs to keep the current static warning on uninitialized variables untouched in order to avoid "forking the language”. We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. So, this is a problem that is not easy to resolve. Do you have suggestion on this? Qing > > Richard. > >> >> Thanks a lot for the help. >> >> Qing >> >>> Richard. >>> >>>> >>>> Thanks. >>>> >>>> Qing ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-24 16:54 ` Qing Zhao @ 2020-11-25 9:11 ` Richard Biener 2020-11-25 17:41 ` Qing Zhao 2020-12-01 19:47 ` Qing Zhao 2020-11-26 0:08 ` Martin Sebor 1 sibling, 2 replies; 56+ messages in thread From: Richard Biener @ 2020-11-25 9:11 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches On Tue, Nov 24, 2020 at 5:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > On Nov 24, 2020, at 9:55 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > > On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > Hi, > > Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? > > If not, what’s the best way to traverse the local variables? > > > Depends on what for. There's the source level view you get by walking > BLOCK_VARS of the > scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and > there's SSA names > (FOR_EACH_SSA_NAME). > > > I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are > not explicitly initialized in the declaration, the basic idea is following: > > ** The proposal: > > A. add a new GCC option: (same name and meaning as CLANG) > -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; > > B. add a new attribute for variable: > __attribute((uninitialized) > the marked variable is uninitialized intentionaly for performance purpose. > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language". > > > ** The implementation: > > There are two major requirements for the implementation: > > 1. all auto-variables that do not have an explicit initializer should be initialized to > zero by this option. (Same behavior as CLANG) > > 2. keep the current static warning on uninitialized variables untouched. > > In order to satisfy 1, we should check whether an auto-variable has initializer > or not; > In order to satisfy 2, we should add this new transformation after > "pass_late_warn_uninitialized". > > So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, > If Not, then insert an initialization for it. > > For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? > > > Yes, but do you want to catch variables promoted to register as well > or just variables > on the stack? > > > I think both as long as they are source-level auto-variables. Then which one is better? > > > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag during gimplification. > > > You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: > > if (VAR_P (decl) && !DECL_EXTERNAL (decl)) > { > tree init = DECL_INITIAL (decl); > ... > if (init && init != error_mark_node) > { > if (!TREE_STATIC (decl)) > { > DECL_IS_INITIALIZED(decl) = 1; > } > > Is this enough for all Frontends? Are there other places that I need to maintain this bit? > > > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? > > > All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). > If GCC adds this phase in FE, then the following design requirement > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. > > So, we have to add this new phase after “pass_late_warn_uninitialized”. > > I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > > Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. > > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > This is a really good point… > > In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best > for this issue). However, if we have to met the following requirement: So is optimization supposed to pick up zero or is it supposed to act as if the initializer is unknown? > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. > > So, this is a problem that is not easy to resolve. Indeed, those are conflicting goals. > Do you have suggestion on this? No, not any easy ones. Doing more of the uninit analysis early (there is already an early uninit pass) which would mean doing IPA analysis turing GCC into more of a static analysis tool. Theres the analyzer now, not sure if that can employ an early LTO phase for example. Richard. > Qing > > > Richard. > > > Thanks a lot for the help. > > Qing > > Richard. > > > Thanks. > > Qing > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-25 9:11 ` Richard Biener @ 2020-11-25 17:41 ` Qing Zhao 2020-12-01 19:47 ` Qing Zhao 1 sibling, 0 replies; 56+ messages in thread From: Qing Zhao @ 2020-11-25 17:41 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, gcc Patches > On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> >> Hi, >> >> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? >> >> If not, what’s the best way to traverse the local variables? >> >> >> Depends on what for. There's the source level view you get by walking >> BLOCK_VARS of the >> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and >> there's SSA names >> (FOR_EACH_SSA_NAME). >> >> >> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are >> not explicitly initialized in the declaration, the basic idea is following: >> >> ** The proposal: >> >> A. add a new GCC option: (same name and meaning as CLANG) >> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >> >> B. add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly for performance purpose. >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language". >> >> >> ** The implementation: >> >> There are two major requirements for the implementation: >> >> 1. all auto-variables that do not have an explicit initializer should be initialized to >> zero by this option. (Same behavior as CLANG) >> >> 2. keep the current static warning on uninitialized variables untouched. >> >> In order to satisfy 1, we should check whether an auto-variable has initializer >> or not; >> In order to satisfy 2, we should add this new transformation after >> "pass_late_warn_uninitialized". >> >> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, >> If Not, then insert an initialization for it. >> >> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? >> >> >> Yes, but do you want to catch variables promoted to register as well >> or just variables >> on the stack? >> >> >> I think both as long as they are source-level auto-variables. Then which one is better? >> >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. >> >> >> For locals it would be more reliable to set this flag during gimplification. >> >> >> You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: >> >> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >> { >> tree init = DECL_INITIAL (decl); >> ... >> if (init && init != error_mark_node) >> { >> if (!TREE_STATIC (decl)) >> { >> DECL_IS_INITIALIZED(decl) = 1; >> } >> >> Is this enough for all Frontends? Are there other places that I need to maintain this bit? >> >> >> >> Do you have any comment and suggestions? >> >> >> As said above - do you want to cover registers as well as locals? >> >> >> All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). >> If GCC adds this phase in FE, then the following design requirement >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. >> >> So, we have to add this new phase after “pass_late_warn_uninitialized”. >> >> I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> >> Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. >> >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. >> >> >> This is a really good point… >> >> In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best >> for this issue). However, if we have to met the following requirement: > > So is optimization supposed to pick up zero or is it supposed to act > as if the initializer > is unknown? Good question! Theoretically, the new option -ftrivial-auto-var-init=zero is supposed to add zero initialization to auto-variables that are not explicitly initialized in order to avoid the possible undefined behavior. So, I think that with the new option specified, compiler optimization should pick up zero initialization. Therefore, ideally, zero initializations should be inserted before optimizations. However, this will conflict with the requirement “ keep the current static warning on uninitialized variables untouched in order to avoid "forking the language”." >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. >> >> So, this is a problem that is not easy to resolve. > > Indeed, those are conflicting goals. Yes, this is the most difficult part for this task. Not sure how CLANG resolved this issue? > >> Do you have suggestion on this? > > No, not any easy ones. Doing more of the uninit analysis early (there > is already an early > uninit pass) which would mean doing IPA analysis turing GCC into more > of a static analysis > tool. Theres the analyzer now, not sure if that can employ an early > LTO phase for example. You mean to enhance “pass_early_warn_uninitialized” or “pass_analyzer” to catch more uninitialized cases, then add the new “zero initialization” after these passes? However, both “pass_early_warn_uninitialized” and “pass_analyzer” still utilize some early ipa optimizations. These early optimizations still act as the initializers are unknown. So, looks like the conflicting cannot be completely resolved. Another thought, If we still add the initializations at “pass_expand” as you suggested in the previous email, GCC will be split into two parts, the earlier part before “pass_expand” all act without the zero initialization And report the uninitialized warnings based on this. The later part after “pass_expand” will pick up zero initializations. All the RTL optimizations will be applied on the program with all new zero initializations. Will such approach have any potential big issue? Qing > > Richard. > >> Qing >> >> >> Richard. >> >> >> Thanks a lot for the help. >> >> Qing >> >> Richard. >> >> >> Thanks. >> >> Qing ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-25 9:11 ` Richard Biener 2020-11-25 17:41 ` Qing Zhao @ 2020-12-01 19:47 ` Qing Zhao 2020-12-02 8:45 ` Richard Biener 1 sibling, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-01 19:47 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, gcc Patches Hi, Richard, Could you please comment on the following approach: Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification. However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes, (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p” to consider “artificial” zero-initializers. (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value). With such approach, we should be able to address all those conflicts. Do you see any obvious issue with this approach? Thanks a lot for your help. Qing > On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> >> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are >> not explicitly initialized in the declaration, the basic idea is following: >> >> ** The proposal: >> >> A. add a new GCC option: (same name and meaning as CLANG) >> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >> >> B. add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly for performance purpose. >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language". >> >> >> ** The implementation: >> >> There are two major requirements for the implementation: >> >> 1. all auto-variables that do not have an explicit initializer should be initialized to >> zero by this option. (Same behavior as CLANG) >> >> 2. keep the current static warning on uninitialized variables untouched. >> >> In order to satisfy 1, we should check whether an auto-variable has initializer >> or not; >> In order to satisfy 2, we should add this new transformation after >> "pass_late_warn_uninitialized". >> >> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, >> If Not, then insert an initialization for it. >> >> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? >> >> >> I think both as long as they are source-level auto-variables. Then which one is better? >> >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. >> >> >> For locals it would be more reliable to set this flag-Wmaybe-uninitialized. >> >> >> You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: >> >> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >> { >> tree init = DECL_INITIAL (decl); >> ... >> if (init && init != error_mark_node) >> { >> if (!TREE_STATIC (decl)) >> { >> DECL_IS_INITIALIZED(decl) = 1; >> } >> >> Is this enough for all Frontends? Are there other places that I need to maintain this bit? >> >> >> >> Do you have any comment and suggestions? >> >> >> As said above - do you want to cover registers as well as locals? >> >> >> All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). >> If GCC adds this phase in FE, then the following design requirement >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. >> >> So, we have to add this new phase after “pass_late_warn_uninitialized”. >> >> I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> >> Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. >> >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. >> >> >> This is a really good point… >> >> In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best >> for this issue). However, if we have to met the following requirement: > > So is optimization supposed to pick up zero or is it supposed to act > as if the initializer > is unknown? > >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. >> >> So, this is a problem that is not easy to resolve. > > Indeed, those are conflicting goals. > >> Do you have suggestion on this? > > No, not any easy ones. Doing more of the uninit analysis early (there > is already an early > uninit pass) which would mean doing IPA analysis turing GCC into more > of a static analysis > tool. Theres the analyzer now, not sure if that can employ an early > LTO phase for example. > > Richard. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-01 19:47 ` Qing Zhao @ 2020-12-02 8:45 ` Richard Biener 2020-12-02 15:36 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2020-12-02 8:45 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > Hi, Richard, > > Could you please comment on the following approach: > > Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification. > However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to > “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes, > (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p” to consider “artificial” zero-initializers. > (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value). > > With such approach, we should be able to address all those conflicts. > > Do you see any obvious issue with this approach? Yes, DSE will happily elide an explicit zero-init following the artificial one leading to false uninit diagnostics. What's the intended purpose of the zero-init? Richard. > Thanks a lot for your help. > > Qing > > > On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > > > I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are > not explicitly initialized in the declaration, the basic idea is following: > > ** The proposal: > > A. add a new GCC option: (same name and meaning as CLANG) > -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; > > B. add a new attribute for variable: > __attribute((uninitialized) > the marked variable is uninitialized intentionaly for performance purpose. > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language". > > > ** The implementation: > > There are two major requirements for the implementation: > > 1. all auto-variables that do not have an explicit initializer should be initialized to > zero by this option. (Same behavior as CLANG) > > 2. keep the current static warning on uninitialized variables untouched. > > In order to satisfy 1, we should check whether an auto-variable has initializer > or not; > In order to satisfy 2, we should add this new transformation after > "pass_late_warn_uninitialized". > > So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, > If Not, then insert an initialization for it. > > For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? > > > I think both as long as they are source-level auto-variables. Then which one is better? > > > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag-Wmaybe-uninitialized. > > > > You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: > > if (VAR_P (decl) && !DECL_EXTERNAL (decl)) > { > tree init = DECL_INITIAL (decl); > ... > if (init && init != error_mark_node) > { > if (!TREE_STATIC (decl)) > { > DECL_IS_INITIALIZED(decl) = 1; > } > > Is this enough for all Frontends? Are there other places that I need to maintain this bit? > > > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? > > > All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). > If GCC adds this phase in FE, then the following design requirement > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. > > So, we have to add this new phase after “pass_late_warn_uninitialized”. > > I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > > Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. > > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > This is a really good point… > > In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best > for this issue). However, if we have to met the following requirement: > > > So is optimization supposed to pick up zero or is it supposed to act > as if the initializer > is unknown? > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. > > So, this is a problem that is not easy to resolve. > > > Indeed, those are conflicting goals. > > Do you have suggestion on this? > > > No, not any easy ones. Doing more of the uninit analysis early (there > is already an early > uninit pass) which would mean doing IPA analysis turing GCC into more > of a static analysis > tool. Theres the analyzer now, not sure if that can employ an early > LTO phase for example. > > > > > Richard. > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-02 8:45 ` Richard Biener @ 2020-12-02 15:36 ` Qing Zhao 2020-12-03 8:45 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-02 15:36 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, gcc Patches, kees Cook > On Dec 2, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> Hi, Richard, >> >> Could you please comment on the following approach: >> >> Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification. >> However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to >> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes, >> (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p” to consider “artificial” zero-initializers. >> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value). >> >> With such approach, we should be able to address all those conflicts. >> >> Do you see any obvious issue with this approach? > > Yes, DSE will happily elide an explicit zero-init following the > artificial one leading to false uninit diagnostics. Indeed. This is a big issue. And other optimizations might also be impacted by the new zero-init, resulting changed behavior of uninitialized analysis in the later stage. > > What's the intended purpose of the zero-init? The purpose of this new option is: (from the original LLVM patch submission): "Add an option to initialize automatic variables with either a pattern or with zeroes. The default is still that automatic variables are uninitialized. Also add attributes to request uninitialized on a per-variable basis, mainly to disable initialization of large stack arrays when deemed too expensive. This isn't meant to change the semantics of C and C++. Rather, it's meant to be a last-resort when programmers inadvertently have some undefined behavior in their code. This patch aims to make undefined behavior hurt less, which security-minded people will be very happy about. Notably, this means that there's no inadvertent information leak when: • The compiler re-uses stack slots, and a value is used uninitialized. • The compiler re-uses a register, and a value is used uninitialized. • Stack structs / arrays / unions with padding are copied. This patch only addresses stack and register information leaks. There's many more infoleaks that we could address, and much more undefined behavior that could be tamed. Let's keep this patch focused, and I'm happy to address related issues elsewhere." For more details, please refer to the LLVM code review discussion on this patch: https://reviews.llvm.org/D54604 I also wrote a simple writeup for this task based on my study and discussion with Kees Cook (cc’ing him) as following: thanks. Qing Support stack variables auto-initialization in GCC 11/19/2020 Qing Zhao ======================================================= ** Background of the task: The correponding GCC bugzilla RFE was created on 9/3/2018: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210 A similar option for LLVM (around Nov, 2018) https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html had invoked a lot of discussion before committed. (The following are quoted from the comments of Alexander Potapenko in GCC bug 87210): Finally, on Oct, 2019, upstream Clang supports force initialization of stack variables under the -ftrivial-auto-var-init flag. -ftrivial-auto-var-init=pattern initializes local variables with a 0xAA pattern (actually it's more complicated, see https://reviews.llvm.org/D54604) -ftrivial-auto-var-init=zero provides zero-initialization of locals. This mode isn't officially supported yet and is hidden behind an additional -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang flag. This is done to avoid creating a C++ dialect where all variables are zero-initialized. Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that performs the build with -ftrivial-auto-var-init=pattern. This one isn't widely adopted yet, partially because initializing locals with 0xAA isn't fast enough. Linus Torvalds is quite positive about zero-initializing the locals though, see https://lkml.org/lkml/2019/7/30/1303: "when a compiler has an option to initialize stack variables, it would probably _also_ be a very good idea for that compiler to then support a variable attribute that says "don't initialize _this_ variable, I will do that manually". I also think that the "initialize with poison" is pointless and wrong. Yes, it can find bugs, but it doesn't really help improve the general situation, and people see it as a debugging tool, not a "improve code quality and improve the life of kernel developers" tool. So having a flag similar to -ftrivial-auto-var-init=zero in GCC will be appreciated by the Linux kernel community. currently, kernel is using a gcc plugin to support stack variables auto-initialization: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c ** Current situation: A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support pattern init and zero init already; http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/ Pattern init is used in development build for debugging purpose, zero init is used in production build for security purpose. B. for CLANG, even though zero init is controlled by "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang", many end users have used it for production build. this functionality cannot be removed anymore. "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang" might be changed to more meaningful name later in CLANG. ** My proposal: A. add a new GCC option: (same name and meaning as CLANG) -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; B. add a new attribute for variable: __attribute((uninitialized) the marked variable is uninitialized intentionaly for performance purpose. C. The implementation needs to keep the current static warning on uninitialized variables untouched in order to avoid "forking the language”. > >> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> >> >> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are >> not explicitly initialized in the declaration, the basic idea is following: >> >> ** The proposal: >> >> A. add a new GCC option: (same name and meaning as CLANG) >> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >> >> B. add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly for performance purpose. >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language". >> >> >> ** The implementation: >> >> There are two major requirements for the implementation: >> >> 1. all auto-variables that do not have an explicit initializer should be initialized to >> zero by this option. (Same behavior as CLANG) >> >> 2. keep the current static warning on uninitialized variables untouched. >> >> In order to satisfy 1, we should check whether an auto-variable has initializer >> or not; >> In order to satisfy 2, we should add this new transformation after >> "pass_late_warn_uninitialized". >> >> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, >> If Not, then insert an initialization for it. >> >> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? >> >> >> I think both as long as they are source-level auto-variables. Then which one is better? >> >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. >> >> >> For locals it would be more reliable to set this flag-Wmaybe-uninitialized. >> >> >> >> You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: >> >> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >> { >> tree init = DECL_INITIAL (decl); >> ... >> if (init && init != error_mark_node) >> { >> if (!TREE_STATIC (decl)) >> { >> DECL_IS_INITIALIZED(decl) = 1; >> } >> >> Is this enough for all Frontends? Are there other places that I need to maintain this bit? >> >> >> >> Do you have any comment and suggestions? >> >> >> As said above - do you want to cover registers as well as locals? >> >> >> All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). >> If GCC adds this phase in FE, then the following design requirement >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. >> >> So, we have to add this new phase after “pass_late_warn_uninitialized”. >> >> I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> >> Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. >> >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. >> >> >> This is a really good point… >> >> In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best >> for this issue). However, if we have to met the following requirement: >> >> >> So is optimization supposed to pick up zero or is it supposed to act >> as if the initializer >> is unknown? >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. >> >> So, this is a problem that is not easy to resolve. >> >> >> Indeed, those are conflicting goals. >> >> Do you have suggestion on this? >> >> >> No, not any easy ones. Doing more of the uninit analysis early (there >> is already an early >> uninit pass) which would mean doing IPA analysis turing GCC into more >> of a static analysis >> tool. Theres the analyzer now, not sure if that can employ an early >> LTO phase for example. >> >> >> >> >> Richard. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-02 15:36 ` Qing Zhao @ 2020-12-03 8:45 ` Richard Biener 2020-12-03 16:07 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2020-12-03 8:45 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches, kees Cook On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > On Dec 2, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > Hi, Richard, > > Could you please comment on the following approach: > > Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification. > However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to > “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes, > (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p” to consider “artificial” zero-initializers. > (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value). > > With such approach, we should be able to address all those conflicts. > > Do you see any obvious issue with this approach? > > > Yes, DSE will happily elide an explicit zero-init following the > artificial one leading to false uninit diagnostics. > > > Indeed. This is a big issue. And other optimizations might also be impacted by the new zero-init, resulting changed behavior > of uninitialized analysis in the later stage. I don't see how the issue can be resolved, you can't get both, uninit warnings and no uninitialized memory. People can compile twice, once without -fzero-init to get uninit warnings and once with -fzero-init to get the extra "security". Richard. > > What's the intended purpose of the zero-init? > > > > The purpose of this new option is: (from the original LLVM patch submission): > > "Add an option to initialize automatic variables with either a pattern or with > zeroes. The default is still that automatic variables are uninitialized. Also > add attributes to request uninitialized on a per-variable basis, mainly to disable > initialization of large stack arrays when deemed too expensive. > > This isn't meant to change the semantics of C and C++. Rather, it's meant to be > a last-resort when programmers inadvertently have some undefined behavior in > their code. This patch aims to make undefined behavior hurt less, which > security-minded people will be very happy about. Notably, this means that > there's no inadvertent information leak when: > > • The compiler re-uses stack slots, and a value is used uninitialized. > • The compiler re-uses a register, and a value is used uninitialized. > • Stack structs / arrays / unions with padding are copied. > This patch only addresses stack and register information leaks. There's many > more infoleaks that we could address, and much more undefined behavior that > could be tamed. Let's keep this patch focused, and I'm happy to address related > issues elsewhere." > > For more details, please refer to the LLVM code review discussion on this patch: > https://reviews.llvm.org/D54604 > > > I also wrote a simple writeup for this task based on my study and discussion with > Kees Cook (cc’ing him) as following: > > > thanks. > > Qing > > Support stack variables auto-initialization in GCC > > 11/19/2020 > > Qing Zhao > > ======================================================= > > > ** Background of the task: > > The correponding GCC bugzilla RFE was created on 9/3/2018: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210 > > A similar option for LLVM (around Nov, 2018) > https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html > had invoked a lot of discussion before committed. > > (The following are quoted from the comments of Alexander Potapenko in > GCC bug 87210): > > Finally, on Oct, 2019, upstream Clang supports force initialization > of stack variables under the -ftrivial-auto-var-init flag. > > -ftrivial-auto-var-init=pattern initializes local variables with a 0xAA pattern > (actually it's more complicated, see https://reviews.llvm.org/D54604) > > -ftrivial-auto-var-init=zero provides zero-initialization of locals. > This mode isn't officially supported yet and is hidden behind an additional > -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang flag. > This is done to avoid creating a C++ dialect where all variables are > zero-initialized. > > Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that performs > the build with -ftrivial-auto-var-init=pattern. This one isn't widely adopted > yet, partially because initializing locals with 0xAA isn't fast enough. > > Linus Torvalds is quite positive about zero-initializing the locals though, > see https://lkml.org/lkml/2019/7/30/1303: > > "when a compiler has an option to initialize stack variables, it > would probably _also_ be a very good idea for that compiler to then > support a variable attribute that says "don't initialize _this_ > variable, I will do that manually". > I also think that the "initialize with poison" is > pointless and wrong. Yes, it can find bugs, but it doesn't really help > improve the general situation, and people see it as a debugging tool, > not a "improve code quality and improve the life of kernel developers" > tool. > > So having a flag similar to -ftrivial-auto-var-init=zero in GCC will be > appreciated by the Linux kernel community. > > currently, kernel is using a gcc plugin to support stack variables > auto-initialization: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c > > ** Current situation: > > A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support pattern init and > zero init already; > http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html > https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/ > Pattern init is used in development build for debugging purpose, zero init is > used in production build for security purpose. > > B. for CLANG, even though zero init is controlled by > "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang", > many end users have used it for production build. > this functionality cannot be removed anymore. > "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang" > might be changed to more meaningful name later in CLANG. > > > ** My proposal: > > A. add a new GCC option: (same name and meaning as CLANG) > -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; > > B. add a new attribute for variable: > __attribute((uninitialized) > the marked variable is uninitialized intentionaly for performance purpose. > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > > > On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > > > I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are > not explicitly initialized in the declaration, the basic idea is following: > > ** The proposal: > > A. add a new GCC option: (same name and meaning as CLANG) > -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; > > B. add a new attribute for variable: > __attribute((uninitialized) > the marked variable is uninitialized intentionaly for performance purpose. > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language". > > > ** The implementation: > > There are two major requirements for the implementation: > > 1. all auto-variables that do not have an explicit initializer should be initialized to > zero by this option. (Same behavior as CLANG) > > 2. keep the current static warning on uninitialized variables untouched. > > In order to satisfy 1, we should check whether an auto-variable has initializer > or not; > In order to satisfy 2, we should add this new transformation after > "pass_late_warn_uninitialized". > > So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, > If Not, then insert an initialization for it. > > For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? > > > I think both as long as they are source-level auto-variables. Then which one is better? > > > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag-Wmaybe-uninitialized. > > > > You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: > > if (VAR_P (decl) && !DECL_EXTERNAL (decl)) > { > tree init = DECL_INITIAL (decl); > ... > if (init && init != error_mark_node) > { > if (!TREE_STATIC (decl)) > { > DECL_IS_INITIALIZED(decl) = 1; > } > > Is this enough for all Frontends? Are there other places that I need to maintain this bit? > > > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? > > > All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). > If GCC adds this phase in FE, then the following design requirement > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. > > So, we have to add this new phase after “pass_late_warn_uninitialized”. > > I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > > Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. > > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > This is a really good point… > > In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best > for this issue). However, if we have to met the following requirement: > > > So is optimization supposed to pick up zero or is it supposed to act > as if the initializer > is unknown? > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. > > So, this is a problem that is not easy to resolve. > > > Indeed, those are conflicting goals. > > Do you have suggestion on this? > > > No, not any easy ones. Doing more of the uninit analysis early (there > is already an early > uninit pass) which would mean doing IPA analysis turing GCC into more > of a static analysis > tool. Theres the analyzer now, not sure if that can employ an early > LTO phase for example. > > > > > Richard. > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-03 8:45 ` Richard Biener @ 2020-12-03 16:07 ` Qing Zhao 2020-12-03 16:36 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-03 16:07 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, gcc Patches, kees Cook > On Dec 3, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote: >> >> >> >> On Dec 2, 2020, at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> >> Hi, Richard, >> >> Could you please comment on the following approach: >> >> Instead of adding the zero-initializer quite late at the pass “pass_expand”, we can add it as early as during gimplification. >> However, we will mark these new added zero-initializers as “artificial”. And passing this “artificial” information to >> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these two uninitialized variable analysis passes, >> (i.e., in tree-sea-uninit.c) We will update the checking on “ssa_undefined_value_p” to consider “artificial” zero-initializers. >> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined value). >> >> With such approach, we should be able to address all those conflicts. >> >> Do you see any obvious issue with this approach? >> >> >> Yes, DSE will happily elide an explicit zero-init following the >> artificial one leading to false uninit diagnostics. >> >> >> Indeed. This is a big issue. And other optimizations might also be impacted by the new zero-init, resulting changed behavior >> of uninitialized analysis in the later stage. > > I don't see how the issue can be resolved, you can't get both, uninit > warnings and no uninitialized memory. > People can compile twice, once without -fzero-init to get uninit > warnings and once with -fzero-init to get > the extra "security". So, for GCC, you think that it’s okay to get rid of the following requirement: C. The implementation needs to keep the current static warning on uninitialized variables untouched in order to avoid "forking the language”. Then, we can add explanation in the user documentation of the new -fzero-init and also that of the -Wuninitialized to inform users that -fzero-init will change the behavior of -Wuninitialized. In order to get the warnings, -fzero-init should not be added at the same time? With this requirement being eliminated, implementation will be much easier. We can add the new initialization during simplification phase. Then this new option will work for all languages. Is this reasonable? thanks. Qing > > Richard. > >> >> What's the intended purpose of the zero-init? >> >> >> >> The purpose of this new option is: (from the original LLVM patch submission): >> >> "Add an option to initialize automatic variables with either a pattern or with >> zeroes. The default is still that automatic variables are uninitialized. Also >> add attributes to request uninitialized on a per-variable basis, mainly to disable >> initialization of large stack arrays when deemed too expensive. >> >> This isn't meant to change the semantics of C and C++. Rather, it's meant to be >> a last-resort when programmers inadvertently have some undefined behavior in >> their code. This patch aims to make undefined behavior hurt less, which >> security-minded people will be very happy about. Notably, this means that >> there's no inadvertent information leak when: >> >> • The compiler re-uses stack slots, and a value is used uninitialized. >> • The compiler re-uses a register, and a value is used uninitialized. >> • Stack structs / arrays / unions with padding are copied. >> This patch only addresses stack and register information leaks. There's many >> more infoleaks that we could address, and much more undefined behavior that >> could be tamed. Let's keep this patch focused, and I'm happy to address related >> issues elsewhere." >> >> For more details, please refer to the LLVM code review discussion on this patch: >> https://reviews.llvm.org/D54604 >> >> >> I also wrote a simple writeup for this task based on my study and discussion with >> Kees Cook (cc’ing him) as following: >> >> >> thanks. >> >> Qing >> >> Support stack variables auto-initialization in GCC >> >> 11/19/2020 >> >> Qing Zhao >> >> ======================================================= >> >> >> ** Background of the task: >> >> The correponding GCC bugzilla RFE was created on 9/3/2018: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210 >> >> A similar option for LLVM (around Nov, 2018) >> https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html >> had invoked a lot of discussion before committed. >> >> (The following are quoted from the comments of Alexander Potapenko in >> GCC bug 87210): >> >> Finally, on Oct, 2019, upstream Clang supports force initialization >> of stack variables under the -ftrivial-auto-var-init flag. >> >> -ftrivial-auto-var-init=pattern initializes local variables with a 0xAA pattern >> (actually it's more complicated, see https://reviews.llvm.org/D54604) >> >> -ftrivial-auto-var-init=zero provides zero-initialization of locals. >> This mode isn't officially supported yet and is hidden behind an additional >> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang flag. >> This is done to avoid creating a C++ dialect where all variables are >> zero-initialized. >> >> Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that performs >> the build with -ftrivial-auto-var-init=pattern. This one isn't widely adopted >> yet, partially because initializing locals with 0xAA isn't fast enough. >> >> Linus Torvalds is quite positive about zero-initializing the locals though, >> see https://lkml.org/lkml/2019/7/30/1303: >> >> "when a compiler has an option to initialize stack variables, it >> would probably _also_ be a very good idea for that compiler to then >> support a variable attribute that says "don't initialize _this_ >> variable, I will do that manually". >> I also think that the "initialize with poison" is >> pointless and wrong. Yes, it can find bugs, but it doesn't really help >> improve the general situation, and people see it as a debugging tool, >> not a "improve code quality and improve the life of kernel developers" >> tool. >> >> So having a flag similar to -ftrivial-auto-var-init=zero in GCC will be >> appreciated by the Linux kernel community. >> >> currently, kernel is using a gcc plugin to support stack variables >> auto-initialization: >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c >> >> ** Current situation: >> >> A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support pattern init and >> zero init already; >> http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html >> https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/ >> Pattern init is used in development build for debugging purpose, zero init is >> used in production build for security purpose. >> >> B. for CLANG, even though zero init is controlled by >> "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang", >> many end users have used it for production build. >> this functionality cannot be removed anymore. >> "-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang" >> might be changed to more meaningful name later in CLANG. >> >> >> ** My proposal: >> >> A. add a new GCC option: (same name and meaning as CLANG) >> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >> >> B. add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly for performance purpose. >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> >> >> On Nov 25, 2020, at 3:11 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> >> >> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are >> not explicitly initialized in the declaration, the basic idea is following: >> >> ** The proposal: >> >> A. add a new GCC option: (same name and meaning as CLANG) >> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >> >> B. add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly for performance purpose. >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language". >> >> >> ** The implementation: >> >> There are two major requirements for the implementation: >> >> 1. all auto-variables that do not have an explicit initializer should be initialized to >> zero by this option. (Same behavior as CLANG) >> >> 2. keep the current static warning on uninitialized variables untouched. >> >> In order to satisfy 1, we should check whether an auto-variable has initializer >> or not; >> In order to satisfy 2, we should add this new transformation after >> "pass_late_warn_uninitialized". >> >> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, >> If Not, then insert an initialization for it. >> >> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? >> >> >> I think both as long as they are source-level auto-variables. Then which one is better? >> >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. >> >> >> For locals it would be more reliable to set this flag-Wmaybe-uninitialized. >> >> >> >> You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: >> >> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >> { >> tree init = DECL_INITIAL (decl); >> ... >> if (init && init != error_mark_node) >> { >> if (!TREE_STATIC (decl)) >> { >> DECL_IS_INITIALIZED(decl) = 1; >> } >> >> Is this enough for all Frontends? Are there other places that I need to maintain this bit? >> >> >> >> Do you have any comment and suggestions? >> >> >> As said above - do you want to cover registers as well as locals? >> >> >> All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). >> If GCC adds this phase in FE, then the following design requirement >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. >> >> So, we have to add this new phase after “pass_late_warn_uninitialized”. >> >> I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> >> Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. >> >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. >> >> >> This is a really good point… >> >> In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best >> for this issue). However, if we have to met the following requirement: >> >> >> So is optimization supposed to pick up zero or is it supposed to act >> as if the initializer >> is unknown? >> >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> >> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. >> >> So, this is a problem that is not easy to resolve. >> >> >> Indeed, those are conflicting goals. >> >> Do you have suggestion on this? >> >> >> No, not any easy ones. Doing more of the uninit analysis early (there >> is already an early >> uninit pass) which would mean doing IPA analysis turing GCC into more >> of a static analysis >> tool. Theres the analyzer now, not sure if that can employ an early >> LTO phase for example. >> >> >> >> >> Richard. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-03 16:07 ` Qing Zhao @ 2020-12-03 16:36 ` Richard Biener 2020-12-03 16:40 ` Qing Zhao 2020-12-03 16:56 ` Richard Sandiford 0 siblings, 2 replies; 56+ messages in thread From: Richard Biener @ 2020-12-03 16:36 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, gcc Patches, kees Cook On December 3, 2020 5:07:28 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM> wrote: > > >> On Dec 3, 2020, at 2:45 AM, Richard Biener ><richard.guenther@gmail.com> wrote: >> >> On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com ><mailto:QING.ZHAO@oracle.com>> wrote: >>> >>> >>> >>> On Dec 2, 2020, at 2:45 AM, Richard Biener ><richard.guenther@gmail.com> wrote: >>> >>> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> >wrote: >>> >>> >>> Hi, Richard, >>> >>> Could you please comment on the following approach: >>> >>> Instead of adding the zero-initializer quite late at the pass >“pass_expand”, we can add it as early as during gimplification. >>> However, we will mark these new added zero-initializers as >“artificial”. And passing this “artificial” information to >>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, >in these two uninitialized variable analysis passes, >>> (i.e., in tree-sea-uninit.c) We will update the checking on >“ssa_undefined_value_p” to consider “artificial” zero-initializers. >>> (i.e, if the def_stmt is marked with “artificial”, then it’s a >undefined value). >>> >>> With such approach, we should be able to address all those >conflicts. >>> >>> Do you see any obvious issue with this approach? >>> >>> >>> Yes, DSE will happily elide an explicit zero-init following the >>> artificial one leading to false uninit diagnostics. >>> >>> >>> Indeed. This is a big issue. And other optimizations might also be >impacted by the new zero-init, resulting changed behavior >>> of uninitialized analysis in the later stage. >> >> I don't see how the issue can be resolved, you can't get both, uninit >> warnings and no uninitialized memory. >> People can compile twice, once without -fzero-init to get uninit >> warnings and once with -fzero-init to get >> the extra "security". > >So, for GCC, you think that it’s okay to get rid of the following >requirement: > >C. The implementation needs to keep the current static warning on >uninitialized >variables untouched in order to avoid "forking the language”. > >Then, we can add explanation in the user documentation of the new >-fzero-init and also >that of the -Wuninitialized to inform users that -fzero-init will >change the behavior of -Wuninitialized. >In order to get the warnings, -fzero-init should not be added at the >same time? > >With this requirement being eliminated, implementation will be much >easier. > >We can add the new initialization during simplification phase. Then >this new option will work >for all languages. Is this reasonable? I think that's reasonable indeed. Eventually doing the init after the early uninit pass is possible as well. Richard. >thanks. > >Qing > > > >> >> Richard. >> >>> >>> What's the intended purpose of the zero-init? >>> >>> >>> >>> The purpose of this new option is: (from the original LLVM patch >submission): >>> >>> "Add an option to initialize automatic variables with either a >pattern or with >>> zeroes. The default is still that automatic variables are >uninitialized. Also >>> add attributes to request uninitialized on a per-variable basis, >mainly to disable >>> initialization of large stack arrays when deemed too expensive. >>> >>> This isn't meant to change the semantics of C and C++. Rather, it's >meant to be >>> a last-resort when programmers inadvertently have some undefined >behavior in >>> their code. This patch aims to make undefined behavior hurt less, >which >>> security-minded people will be very happy about. Notably, this means >that >>> there's no inadvertent information leak when: >>> >>> • The compiler re-uses stack slots, and a value is used >uninitialized. >>> • The compiler re-uses a register, and a value is used >uninitialized. >>> • Stack structs / arrays / unions with padding are copied. >>> This patch only addresses stack and register information leaks. >There's many >>> more infoleaks that we could address, and much more undefined >behavior that >>> could be tamed. Let's keep this patch focused, and I'm happy to >address related >>> issues elsewhere." >>> >>> For more details, please refer to the LLVM code review discussion on >this patch: >>> https://reviews.llvm.org/D54604 >>> >>> >>> I also wrote a simple writeup for this task based on my study and >discussion with >>> Kees Cook (cc’ing him) as following: >>> >>> >>> thanks. >>> >>> Qing >>> >>> Support stack variables auto-initialization in GCC >>> >>> 11/19/2020 >>> >>> Qing Zhao >>> >>> ======================================================= >>> >>> >>> ** Background of the task: >>> >>> The correponding GCC bugzilla RFE was created on 9/3/2018: >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210 >>> >>> A similar option for LLVM (around Nov, 2018) >>> https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html >>> had invoked a lot of discussion before committed. >>> >>> (The following are quoted from the comments of Alexander Potapenko >in >>> GCC bug 87210): >>> >>> Finally, on Oct, 2019, upstream Clang supports force initialization >>> of stack variables under the -ftrivial-auto-var-init flag. >>> >>> -ftrivial-auto-var-init=pattern initializes local variables with a >0xAA pattern >>> (actually it's more complicated, see >https://reviews.llvm.org/D54604) >>> >>> -ftrivial-auto-var-init=zero provides zero-initialization of locals. >>> This mode isn't officially supported yet and is hidden behind an >additional >>> >-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang >flag. >>> This is done to avoid creating a C++ dialect where all variables are >>> zero-initialized. >>> >>> Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that >performs >>> the build with -ftrivial-auto-var-init=pattern. This one isn't >widely adopted >>> yet, partially because initializing locals with 0xAA isn't fast >enough. >>> >>> Linus Torvalds is quite positive about zero-initializing the locals >though, >>> see https://lkml.org/lkml/2019/7/30/1303: >>> >>> "when a compiler has an option to initialize stack variables, it >>> would probably _also_ be a very good idea for that compiler to then >>> support a variable attribute that says "don't initialize _this_ >>> variable, I will do that manually". >>> I also think that the "initialize with poison" is >>> pointless and wrong. Yes, it can find bugs, but it doesn't really >help >>> improve the general situation, and people see it as a debugging >tool, >>> not a "improve code quality and improve the life of kernel >developers" >>> tool. >>> >>> So having a flag similar to -ftrivial-auto-var-init=zero in GCC will >be >>> appreciated by the Linux kernel community. >>> >>> currently, kernel is using a gcc plugin to support stack variables >>> auto-initialization: >>> >https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/gcc-plugins/structleak_plugin.c >>> >>> ** Current situation: >>> >>> A. Both Microsoft compiler and CLANG (APPLE AND GOOGLE) support >pattern init and >>> zero init already; >>> http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html >>> >https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/ >>> Pattern init is used in development build for debugging purpose, >zero init is >>> used in production build for security purpose. >>> >>> B. for CLANG, even though zero init is controlled by >>> >"-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang", >>> many end users have used it for production build. >>> this functionality cannot be removed anymore. >>> >"-fenable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang" >>> might be changed to more meaningful name later in CLANG. >>> >>> >>> ** My proposal: >>> >>> A. add a new GCC option: (same name and meaning as CLANG) >>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as >CLANG; >>> >>> B. add a new attribute for variable: >>> __attribute((uninitialized) >>> the marked variable is uninitialized intentionaly for performance >purpose. >>> >>> C. The implementation needs to keep the current static warning on >uninitialized >>> variables untouched in order to avoid "forking the language”. >>> >>> >>> >>> On Nov 25, 2020, at 3:11 AM, Richard Biener ><richard.guenther@gmail.com> wrote: >>> >>> >>> >>> I am planing to add a new phase immediately after >“pass_late_warn_uninitialized” to initialize all auto-variables that >are >>> not explicitly initialized in the declaration, the basic idea is >following: >>> >>> ** The proposal: >>> >>> A. add a new GCC option: (same name and meaning as CLANG) >>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as >CLANG; >>> >>> B. add a new attribute for variable: >>> __attribute((uninitialized) >>> the marked variable is uninitialized intentionaly for performance >purpose. >>> >>> C. The implementation needs to keep the current static warning on >uninitialized >>> variables untouched in order to avoid "forking the language". >>> >>> >>> ** The implementation: >>> >>> There are two major requirements for the implementation: >>> >>> 1. all auto-variables that do not have an explicit initializer >should be initialized to >>> zero by this option. (Same behavior as CLANG) >>> >>> 2. keep the current static warning on uninitialized variables >untouched. >>> >>> In order to satisfy 1, we should check whether an auto-variable has >initializer >>> or not; >>> In order to satisfy 2, we should add this new transformation after >>> "pass_late_warn_uninitialized". >>> >>> So, we should be able to check whether an auto-variable has >initializer or not after “pass_late_warn_uninitialized”, >>> If Not, then insert an initialization for it. >>> >>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be >better? >>> >>> >>> I think both as long as they are source-level auto-variables. Then >which one is better? >>> >>> >>> Another issue is, in order to check whether an auto-variable has >initializer, I plan to add a new bit in “decl_common” as: >>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>> unsigned decl_is_initialized :1; >>> >>> /* IN VAR_DECL, set when the decl is initialized at the declaration. > */ >>> #define DECL_IS_INITIALIZED(NODE) \ >>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>> >>> set this bit when setting DECL_INITIAL for the variables in FE. then >keep it >>> even though DECL_INITIAL might be NULLed. >>> >>> >>> For locals it would be more reliable to set this >flag-Wmaybe-uninitialized. >>> >>> >>> >>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the >routine “gimpley_decl_expr” (gimplify.c) as following: >>> >>> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >>> { >>> tree init = DECL_INITIAL (decl); >>> ... >>> if (init && init != error_mark_node) >>> { >>> if (!TREE_STATIC (decl)) >>> { >>> DECL_IS_INITIALIZED(decl) = 1; >>> } >>> >>> Is this enough for all Frontends? Are there other places that I need >to maintain this bit? >>> >>> >>> >>> Do you have any comment and suggestions? >>> >>> >>> As said above - do you want to cover registers as well as locals? >>> >>> >>> All the locals from the source-code point of view should be covered. > (From my study so far, looks like that Clang adds that phase in FE). >>> If GCC adds this phase in FE, then the following design requirement >>> >>> C. The implementation needs to keep the current static warning on >uninitialized >>> variables untouched in order to avoid "forking the language”. >>> >>> cannot be satisfied. Since gcc’s uninitialized variables analysis >is applied quite late. >>> >>> So, we have to add this new phase after >“pass_late_warn_uninitialized”. >>> >>> I'd do >>> the actual zeroing during RTL expansion instead since otherwise you >>> have to figure youself whether a local is actually used (see >expand_stack_vars) >>> >>> >>> Adding this new transformation during RTL expansion is okay. I >will check on this in more details to see how to add it to RTL >expansion phase. >>> >>> >>> Note that optimization will already made have use of "uninitialized" >state >>> of locals so depending on what the actual goal is here "late" may be >too late. >>> >>> >>> This is a really good point… >>> >>> In order to avoid optimization to use the “uninitialized” state of >locals, we should add the zeroing phase as early as possible (adding it >in FE might be best >>> for this issue). However, if we have to met the following >requirement: >>> >>> >>> So is optimization supposed to pick up zero or is it supposed to act >>> as if the initializer >>> is unknown? >>> >>> C. The implementation needs to keep the current static warning on >uninitialized >>> variables untouched in order to avoid "forking the language”. >>> >>> We have to move the new phase after all the uninitialized analysis >is done in order to avoid “forking the language”. >>> >>> So, this is a problem that is not easy to resolve. >>> >>> >>> Indeed, those are conflicting goals. >>> >>> Do you have suggestion on this? >>> >>> >>> No, not any easy ones. Doing more of the uninit analysis early >(there >>> is already an early >>> uninit pass) which would mean doing IPA analysis turing GCC into >more >>> of a static analysis >>> tool. Theres the analyzer now, not sure if that can employ an early >>> LTO phase for example. >>> >>> >>> >>> >>> Richard. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-03 16:36 ` Richard Biener @ 2020-12-03 16:40 ` Qing Zhao 2020-12-03 16:56 ` Richard Sandiford 1 sibling, 0 replies; 56+ messages in thread From: Qing Zhao @ 2020-12-03 16:40 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, gcc Patches, kees Cook > On Dec 3, 2020, at 10:36 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On December 3, 2020 5:07:28 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> wrote: >> >> >>>> of uninitialized analysis in the later stage. >>> >>> I don't see how the issue can be resolved, you can't get both, uninit >>> warnings and no uninitialized memory. >>> People can compile twice, once without -fzero-init to get uninit >>> warnings and once with -fzero-init to get >>> the extra "security". >> >> So, for GCC, you think that it’s okay to get rid of the following >> requirement: >> >> C. The implementation needs to keep the current static warning on >> uninitialized >> variables untouched in order to avoid "forking the language”. >> >> Then, we can add explanation in the user documentation of the new >> -fzero-init and also >> that of the -Wuninitialized to inform users that -fzero-init will >> change the behavior of -Wuninitialized. >> In order to get the warnings, -fzero-init should not be added at the >> same time? >> >> With this requirement being eliminated, implementation will be much >> easier. >> >> We can add the new initialization during simplification phase. Then >> this new option will work >> for all languages. Is this reasonable? > > I think that's reasonable indeed. Eventually doing the init after the early uninit pass is possible as well. You suggested to put the new pass after the early uninit pass? Why? Qing > > Richard. > >> thanks. >> >> Qing >> >> >> >>> >>> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-03 16:36 ` Richard Biener 2020-12-03 16:40 ` Qing Zhao @ 2020-12-03 16:56 ` Richard Sandiford 1 sibling, 0 replies; 56+ messages in thread From: Richard Sandiford @ 2020-12-03 16:56 UTC (permalink / raw) To: Richard Biener via Gcc-patches; +Cc: Qing Zhao, Richard Biener, kees Cook Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On December 3, 2020 5:07:28 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM> wrote: >> >> >>> On Dec 3, 2020, at 2:45 AM, Richard Biener >><richard.guenther@gmail.com> wrote: >>> >>> On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao <QING.ZHAO@oracle.com >><mailto:QING.ZHAO@oracle.com>> wrote: >>>> >>>> >>>> >>>> On Dec 2, 2020, at 2:45 AM, Richard Biener >><richard.guenther@gmail.com> wrote: >>>> >>>> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao <QING.ZHAO@oracle.com> >>wrote: >>>> >>>> >>>> Hi, Richard, >>>> >>>> Could you please comment on the following approach: >>>> >>>> Instead of adding the zero-initializer quite late at the pass >>“pass_expand”, we can add it as early as during gimplification. >>>> However, we will mark these new added zero-initializers as >>“artificial”. And passing this “artificial” information to >>>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, >>in these two uninitialized variable analysis passes, >>>> (i.e., in tree-sea-uninit.c) We will update the checking on >>“ssa_undefined_value_p” to consider “artificial” zero-initializers. >>>> (i.e, if the def_stmt is marked with “artificial”, then it’s a >>undefined value). >>>> >>>> With such approach, we should be able to address all those >>conflicts. >>>> >>>> Do you see any obvious issue with this approach? >>>> >>>> >>>> Yes, DSE will happily elide an explicit zero-init following the >>>> artificial one leading to false uninit diagnostics. >>>> >>>> >>>> Indeed. This is a big issue. And other optimizations might also be >>impacted by the new zero-init, resulting changed behavior >>>> of uninitialized analysis in the later stage. >>> >>> I don't see how the issue can be resolved, you can't get both, uninit >>> warnings and no uninitialized memory. >>> People can compile twice, once without -fzero-init to get uninit >>> warnings and once with -fzero-init to get >>> the extra "security". >> >>So, for GCC, you think that it’s okay to get rid of the following >>requirement: >> >>C. The implementation needs to keep the current static warning on >>uninitialized >>variables untouched in order to avoid "forking the language”. >> >>Then, we can add explanation in the user documentation of the new >>-fzero-init and also >>that of the -Wuninitialized to inform users that -fzero-init will >>change the behavior of -Wuninitialized. >>In order to get the warnings, -fzero-init should not be added at the >>same time? >> >>With this requirement being eliminated, implementation will be much >>easier. >> >>We can add the new initialization during simplification phase. Then >>this new option will work >>for all languages. Is this reasonable? > > I think that's reasonable indeed. Eventually doing the init after the early uninit pass is possible as well. Sorry to be awkward, but I kind-of disagree. IIRC, clang was able to give uninit warnings while implementing the initialisation as expected, so I think this is a GCC restriction rather than a fundamental incompatibility. I don't think it's reasonable to expect people to read the documentation of -ffoo for Clang and separately read the documentation of -ffoo for GCC. They'll at best read the documentation for one and (rightly) expect the other compiler to behave in a compatible way. I'm also not sure people would build twice in practice. I remember the issue of forking the language was discussed at length on the Clang dev list at the time (but I haven't gone back and re-read the thread, so I'm relying on memory here). Not forking the language was an important goal/requirement of the option and I don't think we should drop it when implementing the option in GCC. IMO, if we want to define a dialect of C/C++ in which uninitialised uses are always well defined rather than UB, we should do that as a separate option. If we're implementing the Clang options, we should continue to treat uninitialised uses as UB that triggers the same warnings as if the option wasn't passed. So TBH I'd rather not add the option until it can be implemented in a way that is compatible with Clang. Thanks, Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-24 16:54 ` Qing Zhao 2020-11-25 9:11 ` Richard Biener @ 2020-11-26 0:08 ` Martin Sebor 2020-11-30 16:23 ` Qing Zhao 1 sibling, 1 reply; 56+ messages in thread From: Martin Sebor @ 2020-11-26 0:08 UTC (permalink / raw) To: Qing Zhao, Richard Biener; +Cc: Richard Sandiford, gcc Patches On 11/24/20 9:54 AM, Qing Zhao via Gcc-patches wrote: > > >> On Nov 24, 2020, at 9:55 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >>> >>> >>> >>>> On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote: >>>> >>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches >>>> <gcc-patches@gcc.gnu.org> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? >>>>> >>>>> If not, what’s the best way to traverse the local variables? >>>> >>>> Depends on what for. There's the source level view you get by walking >>>> BLOCK_VARS of the >>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and >>>> there's SSA names >>>> (FOR_EACH_SSA_NAME). >>> >>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are >>> not explicitly initialized in the declaration, the basic idea is following: >>> >>> ** The proposal: >>> >>> A. add a new GCC option: (same name and meaning as CLANG) >>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >>> >>> B. add a new attribute for variable: >>> __attribute((uninitialized) >>> the marked variable is uninitialized intentionaly for performance purpose. >>> >>> C. The implementation needs to keep the current static warning on uninitialized >>> variables untouched in order to avoid "forking the language". >>> >>> >>> ** The implementation: >>> >>> There are two major requirements for the implementation: >>> >>> 1. all auto-variables that do not have an explicit initializer should be initialized to >>> zero by this option. (Same behavior as CLANG) >>> >>> 2. keep the current static warning on uninitialized variables untouched. >>> >>> In order to satisfy 1, we should check whether an auto-variable has initializer >>> or not; >>> In order to satisfy 2, we should add this new transformation after >>> "pass_late_warn_uninitialized". >>> >>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, >>> If Not, then insert an initialization for it. >>> >>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? >> >> Yes, but do you want to catch variables promoted to register as well >> or just variables >> on the stack? > > I think both as long as they are source-level auto-variables. Then which one is better? > >> >>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>> unsigned decl_is_initialized :1; >>> >>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>> #define DECL_IS_INITIALIZED(NODE) \ >>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>> >>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>> even though DECL_INITIAL might be NULLed. >> >> For locals it would be more reliable to set this flag during gimplification. > > You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: > > if (VAR_P (decl) && !DECL_EXTERNAL (decl)) > { > tree init = DECL_INITIAL (decl); > ... > if (init && init != error_mark_node) > { > if (!TREE_STATIC (decl)) > { > DECL_IS_INITIALIZED(decl) = 1; > } > > Is this enough for all Frontends? Are there other places that I need to maintain this bit? > > >> >>> Do you have any comment and suggestions? >> >> As said above - do you want to cover registers as well as locals? > > All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). > If GCC adds this phase in FE, then the following design requirement > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. > > So, we have to add this new phase after “pass_late_warn_uninitialized”. > >> I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) > > Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. > > This is a really good point… > > In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best > for this issue). However, if we have to met the following requirement: > > C. The implementation needs to keep the current static warning on uninitialized > variables untouched in order to avoid "forking the language”. > > We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. > > So, this is a problem that is not easy to resolve. > > Do you have suggestion on this? Not having thought about it very long or hard I'd be tempted to do it the other way around. For each use of an uninitialized variable found, first either issue or queue up a -Wuninitialized for it and then initialize it. Then (if queued) at some later point, issue the queued up -Wuninitialized. The last part would be done in tree-ssa-uninit.c where the remaining uses of uninitialized variables would trigger warnings and induce their initialization (if there were any left). Martin ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-26 0:08 ` Martin Sebor @ 2020-11-30 16:23 ` Qing Zhao 2020-11-30 17:18 ` Martin Sebor 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-11-30 16:23 UTC (permalink / raw) To: Martin Sebor; +Cc: Richard Biener, Richard Sandiford, gcc Patches Hi, Martin, Thanks a lot for your suggestion. > On Nov 25, 2020, at 6:08 PM, Martin Sebor <msebor@gmail.com> wrote: > > On 11/24/20 9:54 AM, Qing Zhao via Gcc-patches wrote: >>> On Nov 24, 2020, at 9:55 AM, Richard Biener <richard.guenther@gmail.com> wrote: >>> >>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >>>> >>>> >>>> >>>>> On Nov 24, 2020, at 1:32 AM, Richard Biener <richard.guenther@gmail.com> wrote: >>>>> >>>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches >>>>> <gcc-patches@gcc.gnu.org> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? >>>>>> >>>>>> If not, what’s the best way to traverse the local variables? >>>>> >>>>> Depends on what for. There's the source level view you get by walking >>>>> BLOCK_VARS of the >>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and >>>>> there's SSA names >>>>> (FOR_EACH_SSA_NAME). >>>> >>>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are >>>> not explicitly initialized in the declaration, the basic idea is following: >>>> >>>> ** The proposal: >>>> >>>> A. add a new GCC option: (same name and meaning as CLANG) >>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >>>> >>>> B. add a new attribute for variable: >>>> __attribute((uninitialized) >>>> the marked variable is uninitialized intentionaly for performance purpose. >>>> >>>> C. The implementation needs to keep the current static warning on uninitialized >>>> variables untouched in order to avoid "forking the language". >>>> >>>> >>>> ** The implementation: >>>> >>>> There are two major requirements for the implementation: >>>> >>>> 1. all auto-variables that do not have an explicit initializer should be initialized to >>>> zero by this option. (Same behavior as CLANG) >>>> >>>> 2. keep the current static warning on uninitialized variables untouched. >>>> >>>> In order to satisfy 1, we should check whether an auto-variable has initializer >>>> or not; >>>> In order to satisfy 2, we should add this new transformation after >>>> "pass_late_warn_uninitialized". >>>> >>>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, >>>> If Not, then insert an initialization for it. >>>> >>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? >>> >>> Yes, but do you want to catch variables promoted to register as well >>> or just variables >>> on the stack? >> I think both as long as they are source-level auto-variables. Then which one is better? >>> >>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>> unsigned decl_is_initialized :1; >>>> >>>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>>> #define DECL_IS_INITIALIZED(NODE) \ >>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>> >>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>>> even though DECL_INITIAL might be NULLed. >>> >>> For locals it would be more reliable to set this flag during gimplification. >> You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: >> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >> { >> tree init = DECL_INITIAL (decl); >> ... >> if (init && init != error_mark_node) >> { >> if (!TREE_STATIC (decl)) >> { >> DECL_IS_INITIALIZED(decl) = 1; >> } >> Is this enough for all Frontends? Are there other places that I need to maintain this bit? >>> >>>> Do you have any comment and suggestions? >>> >>> As said above - do you want to cover registers as well as locals? >> All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). >> If GCC adds this phase in FE, then the following design requirement >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. >> So, we have to add this new phase after “pass_late_warn_uninitialized”. >>> I'd do >>> the actual zeroing during RTL expansion instead since otherwise you >>> have to figure youself whether a local is actually used (see expand_stack_vars) >> Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. >>> >>> Note that optimization will already made have use of "uninitialized" state >>> of locals so depending on what the actual goal is here "late" may be too late. >> This is a really good point… >> In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best >> for this issue). However, if we have to met the following requirement: >> C. The implementation needs to keep the current static warning on uninitialized >> variables untouched in order to avoid "forking the language”. >> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. >> So, this is a problem that is not easy to resolve. >> Do you have suggestion on this? > > Not having thought about it very long or hard I'd be tempted to do > it the other way around. For each use of an uninitialized variable > found, first either issue or queue up a -Wuninitialized for it and > then initialize it. Then (if queued) at some later point, issue > the queued up -Wuninitialized. The last part would be done in > tree-ssa-uninit.c where the remaining uses of uninitialized > variables would trigger warnings and induce their initialization > (if there were any left). The major issue with this approach is: There are two passes for uninitialized variable analysis: pass_early_warn_uninitialized pass_late_warn_uninitialized The early pass is placed at the very beginning of the tree optimizer. But the late pass is placed at the very late stage of the tree optimizer. If we add the initializations at the early pass, the result of the late pass will be changed by the new added initializations. This does not meet the requirement. Do I miss anything here? Qing > > Martin ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-30 16:23 ` Qing Zhao @ 2020-11-30 17:18 ` Martin Sebor 2020-11-30 23:05 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Martin Sebor @ 2020-11-30 17:18 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Biener, Richard Sandiford, gcc Patches On 11/30/20 9:23 AM, Qing Zhao wrote: > Hi, Martin, > > Thanks a lot for your suggestion. > >> On Nov 25, 2020, at 6:08 PM, Martin Sebor <msebor@gmail.com >> <mailto:msebor@gmail.com>> wrote: >> >> On 11/24/20 9:54 AM, Qing Zhao via Gcc-patches wrote: >>>> On Nov 24, 2020, at 9:55 AM, Richard Biener >>>> <richard.guenther@gmail.com <mailto:richard.guenther@gmail.com>> wrote: >>>> >>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com >>>> <mailto:QING.ZHAO@oracle.com>> wrote: >>>>> >>>>> >>>>> >>>>>> On Nov 24, 2020, at 1:32 AM, Richard Biener >>>>>> <richard.guenther@gmail.com <mailto:richard.guenther@gmail.com>> >>>>>> wrote: >>>>>> >>>>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches >>>>>> <gcc-patches@gcc.gnu.org <mailto:gcc-patches@gcc.gnu.org>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Does gcc provide an iterator to traverse all the local variables >>>>>>> that are declared in the current routine? >>>>>>> >>>>>>> If not, what’s the best way to traverse the local variables? >>>>>> >>>>>> Depends on what for. There's the source level view you get by walking >>>>>> BLOCK_VARS of the >>>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and >>>>>> there's SSA names >>>>>> (FOR_EACH_SSA_NAME). >>>>> >>>>> I am planing to add a new phase immediately after >>>>> “pass_late_warn_uninitialized” to initialize all auto-variables >>>>> that are >>>>> not explicitly initialized in the declaration, the basic idea is >>>>> following: >>>>> >>>>> ** The proposal: >>>>> >>>>> A. add a new GCC option: (same name and meaning as CLANG) >>>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >>>>> >>>>> B. add a new attribute for variable: >>>>> __attribute((uninitialized) >>>>> the marked variable is uninitialized intentionaly for performance >>>>> purpose. >>>>> >>>>> C. The implementation needs to keep the current static warning on >>>>> uninitialized >>>>> variables untouched in order to avoid "forking the language". >>>>> >>>>> >>>>> ** The implementation: >>>>> >>>>> There are two major requirements for the implementation: >>>>> >>>>> 1. all auto-variables that do not have an explicit initializer >>>>> should be initialized to >>>>> zero by this option. (Same behavior as CLANG) >>>>> >>>>> 2. keep the current static warning on uninitialized variables >>>>> untouched. >>>>> >>>>> In order to satisfy 1, we should check whether an auto-variable has >>>>> initializer >>>>> or not; >>>>> In order to satisfy 2, we should add this new transformation after >>>>> "pass_late_warn_uninitialized". >>>>> >>>>> So, we should be able to check whether an auto-variable has >>>>> initializer or not after “pass_late_warn_uninitialized”, >>>>> If Not, then insert an initialization for it. >>>>> >>>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? >>>> >>>> Yes, but do you want to catch variables promoted to register as well >>>> or just variables >>>> on the stack? >>> I think both as long as they are source-level auto-variables. Then >>> which one is better? >>>> >>>>> Another issue is, in order to check whether an auto-variable has >>>>> initializer, I plan to add a new bit in “decl_common” as: >>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>>> unsigned decl_is_initialized :1; >>>>> >>>>> /* IN VAR_DECL, set when the decl is initialized at the >>>>> declaration. */ >>>>> #define DECL_IS_INITIALIZED(NODE) \ >>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>>> >>>>> set this bit when setting DECL_INITIAL for the variables in FE. >>>>> then keep it >>>>> even though DECL_INITIAL might be NULLed. >>>> >>>> For locals it would be more reliable to set this flag during >>>> gimplification. >>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the >>> routine “gimpley_decl_expr” (gimplify.c) as following: >>> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >>> { >>> tree init = DECL_INITIAL (decl); >>> ... >>> if (init && init != error_mark_node) >>> { >>> if (!TREE_STATIC (decl)) >>> { >>> DECL_IS_INITIALIZED(decl) = 1; >>> } >>> Is this enough for all Frontends? Are there other places that I need >>> to maintain this bit? >>>> >>>>> Do you have any comment and suggestions? >>>> >>>> As said above - do you want to cover registers as well as locals? >>> All the locals from the source-code point of view should be covered. >>> (From my study so far, looks like that Clang adds that phase in FE). >>> If GCC adds this phase in FE, then the following design requirement >>> C. The implementation needs to keep the current static warning on >>> uninitialized >>> variables untouched in order to avoid "forking the language”. >>> cannot be satisfied. Since gcc’s uninitialized variables analysis is >>> applied quite late. >>> So, we have to add this new phase after “pass_late_warn_uninitialized”. >>>> I'd do >>>> the actual zeroing during RTL expansion instead since otherwise you >>>> have to figure youself whether a local is actually used (see >>>> expand_stack_vars) >>> Adding this new transformation during RTL expansion is okay. I will >>> check on this in more details to see how to add it to RTL expansion >>> phase. >>>> >>>> Note that optimization will already made have use of "uninitialized" >>>> state >>>> of locals so depending on what the actual goal is here "late" may be >>>> too late. >>> This is a really good point… >>> In order to avoid optimization to use the “uninitialized” state of >>> locals, we should add the zeroing phase as early as possible (adding >>> it in FE might be best >>> for this issue). However, if we have to met the following requirement: >>> C. The implementation needs to keep the current static warning on >>> uninitialized >>> variables untouched in order to avoid "forking the language”. >>> We have to move the new phase after all the uninitialized analysis is >>> done in order to avoid “forking the language”. >>> So, this is a problem that is not easy to resolve. >>> Do you have suggestion on this? >> >> Not having thought about it very long or hard I'd be tempted to do >> it the other way around. For each use of an uninitialized variable >> found, first either issue or queue up a -Wuninitialized for it and >> then initialize it. Then (if queued) at some later point, issue >> the queued up -Wuninitialized. The last part would be done in >> tree-ssa-uninit.c where the remaining uses of uninitialized >> variables would trigger warnings and induce their initialization >> (if there were any left). > > > The major issue with this approach is: > > There are two passes for uninitialized variable analysis: > pass_early_warn_uninitialized > pass_late_warn_uninitialized > > The early pass is placed at the very beginning of the tree optimizer. > But the late pass is placed at the very late stage of the tree optimizer. > If we add the initializations at the early pass, the result of the late > pass will be changed by the new added initializations. This does not meet > the requirement. > > Do I miss anything here? I'm not sure. As I said, I'd consider issuing (or queuing up for issuing later) -Wuninitialized at the same time as initializing the uninitialized variables. With that approach I'd expect to diagnose all the same instances of uninitialized uses as the two passes do today (actually, I'd expect to diagnose more of them, including those Richard referred to above whose uninitialized state may have been made use of for optimization decisions(*)). Also with this approach the two existing warning passes would cease to serve their current purpose of hunting down uninitialized variables because by the time they ran all their uses would have been initialized (and warnings issued). One question in my mind is what to do with -Wmaybe-uninitialized. Should those also be initialized, even though they're not necessarily used? Or are you only hoping to tackle -Wuninitialized? Martin [*] With the initialization approach I'd expect concerns about the cost of losing those optimization opportunities. Although those could be addressed by making the initialization optional (i.e., opt-in). ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-30 17:18 ` Martin Sebor @ 2020-11-30 23:05 ` Qing Zhao 0 siblings, 0 replies; 56+ messages in thread From: Qing Zhao @ 2020-11-30 23:05 UTC (permalink / raw) To: Martin Sebor; +Cc: Richard Biener, Richard Sandiford, gcc Patches On Nov 30, 2020, at 11:18 AM, Martin Sebor <msebor@gmail.com> wrote: >>>>>>>> Does gcc provide an iterator to traverse all the local variables that are declared in the current routine? >>>>>>>> >>>>>>>> If not, what’s the best way to traverse the local variables? >>>>>>> >>>>>>> Depends on what for. There's the source level view you get by walking >>>>>>> BLOCK_VARS of the >>>>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and >>>>>>> there's SSA names >>>>>>> (FOR_EACH_SSA_NAME). >>>>>> >>>>>> I am planing to add a new phase immediately after “pass_late_warn_uninitialized” to initialize all auto-variables that are >>>>>> not explicitly initialized in the declaration, the basic idea is following: >>>>>> >>>>>> ** The proposal: >>>>>> >>>>>> A. add a new GCC option: (same name and meaning as CLANG) >>>>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG; >>>>>> >>>>>> B. add a new attribute for variable: >>>>>> __attribute((uninitialized) >>>>>> the marked variable is uninitialized intentionaly for performance purpose. >>>>>> >>>>>> C. The implementation needs to keep the current static warning on uninitialized >>>>>> variables untouched in order to avoid "forking the language". >>>>>> >>>>>> >>>>>> ** The implementation: >>>>>> >>>>>> There are two major requirements for the implementation: >>>>>> >>>>>> 1. all auto-variables that do not have an explicit initializer should be initialized to >>>>>> zero by this option. (Same behavior as CLANG) >>>>>> >>>>>> 2. keep the current static warning on uninitialized variables untouched. >>>>>> >>>>>> In order to satisfy 1, we should check whether an auto-variable has initializer >>>>>> or not; >>>>>> In order to satisfy 2, we should add this new transformation after >>>>>> "pass_late_warn_uninitialized". >>>>>> >>>>>> So, we should be able to check whether an auto-variable has initializer or not after “pass_late_warn_uninitialized”, >>>>>> If Not, then insert an initialization for it. >>>>>> >>>>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better? >>>>> >>>>> Yes, but do you want to catch variables promoted to register as well >>>>> or just variables >>>>> on the stack? >>>> I think both as long as they are source-level auto-variables. Then which one is better? >>>>> >>>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>>>> unsigned decl_is_initialized :1; >>>>>> >>>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>>>>> #define DECL_IS_INITIALIZED(NODE) \ >>>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>>>> >>>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>>>>> even though DECL_INITIAL might be NULLed. >>>>> >>>>> For locals it would be more reliable to set this flag during gimplification. >>>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)” inside the routine “gimpley_decl_expr” (gimplify.c) as following: >>>> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >>>> { >>>> tree init = DECL_INITIAL (decl); >>>> ... >>>> if (init && init != error_mark_node) >>>> { >>>> if (!TREE_STATIC (decl)) >>>> { >>>> DECL_IS_INITIALIZED(decl) = 1; >>>> } >>>> Is this enough for all Frontends? Are there other places that I need to maintain this bit? >>>>> >>>>>> Do you have any comment and suggestions? >>>>> >>>>> As said above - do you want to cover registers as well as locals? >>>> All the locals from the source-code point of view should be covered. (From my study so far, looks like that Clang adds that phase in FE). >>>> If GCC adds this phase in FE, then the following design requirement >>>> C. The implementation needs to keep the current static warning on uninitialized >>>> variables untouched in order to avoid "forking the language”. >>>> cannot be satisfied. Since gcc’s uninitialized variables analysis is applied quite late. >>>> So, we have to add this new phase after “pass_late_warn_uninitialized”. >>>>> I'd do >>>>> the actual zeroing during RTL expansion instead since otherwise you >>>>> have to figure youself whether a local is actually used (see expand_stack_vars) >>>> Adding this new transformation during RTL expansion is okay. I will check on this in more details to see how to add it to RTL expansion phase. >>>>> >>>>> Note that optimization will already made have use of "uninitialized" state >>>>> of locals so depending on what the actual goal is here "late" may be too late. >>>> This is a really good point… >>>> In order to avoid optimization to use the “uninitialized” state of locals, we should add the zeroing phase as early as possible (adding it in FE might be best >>>> for this issue). However, if we have to met the following requirement: >>>> C. The implementation needs to keep the current static warning on uninitialized >>>> variables untouched in order to avoid "forking the language”. >>>> We have to move the new phase after all the uninitialized analysis is done in order to avoid “forking the language”. >>>> So, this is a problem that is not easy to resolve. >>>> Do you have suggestion on this? >>> >>> Not having thought about it very long or hard I'd be tempted to do >>> it the other way around. For each use of an uninitialized variable >>> found, first either issue or queue up a -Wuninitialized for it and >>> then initialize it. Then (if queued) at some later point, issue >>> the queued up -Wuninitialized. The last part would be done in >>> tree-ssa-uninit.c where the remaining uses of uninitialized >>> variables would trigger warnings and induce their initialization >>> (if there were any left). >> The major issue with this approach is: >> There are two passes for uninitialized variable analysis: >> pass_early_warn_uninitialized >> pass_late_warn_uninitialized >> The early pass is placed at the very beginning of the tree optimizer. But the late pass is placed at the very late stage of the tree optimizer. >> If we add the initializations at the early pass, the result of the late pass will be changed by the new added initializations. This does not meet >> the requirement. >> Do I miss anything here? > > I'm not sure. As I said, I'd consider issuing (or queuing up for > issuing later) -Wuninitialized at the same time as initializing > the uninitialized variables. I have considered this approach in the very beginning of my study, but later I realized that it would not work. For example, for the following small example: qinzhao@gcc10:~/Bugs/auto-init$ cat t1.c void blah(int); int foo_2 (int n, int l, int m, int r) { int v; if ( (n < 10) && (m != 100) && (r < 20) ) v = r; if (l > 100) if ( (n <= 8) && (m < 102) && (r < 19) ) blah(v); /* { dg-warning "uninitialized" "real warning" } */ return 0; } With the latest gcc and the following options: qinzhao@gcc10:~/Bugs/auto-init$ /home/qinzhao/Install/latest_write/bin/gcc -Wuninitialized -Wmaybe-uninitialized -S t1.c qinzhao@gcc10:~/Bugs/auto-init$ We can see that there is no any uninitialized warning issued by the latest gcc if no optimization is specified. But for this case, It’s clear that we should insert a zero initializer for auto-variable “v” even though the current uninitialized variable analysis pass is not able to determine “v” is not initialized in some execution paths. The above is just a simple example to show that we cannot rely on the result of the uninitialized variable analysis pass to decide which variable should be initialized. For security purpose, we should conservatively initialize all auto-variables that might not be initialized. i.e, for all the auto-variables that do not have an explicit initializer in source code level, we should insert initializer for them. This is the current behavior of LLVM with -ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang. I believe that GCC should do the same thing for the security benefit. > With that approach I'd expect to > diagnose all the same instances of uninitialized uses as the two > passes do today (actually, I'd expect to diagnose more of them, > including those Richard referred to above whose uninitialized > state may have been made use of for optimization decisions(*)). > Also with this approach the two existing warning passes would > cease to serve their current purpose of hunting down uninitialized > variables because by the time they ran all their uses would have > been initialized (and warnings issued). Inserting the zero-initializer before pass_late_warn_uninitialized will invalid the current uninitialized variable analysis, which is unacceptable based on my current understanding. > One question in my mind is what to do with -Wmaybe-uninitialized. > Should those also be initialized, even though they're not necessarily > used? Or are you only hoping to tackle -Wuninitialized? All the auto-variables that might not be initialized should be initialized with the new option. The decision on which auto-variable should be initialized should based on the source code level initializer: If an auto-variable does not have a source code level initializer, the compiler should add a zero-initializer for it. Qing > > Martin > > [*] With the initialization approach I'd expect concerns about > the cost of losing those optimization opportunities. Although > those could be addressed by making the initialization optional > (i.e., opt-in). ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-11-24 15:55 ` Richard Biener 2020-11-24 16:54 ` Qing Zhao @ 2020-12-03 17:32 ` Richard Sandiford 2020-12-03 23:04 ` Qing Zhao 2020-12-04 8:50 ` Richard Biener 1 sibling, 2 replies; 56+ messages in thread From: Richard Sandiford @ 2020-12-03 17:32 UTC (permalink / raw) To: Richard Biener via Gcc-patches Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. > > For locals it would be more reliable to set this flag during gimplification. > >> Do you have any comment and suggestions? > > As said above - do you want to cover registers as well as locals? I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. Haven't thought about this much, so it might be a daft idea, but would a compromise be to use a const internal function: X1 = .DEFERRED_INIT (X0, INIT) where the X0 argument is an uninitialised value and the INIT argument describes the initialisation pattern? So for a decl we'd have: X = .DEFERRED_INIT (X, INIT) and for an SSA name we'd have: X_2 = .DEFERRED_INIT (X_1(D), INIT) with all other uses of X_1(D) being replaced by X_2. The idea is that: * Having the X0 argument would keep the uninitialised use of the variable around for the later warning passes. * Using a const function should still allow the UB to be deleted as dead if X1 isn't needed. * Having a function in the way should stop passes from taking advantage of direct uninitialised uses for optimisation. This means we won't be able to optimise based on the actual init value at the gimple level, but that seems like a fair trade-off. AIUI this is really a security feature or anti-UB hardening feature (in the sense that users are more likely to see predictable behaviour “in the field” even if the program has UB). Thanks, Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-03 17:32 ` Richard Sandiford @ 2020-12-03 23:04 ` Qing Zhao 2020-12-04 8:50 ` Richard Biener 1 sibling, 0 replies; 56+ messages in thread From: Qing Zhao @ 2020-12-03 23:04 UTC (permalink / raw) To: Richard Sandiford; +Cc: Richard Biener via Gcc-patches, Richard Biener Hi, Richard, Thanks a lot for your suggestion. Actually, I like this idea. My understanding of your suggestion is: 1. During gimplification phase: For each auto-variable that does not have an explicit initializer, insert the following initializer for it: X = DEFERRED_INIT (X, INIT) In which, DEFERRED_INIT is an internal const function, which can be defined as: DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) It’s two arguments are: 1st argument: this uninitialized auto-variable; 2nd argument: initialized pattern (zero | pattern); 2. During tree to SSA phase: No change, the current tree to SSA phase should automatically change the above new inserted statement as X_2 = DEFERRED_INIT (X_1(D), INIT); And all other uses of X-1(D) being replaced by X_2. 3. During expanding phase: Expand each call to “DEFERRED_INIT (X, INIT)” to zero or pattern depends on “INIT”. Is the above understanding correct? Do I miss anything? More comments and questions are embedded below: > On Dec 3, 2020, at 11:32 AM, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>> unsigned decl_is_initialized :1; >>> >>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>> #define DECL_IS_INITIALIZED(NODE) \ >>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>> >>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>> even though DECL_INITIAL might be NULLed. >> >> For locals it would be more reliable to set this flag during gimplification. >> >>> Do you have any comment and suggestions? >> >> As said above - do you want to cover registers as well as locals? I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. > > Haven't thought about this much, so it might be a daft idea, but would a > compromise be to use a const internal function: > > X1 = .DEFERRED_INIT (X0, INIT) > > where the X0 argument is an uninitialised value and the INIT argument > describes the initialisation pattern? So for a decl we'd have: > > X = .DEFERRED_INIT (X, INIT) > > and for an SSA name we'd have: > > X_2 = .DEFERRED_INIT (X_1(D), INIT) > > with all other uses of X_1(D) being replaced by X_2. The idea is that: > > * Having the X0 argument would keep the uninitialised use of the > variable around for the later warning passes. > > * Using a const function should still allow the UB to be deleted as dead > if X1 isn't needed. So, current GCC will delete the UB as dead code when X1 is not needed, with The new option, we should keep this behavior? > > * Having a function in the way should stop passes from taking advantage > of direct uninitialised uses for optimisation. This will resolve the issue we raised before with directly adding “artificial” zero-initializer during gimplification. However, I am wondering whether the new added const internal functions will impact the optimization and then change the uninitialized analysis behavior? > > This means we won't be able to optimise based on the actual init > value at the gimple level, but that seems like a fair trade-off. Yes, with this approach: At gimple level, we will not be able to optimize on the new added init values; At RTL level, we will optimize on the new added init values; RTL optimizations will be able to eliminate any redundancy introduced by this new Initializations to reduce the cost of this options. > AIUI this is really a security feature or anti-UB hardening feature > (in the sense that users are more likely to see predictable behaviour > “in the field” even if the program has UB). Yes, this option is for security purpose, and currently have been used in productions by Microsoft, Apple and google, etc. Qing > > Thanks, > Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-03 17:32 ` Richard Sandiford 2020-12-03 23:04 ` Qing Zhao @ 2020-12-04 8:50 ` Richard Biener 2020-12-04 16:19 ` Qing Zhao 2020-12-07 17:21 ` How to traverse all the local variables that declared in the current routine? Richard Sandiford 1 sibling, 2 replies; 56+ messages in thread From: Richard Biener @ 2020-12-04 8:50 UTC (permalink / raw) To: Richard Biener via Gcc-patches, Qing Zhao, Richard Biener, Richard Sandiford On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > >> unsigned decl_is_initialized :1; > >> > >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > >> #define DECL_IS_INITIALIZED(NODE) \ > >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > >> > >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it > >> even though DECL_INITIAL might be NULLed. > > > > For locals it would be more reliable to set this flag during gimplification. > > > >> Do you have any comment and suggestions? > > > > As said above - do you want to cover registers as well as locals? I'd do > > the actual zeroing during RTL expansion instead since otherwise you > > have to figure youself whether a local is actually used (see expand_stack_vars) > > > > Note that optimization will already made have use of "uninitialized" state > > of locals so depending on what the actual goal is here "late" may be too late. > > Haven't thought about this much, so it might be a daft idea, but would a > compromise be to use a const internal function: > > X1 = .DEFERRED_INIT (X0, INIT) > > where the X0 argument is an uninitialised value and the INIT argument > describes the initialisation pattern? So for a decl we'd have: > > X = .DEFERRED_INIT (X, INIT) > > and for an SSA name we'd have: > > X_2 = .DEFERRED_INIT (X_1(D), INIT) > > with all other uses of X_1(D) being replaced by X_2. The idea is that: > > * Having the X0 argument would keep the uninitialised use of the > variable around for the later warning passes. > > * Using a const function should still allow the UB to be deleted as dead > if X1 isn't needed. > > * Having a function in the way should stop passes from taking advantage > of direct uninitialised uses for optimisation. > > This means we won't be able to optimise based on the actual init > value at the gimple level, but that seems like a fair trade-off. > AIUI this is really a security feature or anti-UB hardening feature > (in the sense that users are more likely to see predictable behaviour > “in the field” even if the program has UB). The question is whether it's in line of peoples expectation that explicitely zero-initialized code behaves differently from implicitely zero-initialized code with respect to optimization and secondary side-effects (late diagnostics, latent bugs, etc.). Introducing a new concept like .DEFERRED_INIT is much more heavy-weight than an explicit zero initializer. As for optimization I fear you'll get a load of redundant zero-init actually emitted if you can just rely on RTL DSE/DCE to remove it. Btw, I don't think theres any reason to cling onto clangs semantics for a particular switch. We'll never be able to emulate 1:1 behavior and our -Wuninit behavior is probably wastly different already. Richard. > Thanks, > Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-04 8:50 ` Richard Biener @ 2020-12-04 16:19 ` Qing Zhao 2020-12-07 7:12 ` Richard Biener 2020-12-07 17:21 ` How to traverse all the local variables that declared in the current routine? Richard Sandiford 1 sibling, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-04 16:19 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Biener via Gcc-patches, Richard Sandiford > On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford > <richard.sandiford@arm.com <mailto:richard.sandiford@arm.com>> wrote: >> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>> unsigned decl_is_initialized :1; >>>> >>>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>>> #define DECL_IS_INITIALIZED(NODE) \ >>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>> >>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>>> even though DECL_INITIAL might be NULLed. >>> >>> For locals it would be more reliable to set this flag during gimplification. >>> >>>> Do you have any comment and suggestions? >>> >>> As said above - do you want to cover registers as well as locals? I'd do >>> the actual zeroing during RTL expansion instead since otherwise you >>> have to figure youself whether a local is actually used (see expand_stack_vars) >>> >>> Note that optimization will already made have use of "uninitialized" state >>> of locals so depending on what the actual goal is here "late" may be too late. >> >> Haven't thought about this much, so it might be a daft idea, but would a >> compromise be to use a const internal function: >> >> X1 = .DEFERRED_INIT (X0, INIT) >> >> where the X0 argument is an uninitialised value and the INIT argument >> describes the initialisation pattern? So for a decl we'd have: >> >> X = .DEFERRED_INIT (X, INIT) >> >> and for an SSA name we'd have: >> >> X_2 = .DEFERRED_INIT (X_1(D), INIT) >> >> with all other uses of X_1(D) being replaced by X_2. The idea is that: >> >> * Having the X0 argument would keep the uninitialised use of the >> variable around for the later warning passes. >> >> * Using a const function should still allow the UB to be deleted as dead >> if X1 isn't needed. >> >> * Having a function in the way should stop passes from taking advantage >> of direct uninitialised uses for optimisation. >> >> This means we won't be able to optimise based on the actual init >> value at the gimple level, but that seems like a fair trade-off. >> AIUI this is really a security feature or anti-UB hardening feature >> (in the sense that users are more likely to see predictable behaviour >> “in the field” even if the program has UB). > > The question is whether it's in line of peoples expectation that > explicitely zero-initialized code behaves differently from > implicitely zero-initialized code with respect to optimization > and secondary side-effects (late diagnostics, latent bugs, etc.). > > Introducing a new concept like .DEFERRED_INIT is much more > heavy-weight than an explicit zero initializer. What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass the “uninitialized” info from source code level to “pass_expand”. If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should be much easier and simpler, and also smaller run-time overhead. > > As for optimization I fear you'll get a load of redundant zero-init > actually emitted if you can just rely on RTL DSE/DCE to remove it. Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero Initialization since it will be used in production build. We can do some run-time performance evaluation when we have an implementation ready. > > Btw, I don't think theres any reason to cling onto clangs semantics > for a particular switch. We'll never be able to emulate 1:1 behavior > and our -Wuninit behavior is probably wastly different already. From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. For example, for the following small testing case: void blah(int); int foo_2 (int n, int l, int m, int r) { int v; if ( (n > 10) && (m != 100) && (r < 20) ) v = r; if (l > 100) if ( (n <= 8) && (m < 102) && (r < 19) ) blah(v); /* { dg-warning "uninitialized" "real warning" } */ return 0; } GCC is able to report maybe uninitialized warning, but Clang cannot. Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. Really curious on how clang implement its uninitialized analysis? Qing > > Richard. > >> Thanks, >> Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-04 16:19 ` Qing Zhao @ 2020-12-07 7:12 ` Richard Biener 2020-12-07 16:20 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2020-12-07 7:12 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Biener via Gcc-patches, Richard Sandiford On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag during gimplification. > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > Haven't thought about this much, so it might be a daft idea, but would a > compromise be to use a const internal function: > > X1 = .DEFERRED_INIT (X0, INIT) > > where the X0 argument is an uninitialised value and the INIT argument > describes the initialisation pattern? So for a decl we'd have: > > X = .DEFERRED_INIT (X, INIT) > > and for an SSA name we'd have: > > X_2 = .DEFERRED_INIT (X_1(D), INIT) > > with all other uses of X_1(D) being replaced by X_2. The idea is that: > > * Having the X0 argument would keep the uninitialised use of the > variable around for the later warning passes. > > * Using a const function should still allow the UB to be deleted as dead > if X1 isn't needed. > > * Having a function in the way should stop passes from taking advantage > of direct uninitialised uses for optimisation. > > This means we won't be able to optimise based on the actual init > value at the gimple level, but that seems like a fair trade-off. > AIUI this is really a security feature or anti-UB hardening feature > (in the sense that users are more likely to see predictable behaviour > “in the field” even if the program has UB). > > > The question is whether it's in line of peoples expectation that > explicitely zero-initialized code behaves differently from > implicitely zero-initialized code with respect to optimization > and secondary side-effects (late diagnostics, latent bugs, etc.). > > Introducing a new concept like .DEFERRED_INIT is much more > heavy-weight than an explicit zero initializer. > > > What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? > > The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass > the “uninitialized” info from source code level to “pass_expand”. Well, "untouched" is a bit oversimplified. You do need to handle .DEFERRED_INIT as not being an initialization which will definitely get interesting. > If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. > > However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should > be much easier and simpler, and also smaller run-time overhead. > > > As for optimization I fear you'll get a load of redundant zero-init > actually emitted if you can just rely on RTL DSE/DCE to remove it. > > > Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero > Initialization since it will be used in production build. > We can do some run-time performance evaluation when we have an implementation ready. Note there will be other passes "confused" by .DEFERRED_INIT. Note that there's going to be other considerations - namely where to emit the .DEFERRED_INIT - when emitting it during gimplification you can emit it at the start of the block of block-scope variables. When emitting after gimplification you have to emit at function start which will probably make stack slot sharing inefficient because the deferred init will cause overlapping lifetimes. With emitting at block boundary the .DEFERRED_INIT will act as code-motion barrier (and it itself likely cannot be moved) so for example invariant motion will no longer happen. Likewise optimizations like SRA will be confused by .DEFERRED_INIT which again will lead to bigger stack usage (and less optimization). But sure, you can try implement a few variants but definitely .DEFERRED_INIT will be the most work. > Btw, I don't think theres any reason to cling onto clangs semantics > for a particular switch. We'll never be able to emulate 1:1 behavior > and our -Wuninit behavior is probably wastly different already. > > > From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. > > For example, for the following small testing case: > void blah(int); > > int foo_2 (int n, int l, int m, int r) > { > int v; > > if ( (n > 10) && (m != 100) && (r < 20) ) > v = r; > > if (l > 100) > if ( (n <= 8) && (m < 102) && (r < 19) ) > blah(v); /* { dg-warning "uninitialized" "real warning" } */ > > return 0; > } > > GCC is able to report maybe uninitialized warning, but Clang cannot. > Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. > > Really curious on how clang implement its uninitialized analysis? > > Qing > > > > > Richard. > > Thanks, > Richard > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-07 7:12 ` Richard Biener @ 2020-12-07 16:20 ` Qing Zhao 2020-12-07 17:10 ` Richard Sandiford 2020-12-08 7:40 ` Richard Biener 0 siblings, 2 replies; 56+ messages in thread From: Qing Zhao @ 2020-12-07 16:20 UTC (permalink / raw) To: Richard Biener, Richard Sandiford; +Cc: Richard Biener via Gcc-patches > On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote: >> >> >> >> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford >> <richard.sandiford@arm.com> wrote: >> >> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> >> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. >> >> >> For locals it would be more reliable to set this flag during gimplification. >> >> Do you have any comment and suggestions? >> >> >> As said above - do you want to cover registers as well as locals? I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. >> >> >> Haven't thought about this much, so it might be a daft idea, but would a >> compromise be to use a const internal function: >> >> X1 = .DEFERRED_INIT (X0, INIT) >> >> where the X0 argument is an uninitialised value and the INIT argument >> describes the initialisation pattern? So for a decl we'd have: >> >> X = .DEFERRED_INIT (X, INIT) >> >> and for an SSA name we'd have: >> >> X_2 = .DEFERRED_INIT (X_1(D), INIT) >> >> with all other uses of X_1(D) being replaced by X_2. The idea is that: >> >> * Having the X0 argument would keep the uninitialised use of the >> variable around for the later warning passes. >> >> * Using a const function should still allow the UB to be deleted as dead >> if X1 isn't needed. >> >> * Having a function in the way should stop passes from taking advantage >> of direct uninitialised uses for optimisation. >> >> This means we won't be able to optimise based on the actual init >> value at the gimple level, but that seems like a fair trade-off. >> AIUI this is really a security feature or anti-UB hardening feature >> (in the sense that users are more likely to see predictable behaviour >> “in the field” even if the program has UB). >> >> >> The question is whether it's in line of peoples expectation that >> explicitely zero-initialized code behaves differently from >> implicitely zero-initialized code with respect to optimization >> and secondary side-effects (late diagnostics, latent bugs, etc.). >> >> Introducing a new concept like .DEFERRED_INIT is much more >> heavy-weight than an explicit zero initializer. >> >> >> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? >> >> The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass >> the “uninitialized” info from source code level to “pass_expand”. > > Well, "untouched" is a bit oversimplified. You do need to handle > .DEFERRED_INIT as not > being an initialization which will definitely get interesting. Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. >> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. >> >> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should >> be much easier and simpler, and also smaller run-time overhead. >> >> >> As for optimization I fear you'll get a load of redundant zero-init >> actually emitted if you can just rely on RTL DSE/DCE to remove it. >> >> >> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero >> Initialization since it will be used in production build. >> We can do some run-time performance evaluation when we have an implementation ready. > > Note there will be other passes "confused" by .DEFERRED_INIT. Note > that there's going to be other > considerations - namely where to emit the .DEFERRED_INIT - when > emitting it during gimplification > you can emit it at the start of the block of block-scope variables. > When emitting after gimplification > you have to emit at function start which will probably make stack slot > sharing inefficient because > the deferred init will cause overlapping lifetimes. With emitting at > block boundary the .DEFERRED_INIT > will act as code-motion barrier (and it itself likely cannot be moved) > so for example invariant motion > will no longer happen. Likewise optimizations like SRA will be > confused by .DEFERRED_INIT which > again will lead to bigger stack usage (and less optimization). Yes, looks like that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations. > > But sure, you can try implement a few variants but definitely > .DEFERRED_INIT will be the most > work. How about implement the following two approaches and compare the run-time cost: A. Insert the real initialization during gimplification phase. B. Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase. The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC. The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC. And then decide which approach we will go with? What’s your opinion on this? > >> Btw, I don't think theres any reason to cling onto clangs semantics >> for a particular switch. We'll never be able to emulate 1:1 behavior >> and our -Wuninit behavior is probably wastly different already. >> >> >> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. >> >> For example, for the following small testing case: >> void blah(int); >> >> int foo_2 (int n, int l, int m, int r) >> { >> int v; >> >> if ( (n > 10) && (m != 100) && (r < 20) ) >> v = r; >> >> if (l > 100) >> if ( (n <= 8) && (m < 102) && (r < 19) ) >> blah(v); /* { dg-warning "uninitialized" "real warning" } */ >> >> return 0; >> } >> >> GCC is able to report maybe uninitialized warning, but Clang cannot. >> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. >> >> Really curious on how clang implement its uninitialized analysis? Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday. And noticed that CLANG has a data flow analysis phase based on CLANG's AST. http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html And clang’s uninitialized analysis is based on this data flow analysis. Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward. However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase, Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG. Qing >> >> Qing >> >> >> >> >> Richard. >> >> Thanks, >> Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-07 16:20 ` Qing Zhao @ 2020-12-07 17:10 ` Richard Sandiford 2020-12-07 17:36 ` Qing Zhao 2020-12-08 7:40 ` Richard Biener 1 sibling, 1 reply; 56+ messages in thread From: Richard Sandiford @ 2020-12-07 17:10 UTC (permalink / raw) To: Qing Zhao via Gcc-patches Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote: >>> >>> >>> >>> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: >>> >>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford >>> <richard.sandiford@arm.com> wrote: >>> >>> >>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >>> >>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >>> >>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>> unsigned decl_is_initialized :1; >>> >>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>> #define DECL_IS_INITIALIZED(NODE) \ >>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>> >>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>> even though DECL_INITIAL might be NULLed. >>> >>> >>> For locals it would be more reliable to set this flag during gimplification. >>> >>> Do you have any comment and suggestions? >>> >>> >>> As said above - do you want to cover registers as well as locals? I'd do >>> the actual zeroing during RTL expansion instead since otherwise you >>> have to figure youself whether a local is actually used (see expand_stack_vars) >>> >>> Note that optimization will already made have use of "uninitialized" state >>> of locals so depending on what the actual goal is here "late" may be too late. >>> >>> >>> Haven't thought about this much, so it might be a daft idea, but would a >>> compromise be to use a const internal function: >>> >>> X1 = .DEFERRED_INIT (X0, INIT) >>> >>> where the X0 argument is an uninitialised value and the INIT argument >>> describes the initialisation pattern? So for a decl we'd have: >>> >>> X = .DEFERRED_INIT (X, INIT) >>> >>> and for an SSA name we'd have: >>> >>> X_2 = .DEFERRED_INIT (X_1(D), INIT) >>> >>> with all other uses of X_1(D) being replaced by X_2. The idea is that: >>> >>> * Having the X0 argument would keep the uninitialised use of the >>> variable around for the later warning passes. >>> >>> * Using a const function should still allow the UB to be deleted as dead >>> if X1 isn't needed. >>> >>> * Having a function in the way should stop passes from taking advantage >>> of direct uninitialised uses for optimisation. >>> >>> This means we won't be able to optimise based on the actual init >>> value at the gimple level, but that seems like a fair trade-off. >>> AIUI this is really a security feature or anti-UB hardening feature >>> (in the sense that users are more likely to see predictable behaviour >>> “in the field” even if the program has UB). >>> >>> >>> The question is whether it's in line of peoples expectation that >>> explicitely zero-initialized code behaves differently from >>> implicitely zero-initialized code with respect to optimization >>> and secondary side-effects (late diagnostics, latent bugs, etc.). >>> >>> Introducing a new concept like .DEFERRED_INIT is much more >>> heavy-weight than an explicit zero initializer. >>> >>> >>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? >>> >>> The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass >>> the “uninitialized” info from source code level to “pass_expand”. >> >> Well, "untouched" is a bit oversimplified. You do need to handle >> .DEFERRED_INIT as not >> being an initialization which will definitely get interesting. > > Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. Are you sure we need to do that? The point of having the first argument to .DEFERRED_INIT was that that argument would still provide an uninitialised use of the variable. And the values are passed and returned by value, so the lack of initialisation is explicit in the gcall itself, without knowing what the target function does. The idea is that we can essentially treat .DEFERRED_INIT as a normal (const) function call. I'd be surprised if many passes needed to handle it specially. Thanks, Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-07 17:10 ` Richard Sandiford @ 2020-12-07 17:36 ` Qing Zhao 2020-12-07 18:05 ` Richard Sandiford 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-07 17:36 UTC (permalink / raw) To: Richard Sandiford; +Cc: Qing Zhao via Gcc-patches, Richard Biener > On Dec 7, 2020, at 11:10 AM, Richard Sandiford <richard.sandiford@arm.com> wrote: >>>> >>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>> unsigned decl_is_initialized :1; >>>> >>>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>>> #define DECL_IS_INITIALIZED(NODE) \ >>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>> >>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>>> even though DECL_INITIAL might be NULLed. >>>> >>>> >>>> For locals it would be more reliable to set this flag during gimplification. >>>> >>>> Do you have any comment and suggestions? >>>> >>>> >>>> As said above - do you want to cover registers as well as locals? I'd do >>>> the actual zeroing during RTL expansion instead since otherwise you >>>> have to figure youself whether a local is actually used (see expand_stack_vars) >>>> >>>> Note that optimization will already made have use of "uninitialized" state >>>> of locals so depending on what the actual goal is here "late" may be too late. >>>> >>>> >>>> Haven't thought about this much, so it might be a daft idea, but would a >>>> compromise be to use a const internal function: >>>> >>>> X1 = .DEFERRED_INIT (X0, INIT) >>>> >>>> where the X0 argument is an uninitialised value and the INIT argument >>>> describes the initialisation pattern? So for a decl we'd have: >>>> >>>> X = .DEFERRED_INIT (X, INIT) >>>> >>>> and for an SSA name we'd have: >>>> >>>> X_2 = .DEFERRED_INIT (X_1(D), INIT) >>>> >>>> with all other uses of X_1(D) being replaced by X_2. The idea is that: >>>> >>>> * Having the X0 argument would keep the uninitialised use of the >>>> variable around for the later warning passes. >>>> >>>> * Using a const function should still allow the UB to be deleted as dead >>>> if X1 isn't needed. >>>> >>>> * Having a function in the way should stop passes from taking advantage >>>> of direct uninitialised uses for optimisation. >>>> >>>> This means we won't be able to optimise based on the actual init >>>> value at the gimple level, but that seems like a fair trade-off. >>>> AIUI this is really a security feature or anti-UB hardening feature >>>> (in the sense that users are more likely to see predictable behaviour >>>> “in the field” even if the program has UB). >>>> >>>> >>>> The question is whether it's in line of peoples expectation that >>>> explicitely zero-initialized code behaves differently from >>>> implicitely zero-initialized code with respect to optimization >>>> and secondary side-effects (late diagnostics, latent bugs, etc.). >>>> >>>> Introducing a new concept like .DEFERRED_INIT is much more >>>> heavy-weight than an explicit zero initializer. >>>> >>>> >>>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? >>>> >>>> The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass >>>> the “uninitialized” info from source code level to “pass_expand”. >>> >>> Well, "untouched" is a bit oversimplified. You do need to handle >>> .DEFERRED_INIT as not >>> being an initialization which will definitely get interesting. >> >> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. > > Are you sure we need to do that? The point of having the first argument > to .DEFERRED_INIT was that that argument would still provide an > uninitialised use of the variable. And the values are passed and > returned by value, so the lack of initialisation is explicit in > the gcall itself, without knowing what the target function does. > > The idea is that we can essentially treat .DEFERRED_INIT as a normal > (const) function call. I'd be surprised if many passes needed to > handle it specially. > Just checked with a small testing case (to emulate the .DEFERRED_INIT approach): qinzhao@gcc10:~/Bugs/auto-init$ cat t.c extern int DEFFERED_INIT (int, int) __attribute__ ((const)); int foo (int n, int r) { int v; v = DEFFERED_INIT (v, 0); if (n < 10) v = r; return v; } qinzhao@gcc10:~/Bugs/auto-init$ sh t /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all -S t.c t.c: In function ‘foo’: t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized] 7 | v = DEFFERED_INIT (v, 0); | ^~~~~~~~~~~~~~~~~~~~ We can see that the current uninitialized variable analysis treats the new added artificial initialization as the first use of the uninialized variable. Therefore report the warning there. However, we should report warning at “return v”. So, I think that we still need to specifically handle the new added artificial initialization during uninitialized analysis phase. Do I still miss anything? Qing > Thanks, > Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-07 17:36 ` Qing Zhao @ 2020-12-07 18:05 ` Richard Sandiford 2020-12-07 18:34 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Sandiford @ 2020-12-07 18:05 UTC (permalink / raw) To: Qing Zhao; +Cc: Qing Zhao via Gcc-patches, Richard Biener Qing Zhao <QING.ZHAO@ORACLE.COM> writes: >> On Dec 7, 2020, at 11:10 AM, Richard Sandiford <richard.sandiford@arm.com> wrote: >>>>> >>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>>> unsigned decl_is_initialized :1; >>>>> >>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>>>> #define DECL_IS_INITIALIZED(NODE) \ >>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>>> >>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>>>> even though DECL_INITIAL might be NULLed. >>>>> >>>>> >>>>> For locals it would be more reliable to set this flag during gimplification. >>>>> >>>>> Do you have any comment and suggestions? >>>>> >>>>> >>>>> As said above - do you want to cover registers as well as locals? I'd do >>>>> the actual zeroing during RTL expansion instead since otherwise you >>>>> have to figure youself whether a local is actually used (see expand_stack_vars) >>>>> >>>>> Note that optimization will already made have use of "uninitialized" state >>>>> of locals so depending on what the actual goal is here "late" may be too late. >>>>> >>>>> >>>>> Haven't thought about this much, so it might be a daft idea, but would a >>>>> compromise be to use a const internal function: >>>>> >>>>> X1 = .DEFERRED_INIT (X0, INIT) >>>>> >>>>> where the X0 argument is an uninitialised value and the INIT argument >>>>> describes the initialisation pattern? So for a decl we'd have: >>>>> >>>>> X = .DEFERRED_INIT (X, INIT) >>>>> >>>>> and for an SSA name we'd have: >>>>> >>>>> X_2 = .DEFERRED_INIT (X_1(D), INIT) >>>>> >>>>> with all other uses of X_1(D) being replaced by X_2. The idea is that: >>>>> >>>>> * Having the X0 argument would keep the uninitialised use of the >>>>> variable around for the later warning passes. >>>>> >>>>> * Using a const function should still allow the UB to be deleted as dead >>>>> if X1 isn't needed. >>>>> >>>>> * Having a function in the way should stop passes from taking advantage >>>>> of direct uninitialised uses for optimisation. >>>>> >>>>> This means we won't be able to optimise based on the actual init >>>>> value at the gimple level, but that seems like a fair trade-off. >>>>> AIUI this is really a security feature or anti-UB hardening feature >>>>> (in the sense that users are more likely to see predictable behaviour >>>>> “in the field” even if the program has UB). >>>>> >>>>> >>>>> The question is whether it's in line of peoples expectation that >>>>> explicitely zero-initialized code behaves differently from >>>>> implicitely zero-initialized code with respect to optimization >>>>> and secondary side-effects (late diagnostics, latent bugs, etc.). >>>>> >>>>> Introducing a new concept like .DEFERRED_INIT is much more >>>>> heavy-weight than an explicit zero initializer. >>>>> >>>>> >>>>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? >>>>> >>>>> The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass >>>>> the “uninitialized” info from source code level to “pass_expand”. >>>> >>>> Well, "untouched" is a bit oversimplified. You do need to handle >>>> .DEFERRED_INIT as not >>>> being an initialization which will definitely get interesting. >>> >>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. >> >> Are you sure we need to do that? The point of having the first argument >> to .DEFERRED_INIT was that that argument would still provide an >> uninitialised use of the variable. And the values are passed and >> returned by value, so the lack of initialisation is explicit in >> the gcall itself, without knowing what the target function does. >> >> The idea is that we can essentially treat .DEFERRED_INIT as a normal >> (const) function call. I'd be surprised if many passes needed to >> handle it specially. >> > > Just checked with a small testing case (to emulate the .DEFERRED_INIT approach): > > qinzhao@gcc10:~/Bugs/auto-init$ cat t.c > extern int DEFFERED_INIT (int, int) __attribute__ ((const)); > > int foo (int n, int r) > { > int v; > > v = DEFFERED_INIT (v, 0); > if (n < 10) > v = r; > > return v; > } > qinzhao@gcc10:~/Bugs/auto-init$ sh t > /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all -S t.c > t.c: In function ‘foo’: > t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized] > 7 | v = DEFFERED_INIT (v, 0); > | ^~~~~~~~~~~~~~~~~~~~ > > We can see that the current uninitialized variable analysis treats the new added artificial initialization as the first use of the uninialized variable. Therefore report the warning there. > However, we should report warning at “return v”. Ah, OK, so this is about the quality of the warning, rather than about whether we report a warning or not? > So, I think that we still need to specifically handle the new added artificial initialization during uninitialized analysis phase. Yeah, that sounds like one approach. But if we're adding .DEFERRED_INIT in response to known uninitialised uses, two other approaches might be: (1) Give the call the same source location as one of the uninitialised uses. (2) Pass the locations of all uninitialised uses as additional arguments. The uninit pass would then be picking the source location differently from normal, but I don't know what effect it would have on the quality of diagnostics. One obvious problem is that if there are multiple uninitialised uses, some of them might get optimised away later. On the other hand, using early source locations might give better results in some cases. I guess it will depend. Thanks, Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-07 18:05 ` Richard Sandiford @ 2020-12-07 18:34 ` Qing Zhao 2020-12-08 7:35 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-07 18:34 UTC (permalink / raw) To: Richard Sandiford; +Cc: Qing Zhao via Gcc-patches, Richard Biener > On Dec 7, 2020, at 12:05 PM, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes: >>> On Dec 7, 2020, at 11:10 AM, Richard Sandiford <richard.sandiford@arm.com> wrote: >>>>>> >>>>>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >>>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>>>> unsigned decl_is_initialized :1; >>>>>> >>>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >>>>>> #define DECL_IS_INITIALIZED(NODE) \ >>>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>>>> >>>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >>>>>> even though DECL_INITIAL might be NULLed. >>>>>> >>>>>> >>>>>> For locals it would be more reliable to set this flag during gimplification. >>>>>> >>>>>> Do you have any comment and suggestions? >>>>>> >>>>>> >>>>>> As said above - do you want to cover registers as well as locals? I'd do >>>>>> the actual zeroing during RTL expansion instead since otherwise you >>>>>> have to figure youself whether a local is actually used (see expand_stack_vars) >>>>>> >>>>>> Note that optimization will already made have use of "uninitialized" state >>>>>> of locals so depending on what the actual goal is here "late" may be too late. >>>>>> >>>>>> >>>>>> Haven't thought about this much, so it might be a daft idea, but would a >>>>>> compromise be to use a const internal function: >>>>>> >>>>>> X1 = .DEFERRED_INIT (X0, INIT) >>>>>> >>>>>> where the X0 argument is an uninitialised value and the INIT argument >>>>>> describes the initialisation pattern? So for a decl we'd have: >>>>>> >>>>>> X = .DEFERRED_INIT (X, INIT) >>>>>> >>>>>> and for an SSA name we'd have: >>>>>> >>>>>> X_2 = .DEFERRED_INIT (X_1(D), INIT) >>>>>> >>>>>> with all other uses of X_1(D) being replaced by X_2. The idea is that: >>>>>> >>>>>> * Having the X0 argument would keep the uninitialised use of the >>>>>> variable around for the later warning passes. >>>>>> >>>>>> * Using a const function should still allow the UB to be deleted as dead >>>>>> if X1 isn't needed. >>>>>> >>>>>> * Having a function in the way should stop passes from taking advantage >>>>>> of direct uninitialised uses for optimisation. >>>>>> >>>>>> This means we won't be able to optimise based on the actual init >>>>>> value at the gimple level, but that seems like a fair trade-off. >>>>>> AIUI this is really a security feature or anti-UB hardening feature >>>>>> (in the sense that users are more likely to see predictable behaviour >>>>>> “in the field” even if the program has UB). >>>>>> >>>>>> >>>>>> The question is whether it's in line of peoples expectation that >>>>>> explicitely zero-initialized code behaves differently from >>>>>> implicitely zero-initialized code with respect to optimization >>>>>> and secondary side-effects (late diagnostics, latent bugs, etc.). >>>>>> >>>>>> Introducing a new concept like .DEFERRED_INIT is much more >>>>>> heavy-weight than an explicit zero initializer. >>>>>> >>>>>> >>>>>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? >>>>>> >>>>>> The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass >>>>>> the “uninitialized” info from source code level to “pass_expand”. >>>>> >>>>> Well, "untouched" is a bit oversimplified. You do need to handle >>>>> .DEFERRED_INIT as not >>>>> being an initialization which will definitely get interesting. >>>> >>>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. >>> >>> Are you sure we need to do that? The point of having the first argument >>> to .DEFERRED_INIT was that that argument would still provide an >>> uninitialised use of the variable. And the values are passed and >>> returned by value, so the lack of initialisation is explicit in >>> the gcall itself, without knowing what the target function does. >>> >>> The idea is that we can essentially treat .DEFERRED_INIT as a normal >>> (const) function call. I'd be surprised if many passes needed to >>> handle it specially. >>> >> >> Just checked with a small testing case (to emulate the .DEFERRED_INIT approach): >> >> qinzhao@gcc10:~/Bugs/auto-init$ cat t.c >> extern int DEFFERED_INIT (int, int) __attribute__ ((const)); >> >> int foo (int n, int r) >> { >> int v; >> >> v = DEFFERED_INIT (v, 0); >> if (n < 10) >> v = r; >> >> return v; >> } >> qinzhao@gcc10:~/Bugs/auto-init$ sh t >> /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all -S t.c >> t.c: In function ‘foo’: >> t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized] >> 7 | v = DEFFERED_INIT (v, 0); >> | ^~~~~~~~~~~~~~~~~~~~ >> >> We can see that the current uninitialized variable analysis treats the new added artificial initialization as the first use of the uninialized variable. Therefore report the warning there. >> However, we should report warning at “return v”. > > Ah, OK, so this is about the quality of the warning, rather than about > whether we report a warning or not? > >> So, I think that we still need to specifically handle the new added artificial initialization during uninitialized analysis phase. > > Yeah, that sounds like one approach. But if we're adding .DEFERRED_INIT > in response to known uninitialised uses, two other approaches might be: > > (1) Give the call the same source location as one of the uninitialised uses. > > (2) Pass the locations of all uninitialised uses as additional arguments. If we add .DEFERRED_INIT during gimplification phase, is the “uninitialized uses” information available at that time? Qing > > The uninit pass would then be picking the source location differently > from normal, but I don't know what effect it would have on the quality > of diagnostics. One obvious problem is that if there are multiple > uninitialised uses, some of them might get optimised away later. > On the other hand, using early source locations might give better > results in some cases. I guess it will depend. > > Thanks, > Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-07 18:34 ` Qing Zhao @ 2020-12-08 7:35 ` Richard Biener 0 siblings, 0 replies; 56+ messages in thread From: Richard Biener @ 2020-12-08 7:35 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Qing Zhao via Gcc-patches On Mon, Dec 7, 2020 at 7:34 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > On Dec 7, 2020, at 12:05 PM, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Qing Zhao <QING.ZHAO@ORACLE.COM> writes: > > On Dec 7, 2020, at 11:10 AM, Richard Sandiford <richard.sandiford@arm.com> wrote: > > > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag during gimplification. > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > Haven't thought about this much, so it might be a daft idea, but would a > compromise be to use a const internal function: > > X1 = .DEFERRED_INIT (X0, INIT) > > where the X0 argument is an uninitialised value and the INIT argument > describes the initialisation pattern? So for a decl we'd have: > > X = .DEFERRED_INIT (X, INIT) > > and for an SSA name we'd have: > > X_2 = .DEFERRED_INIT (X_1(D), INIT) > > with all other uses of X_1(D) being replaced by X_2. The idea is that: > > * Having the X0 argument would keep the uninitialised use of the > variable around for the later warning passes. > > * Using a const function should still allow the UB to be deleted as dead > if X1 isn't needed. > > * Having a function in the way should stop passes from taking advantage > of direct uninitialised uses for optimisation. > > This means we won't be able to optimise based on the actual init > value at the gimple level, but that seems like a fair trade-off. > AIUI this is really a security feature or anti-UB hardening feature > (in the sense that users are more likely to see predictable behaviour > “in the field” even if the program has UB). > > > The question is whether it's in line of peoples expectation that > explicitely zero-initialized code behaves differently from > implicitely zero-initialized code with respect to optimization > and secondary side-effects (late diagnostics, latent bugs, etc.). > > Introducing a new concept like .DEFERRED_INIT is much more > heavy-weight than an explicit zero initializer. > > > What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? > > The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass > the “uninitialized” info from source code level to “pass_expand”. > > > Well, "untouched" is a bit oversimplified. You do need to handle > .DEFERRED_INIT as not > being an initialization which will definitely get interesting. > > > Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. > > > Are you sure we need to do that? The point of having the first argument > to .DEFERRED_INIT was that that argument would still provide an > uninitialised use of the variable. And the values are passed and > returned by value, so the lack of initialisation is explicit in > the gcall itself, without knowing what the target function does. > > The idea is that we can essentially treat .DEFERRED_INIT as a normal > (const) function call. I'd be surprised if many passes needed to > handle it specially. > > > Just checked with a small testing case (to emulate the .DEFERRED_INIT approach): > > qinzhao@gcc10:~/Bugs/auto-init$ cat t.c > extern int DEFFERED_INIT (int, int) __attribute__ ((const)); > > int foo (int n, int r) > { > int v; > > v = DEFFERED_INIT (v, 0); > if (n < 10) > v = r; > > return v; > } > qinzhao@gcc10:~/Bugs/auto-init$ sh t > /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all -S t.c > t.c: In function ‘foo’: > t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized] > 7 | v = DEFFERED_INIT (v, 0); > | ^~~~~~~~~~~~~~~~~~~~ > > We can see that the current uninitialized variable analysis treats the new added artificial initialization as the first use of the uninialized variable. Therefore report the warning there. > However, we should report warning at “return v”. > > > Ah, OK, so this is about the quality of the warning, rather than about > whether we report a warning or not? > > So, I think that we still need to specifically handle the new added artificial initialization during uninitialized analysis phase. > > > Yeah, that sounds like one approach. But if we're adding .DEFERRED_INIT > in response to known uninitialised uses, two other approaches might be: > > (1) Give the call the same source location as one of the uninitialised uses. > > (2) Pass the locations of all uninitialised uses as additional arguments. > > > If we add .DEFERRED_INIT during gimplification phase, is the “uninitialized uses” information available at that time? No. > Qing > > > The uninit pass would then be picking the source location differently > from normal, but I don't know what effect it would have on the quality > of diagnostics. One obvious problem is that if there are multiple > uninitialised uses, some of them might get optimised away later. > On the other hand, using early source locations might give better > results in some cases. I guess it will depend. > > Thanks, > Richard > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-07 16:20 ` Qing Zhao 2020-12-07 17:10 ` Richard Sandiford @ 2020-12-08 7:40 ` Richard Biener 2020-12-08 19:54 ` Qing Zhao 1 sibling, 1 reply; 56+ messages in thread From: Richard Biener @ 2020-12-08 7:40 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > > On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag during gimplification. > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > Haven't thought about this much, so it might be a daft idea, but would a > compromise be to use a const internal function: > > X1 = .DEFERRED_INIT (X0, INIT) > > where the X0 argument is an uninitialised value and the INIT argument > describes the initialisation pattern? So for a decl we'd have: > > X = .DEFERRED_INIT (X, INIT) > > and for an SSA name we'd have: > > X_2 = .DEFERRED_INIT (X_1(D), INIT) > > with all other uses of X_1(D) being replaced by X_2. The idea is that: > > * Having the X0 argument would keep the uninitialised use of the > variable around for the later warning passes. > > * Using a const function should still allow the UB to be deleted as dead > if X1 isn't needed. > > * Having a function in the way should stop passes from taking advantage > of direct uninitialised uses for optimisation. > > This means we won't be able to optimise based on the actual init > value at the gimple level, but that seems like a fair trade-off. > AIUI this is really a security feature or anti-UB hardening feature > (in the sense that users are more likely to see predictable behaviour > “in the field” even if the program has UB). > > > The question is whether it's in line of peoples expectation that > explicitely zero-initialized code behaves differently from > implicitely zero-initialized code with respect to optimization > and secondary side-effects (late diagnostics, latent bugs, etc.). > > Introducing a new concept like .DEFERRED_INIT is much more > heavy-weight than an explicit zero initializer. > > > What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? > > The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass > the “uninitialized” info from source code level to “pass_expand”. > > > Well, "untouched" is a bit oversimplified. You do need to handle > .DEFERRED_INIT as not > being an initialization which will definitely get interesting. > > > Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. > > If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. > > However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should > be much easier and simpler, and also smaller run-time overhead. > > > As for optimization I fear you'll get a load of redundant zero-init > actually emitted if you can just rely on RTL DSE/DCE to remove it. > > > Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero > Initialization since it will be used in production build. > We can do some run-time performance evaluation when we have an implementation ready. > > > Note there will be other passes "confused" by .DEFERRED_INIT. Note > that there's going to be other > considerations - namely where to emit the .DEFERRED_INIT - when > emitting it during gimplification > you can emit it at the start of the block of block-scope variables. > When emitting after gimplification > you have to emit at function start which will probably make stack slot > sharing inefficient because > the deferred init will cause overlapping lifetimes. With emitting at > block boundary the .DEFERRED_INIT > will act as code-motion barrier (and it itself likely cannot be moved) > so for example invariant motion > will no longer happen. Likewise optimizations like SRA will be > confused by .DEFERRED_INIT which > again will lead to bigger stack usage (and less optimization). > > > Yes, looks like that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations. > > > But sure, you can try implement a few variants but definitely > .DEFERRED_INIT will be the most > work. > > > How about implement the following two approaches and compare the run-time cost: > > A. Insert the real initialization during gimplification phase. > B. Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase. > > The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC. > The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC. > > And then decide which approach we will go with? > > What’s your opinion on this? Well, in the end you have to try. Note for the purpose of stack slot sharing you do want the instrumentation to happen during gimplification. Another possibility is to materialize .DEFERRED_INIT earlier than expand, for example shortly after IPA optimizations to avoid pessimizing loop transforms and allow SRA. At the point you materialize the inits you could run the late uninit warning pass (which would then be earlier than regular but would still see the .DEFERRED_INIT). While users may be happy to pay some performance stack usage is probably more critical (just thinking of the kernel) so not regressing there should be as important as preserving uninit warnings (which I for practical purposes see not important at all - people can do "debug" builds without -fzero-init). Richard. > > Btw, I don't think theres any reason to cling onto clangs semantics > for a particular switch. We'll never be able to emulate 1:1 behavior > and our -Wuninit behavior is probably wastly different already. > > > From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. > > For example, for the following small testing case: > void blah(int); > > int foo_2 (int n, int l, int m, int r) > { > int v; > > if ( (n > 10) && (m != 100) && (r < 20) ) > v = r; > > if (l > 100) > if ( (n <= 8) && (m < 102) && (r < 19) ) > blah(v); /* { dg-warning "uninitialized" "real warning" } */ > > return 0; > } > > GCC is able to report maybe uninitialized warning, but Clang cannot. > Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. > > Really curious on how clang implement its uninitialized analysis? > > > > Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday. > And noticed that CLANG has a data flow analysis phase based on CLANG's AST. > http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html > > And clang’s uninitialized analysis is based on this data flow analysis. > > Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward. > > However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase, > Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG. > > Qing > > > > Qing > > > > > Richard. > > Thanks, > Richard > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-08 7:40 ` Richard Biener @ 2020-12-08 19:54 ` Qing Zhao 2020-12-09 8:23 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-08 19:54 UTC (permalink / raw) To: Richard Biener, Richard Sandiford; +Cc: Richard Biener via Gcc-patches > On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote: >> >> >> >> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> >> >> >> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford >> <richard.sandiford@arm.com> wrote: >> >> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> >> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. >> >> >> For locals it would be more reliable to set this flag during gimplification. >> >> Do you have any comment and suggestions? >> >> >> As said above - do you want to cover registers as well as locals? I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. >> >> >> Haven't thought about this much, so it might be a daft idea, but would a >> compromise be to use a const internal function: >> >> X1 = .DEFERRED_INIT (X0, INIT) >> >> where the X0 argument is an uninitialised value and the INIT argument >> describes the initialisation pattern? So for a decl we'd have: >> >> X = .DEFERRED_INIT (X, INIT) >> >> and for an SSA name we'd have: >> >> X_2 = .DEFERRED_INIT (X_1(D), INIT) >> >> with all other uses of X_1(D) being replaced by X_2. The idea is that: >> >> * Having the X0 argument would keep the uninitialised use of the >> variable around for the later warning passes. >> >> * Using a const function should still allow the UB to be deleted as dead >> if X1 isn't needed. >> >> * Having a function in the way should stop passes from taking advantage >> of direct uninitialised uses for optimisation. >> >> This means we won't be able to optimise based on the actual init >> value at the gimple level, but that seems like a fair trade-off. >> AIUI this is really a security feature or anti-UB hardening feature >> (in the sense that users are more likely to see predictable behaviour >> “in the field” even if the program has UB). >> >> >> The question is whether it's in line of peoples expectation that >> explicitely zero-initialized code behaves differently from >> implicitely zero-initialized code with respect to optimization >> and secondary side-effects (late diagnostics, latent bugs, etc.). >> >> Introducing a new concept like .DEFERRED_INIT is much more >> heavy-weight than an explicit zero initializer. >> >> >> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? >> >> The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass >> the “uninitialized” info from source code level to “pass_expand”. >> >> >> Well, "untouched" is a bit oversimplified. You do need to handle >> .DEFERRED_INIT as not >> being an initialization which will definitely get interesting. >> >> >> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. >> >> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. >> >> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should >> be much easier and simpler, and also smaller run-time overhead. >> >> >> As for optimization I fear you'll get a load of redundant zero-init >> actually emitted if you can just rely on RTL DSE/DCE to remove it. >> >> >> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero >> Initialization since it will be used in production build. >> We can do some run-time performance evaluation when we have an implementation ready. >> >> >> Note there will be other passes "confused" by .DEFERRED_INIT. Note >> that there's going to be other >> considerations - namely where to emit the .DEFERRED_INIT - when >> emitting it during gimplification >> you can emit it at the start of the block of block-scope variables. >> When emitting after gimplification >> you have to emit at function start which will probably make stack slot >> sharing inefficient because >> the deferred init will cause overlapping lifetimes. With emitting at >> block boundary the .DEFERRED_INIT >> will act as code-motion barrier (and it itself likely cannot be moved) >> so for example invariant motion >> will no longer happen. Likewise optimizations like SRA will be >> confused by .DEFERRED_INIT which >> again will lead to bigger stack usage (and less optimization). >> >> >> Yes, looks like that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations. >> >> >> But sure, you can try implement a few variants but definitely >> .DEFERRED_INIT will be the most >> work. >> >> >> How about implement the following two approaches and compare the run-time cost: >> >> A. Insert the real initialization during gimplification phase. >> B. Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase. >> >> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC. >> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC. >> >> And then decide which approach we will go with? >> >> What’s your opinion on this? > > Well, in the end you have to try. Note for the purpose of stack slot > sharing you do want the > instrumentation to happen during gimplification. > > Another possibility is to materialize .DEFERRED_INIT earlier than > expand, for example shortly > after IPA optimizations to avoid pessimizing loop transforms and allow > SRA. At the point you > materialize the inits you could run the late uninit warning pass > (which would then be earlier > than regular but would still see the .DEFERRED_INIT). If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above, the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”. Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really want. > > While users may be happy to pay some performance stack usage is > probably more critical So, which pass is for computing the stack usage? > (just thinking of the kernel) so not regressing there should be as > important as preserving > uninit warnings (which I for practical purposes see not important at > all - people can do > "debug" builds without -fzero-init). Looks like that the major issue with the “.DERERRED_INIT” approach is: the new inserted calls to internal const function might inhibit some important tree optimizations. So, I am thinking again the following another approach I raised in the very beginning: During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are marked with “no_explicit_init”. This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit might be very difficult? Do you have any comments on this approach? thanks. Qing > > Richard. > >> >> Btw, I don't think theres any reason to cling onto clangs semantics >> for a particular switch. We'll never be able to emulate 1:1 behavior >> and our -Wuninit behavior is probably wastly different already. >> >> >> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. >> >> For example, for the following small testing case: >> void blah(int); >> >> int foo_2 (int n, int l, int m, int r) >> { >> int v; >> >> if ( (n > 10) && (m != 100) && (r < 20) ) >> v = r; >> >> if (l > 100) >> if ( (n <= 8) && (m < 102) && (r < 19) ) >> blah(v); /* { dg-warning "uninitialized" "real warning" } */ >> >> return 0; >> } >> >> GCC is able to report maybe uninitialized warning, but Clang cannot. >> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. >> >> Really curious on how clang implement its uninitialized analysis? >> >> >> >> Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday. >> And noticed that CLANG has a data flow analysis phase based on CLANG's AST. >> http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html >> >> And clang’s uninitialized analysis is based on this data flow analysis. >> >> Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward. >> >> However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase, >> Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG. >> >> Qing >> >> >> >> Qing >> >> >> >> >> Richard. >> >> Thanks, >> Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-08 19:54 ` Qing Zhao @ 2020-12-09 8:23 ` Richard Biener 2020-12-09 15:04 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2020-12-09 8:23 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > > On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > > On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag during gimplification. > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > Haven't thought about this much, so it might be a daft idea, but would a > compromise be to use a const internal function: > > X1 = .DEFERRED_INIT (X0, INIT) > > where the X0 argument is an uninitialised value and the INIT argument > describes the initialisation pattern? So for a decl we'd have: > > X = .DEFERRED_INIT (X, INIT) > > and for an SSA name we'd have: > > X_2 = .DEFERRED_INIT (X_1(D), INIT) > > with all other uses of X_1(D) being replaced by X_2. The idea is that: > > * Having the X0 argument would keep the uninitialised use of the > variable around for the later warning passes. > > * Using a const function should still allow the UB to be deleted as dead > if X1 isn't needed. > > * Having a function in the way should stop passes from taking advantage > of direct uninitialised uses for optimisation. > > This means we won't be able to optimise based on the actual init > value at the gimple level, but that seems like a fair trade-off. > AIUI this is really a security feature or anti-UB hardening feature > (in the sense that users are more likely to see predictable behaviour > “in the field” even if the program has UB). > > > The question is whether it's in line of peoples expectation that > explicitely zero-initialized code behaves differently from > implicitely zero-initialized code with respect to optimization > and secondary side-effects (late diagnostics, latent bugs, etc.). > > Introducing a new concept like .DEFERRED_INIT is much more > heavy-weight than an explicit zero initializer. > > > What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? > > The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass > the “uninitialized” info from source code level to “pass_expand”. > > > Well, "untouched" is a bit oversimplified. You do need to handle > .DEFERRED_INIT as not > being an initialization which will definitely get interesting. > > > Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. > > If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. > > However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should > be much easier and simpler, and also smaller run-time overhead. > > > As for optimization I fear you'll get a load of redundant zero-init > actually emitted if you can just rely on RTL DSE/DCE to remove it. > > > Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero > Initialization since it will be used in production build. > We can do some run-time performance evaluation when we have an implementation ready. > > > Note there will be other passes "confused" by .DEFERRED_INIT. Note > that there's going to be other > considerations - namely where to emit the .DEFERRED_INIT - when > emitting it during gimplification > you can emit it at the start of the block of block-scope variables. > When emitting after gimplification > you have to emit at function start which will probably make stack slot > sharing inefficient because > the deferred init will cause overlapping lifetimes. With emitting at > block boundary the .DEFERRED_INIT > will act as code-motion barrier (and it itself likely cannot be moved) > so for example invariant motion > will no longer happen. Likewise optimizations like SRA will be > confused by .DEFERRED_INIT which > again will lead to bigger stack usage (and less optimization). > > > Yes, looks like that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations. > > > But sure, you can try implement a few variants but definitely > .DEFERRED_INIT will be the most > work. > > > How about implement the following two approaches and compare the run-time cost: > > A. Insert the real initialization during gimplification phase. > B. Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase. > > The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC. > The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC. > > And then decide which approach we will go with? > > What’s your opinion on this? > > > Well, in the end you have to try. Note for the purpose of stack slot > sharing you do want the > instrumentation to happen during gimplification. > > Another possibility is to materialize .DEFERRED_INIT earlier than > expand, for example shortly > after IPA optimizations to avoid pessimizing loop transforms and allow > SRA. At the point you > materialize the inits you could run the late uninit warning pass > (which would then be earlier > than regular but would still see the .DEFERRED_INIT). > > > If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above, > the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”. > Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really > want. > > > While users may be happy to pay some performance stack usage is > probably more critical > > > So, which pass is for computing the stack usage? There is no pass doing that, stack slot assignment and sharing (when lifetimes do not overlap) is done by RTL expansion. > (just thinking of the kernel) so not regressing there should be as > important as preserving > uninit warnings (which I for practical purposes see not important at > all - people can do > "debug" builds without -fzero-init). > > > Looks like that the major issue with the “.DERERRED_INIT” approach is: the new inserted calls to internal const function > might inhibit some important tree optimizations. > > So, I am thinking again the following another approach I raised in the very beginning: > > During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this > “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are > marked with “no_explicit_init”. > > This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit > might be very difficult? > > Do you have any comments on this approach? As said earlier you'll still get optimistic propagation bypassing the still missing implicit zero init. Maybe that's OK - you don't get "garbage" but you'll get some other defined value. As said, you have to implement a few options and compare. Richard. > thanks. > > Qing > > > > Richard. > > > Btw, I don't think theres any reason to cling onto clangs semantics > for a particular switch. We'll never be able to emulate 1:1 behavior > and our -Wuninit behavior is probably wastly different already. > > > From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. > > For example, for the following small testing case: > void blah(int); > > int foo_2 (int n, int l, int m, int r) > { > int v; > > if ( (n > 10) && (m != 100) && (r < 20) ) > v = r; > > if (l > 100) > if ( (n <= 8) && (m < 102) && (r < 19) ) > blah(v); /* { dg-warning "uninitialized" "real warning" } */ > > return 0; > } > > GCC is able to report maybe uninitialized warning, but Clang cannot. > Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. > > Really curious on how clang implement its uninitialized analysis? > > > > Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday. > And noticed that CLANG has a data flow analysis phase based on CLANG's AST. > http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html > > And clang’s uninitialized analysis is based on this data flow analysis. > > Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward. > > However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase, > Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG. > > Qing > > > > Qing > > > > > Richard. > > Thanks, > Richard > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-09 8:23 ` Richard Biener @ 2020-12-09 15:04 ` Qing Zhao 2020-12-09 15:12 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-09 15:04 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches > On Dec 9, 2020, at 2:23 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote: >> >> >> >> On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com <mailto:richard.guenther@gmail.com>> wrote: >> >> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote: >> >> >> >> >> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> >> >> >> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford >> <richard.sandiford@arm.com> wrote: >> >> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> >> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. >> >> >> For locals it would be more reliable to set this flag during gimplification. >> >> Do you have any comment and suggestions? >> >> >> As said above - do you want to cover registers as well as locals? I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. >> >> >> Haven't thought about this much, so it might be a daft idea, but would a >> compromise be to use a const internal function: >> >> X1 = .DEFERRED_INIT (X0, INIT) >> >> where the X0 argument is an uninitialised value and the INIT argument >> describes the initialisation pattern? So for a decl we'd have: >> >> X = .DEFERRED_INIT (X, INIT) >> >> and for an SSA name we'd have: >> >> X_2 = .DEFERRED_INIT (X_1(D), INIT) >> >> with all other uses of X_1(D) being replaced by X_2. The idea is that: >> >> * Having the X0 argument would keep the uninitialised use of the >> variable around for the later warning passes. >> >> * Using a const function should still allow the UB to be deleted as dead >> if X1 isn't needed. >> >> * Having a function in the way should stop passes from taking advantage >> of direct uninitialised uses for optimisation. >> >> This means we won't be able to optimise based on the actual init >> value at the gimple level, but that seems like a fair trade-off. >> AIUI this is really a security feature or anti-UB hardening feature >> (in the sense that users are more likely to see predictable behaviour >> “in the field” even if the program has UB). >> >> >> The question is whether it's in line of peoples expectation that >> explicitely zero-initialized code behaves differently from >> implicitely zero-initialized code with respect to optimization >> and secondary side-effects (late diagnostics, latent bugs, etc.). >> >> Introducing a new concept like .DEFERRED_INIT is much more >> heavy-weight than an explicit zero initializer. >> >> >> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? >> >> The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass >> the “uninitialized” info from source code level to “pass_expand”. >> >> >> Well, "untouched" is a bit oversimplified. You do need to handle >> .DEFERRED_INIT as not >> being an initialization which will definitely get interesting. >> >> >> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. >> >> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. >> >> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should >> be much easier and simpler, and also smaller run-time overhead. >> >> >> As for optimization I fear you'll get a load of redundant zero-init >> actually emitted if you can just rely on RTL DSE/DCE to remove it. >> >> >> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero >> Initialization since it will be used in production build. >> We can do some run-time performance evaluation when we have an implementation ready. >> >> >> Note there will be other passes "confused" by .DEFERRED_INIT. Note >> that there's going to be other >> considerations - namely where to emit the .DEFERRED_INIT - when >> emitting it during gimplification >> you can emit it at the start of the block of block-scope variables. >> When emitting after gimplification >> you have to emit at function start which will probably make stack slot >> sharing inefficient because >> the deferred init will cause overlapping lifetimes. With emitting at >> block boundary the .DEFERRED_INIT >> will act as code-motion barrier (and it itself likely cannot be moved) >> so for example invariant motion >> will no longer happen. Likewise optimizations like SRA will be >> confused by .DEFERRED_INIT which >> again will lead to bigger stack usage (and less optimization). >> >> >> Yes, looks like that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations. >> >> >> But sure, you can try implement a few variants but definitely >> .DEFERRED_INIT will be the most >> work. >> >> >> How about implement the following two approaches and compare the run-time cost: >> >> A. Insert the real initialization during gimplification phase. >> B. Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase. >> >> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC. >> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC. >> >> And then decide which approach we will go with? >> >> What’s your opinion on this? >> >> >> Well, in the end you have to try. Note for the purpose of stack slot >> sharing you do want the >> instrumentation to happen during gimplification. >> >> Another possibility is to materialize .DEFERRED_INIT earlier than >> expand, for example shortly >> after IPA optimizations to avoid pessimizing loop transforms and allow >> SRA. At the point you >> materialize the inits you could run the late uninit warning pass >> (which would then be earlier >> than regular but would still see the .DEFERRED_INIT). >> >> >> If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above, >> the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”. >> Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really >> want. >> >> >> While users may be happy to pay some performance stack usage is >> probably more critical >> >> >> So, which pass is for computing the stack usage? > > There is no pass doing that, stack slot assignment and sharing (when > lifetimes do > not overlap) is done by RTL expansion. Okay. I see. > >> (just thinking of the kernel) so not regressing there should be as >> important as preserving >> uninit warnings (which I for practical purposes see not important at >> all - people can do >> "debug" builds without -fzero-init). >> >> >> Looks like that the major issue with the “.DERERRED_INIT” approach is: the new inserted calls to internal const function >> might inhibit some important tree optimizations. >> >> So, I am thinking again the following another approach I raised in the very beginning: >> >> During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this >> “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are >> marked with “no_explicit_init”. >> >> This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit >> might be very difficult? >> >> Do you have any comments on this approach? > > As said earlier you'll still get optimistic propagation bypassing the > still missing > implicit zero init. Maybe that's OK - you don't get "garbage" but you'll get > some other defined value. > There is another approach: During gimplification phase, adding the real initialization to the uninitialized variables, but mark these initializations as “artificial_init”. Then update the uninitialized analysis phase to handle these initializations marked with “artificial_init” specially as Non-initialization to keep the uninitialized warnings. Then we should be able to get the maximum optimization and also keep the uninitialized warning at the same time. Microsoft compiler seems used this approach: (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/) " Does InitAll Break Static Analysis? Static analysis is incredibly useful at letting developers know they forgot to initialize before use. The InitAll feature indicates if a variable assignment was caused by InitAll to both PREfast and the compiler backend (both of which have uninitialized warnings). This allows the analysis tools to ignore InitAll variable assignments for the purposes of these warnings. With InitAll enabled, a developer will still receive static analysis warnings if they forget to initialize a variable even if InitAll forcibly initializes it for them. “ Any comment on this? > As said, you have to implement a few options and compare. Yes, I will do that, just make sure which approaches we should implement and compare first. Qing > > Richard. > >> thanks. >> >> Qing >> >> >> >> Richard. >> >> >> Btw, I don't think theres any reason to cling onto clangs semantics >> for a particular switch. We'll never be able to emulate 1:1 behavior >> and our -Wuninit behavior is probably wastly different already. >> >> >> From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. >> >> For example, for the following small testing case: >> void blah(int); >> >> int foo_2 (int n, int l, int m, int r) >> { >> int v; >> >> if ( (n > 10) && (m != 100) && (r < 20) ) >> v = r; >> >> if (l > 100) >> if ( (n <= 8) && (m < 102) && (r < 19) ) >> blah(v); /* { dg-warning "uninitialized" "real warning" } */ >> >> return 0; >> } >> >> GCC is able to report maybe uninitialized warning, but Clang cannot. >> Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. >> >> Really curious on how clang implement its uninitialized analysis? >> >> >> >> Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday. >> And noticed that CLANG has a data flow analysis phase based on CLANG's AST. >> http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html >> >> And clang’s uninitialized analysis is based on this data flow analysis. >> >> Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward. >> >> However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase, >> Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG. >> >> Qing >> >> >> >> Qing >> >> >> >> >> Richard. >> >> Thanks, >> Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-09 15:04 ` Qing Zhao @ 2020-12-09 15:12 ` Richard Biener 2020-12-09 16:18 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2020-12-09 15:12 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On Wed, Dec 9, 2020 at 4:04 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > On Dec 9, 2020, at 2:23 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > > On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > > On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > > > > On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: > > Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: > /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ > unsigned decl_is_initialized :1; > > /* IN VAR_DECL, set when the decl is initialized at the declaration. */ > #define DECL_IS_INITIALIZED(NODE) \ > (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) > > set this bit when setting DECL_INITIAL for the variables in FE. then keep it > even though DECL_INITIAL might be NULLed. > > > For locals it would be more reliable to set this flag during gimplification. > > Do you have any comment and suggestions? > > > As said above - do you want to cover registers as well as locals? I'd do > the actual zeroing during RTL expansion instead since otherwise you > have to figure youself whether a local is actually used (see expand_stack_vars) > > Note that optimization will already made have use of "uninitialized" state > of locals so depending on what the actual goal is here "late" may be too late. > > > Haven't thought about this much, so it might be a daft idea, but would a > compromise be to use a const internal function: > > X1 = .DEFERRED_INIT (X0, INIT) > > where the X0 argument is an uninitialised value and the INIT argument > describes the initialisation pattern? So for a decl we'd have: > > X = .DEFERRED_INIT (X, INIT) > > and for an SSA name we'd have: > > X_2 = .DEFERRED_INIT (X_1(D), INIT) > > with all other uses of X_1(D) being replaced by X_2. The idea is that: > > * Having the X0 argument would keep the uninitialised use of the > variable around for the later warning passes. > > * Using a const function should still allow the UB to be deleted as dead > if X1 isn't needed. > > * Having a function in the way should stop passes from taking advantage > of direct uninitialised uses for optimisation. > > This means we won't be able to optimise based on the actual init > value at the gimple level, but that seems like a fair trade-off. > AIUI this is really a security feature or anti-UB hardening feature > (in the sense that users are more likely to see predictable behaviour > “in the field” even if the program has UB). > > > The question is whether it's in line of peoples expectation that > explicitely zero-initialized code behaves differently from > implicitely zero-initialized code with respect to optimization > and secondary side-effects (late diagnostics, latent bugs, etc.). > > Introducing a new concept like .DEFERRED_INIT is much more > heavy-weight than an explicit zero initializer. > > > What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? > > The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass > the “uninitialized” info from source code level to “pass_expand”. > > > Well, "untouched" is a bit oversimplified. You do need to handle > .DEFERRED_INIT as not > being an initialization which will definitely get interesting. > > > Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. > > If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. > > However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should > be much easier and simpler, and also smaller run-time overhead. > > > As for optimization I fear you'll get a load of redundant zero-init > actually emitted if you can just rely on RTL DSE/DCE to remove it. > > > Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero > Initialization since it will be used in production build. > We can do some run-time performance evaluation when we have an implementation ready. > > > Note there will be other passes "confused" by .DEFERRED_INIT. Note > that there's going to be other > considerations - namely where to emit the .DEFERRED_INIT - when > emitting it during gimplification > you can emit it at the start of the block of block-scope variables. > When emitting after gimplification > you have to emit at function start which will probably make stack slot > sharing inefficient because > the deferred init will cause overlapping lifetimes. With emitting at > block boundary the .DEFERRED_INIT > will act as code-motion barrier (and it itself likely cannot be moved) > so for example invariant motion > will no longer happen. Likewise optimizations like SRA will be > confused by .DEFERRED_INIT which > again will lead to bigger stack usage (and less optimization). > > > Yes, looks like that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations. > > > But sure, you can try implement a few variants but definitely > .DEFERRED_INIT will be the most > work. > > > How about implement the following two approaches and compare the run-time cost: > > A. Insert the real initialization during gimplification phase. > B. Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase. > > The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC. > The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC. > > And then decide which approach we will go with? > > What’s your opinion on this? > > > Well, in the end you have to try. Note for the purpose of stack slot > sharing you do want the > instrumentation to happen during gimplification. > > Another possibility is to materialize .DEFERRED_INIT earlier than > expand, for example shortly > after IPA optimizations to avoid pessimizing loop transforms and allow > SRA. At the point you > materialize the inits you could run the late uninit warning pass > (which would then be earlier > than regular but would still see the .DEFERRED_INIT). > > > If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above, > the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”. > Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really > want. > > > While users may be happy to pay some performance stack usage is > probably more critical > > > So, which pass is for computing the stack usage? > > > There is no pass doing that, stack slot assignment and sharing (when > lifetimes do > not overlap) is done by RTL expansion. > > > Okay. I see. > > > (just thinking of the kernel) so not regressing there should be as > important as preserving > uninit warnings (which I for practical purposes see not important at > all - people can do > "debug" builds without -fzero-init). > > > Looks like that the major issue with the “.DERERRED_INIT” approach is: the new inserted calls to internal const function > might inhibit some important tree optimizations. > > So, I am thinking again the following another approach I raised in the very beginning: > > During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this > “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are > marked with “no_explicit_init”. > > This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit > might be very difficult? > > Do you have any comments on this approach? > > > As said earlier you'll still get optimistic propagation bypassing the > still missing > implicit zero init. Maybe that's OK - you don't get "garbage" but you'll get > some other defined value. > > > There is another approach: > > During gimplification phase, adding the real initialization to the uninitialized variables, but mark these initializations as “artificial_init”. > Then update the uninitialized analysis phase to handle these initializations marked with “artificial_init” specially as Non-initialization to > keep the uninitialized warnings. > > Then we should be able to get the maximum optimization and also keep the uninitialized warning at the same time. > > Microsoft compiler seems used this approach: (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/) > > " > Does InitAll Break Static Analysis? > > Static analysis is incredibly useful at letting developers know they forgot to initialize before use. > > The InitAll feature indicates if a variable assignment was caused by InitAll to both PREfast and the compiler backend (both of which have uninitialized warnings). This allows the analysis tools to ignore InitAll variable assignments for the purposes of these warnings. With InitAll enabled, a developer will still receive static analysis warnings if they forget to initialize a variable even if InitAll forcibly initializes it for them. > > “ > > Any comment on this? You have to try. Bits to implement are adjusting the uninit pass and maining the annotation as well as making sure to not elide the real init because there's a 'fake' init (we have redundant store elimination which works in this direction for example just to name one). Richard. > As said, you have to implement a few options and compare. > > > Yes, I will do that, just make sure which approaches we should implement and compare first. > > Qing > > > Richard. > > thanks. > > Qing > > > > Richard. > > > Btw, I don't think theres any reason to cling onto clangs semantics > for a particular switch. We'll never be able to emulate 1:1 behavior > and our -Wuninit behavior is probably wastly different already. > > > From my study so far, yes, the currently behavior of -Wunit for Clang and GCC is not exactly the same. > > For example, for the following small testing case: > void blah(int); > > int foo_2 (int n, int l, int m, int r) > { > int v; > > if ( (n > 10) && (m != 100) && (r < 20) ) > v = r; > > if (l > 100) > if ( (n <= 8) && (m < 102) && (r < 19) ) > blah(v); /* { dg-warning "uninitialized" "real warning" } */ > > return 0; > } > > GCC is able to report maybe uninitialized warning, but Clang cannot. > Looks like that GCC’s uninitialized analysis relies on more analysis and optimization information than CLANG. > > Really curious on how clang implement its uninitialized analysis? > > > > Actually, I studied a little bit on how clang implement its uninitialized analysis last Friday. > And noticed that CLANG has a data flow analysis phase based on CLANG's AST. > http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-in-Clang-td4069644.html > > And clang’s uninitialized analysis is based on this data flow analysis. > > Therefore, adding initialization AFTER clang’s uninitialization analysis phase is straightforward. > > However, for GCC, we don’t have data flow analysis in FE. The uninitialized variable analysis is put in TREE optimization phase, > Therefore, it’s much more difficult to implement this feature in GCC than that in CLANG. > > Qing > > > > Qing > > > > > Richard. > > Thanks, > Richard > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-09 15:12 ` Richard Biener @ 2020-12-09 16:18 ` Qing Zhao 2021-01-05 19:05 ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2020-12-09 16:18 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches > On Dec 9, 2020, at 9:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Wed, Dec 9, 2020 at 4:04 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote: >> >> >> >> On Dec 9, 2020, at 2:23 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> >> >> >> On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> >> >> >> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> >> >> >> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford >> <richard.sandiford@arm.com> wrote: >> >> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> >> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> even though DECL_INITIAL might be NULLed. >> >> >> For locals it would be more reliable to set this flag during gimplification. >> >> Do you have any comment and suggestions? >> >> >> As said above - do you want to cover registers as well as locals? I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see expand_stack_vars) >> >> Note that optimization will already made have use of "uninitialized" state >> of locals so depending on what the actual goal is here "late" may be too late. >> >> >> Haven't thought about this much, so it might be a daft idea, but would a >> compromise be to use a const internal function: >> >> X1 = .DEFERRED_INIT (X0, INIT) >> >> where the X0 argument is an uninitialised value and the INIT argument >> describes the initialisation pattern? So for a decl we'd have: >> >> X = .DEFERRED_INIT (X, INIT) >> >> and for an SSA name we'd have: >> >> X_2 = .DEFERRED_INIT (X_1(D), INIT) >> >> with all other uses of X_1(D) being replaced by X_2. The idea is that: >> >> * Having the X0 argument would keep the uninitialised use of the >> variable around for the later warning passes. >> >> * Using a const function should still allow the UB to be deleted as dead >> if X1 isn't needed. >> >> * Having a function in the way should stop passes from taking advantage >> of direct uninitialised uses for optimisation. >> >> This means we won't be able to optimise based on the actual init >> value at the gimple level, but that seems like a fair trade-off. >> AIUI this is really a security feature or anti-UB hardening feature >> (in the sense that users are more likely to see predictable behaviour >> “in the field” even if the program has UB). >> >> >> The question is whether it's in line of peoples expectation that >> explicitely zero-initialized code behaves differently from >> implicitely zero-initialized code with respect to optimization >> and secondary side-effects (late diagnostics, latent bugs, etc.). >> >> Introducing a new concept like .DEFERRED_INIT is much more >> heavy-weight than an explicit zero initializer. >> >> >> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else? >> >> The major benefit of the approach of “.DEFERRED_INIT” is to enable us keep the current -Wuninitialized analysis untouched and also pass >> the “uninitialized” info from source code level to “pass_expand”. >> >> >> Well, "untouched" is a bit oversimplified. You do need to handle >> .DEFERRED_INIT as not >> being an initialization which will definitely get interesting. >> >> >> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations. >> >> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach. >> >> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should >> be much easier and simpler, and also smaller run-time overhead. >> >> >> As for optimization I fear you'll get a load of redundant zero-init >> actually emitted if you can just rely on RTL DSE/DCE to remove it. >> >> >> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero >> Initialization since it will be used in production build. >> We can do some run-time performance evaluation when we have an implementation ready. >> >> >> Note there will be other passes "confused" by .DEFERRED_INIT. Note >> that there's going to be other >> considerations - namely where to emit the .DEFERRED_INIT - when >> emitting it during gimplification >> you can emit it at the start of the block of block-scope variables. >> When emitting after gimplification >> you have to emit at function start which will probably make stack slot >> sharing inefficient because >> the deferred init will cause overlapping lifetimes. With emitting at >> block boundary the .DEFERRED_INIT >> will act as code-motion barrier (and it itself likely cannot be moved) >> so for example invariant motion >> will no longer happen. Likewise optimizations like SRA will be >> confused by .DEFERRED_INIT which >> again will lead to bigger stack usage (and less optimization). >> >> >> Yes, looks like that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations. >> >> >> But sure, you can try implement a few variants but definitely >> .DEFERRED_INIT will be the most >> work. >> >> >> How about implement the following two approaches and compare the run-time cost: >> >> A. Insert the real initialization during gimplification phase. >> B. Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase. >> >> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC. >> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC. >> >> And then decide which approach we will go with? >> >> What’s your opinion on this? >> >> >> Well, in the end you have to try. Note for the purpose of stack slot >> sharing you do want the >> instrumentation to happen during gimplification. >> >> Another possibility is to materialize .DEFERRED_INIT earlier than >> expand, for example shortly >> after IPA optimizations to avoid pessimizing loop transforms and allow >> SRA. At the point you >> materialize the inits you could run the late uninit warning pass >> (which would then be earlier >> than regular but would still see the .DEFERRED_INIT). >> >> >> If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above, >> the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”. >> Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really >> want. >> >> >> While users may be happy to pay some performance stack usage is >> probably more critical >> >> >> So, which pass is for computing the stack usage? >> >> >> There is no pass doing that, stack slot assignment and sharing (when >> lifetimes do >> not overlap) is done by RTL expansion. >> >> >> Okay. I see. >> >> >> (just thinking of the kernel) so not regressing there should be as >> important as preserving >> uninit warnings (which I for practical purposes see not important at >> all - people can do >> "debug" builds without -fzero-init). >> >> >> Looks like that the major issue with the “.DERERRED_INIT” approach is: the new inserted calls to internal const function >> might inhibit some important tree optimizations. >> >> So, I am thinking again the following another approach I raised in the very beginning: >> >> During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this >> “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are >> marked with “no_explicit_init”. >> >> This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit >> might be very difficult? >> >> Do you have any comments on this approach? >> >> >> As said earlier you'll still get optimistic propagation bypassing the >> still missing >> implicit zero init. Maybe that's OK - you don't get "garbage" but you'll get >> some other defined value. >> >> >> There is another approach: >> >> During gimplification phase, adding the real initialization to the uninitialized variables, but mark these initializations as “artificial_init”. >> Then update the uninitialized analysis phase to handle these initializations marked with “artificial_init” specially as Non-initialization to >> keep the uninitialized warnings. >> >> Then we should be able to get the maximum optimization and also keep the uninitialized warning at the same time. >> >> Microsoft compiler seems used this approach: (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/) >> >> " >> Does InitAll Break Static Analysis? >> >> Static analysis is incredibly useful at letting developers know they forgot to initialize before use. >> >> The InitAll feature indicates if a variable assignment was caused by InitAll to both PREfast and the compiler backend (both of which have uninitialized warnings). This allows the analysis tools to ignore InitAll variable assignments for the purposes of these warnings. With InitAll enabled, a developer will still receive static analysis warnings if they forget to initialize a variable even if InitAll forcibly initializes it for them. >> >> “ >> >> Any comment on this? > > You have to try. Bits to implement are adjusting the uninit pass and > maining the annotation > as well as making sure to not elide the real init because there's a > 'fake' init (we have redundant > store elimination which works in this direction for example just to name one). Okay, I see. > > Richard. > >> As said, you have to implement a few options and compare. >> >> >> Yes, I will do that, just make sure which approaches we should implement and compare first. The following are the approaches I will implement and compare: Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. A. Adding real initialization during gimplification, not maintain the uninitialized warnings. B. Adding real initialization during gimplification, marking them with “artificial_init”. Adjusting uninitialized pass, maintaining the annotation, making sure the real init not Deleted from the fake init. C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add real initialization for all DECLs that are marked with “no_explicit_init”. D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance comparison. I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach will be. If the data is good, maybe we can avoid the effort to implement B, and C. If the performance of D is not good, I will implement B or C at that time. Let me know if you have any comment or suggestions. Thanks. Qing ^ permalink raw reply [flat|nested] 56+ messages in thread
* The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2020-12-09 16:18 ` Qing Zhao @ 2021-01-05 19:05 ` Qing Zhao 2021-01-05 19:10 ` Qing Zhao 2021-01-12 20:34 ` Qing Zhao 0 siblings, 2 replies; 56+ messages in thread From: Qing Zhao @ 2021-01-05 19:05 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches Hi, This is an update for our previous discussion. 1. I implemented the following two different implementations in the latest upstream gcc: A. Adding real initialization during gimplification, not maintain the uninitialized warnings. D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. Note, in this initial implementation, ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for runtime performance study. ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: no: default. (-g -O2 -march=native ) A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D And then compute the slowdown data for both A and D as following: benchmarks A / no D /no 500.perlbench_r 1.25% 1.25% 502.gcc_r 0.68% 1.80% 505.mcf_r 0.68% 0.14% 520.omnetpp_r 4.83% 4.68% 523.xalancbmk_r 0.18% 1.96% 525.x264_r 1.55% 2.07% 531.deepsjeng_ 11.57% 11.85% 541.leela_r 0.64% 0.80% 557.xz_ -0.41% -0.41% 507.cactuBSSN_r 0.44% 0.44% 508.namd_r 0.34% 0.34% 510.parest_r 0.17% 0.25% 511.povray_r 56.57% 57.27% 519.lbm_r 0.00% 0.00% 521.wrf_r -0.28% -0.37% 526.blender_r 16.96% 17.71% 527.cam4_r 0.70% 0.53% 538.imagick_r 2.40% 2.40% 544.nab_r 0.00% -0.65% avg 5.17% 5.37% From the above data, we can see that in general, the runtime performance slowdown for implementation A and D are similar for individual benchmarks. There are several benchmarks that have significant slowdown with the new added initialization for both A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit more on what kind of new initializations introduced such slowdown. From the current study so far, I think that approach D should be good enough for our final implementation. So, I will try to finish approach D with the following remaining work ** complete the implementation of -ftrivial-auto-var-init=pattern; ** complete the implementation of uninitialized warnings maintenance work for D. Let me know if you have any comments and suggestions on my current and future work. Thanks a lot for your help. Qing > On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > The following are the approaches I will implement and compare: > > Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. > > A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > B. Adding real initialization during gimplification, marking them with “artificial_init”. > Adjusting uninitialized pass, maintaining the annotation, making sure the real init not > Deleted from the fake init. > C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, > maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, > add real initialization for all DECLs that are marked with “no_explicit_init”. > D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > > > In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance > comparison. > > I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but > Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach > will be. If the data is good, maybe we can avoid the effort to implement B, and C. > > If the performance of D is not good, I will implement B or C at that time. > > Let me know if you have any comment or suggestions. > > Thanks. > > Qing ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-05 19:05 ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao @ 2021-01-05 19:10 ` Qing Zhao 2021-01-12 20:34 ` Qing Zhao 1 sibling, 0 replies; 56+ messages in thread From: Qing Zhao @ 2021-01-05 19:10 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches I am attaching my current (incomplete) patch to gcc for your reference. From a71eb73bee5857440c4ff67c4c82be115e0675cb Mon Sep 17 00:00:00 2001 From: qing zhao <qinzhao@gcc.gnu.org> Date: Sat, 12 Dec 2020 00:02:28 +0100 Subject: [PATCH] First version of -ftrivial-auto-var-init --- gcc/common.opt | 35 ++++++++++++++++++ gcc/flag-types.h | 14 ++++++++ gcc/gimple-pretty-print.c | 2 +- gcc/gimplify.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++ gcc/internal-fn.c | 20 +++++++++++ gcc/internal-fn.def | 5 +++ gcc/tree-cfg.c | 3 ++ gcc/tree-ssa-uninit.c | 3 ++ gcc/tree-ssa.c | 5 +++ 9 files changed, 176 insertions(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index 6645539f5e5..c4c4fc28ef7 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -3053,6 +3053,41 @@ ftree-scev-cprop Common Report Var(flag_tree_scev_cprop) Init(1) Optimization Enable copy propagation of scalar-evolution information. +ftrivial-auto-var-init= +Common Joined RejectNegative Enum(auto_init_type) Var(flag_trivial_auto_var_init) Init(AUTO_INIT_UNINITIALIZED) +-ftrivial-auto-var-init=[uninitialized|pattern|zero] Add initializations to automatic variables. + +Enum +Name(auto_init_type) Type(enum auto_init_type) UnknownError(unrecognized automatic variable initialization type %qs) + +EnumValue +Enum(auto_init_type) String(uninitialized) Value(AUTO_INIT_UNINITIALIZED) + +EnumValue +Enum(auto_init_type) String(pattern) Value(AUTO_INIT_PATTERN) + +EnumValue +Enum(auto_init_type) String(zero) Value(AUTO_INIT_ZERO) + +fauto-var-init-approach= +Common Joined RejectNegative Enum(auto_init_approach) Var(flag_auto_init_approach) Init(AUTO_INIT_A)) +-fauto-var-init-approach=[A|B|C|D] Choose the approach to initialize automatic variables. + +Enum +Name(auto_init_approach) Type(enum auto_init_approach) UnknownError(unrecognized automatic variable initialization approach %qs) + +EnumValue +Enum(auto_init_approach) String(A) Value(AUTO_INIT_A) + +EnumValue +Enum(auto_init_approach) String(B) Value(AUTO_INIT_B) + +EnumValue +Enum(auto_init_approach) String(C) Value(AUTO_INIT_C) + +EnumValue +Enum(auto_init_approach) String(D) Value(AUTO_INIT_D) + ; -fverbose-asm causes extra commentary information to be produced in ; the generated assembly code (to make it more readable). This option ; is generally only of use to those who actually need to read the diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 9342bd87be3..bfd0692b82c 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -242,6 +242,20 @@ enum vect_cost_model { VECT_COST_MODEL_DEFAULT = 1 }; +/* Automatic variable initialization type. */ +enum auto_init_type { + AUTO_INIT_UNINITIALIZED = 0, + AUTO_INIT_PATTERN = 1, + AUTO_INIT_ZERO = 2 +}; + +enum auto_init_approach { + AUTO_INIT_A = 0, + AUTO_INIT_B = 1, + AUTO_INIT_C = 2, + AUTO_INIT_D = 3 +}; + /* Different instrumentation modes. */ enum sanitize_code { /* AddressSanitizer. */ diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index 075d6e5208a..1044d54e8d3 100644 --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -81,7 +81,7 @@ newline_and_indent (pretty_printer *buffer, int spc) DEBUG_FUNCTION void debug_gimple_stmt (gimple *gs) { - print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS); + print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS|TDF_LINENO|TDF_ALIAS); } diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 54cb66bd1dd..1eb0747ea2f 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1674,6 +1674,16 @@ gimplify_return_expr (tree stmt, gimple_seq *pre_p) return GS_ALL_DONE; } +/* Return the value that is used to initialize the vla DECL based + on INIT_TYPE. */ +tree memset_init_node (enum auto_init_type init_type) +{ + if (init_type == AUTO_INIT_ZERO) + return integer_zero_node; + else + gcc_assert (0); +} + /* Gimplify a variable-length array DECL. */ static void @@ -1712,6 +1722,19 @@ gimplify_vla_decl (tree decl, gimple_seq *seq_p) gimplify_and_add (t, seq_p); + /* Add a call to memset to initialize this vla when the user requested. */ + if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED + && !DECL_ARTIFICIAL (decl) + && VAR_P (decl) + && !DECL_EXTERNAL (decl) + && !TREE_STATIC (decl)) + { + t = builtin_decl_implicit (BUILT_IN_MEMSET); + tree init_node = memset_init_node (flag_trivial_auto_var_init); + t = build_call_expr (t, 3, addr, init_node, DECL_SIZE_UNIT (decl)); + gimplify_and_add (t, seq_p); + } + /* Record the dynamic allocation associated with DECL if requested. */ if (flag_callgraph_info & CALLGRAPH_INFO_DYNAMIC_ALLOC) record_dynamic_alloc (decl); @@ -1734,6 +1757,63 @@ force_labels_r (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) return NULL_TREE; } + +/* Build a call to internal const function DEFERRED_INIT, + 1st argument: DECL; + 2nd argument: INIT_TYPE; + + as DEFERRED_INIT (DECL, INIT_TYPE) + + DEFERRED_INIT is defined as: + DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL). */ + +static gimple * +build_deferred_init (tree decl, + enum auto_init_type init_type) +{ + tree init_type_node = + build_int_cst (integer_type_node, (int) init_type); + return gimple_build_call_internal (IFN_DEFERRED_INIT, 2, decl, init_type_node); +} + + +/* Generate initialization to automatic variable DECL based on INIT_TYPE. */ +static void +gimple_add_init_for_auto_var (tree decl, + enum auto_init_type init_type, + enum auto_init_approach init_approach, + gimple_seq *seq_p) +{ + gcc_assert (VAR_P (decl) && !DECL_EXTERNAL (decl) && !TREE_STATIC (decl)); + switch (init_type) + { + case AUTO_INIT_UNINITIALIZED: + case AUTO_INIT_PATTERN: + gcc_assert (0); + break; + case AUTO_INIT_ZERO: + if (init_approach == AUTO_INIT_A) + { + tree init = build_zero_cst (TREE_TYPE (decl)); + init = build2 (INIT_EXPR, void_type_node, decl, init); + gimplify_and_add (init, seq_p); + ggc_free (init); + } + else if (init_approach == AUTO_INIT_D) + { + gimple *call = build_deferred_init (decl, AUTO_INIT_ZERO); + gimple_call_set_lhs (call, decl); + gimplify_seq_add_stmt (seq_p, call); + } + else + gcc_assert (0); + break; + default: + gcc_unreachable (); + } +} + + /* Gimplify a DECL_EXPR node *STMT_P by making any necessary allocation and initialization explicit. */ @@ -1821,6 +1901,16 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) as they may contain a label address. */ walk_tree (&init, force_labels_r, NULL, NULL); } + /* When there is no explicit initializer, if the user requested, + We should insert an artifical initializer for this automatic + variable for non vla variables. */ + else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED + && !TREE_STATIC (decl) + && !is_vla) + gimple_add_init_for_auto_var (decl, + flag_trivial_auto_var_init, + flag_auto_init_approach, + seq_p); } return GS_ALL_DONE; diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 41223ff7d82..6eef6ddb259 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2971,6 +2971,26 @@ expand_UNIQUE (internal_fn, gcall *stmt) emit_insn (pattern); } +/* Expand the IFN_DEFERRED_INIT function according to its second argument. */ +static void +expand_DEFERRED_INIT (internal_fn, gcall *stmt) +{ + tree var = gimple_call_lhs (stmt); + enum auto_init_type init_type + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); + + switch (init_type) + { + default: + gcc_unreachable (); + case AUTO_INIT_PATTERN: + gcc_assert (0); + case AUTO_INIT_ZERO: + tree init = build_zero_cst (TREE_TYPE (var)); + expand_assignment (var, init, false); + } +} + /* The size of an OpenACC compute dimension. */ static void diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 91a7bfea3ee..fd077d8b55c 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -347,6 +347,11 @@ DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW, NULL) DEF_INTERNAL_FN (PHI, 0, NULL) +/* A function to represent an artifical initialization to an uninitialized + automatic variable. The first argument is the variable itself, the + second argument is the initialization type. */ +DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) + /* DIM_SIZE and DIM_POS return the size of a particular compute dimension and the executing thread's position within that dimension. DIM_POS is pure (and not const) so that it isn't diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index f59a0c05200..3717c6d26a5 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3433,6 +3433,9 @@ verify_gimple_call (gcall *stmt) } } + if (gimple_call_internal_p (stmt, IFN_DEFERRED_INIT)) + return false; + /* ??? The C frontend passes unpromoted arguments in case it didn't see a function declaration before the call. So for now leave the call arguments mostly unverified. Once we gimplify diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c index 516a7bd2c99..6c0946b0bc5 100644 --- a/gcc/tree-ssa-uninit.c +++ b/gcc/tree-ssa-uninit.c @@ -611,6 +611,9 @@ warn_uninitialized_vars (bool wmaybe_uninit) ssa_op_iter op_iter; tree use; + if (gimple_call_internal_p (stmt, IFN_DEFERRED_INIT)) + continue; + if (is_gimple_debug (stmt)) continue; diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c index a575979aa13..319e4150dc4 100644 --- a/gcc/tree-ssa.c +++ b/gcc/tree-ssa.c @@ -1325,6 +1325,11 @@ ssa_undefined_value_p (tree t, bool partial) if (gimple_nop_p (def_stmt)) return true; + /* The value is undefined iff the definition statement is a call + to .DEFERRED_INIT function. */ + if (gimple_call_internal_p (def_stmt, IFN_DEFERRED_INIT)) + return true; + /* Check if the complex was not only partially defined. */ if (partial && is_gimple_assign (def_stmt) && gimple_assign_rhs_code (def_stmt) == COMPLEX_EXPR) -- 2.11.0 > On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi, > > This is an update for our previous discussion. > > 1. I implemented the following two different implementations in the latest upstream gcc: > > A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > > D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > > Note, in this initial implementation, > ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern > is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. > > ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for > runtime performance study. > ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). > > 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: > > no: default. (-g -O2 -march=native ) > A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > > And then compute the slowdown data for both A and D as following: > > benchmarks A / no D /no > > 500.perlbench_r 1.25% 1.25% > 502.gcc_r 0.68% 1.80% > 505.mcf_r 0.68% 0.14% > 520.omnetpp_r 4.83% 4.68% > 523.xalancbmk_r 0.18% 1.96% > 525.x264_r 1.55% 2.07% > 531.deepsjeng_ 11.57% 11.85% > 541.leela_r 0.64% 0.80% > 557.xz_ -0.41% -0.41% > > 507.cactuBSSN_r 0.44% 0.44% > 508.namd_r 0.34% 0.34% > 510.parest_r 0.17% 0.25% > 511.povray_r 56.57% 57.27% > 519.lbm_r 0.00% 0.00% > 521.wrf_r -0.28% -0.37% > 526.blender_r 16.96% 17.71% > 527.cam4_r 0.70% 0.53% > 538.imagick_r 2.40% 2.40% > 544.nab_r 0.00% -0.65% > > avg 5.17% 5.37% > > From the above data, we can see that in general, the runtime performance slowdown for > implementation A and D are similar for individual benchmarks. > > There are several benchmarks that have significant slowdown with the new added initialization for both > A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit > more on what kind of new initializations introduced such slowdown. > > From the current study so far, I think that approach D should be good enough for our final implementation. > So, I will try to finish approach D with the following remaining work > > ** complete the implementation of -ftrivial-auto-var-init=pattern; > ** complete the implementation of uninitialized warnings maintenance work for D. > > > Let me know if you have any comments and suggestions on my current and future work. > > Thanks a lot for your help. > > Qing > >> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >> >> The following are the approaches I will implement and compare: >> >> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. >> >> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. >> B. Adding real initialization during gimplification, marking them with “artificial_init”. >> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not >> Deleted from the fake init. >> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, >> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, >> add real initialization for all DECLs that are marked with “no_explicit_init”. >> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >> >> >> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance >> comparison. >> >> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but >> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach >> will be. If the data is good, maybe we can avoid the effort to implement B, and C. >> >> If the performance of D is not good, I will implement B or C at that time. >> >> Let me know if you have any comment or suggestions. >> >> Thanks. >> >> Qing > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-05 19:05 ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao 2021-01-05 19:10 ` Qing Zhao @ 2021-01-12 20:34 ` Qing Zhao 2021-01-13 7:39 ` Richard Biener 1 sibling, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-01-12 20:34 UTC (permalink / raw) To: Richard Biener, Richard Sandiford; +Cc: Richard Biener via Gcc-patches Hi, Just check in to see whether you have any comments and suggestions on this: FYI, I have been continue with Approach D implementation since last week: D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. For the remaining work of Approach D: ** complete the implementation of -ftrivial-auto-var-init=pattern; ** complete the implementation of uninitialized warnings maintenance work for D. I have completed the uninitialized warnings maintenance work for D. And finished partial of the -ftrivial-auto-var-init=pattern implementation. The following are remaining work of Approach D: ** -ftrivial-auto-var-init=pattern for VLA; **add a new attribute for variable: __attribute((uninitialized) the marked variable is uninitialized intentionaly for performance purpose. ** adding complete testing cases; Please let me know if you have any objection on my current decision on implementing approach D. Thanks a lot for your help. Qing > On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi, > > This is an update for our previous discussion. > > 1. I implemented the following two different implementations in the latest upstream gcc: > > A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > > D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > > Note, in this initial implementation, > ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern > is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. > > ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for > runtime performance study. > ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). > > 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: > > no: default. (-g -O2 -march=native ) > A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > > And then compute the slowdown data for both A and D as following: > > benchmarks A / no D /no > > 500.perlbench_r 1.25% 1.25% > 502.gcc_r 0.68% 1.80% > 505.mcf_r 0.68% 0.14% > 520.omnetpp_r 4.83% 4.68% > 523.xalancbmk_r 0.18% 1.96% > 525.x264_r 1.55% 2.07% > 531.deepsjeng_ 11.57% 11.85% > 541.leela_r 0.64% 0.80% > 557.xz_ -0.41% -0.41% > > 507.cactuBSSN_r 0.44% 0.44% > 508.namd_r 0.34% 0.34% > 510.parest_r 0.17% 0.25% > 511.povray_r 56.57% 57.27% > 519.lbm_r 0.00% 0.00% > 521.wrf_r -0.28% -0.37% > 526.blender_r 16.96% 17.71% > 527.cam4_r 0.70% 0.53% > 538.imagick_r 2.40% 2.40% > 544.nab_r 0.00% -0.65% > > avg 5.17% 5.37% > > From the above data, we can see that in general, the runtime performance slowdown for > implementation A and D are similar for individual benchmarks. > > There are several benchmarks that have significant slowdown with the new added initialization for both > A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit > more on what kind of new initializations introduced such slowdown. > > From the current study so far, I think that approach D should be good enough for our final implementation. > So, I will try to finish approach D with the following remaining work > > ** complete the implementation of -ftrivial-auto-var-init=pattern; > ** complete the implementation of uninitialized warnings maintenance work for D. > > > Let me know if you have any comments and suggestions on my current and future work. > > Thanks a lot for your help. > > Qing > >> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >> >> The following are the approaches I will implement and compare: >> >> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. >> >> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. >> B. Adding real initialization during gimplification, marking them with “artificial_init”. >> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not >> Deleted from the fake init. >> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, >> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, >> add real initialization for all DECLs that are marked with “no_explicit_init”. >> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >> >> >> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance >> comparison. >> >> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but >> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach >> will be. If the data is good, maybe we can avoid the effort to implement B, and C. >> >> If the performance of D is not good, I will implement B or C at that time. >> >> Let me know if you have any comment or suggestions. >> >> Thanks. >> >> Qing > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-12 20:34 ` Qing Zhao @ 2021-01-13 7:39 ` Richard Biener 2021-01-13 15:06 ` Qing Zhao 2021-01-14 21:16 ` Qing Zhao 0 siblings, 2 replies; 56+ messages in thread From: Richard Biener @ 2021-01-13 7:39 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On Tue, 12 Jan 2021, Qing Zhao wrote: > Hi, > > Just check in to see whether you have any comments and suggestions on this: > > FYI, I have been continue with Approach D implementation since last week: > > D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > > For the remaining work of Approach D: > > ** complete the implementation of -ftrivial-auto-var-init=pattern; > ** complete the implementation of uninitialized warnings maintenance work for D. > > I have completed the uninitialized warnings maintenance work for D. > And finished partial of the -ftrivial-auto-var-init=pattern implementation. > > The following are remaining work of Approach D: > > ** -ftrivial-auto-var-init=pattern for VLA; > **add a new attribute for variable: > __attribute((uninitialized) > the marked variable is uninitialized intentionaly for performance purpose. > ** adding complete testing cases; > > > Please let me know if you have any objection on my current decision on implementing approach D. Did you do any analysis on how stack usage and code size are changed with approach D? How does compile-time behave (we could gobble up lots of .DEFERRED_INIT calls I guess)? Richard. > Thanks a lot for your help. > > Qing > > > > On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > Hi, > > > > This is an update for our previous discussion. > > > > 1. I implemented the following two different implementations in the latest upstream gcc: > > > > A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > > > > D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > > real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > > > > Note, in this initial implementation, > > ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern > > is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. > > > > ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for > > runtime performance study. > > ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). > > > > 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: > > > > no: default. (-g -O2 -march=native ) > > A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > > D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > > > > And then compute the slowdown data for both A and D as following: > > > > benchmarks A / no D /no > > > > 500.perlbench_r 1.25% 1.25% > > 502.gcc_r 0.68% 1.80% > > 505.mcf_r 0.68% 0.14% > > 520.omnetpp_r 4.83% 4.68% > > 523.xalancbmk_r 0.18% 1.96% > > 525.x264_r 1.55% 2.07% > > 531.deepsjeng_ 11.57% 11.85% > > 541.leela_r 0.64% 0.80% > > 557.xz_ -0.41% -0.41% > > > > 507.cactuBSSN_r 0.44% 0.44% > > 508.namd_r 0.34% 0.34% > > 510.parest_r 0.17% 0.25% > > 511.povray_r 56.57% 57.27% > > 519.lbm_r 0.00% 0.00% > > 521.wrf_r -0.28% -0.37% > > 526.blender_r 16.96% 17.71% > > 527.cam4_r 0.70% 0.53% > > 538.imagick_r 2.40% 2.40% > > 544.nab_r 0.00% -0.65% > > > > avg 5.17% 5.37% > > > > From the above data, we can see that in general, the runtime performance slowdown for > > implementation A and D are similar for individual benchmarks. > > > > There are several benchmarks that have significant slowdown with the new added initialization for both > > A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit > > more on what kind of new initializations introduced such slowdown. > > > > From the current study so far, I think that approach D should be good enough for our final implementation. > > So, I will try to finish approach D with the following remaining work > > > > ** complete the implementation of -ftrivial-auto-var-init=pattern; > > ** complete the implementation of uninitialized warnings maintenance work for D. > > > > > > Let me know if you have any comments and suggestions on my current and future work. > > > > Thanks a lot for your help. > > > > Qing > > > >> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > >> > >> The following are the approaches I will implement and compare: > >> > >> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. > >> > >> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > >> B. Adding real initialization during gimplification, marking them with “artificial_init”. > >> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not > >> Deleted from the fake init. > >> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, > >> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, > >> add real initialization for all DECLs that are marked with “no_explicit_init”. > >> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >> > >> > >> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance > >> comparison. > >> > >> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but > >> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach > >> will be. If the data is good, maybe we can avoid the effort to implement B, and C. > >> > >> If the performance of D is not good, I will implement B or C at that time. > >> > >> Let me know if you have any comment or suggestions. > >> > >> Thanks. > >> > >> Qing > > > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-13 7:39 ` Richard Biener @ 2021-01-13 15:06 ` Qing Zhao 2021-01-13 15:10 ` Richard Biener 2021-01-14 21:16 ` Qing Zhao 1 sibling, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-01-13 15:06 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches > On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote: > > On Tue, 12 Jan 2021, Qing Zhao wrote: > >> Hi, >> >> Just check in to see whether you have any comments and suggestions on this: >> >> FYI, I have been continue with Approach D implementation since last week: >> >> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >> >> For the remaining work of Approach D: >> >> ** complete the implementation of -ftrivial-auto-var-init=pattern; >> ** complete the implementation of uninitialized warnings maintenance work for D. >> >> I have completed the uninitialized warnings maintenance work for D. >> And finished partial of the -ftrivial-auto-var-init=pattern implementation. >> >> The following are remaining work of Approach D: >> >> ** -ftrivial-auto-var-init=pattern for VLA; >> **add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly for performance purpose. >> ** adding complete testing cases; >> >> >> Please let me know if you have any objection on my current decision on implementing approach D. > > Did you do any analysis on how stack usage and code size are changed > with approach D? I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually). But not the stack usage. Not sure how to collect the stack usage data, do you have any suggestion on this? > How does compile-time behave (we could gobble up > lots of .DEFERRED_INIT calls I guess)? I can collect this data too and report it later. Thanks. Qing > > Richard. > >> Thanks a lot for your help. >> >> Qing >> >> >>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >>> >>> Hi, >>> >>> This is an update for our previous discussion. >>> >>> 1. I implemented the following two different implementations in the latest upstream gcc: >>> >>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. >>> >>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >>> >>> Note, in this initial implementation, >>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern >>> is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. >>> >>> ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for >>> runtime performance study. >>> ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). >>> >>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: >>> >>> no: default. (-g -O2 -march=native ) >>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A >>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D >>> >>> And then compute the slowdown data for both A and D as following: >>> >>> benchmarks A / no D /no >>> >>> 500.perlbench_r 1.25% 1.25% >>> 502.gcc_r 0.68% 1.80% >>> 505.mcf_r 0.68% 0.14% >>> 520.omnetpp_r 4.83% 4.68% >>> 523.xalancbmk_r 0.18% 1.96% >>> 525.x264_r 1.55% 2.07% >>> 531.deepsjeng_ 11.57% 11.85% >>> 541.leela_r 0.64% 0.80% >>> 557.xz_ -0.41% -0.41% >>> >>> 507.cactuBSSN_r 0.44% 0.44% >>> 508.namd_r 0.34% 0.34% >>> 510.parest_r 0.17% 0.25% >>> 511.povray_r 56.57% 57.27% >>> 519.lbm_r 0.00% 0.00% >>> 521.wrf_r -0.28% -0.37% >>> 526.blender_r 16.96% 17.71% >>> 527.cam4_r 0.70% 0.53% >>> 538.imagick_r 2.40% 2.40% >>> 544.nab_r 0.00% -0.65% >>> >>> avg 5.17% 5.37% >>> >>> From the above data, we can see that in general, the runtime performance slowdown for >>> implementation A and D are similar for individual benchmarks. >>> >>> There are several benchmarks that have significant slowdown with the new added initialization for both >>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit >>> more on what kind of new initializations introduced such slowdown. >>> >>> From the current study so far, I think that approach D should be good enough for our final implementation. >>> So, I will try to finish approach D with the following remaining work >>> >>> ** complete the implementation of -ftrivial-auto-var-init=pattern; >>> ** complete the implementation of uninitialized warnings maintenance work for D. >>> >>> >>> Let me know if you have any comments and suggestions on my current and future work. >>> >>> Thanks a lot for your help. >>> >>> Qing >>> >>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >>>> >>>> The following are the approaches I will implement and compare: >>>> >>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. >>>> >>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. >>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. >>>> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not >>>> Deleted from the fake init. >>>> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, >>>> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, >>>> add real initialization for all DECLs that are marked with “no_explicit_init”. >>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >>>> >>>> >>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance >>>> comparison. >>>> >>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but >>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach >>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. >>>> >>>> If the performance of D is not good, I will implement B or C at that time. >>>> >>>> Let me know if you have any comment or suggestions. >>>> >>>> Thanks. >>>> >>>> Qing >>> >> >> > > -- > Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-13 15:06 ` Qing Zhao @ 2021-01-13 15:10 ` Richard Biener 2021-01-13 15:35 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2021-01-13 15:10 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On Wed, 13 Jan 2021, Qing Zhao wrote: > > > > On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote: > > > > On Tue, 12 Jan 2021, Qing Zhao wrote: > > > >> Hi, > >> > >> Just check in to see whether you have any comments and suggestions on this: > >> > >> FYI, I have been continue with Approach D implementation since last week: > >> > >> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >> > >> For the remaining work of Approach D: > >> > >> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >> ** complete the implementation of uninitialized warnings maintenance work for D. > >> > >> I have completed the uninitialized warnings maintenance work for D. > >> And finished partial of the -ftrivial-auto-var-init=pattern implementation. > >> > >> The following are remaining work of Approach D: > >> > >> ** -ftrivial-auto-var-init=pattern for VLA; > >> **add a new attribute for variable: > >> __attribute((uninitialized) > >> the marked variable is uninitialized intentionaly for performance purpose. > >> ** adding complete testing cases; > >> > >> > >> Please let me know if you have any objection on my current decision on implementing approach D. > > > > Did you do any analysis on how stack usage and code size are changed > > with approach D? > > I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually). > > But not the stack usage. Not sure how to collect the stack usage data, > do you have any suggestion on this? There is -fstack-usage you could use, then of course watching the stack segment at runtime. I'm mostly concerned about stack-limited "processes" such as the linux kernel which I think is a primary target of your work. Richard. > > > How does compile-time behave (we could gobble up > > lots of .DEFERRED_INIT calls I guess)? > I can collect this data too and report it later. > > Thanks. > > Qing > > > > Richard. > > > >> Thanks a lot for your help. > >> > >> Qing > >> > >> > >>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > >>> > >>> Hi, > >>> > >>> This is an update for our previous discussion. > >>> > >>> 1. I implemented the following two different implementations in the latest upstream gcc: > >>> > >>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > >>> > >>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >>> > >>> Note, in this initial implementation, > >>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern > >>> is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. > >>> > >>> ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for > >>> runtime performance study. > >>> ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). > >>> > >>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: > >>> > >>> no: default. (-g -O2 -march=native ) > >>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > >>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > >>> > >>> And then compute the slowdown data for both A and D as following: > >>> > >>> benchmarks A / no D /no > >>> > >>> 500.perlbench_r 1.25% 1.25% > >>> 502.gcc_r 0.68% 1.80% > >>> 505.mcf_r 0.68% 0.14% > >>> 520.omnetpp_r 4.83% 4.68% > >>> 523.xalancbmk_r 0.18% 1.96% > >>> 525.x264_r 1.55% 2.07% > >>> 531.deepsjeng_ 11.57% 11.85% > >>> 541.leela_r 0.64% 0.80% > >>> 557.xz_ -0.41% -0.41% > >>> > >>> 507.cactuBSSN_r 0.44% 0.44% > >>> 508.namd_r 0.34% 0.34% > >>> 510.parest_r 0.17% 0.25% > >>> 511.povray_r 56.57% 57.27% > >>> 519.lbm_r 0.00% 0.00% > >>> 521.wrf_r -0.28% -0.37% > >>> 526.blender_r 16.96% 17.71% > >>> 527.cam4_r 0.70% 0.53% > >>> 538.imagick_r 2.40% 2.40% > >>> 544.nab_r 0.00% -0.65% > >>> > >>> avg 5.17% 5.37% > >>> > >>> From the above data, we can see that in general, the runtime performance slowdown for > >>> implementation A and D are similar for individual benchmarks. > >>> > >>> There are several benchmarks that have significant slowdown with the new added initialization for both > >>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit > >>> more on what kind of new initializations introduced such slowdown. > >>> > >>> From the current study so far, I think that approach D should be good enough for our final implementation. > >>> So, I will try to finish approach D with the following remaining work > >>> > >>> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >>> ** complete the implementation of uninitialized warnings maintenance work for D. > >>> > >>> > >>> Let me know if you have any comments and suggestions on my current and future work. > >>> > >>> Thanks a lot for your help. > >>> > >>> Qing > >>> > >>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > >>>> > >>>> The following are the approaches I will implement and compare: > >>>> > >>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. > >>>> > >>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > >>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. > >>>> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not > >>>> Deleted from the fake init. > >>>> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, > >>>> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, > >>>> add real initialization for all DECLs that are marked with “no_explicit_init”. > >>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >>>> > >>>> > >>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance > >>>> comparison. > >>>> > >>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but > >>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach > >>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. > >>>> > >>>> If the performance of D is not good, I will implement B or C at that time. > >>>> > >>>> Let me know if you have any comment or suggestions. > >>>> > >>>> Thanks. > >>>> > >>>> Qing > >>> > >> > >> > > > > -- > > Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-13 15:10 ` Richard Biener @ 2021-01-13 15:35 ` Qing Zhao 2021-01-13 15:40 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-01-13 15:35 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches > On Jan 13, 2021, at 9:10 AM, Richard Biener <rguenther@suse.de> wrote: > > On Wed, 13 Jan 2021, Qing Zhao wrote: > >> >> >>> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote: >>> >>> On Tue, 12 Jan 2021, Qing Zhao wrote: >>> >>>> Hi, >>>> >>>> Just check in to see whether you have any comments and suggestions on this: >>>> >>>> FYI, I have been continue with Approach D implementation since last week: >>>> >>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >>>> >>>> For the remaining work of Approach D: >>>> >>>> ** complete the implementation of -ftrivial-auto-var-init=pattern; >>>> ** complete the implementation of uninitialized warnings maintenance work for D. >>>> >>>> I have completed the uninitialized warnings maintenance work for D. >>>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. >>>> >>>> The following are remaining work of Approach D: >>>> >>>> ** -ftrivial-auto-var-init=pattern for VLA; >>>> **add a new attribute for variable: >>>> __attribute((uninitialized) >>>> the marked variable is uninitialized intentionaly for performance purpose. >>>> ** adding complete testing cases; >>>> >>>> >>>> Please let me know if you have any objection on my current decision on implementing approach D. >>> >>> Did you do any analysis on how stack usage and code size are changed >>> with approach D? >> >> I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually). >> >> But not the stack usage. Not sure how to collect the stack usage data, >> do you have any suggestion on this? > > There is -fstack-usage you could use, then of course watching > the stack segment at runtime. I can do this for CPU2017 to collect the stack usage data and report back. > I'm mostly concerned about > stack-limited "processes" such as the linux kernel which I think > is a primary target of your work. I don’t have any experience on building linux kernel. Do we have to collect data for linux kernel at this time? Is CPU2017 data not enough? Qing > > Richard. > >> >>> How does compile-time behave (we could gobble up >>> lots of .DEFERRED_INIT calls I guess)? >> I can collect this data too and report it later. >> >> Thanks. >> >> Qing >>> >>> Richard. >>> >>>> Thanks a lot for your help. >>>> >>>> Qing >>>> >>>> >>>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >>>>> >>>>> Hi, >>>>> >>>>> This is an update for our previous discussion. >>>>> >>>>> 1. I implemented the following two different implementations in the latest upstream gcc: >>>>> >>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. >>>>> >>>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >>>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >>>>> >>>>> Note, in this initial implementation, >>>>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern >>>>> is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. >>>>> >>>>> ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for >>>>> runtime performance study. >>>>> ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). >>>>> >>>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: >>>>> >>>>> no: default. (-g -O2 -march=native ) >>>>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A >>>>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D >>>>> >>>>> And then compute the slowdown data for both A and D as following: >>>>> >>>>> benchmarks A / no D /no >>>>> >>>>> 500.perlbench_r 1.25% 1.25% >>>>> 502.gcc_r 0.68% 1.80% >>>>> 505.mcf_r 0.68% 0.14% >>>>> 520.omnetpp_r 4.83% 4.68% >>>>> 523.xalancbmk_r 0.18% 1.96% >>>>> 525.x264_r 1.55% 2.07% >>>>> 531.deepsjeng_ 11.57% 11.85% >>>>> 541.leela_r 0.64% 0.80% >>>>> 557.xz_ -0.41% -0.41% >>>>> >>>>> 507.cactuBSSN_r 0.44% 0.44% >>>>> 508.namd_r 0.34% 0.34% >>>>> 510.parest_r 0.17% 0.25% >>>>> 511.povray_r 56.57% 57.27% >>>>> 519.lbm_r 0.00% 0.00% >>>>> 521.wrf_r -0.28% -0.37% >>>>> 526.blender_r 16.96% 17.71% >>>>> 527.cam4_r 0.70% 0.53% >>>>> 538.imagick_r 2.40% 2.40% >>>>> 544.nab_r 0.00% -0.65% >>>>> >>>>> avg 5.17% 5.37% >>>>> >>>>> From the above data, we can see that in general, the runtime performance slowdown for >>>>> implementation A and D are similar for individual benchmarks. >>>>> >>>>> There are several benchmarks that have significant slowdown with the new added initialization for both >>>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit >>>>> more on what kind of new initializations introduced such slowdown. >>>>> >>>>> From the current study so far, I think that approach D should be good enough for our final implementation. >>>>> So, I will try to finish approach D with the following remaining work >>>>> >>>>> ** complete the implementation of -ftrivial-auto-var-init=pattern; >>>>> ** complete the implementation of uninitialized warnings maintenance work for D. >>>>> >>>>> >>>>> Let me know if you have any comments and suggestions on my current and future work. >>>>> >>>>> Thanks a lot for your help. >>>>> >>>>> Qing >>>>> >>>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >>>>>> >>>>>> The following are the approaches I will implement and compare: >>>>>> >>>>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. >>>>>> >>>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. >>>>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. >>>>>> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not >>>>>> Deleted from the fake init. >>>>>> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, >>>>>> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, >>>>>> add real initialization for all DECLs that are marked with “no_explicit_init”. >>>>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >>>>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >>>>>> >>>>>> >>>>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance >>>>>> comparison. >>>>>> >>>>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but >>>>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach >>>>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. >>>>>> >>>>>> If the performance of D is not good, I will implement B or C at that time. >>>>>> >>>>>> Let me know if you have any comment or suggestions. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Qing >>>>> >>>> >>>> >>> >>> -- >>> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de> <mailto:rguenther@suse.de <mailto:rguenther@suse.de>>> >>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, >>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) >> >> > > -- > Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-13 15:35 ` Qing Zhao @ 2021-01-13 15:40 ` Richard Biener 0 siblings, 0 replies; 56+ messages in thread From: Richard Biener @ 2021-01-13 15:40 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On Wed, 13 Jan 2021, Qing Zhao wrote: > > > > On Jan 13, 2021, at 9:10 AM, Richard Biener <rguenther@suse.de> wrote: > > > > On Wed, 13 Jan 2021, Qing Zhao wrote: > > > >> > >> > >>> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote: > >>> > >>> On Tue, 12 Jan 2021, Qing Zhao wrote: > >>> > >>>> Hi, > >>>> > >>>> Just check in to see whether you have any comments and suggestions on this: > >>>> > >>>> FYI, I have been continue with Approach D implementation since last week: > >>>> > >>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >>>> > >>>> For the remaining work of Approach D: > >>>> > >>>> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >>>> ** complete the implementation of uninitialized warnings maintenance work for D. > >>>> > >>>> I have completed the uninitialized warnings maintenance work for D. > >>>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. > >>>> > >>>> The following are remaining work of Approach D: > >>>> > >>>> ** -ftrivial-auto-var-init=pattern for VLA; > >>>> **add a new attribute for variable: > >>>> __attribute((uninitialized) > >>>> the marked variable is uninitialized intentionaly for performance purpose. > >>>> ** adding complete testing cases; > >>>> > >>>> > >>>> Please let me know if you have any objection on my current decision on implementing approach D. > >>> > >>> Did you do any analysis on how stack usage and code size are changed > >>> with approach D? > >> > >> I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually). > >> > >> But not the stack usage. Not sure how to collect the stack usage data, > >> do you have any suggestion on this? > > > > There is -fstack-usage you could use, then of course watching > > the stack segment at runtime. > > I can do this for CPU2017 to collect the stack usage data and report back. > > > I'm mostly concerned about > > stack-limited "processes" such as the linux kernel which I think > > is a primary target of your work. > > I don’t have any experience on building linux kernel. > Do we have to collect data for linux kernel at this time? Is CPU2017 data not enough? Well, it depends on the desired target. The linux kernel has a 8kb hard stack limit for kernel threads on x86_64 (IIRC). You don't have to do anything, it was just a suggestion. For normal program stack usage is probably the least important problem. Richard. > Qing > > > > Richard. > > > >> > >>> How does compile-time behave (we could gobble up > >>> lots of .DEFERRED_INIT calls I guess)? > >> I can collect this data too and report it later. > >> > >> Thanks. > >> > >> Qing > >>> > >>> Richard. > >>> > >>>> Thanks a lot for your help. > >>>> > >>>> Qing > >>>> > >>>> > >>>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> This is an update for our previous discussion. > >>>>> > >>>>> 1. I implemented the following two different implementations in the latest upstream gcc: > >>>>> > >>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > >>>>> > >>>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >>>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >>>>> > >>>>> Note, in this initial implementation, > >>>>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern > >>>>> is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. > >>>>> > >>>>> ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for > >>>>> runtime performance study. > >>>>> ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). > >>>>> > >>>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: > >>>>> > >>>>> no: default. (-g -O2 -march=native ) > >>>>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > >>>>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > >>>>> > >>>>> And then compute the slowdown data for both A and D as following: > >>>>> > >>>>> benchmarks A / no D /no > >>>>> > >>>>> 500.perlbench_r 1.25% 1.25% > >>>>> 502.gcc_r 0.68% 1.80% > >>>>> 505.mcf_r 0.68% 0.14% > >>>>> 520.omnetpp_r 4.83% 4.68% > >>>>> 523.xalancbmk_r 0.18% 1.96% > >>>>> 525.x264_r 1.55% 2.07% > >>>>> 531.deepsjeng_ 11.57% 11.85% > >>>>> 541.leela_r 0.64% 0.80% > >>>>> 557.xz_ -0.41% -0.41% > >>>>> > >>>>> 507.cactuBSSN_r 0.44% 0.44% > >>>>> 508.namd_r 0.34% 0.34% > >>>>> 510.parest_r 0.17% 0.25% > >>>>> 511.povray_r 56.57% 57.27% > >>>>> 519.lbm_r 0.00% 0.00% > >>>>> 521.wrf_r -0.28% -0.37% > >>>>> 526.blender_r 16.96% 17.71% > >>>>> 527.cam4_r 0.70% 0.53% > >>>>> 538.imagick_r 2.40% 2.40% > >>>>> 544.nab_r 0.00% -0.65% > >>>>> > >>>>> avg 5.17% 5.37% > >>>>> > >>>>> From the above data, we can see that in general, the runtime performance slowdown for > >>>>> implementation A and D are similar for individual benchmarks. > >>>>> > >>>>> There are several benchmarks that have significant slowdown with the new added initialization for both > >>>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit > >>>>> more on what kind of new initializations introduced such slowdown. > >>>>> > >>>>> From the current study so far, I think that approach D should be good enough for our final implementation. > >>>>> So, I will try to finish approach D with the following remaining work > >>>>> > >>>>> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >>>>> ** complete the implementation of uninitialized warnings maintenance work for D. > >>>>> > >>>>> > >>>>> Let me know if you have any comments and suggestions on my current and future work. > >>>>> > >>>>> Thanks a lot for your help. > >>>>> > >>>>> Qing > >>>>> > >>>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > >>>>>> > >>>>>> The following are the approaches I will implement and compare: > >>>>>> > >>>>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. > >>>>>> > >>>>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > >>>>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. > >>>>>> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not > >>>>>> Deleted from the fake init. > >>>>>> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, > >>>>>> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, > >>>>>> add real initialization for all DECLs that are marked with “no_explicit_init”. > >>>>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >>>>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >>>>>> > >>>>>> > >>>>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance > >>>>>> comparison. > >>>>>> > >>>>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but > >>>>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach > >>>>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. > >>>>>> > >>>>>> If the performance of D is not good, I will implement B or C at that time. > >>>>>> > >>>>>> Let me know if you have any comment or suggestions. > >>>>>> > >>>>>> Thanks. > >>>>>> > >>>>>> Qing > >>>>> > >>>> > >>>> > >>> > >>> -- > >>> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de> <mailto:rguenther@suse.de <mailto:rguenther@suse.de>>> > >>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > >>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > >> > >> > > > > -- > > Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-13 7:39 ` Richard Biener 2021-01-13 15:06 ` Qing Zhao @ 2021-01-14 21:16 ` Qing Zhao 2021-01-15 8:11 ` Richard Biener 1 sibling, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-01-14 21:16 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches Hi, More data on code size and compilation time with CPU2017: ********Compilation time data: the numbers are the slowdown against the default “no”: benchmarks A/no D/no 500.perlbench_r 5.19% 1.95% 502.gcc_r 0.46% -0.23% 505.mcf_r 0.00% 0.00% 520.omnetpp_r 0.85% 0.00% 523.xalancbmk_r 0.79% -0.40% 525.x264_r -4.48% 0.00% 531.deepsjeng_r 16.67% 16.67% 541.leela_r 0.00% 0.00% 557.xz_r 0.00% 0.00% 507.cactuBSSN_r 1.16% 0.58% 508.namd_r 9.62% 8.65% 510.parest_r 0.48% 1.19% 511.povray_r 3.70% 3.70% 519.lbm_r 0.00% 0.00% 521.wrf_r 0.05% 0.02% 526.blender_r 0.33% 1.32% 527.cam4_r -0.93% -0.93% 538.imagick_r 1.32% 3.95% 544.nab_r 0.00% 0.00% From the above data, looks like that the compilation time impact from implementation A and D are almost the same. *******code size data: the numbers are the code size increase against the default “no”: benchmarks A/no D/no 500.perlbench_r 2.84% 0.34% 502.gcc_r 2.59% 0.35% 505.mcf_r 3.55% 0.39% 520.omnetpp_r 0.54% 0.03% 523.xalancbmk_r 0.36% 0.39% 525.x264_r 1.39% 0.13% 531.deepsjeng_r 2.15% -1.12% 541.leela_r 0.50% -0.20% 557.xz_r 0.31% 0.13% 507.cactuBSSN_r 5.00% -0.01% 508.namd_r 3.64% -0.07% 510.parest_r 1.12% 0.33% 511.povray_r 4.18% 1.16% 519.lbm_r 8.83% 6.44% 521.wrf_r 0.08% 0.02% 526.blender_r 1.63% 0.45% 527.cam4_r 0.16% 0.06% 538.imagick_r 3.18% -0.80% 544.nab_r 5.76% -1.11% Avg 2.52% 0.36% From the above data, the implementation D is always better than A, it’s a surprising to me, not sure what’s the reason for this. ********stack usage data, I added -fstack-usage to the compilation line when compiling CPU2017 benchmarks. And all the *.su files were generated for each of the modules. Since there a lot of such files, and the stack size information are embedded in each of the files. I just picked up one benchmark 511.povray to check. Which is the one that has the most runtime overhead when adding initialization (both A and D). I identified all the *.su files that are different between A and D and do a diff on those *.su files, and looks like that the stack size is much higher with D than that with A, for example: $ diff build_base_auto_init.D.0000/bbox.su build_base_auto_init.A.0000/bbox.su 5c5 < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, pov::BBOX_TREE**&, long int*, long int, long int) 160 static --- > bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, pov::BBOX_TREE**&, long int*, long int, long int) 96 static $ diff build_base_auto_init.D.0000/image.su build_base_auto_init.A.0000/image.su 9c9 < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 624 static --- > image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 272 static …. Looks like that implementation D has more stack size impact than A. Do you have any insight on what the reason for this? Let me know if you have any comments and suggestions. thanks. Qing > On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> wrote: > > On Tue, 12 Jan 2021, Qing Zhao wrote: > >> Hi, >> >> Just check in to see whether you have any comments and suggestions on this: >> >> FYI, I have been continue with Approach D implementation since last week: >> >> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >> >> For the remaining work of Approach D: >> >> ** complete the implementation of -ftrivial-auto-var-init=pattern; >> ** complete the implementation of uninitialized warnings maintenance work for D. >> >> I have completed the uninitialized warnings maintenance work for D. >> And finished partial of the -ftrivial-auto-var-init=pattern implementation. >> >> The following are remaining work of Approach D: >> >> ** -ftrivial-auto-var-init=pattern for VLA; >> **add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly for performance purpose. >> ** adding complete testing cases; >> >> >> Please let me know if you have any objection on my current decision on implementing approach D. > > Did you do any analysis on how stack usage and code size are changed > with approach D? How does compile-time behave (we could gobble up > lots of .DEFERRED_INIT calls I guess)? > > Richard. > >> Thanks a lot for your help. >> >> Qing >> >> >>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >>> >>> Hi, >>> >>> This is an update for our previous discussion. >>> >>> 1. I implemented the following two different implementations in the latest upstream gcc: >>> >>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. >>> >>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >>> >>> Note, in this initial implementation, >>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern >>> is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. >>> >>> ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for >>> runtime performance study. >>> ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). >>> >>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: >>> >>> no: default. (-g -O2 -march=native ) >>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A >>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D >>> >>> And then compute the slowdown data for both A and D as following: >>> >>> benchmarks A / no D /no >>> >>> 500.perlbench_r 1.25% 1.25% >>> 502.gcc_r 0.68% 1.80% >>> 505.mcf_r 0.68% 0.14% >>> 520.omnetpp_r 4.83% 4.68% >>> 523.xalancbmk_r 0.18% 1.96% >>> 525.x264_r 1.55% 2.07% >>> 531.deepsjeng_ 11.57% 11.85% >>> 541.leela_r 0.64% 0.80% >>> 557.xz_ -0.41% -0.41% >>> >>> 507.cactuBSSN_r 0.44% 0.44% >>> 508.namd_r 0.34% 0.34% >>> 510.parest_r 0.17% 0.25% >>> 511.povray_r 56.57% 57.27% >>> 519.lbm_r 0.00% 0.00% >>> 521.wrf_r -0.28% -0.37% >>> 526.blender_r 16.96% 17.71% >>> 527.cam4_r 0.70% 0.53% >>> 538.imagick_r 2.40% 2.40% >>> 544.nab_r 0.00% -0.65% >>> >>> avg 5.17% 5.37% >>> >>> From the above data, we can see that in general, the runtime performance slowdown for >>> implementation A and D are similar for individual benchmarks. >>> >>> There are several benchmarks that have significant slowdown with the new added initialization for both >>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit >>> more on what kind of new initializations introduced such slowdown. >>> >>> From the current study so far, I think that approach D should be good enough for our final implementation. >>> So, I will try to finish approach D with the following remaining work >>> >>> ** complete the implementation of -ftrivial-auto-var-init=pattern; >>> ** complete the implementation of uninitialized warnings maintenance work for D. >>> >>> >>> Let me know if you have any comments and suggestions on my current and future work. >>> >>> Thanks a lot for your help. >>> >>> Qing >>> >>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >>>> >>>> The following are the approaches I will implement and compare: >>>> >>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. >>>> >>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. >>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. >>>> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not >>>> Deleted from the fake init. >>>> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, >>>> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, >>>> add real initialization for all DECLs that are marked with “no_explicit_init”. >>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to >>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. >>>> >>>> >>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance >>>> comparison. >>>> >>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but >>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach >>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. >>>> >>>> If the performance of D is not good, I will implement B or C at that time. >>>> >>>> Let me know if you have any comment or suggestions. >>>> >>>> Thanks. >>>> >>>> Qing >>> >> >> > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-14 21:16 ` Qing Zhao @ 2021-01-15 8:11 ` Richard Biener 2021-01-15 16:16 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2021-01-15 8:11 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On Thu, 14 Jan 2021, Qing Zhao wrote: > Hi, > More data on code size and compilation time with CPU2017: > > ********Compilation time data: the numbers are the slowdown against the > default “no”: > > benchmarks A/no D/no > > 500.perlbench_r 5.19% 1.95% > 502.gcc_r 0.46% -0.23% > 505.mcf_r 0.00% 0.00% > 520.omnetpp_r 0.85% 0.00% > 523.xalancbmk_r 0.79% -0.40% > 525.x264_r -4.48% 0.00% > 531.deepsjeng_r 16.67% 16.67% > 541.leela_r 0.00% 0.00% > 557.xz_r 0.00% 0.00% > > 507.cactuBSSN_r 1.16% 0.58% > 508.namd_r 9.62% 8.65% > 510.parest_r 0.48% 1.19% > 511.povray_r 3.70% 3.70% > 519.lbm_r 0.00% 0.00% > 521.wrf_r 0.05% 0.02% > 526.blender_r 0.33% 1.32% > 527.cam4_r -0.93% -0.93% > 538.imagick_r 1.32% 3.95% > 544.nab_r 0.00% 0.00% > > From the above data, looks like that the compilation time impact > from implementation A and D are almost the same. > *******code size data: the numbers are the code size increase against the > default “no”: > benchmarks A/no D/no > > 500.perlbench_r 2.84% 0.34% > 502.gcc_r 2.59% 0.35% > 505.mcf_r 3.55% 0.39% > 520.omnetpp_r 0.54% 0.03% > 523.xalancbmk_r 0.36% 0.39% > 525.x264_r 1.39% 0.13% > 531.deepsjeng_r 2.15% -1.12% > 541.leela_r 0.50% -0.20% > 557.xz_r 0.31% 0.13% > > 507.cactuBSSN_r 5.00% -0.01% > 508.namd_r 3.64% -0.07% > 510.parest_r 1.12% 0.33% > 511.povray_r 4.18% 1.16% > 519.lbm_r 8.83% 6.44% > 521.wrf_r 0.08% 0.02% > 526.blender_r 1.63% 0.45% > 527.cam4_r 0.16% 0.06% > 538.imagick_r 3.18% -0.80% > 544.nab_r 5.76% -1.11% > Avg 2.52% 0.36% > > From the above data, the implementation D is always better than A, it’s a > surprising to me, not sure what’s the reason for this. D probably inhibits most interesting loop transforms (check SPEC FP performance). It will also most definitely disallow SRA which, when an aggregate is not completely elided, tends to grow code. > ********stack usage data, I added -fstack-usage to the compilation line when > compiling CPU2017 benchmarks. And all the *.su files were generated for each > of the modules. > Since there a lot of such files, and the stack size information are embedded > in each of the files. I just picked up one benchmark 511.povray to > check. Which is the one that > has the most runtime overhead when adding initialization (both A and D). > > I identified all the *.su files that are different between A and D and do a > diff on those *.su files, and looks like that the stack size is much higher > with D than that with A, for example: > > $ diff build_base_auto_init.D.0000/bbox.su > build_base_auto_init.A.0000/bbox.su5c5 > < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, > pov::BBOX_TREE**&, long int*, long int, long int) 160 static > --- > > bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, > pov::BBOX_TREE**&, long int*, long int, long int) 96 static > > $ diff build_base_auto_init.D.0000/image.su > build_base_auto_init.A.0000/image.su > 9c9 > < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 624 > static > --- > > image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 272 > static > …. > Looks like that implementation D has more stack size impact than A. > > Do you have any insight on what the reason for this? D will keep all initialized aggregates as aggregates and live which means stack will be allocated for it. With A the usual optimizations to reduce stack usage can be applied. > Let me know if you have any comments and suggestions. First of all I would check whether the prototype implementations work as expected. Richard. > thanks. > > Qing > On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> > wrote: > > On Tue, 12 Jan 2021, Qing Zhao wrote: > > Hi, > > Just check in to see whether you have any comments > and suggestions on this: > > FYI, I have been continue with Approach D > implementation since last week: > > D. Adding calls to .DEFFERED_INIT during > gimplification, expand the .DEFFERED_INIT during > expand to > real initialization. Adjusting uninitialized pass > with the new refs with “.DEFFERED_INIT”. > > For the remaining work of Approach D: > > ** complete the implementation of > -ftrivial-auto-var-init=pattern; > ** complete the implementation of uninitialized > warnings maintenance work for D. > > I have completed the uninitialized warnings > maintenance work for D. > And finished partial of the > -ftrivial-auto-var-init=pattern implementation. > > The following are remaining work of Approach D: > > ** -ftrivial-auto-var-init=pattern for VLA; > **add a new attribute for variable: > __attribute((uninitialized) > the marked variable is uninitialized intentionaly > for performance purpose. > ** adding complete testing cases; > > > Please let me know if you have any objection on my > current decision on implementing approach D. > > > Did you do any analysis on how stack usage and code size are > changed > with approach D? How does compile-time behave (we could gobble > up > lots of .DEFERRED_INIT calls I guess)? > > Richard. > > Thanks a lot for your help. > > Qing > > > On Jan 5, 2021, at 1:05 PM, Qing Zhao > via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > Hi, > > This is an update for our previous > discussion. > > 1. I implemented the following two > different implementations in the latest > upstream gcc: > > A. Adding real initialization during > gimplification, not maintain the > uninitialized warnings. > > D. Adding calls to .DEFFERED_INIT > during gimplification, expand the > .DEFFERED_INIT during expand to > real initialization. Adjusting > uninitialized pass with the new refs > with “.DEFFERED_INIT”. > > Note, in this initial implementation, > ** I ONLY implement > -ftrivial-auto-var-init=zero, the > implementation of > -ftrivial-auto-var-init=pattern > is not done yet. Therefore, the > performance data is only about > -ftrivial-auto-var-init=zero. > > ** I added an temporary option > -fauto-var-init-approach=A|B|C|D to > choose implementation A or D for > runtime performance study. > ** I didn’t finish the uninitialized > warnings maintenance work for D. (That > might take more time than I expected). > > 2. I collected runtime data for CPU2017 > on a x86 machine with this new gcc for > the following 3 cases: > > no: default. (-g -O2 -march=native ) > A: default + > -ftrivial-auto-var-init=zero > -fauto-var-init-approach=A > D: default + > -ftrivial-auto-var-init=zero > -fauto-var-init-approach=D > > And then compute the slowdown data for > both A and D as following: > > benchmarks A / no D /no > > 500.perlbench_r 1.25% 1.25% > 502.gcc_r 0.68% 1.80% > 505.mcf_r 0.68% 0.14% > 520.omnetpp_r 4.83% 4.68% > 523.xalancbmk_r 0.18% 1.96% > 525.x264_r 1.55% 2.07% > 531.deepsjeng_ 11.57% 11.85% > 541.leela_r 0.64% 0.80% > 557.xz_ -0.41% -0.41% > > 507.cactuBSSN_r 0.44% 0.44% > 508.namd_r 0.34% 0.34% > 510.parest_r 0.17% 0.25% > 511.povray_r 56.57% 57.27% > 519.lbm_r 0.00% 0.00% > 521.wrf_r -0.28% -0.37% > 526.blender_r 16.96% 17.71% > 527.cam4_r 0.70% 0.53% > 538.imagick_r 2.40% 2.40% > 544.nab_r 0.00% -0.65% > > avg 5.17% 5.37% > > From the above data, we can see that in > general, the runtime performance > slowdown for > implementation A and D are similar for > individual benchmarks. > > There are several benchmarks that have > significant slowdown with the new added > initialization for both > A and D, for example, 511.povray_r, > 526.blender_, and 531.deepsjeng_r, I > will try to study a little bit > more on what kind of new initializations > introduced such slowdown. > > From the current study so far, I think > that approach D should be good enough > for our final implementation. > So, I will try to finish approach D with > the following remaining work > > ** complete the implementation of > -ftrivial-auto-var-init=pattern; > ** complete the implementation of > uninitialized warnings maintenance work > for D. > > > Let me know if you have any comments and > suggestions on my current and future > work. > > Thanks a lot for your help. > > Qing > > On Dec 9, 2020, at 10:18 AM, > Qing Zhao via Gcc-patches > <gcc-patches@gcc.gnu.org> > wrote: > > The following are the > approaches I will implement > and compare: > > Our final goal is to keep > the uninitialized warning > and minimize the run-time > performance cost. > > A. Adding real > initialization during > gimplification, not maintain > the uninitialized warnings. > B. Adding real > initialization during > gimplification, marking them > with “artificial_init”. > Adjusting uninitialized > pass, maintaining the > annotation, making sure the > real init not > Deleted from the fake > init. > C. Marking the DECL for an > uninitialized auto variable > as “no_explicit_init” during > gimplification, > maintain this > “no_explicit_init” bit till > after > pass_late_warn_uninitialized, > or till pass_expand, > add real initialization > for all DECLs that are > marked with > “no_explicit_init”. > D. Adding .DEFFERED_INIT > during gimplification, > expand the .DEFFERED_INIT > during expand to > real initialization. > Adjusting uninitialized pass > with the new refs with > “.DEFFERED_INIT”. > > > In the above, approach A > will be the one that have > the minimum run-time cost, > will be the base for the > performance > comparison. > > I will implement approach D > then, this one is expected > to have the most run-time > overhead among the above > list, but > Implementation should be the > cleanest among B, C, D. > Let’s see how much more > performance overhead this > approach > will be. If the data is > good, maybe we can avoid the > effort to implement B, and > C. > > If the performance of D is > not good, I will implement B > or C at that time. > > Let me know if you have any > comment or suggestions. > > Thanks. > > Qing > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-15 8:11 ` Richard Biener @ 2021-01-15 16:16 ` Qing Zhao 2021-01-15 17:22 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-01-15 16:16 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches > On Jan 15, 2021, at 2:11 AM, Richard Biener <rguenther@suse.de> wrote: > > > > On Thu, 14 Jan 2021, Qing Zhao wrote: > >> Hi, >> More data on code size and compilation time with CPU2017: >> ********Compilation time data: the numbers are the slowdown against the >> default “no”: >> benchmarks A/no D/no >> >> 500.perlbench_r 5.19% 1.95% >> 502.gcc_r 0.46% -0.23% >> 505.mcf_r 0.00% 0.00% >> 520.omnetpp_r 0.85% 0.00% >> 523.xalancbmk_r 0.79% -0.40% >> 525.x264_r -4.48% 0.00% >> 531.deepsjeng_r 16.67% 16.67% >> 541.leela_r 0.00% 0.00% >> 557.xz_r 0.00% 0.00% >> >> 507.cactuBSSN_r 1.16% 0.58% >> 508.namd_r 9.62% 8.65% >> 510.parest_r 0.48% 1.19% >> 511.povray_r 3.70% 3.70% >> 519.lbm_r 0.00% 0.00% >> 521.wrf_r 0.05% 0.02% >> 526.blender_r 0.33% 1.32% >> 527.cam4_r -0.93% -0.93% >> 538.imagick_r 1.32% 3.95% >> 544.nab_r 0.00% 0.00% >> From the above data, looks like that the compilation time impact >> from implementation A and D are almost the same. >> *******code size data: the numbers are the code size increase against the >> default “no”: >> benchmarks A/no D/no >> >> 500.perlbench_r 2.84% 0.34% >> 502.gcc_r 2.59% 0.35% >> 505.mcf_r 3.55% 0.39% >> 520.omnetpp_r 0.54% 0.03% >> 523.xalancbmk_r 0.36% 0.39% >> 525.x264_r 1.39% 0.13% >> 531.deepsjeng_r 2.15% -1.12% >> 541.leela_r 0.50% -0.20% >> 557.xz_r 0.31% 0.13% >> >> 507.cactuBSSN_r 5.00% -0.01% >> 508.namd_r 3.64% -0.07% >> 510.parest_r 1.12% 0.33% >> 511.povray_r 4.18% 1.16% >> 519.lbm_r 8.83% 6.44% >> 521.wrf_r 0.08% 0.02% >> 526.blender_r 1.63% 0.45% >> 527.cam4_r 0.16% 0.06% >> 538.imagick_r 3.18% -0.80% >> 544.nab_r 5.76% -1.11% >> Avg 2.52% 0.36% >> From the above data, the implementation D is always better than A, it’s a >> surprising to me, not sure what’s the reason for this. > > D probably inhibits most interesting loop transforms (check SPEC FP > performance). The call to .DEFERRED_INIT is marked as ECF_CONST: /* A function to represent an artifical initialization to an uninitialized automatic variable. The first argument is the variable itself, the second argument is the initialization type. */ DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) So, I assume that such const call should minimize the impact to loop optimizations. But yes, it will still inhibit some of the loop transformations. > It will also most definitely disallow SRA which, when > an aggregate is not completely elided, tends to grow code. Make sense to me. The run-time performance data for D and A are actually very similar as I posted in the previous email (I listed it here for convenience) Run-time performance overhead with A and D: benchmarks A / no D /no 500.perlbench_r 1.25% 1.25% 502.gcc_r 0.68% 1.80% 505.mcf_r 0.68% 0.14% 520.omnetpp_r 4.83% 4.68% 523.xalancbmk_r 0.18% 1.96% 525.x264_r 1.55% 2.07% 531.deepsjeng_ 11.57% 11.85% 541.leela_r 0.64% 0.80% 557.xz_ -0.41% -0.41% 507.cactuBSSN_r 0.44% 0.44% 508.namd_r 0.34% 0.34% 510.parest_r 0.17% 0.25% 511.povray_r 56.57% 57.27% 519.lbm_r 0.00% 0.00% 521.wrf_r -0.28% -0.37% 526.blender_r 16.96% 17.71% 527.cam4_r 0.70% 0.53% 538.imagick_r 2.40% 2.40% 544.nab_r 0.00% -0.65% avg 5.17% 5.37% Especially for the SPEC FP benchmarks, I didn’t see too much performance difference between A and D. I guess that the RTL optimizations might be enough to get rid of most of the overhead introduced by the additional initialization. > >> ********stack usage data, I added -fstack-usage to the compilation line when >> compiling CPU2017 benchmarks. And all the *.su files were generated for each >> of the modules. >> Since there a lot of such files, and the stack size information are embedded >> in each of the files. I just picked up one benchmark 511.povray to >> check. Which is the one that >> has the most runtime overhead when adding initialization (both A and D). >> I identified all the *.su files that are different between A and D and do a >> diff on those *.su files, and looks like that the stack size is much higher >> with D than that with A, for example: >> $ diff build_base_auto_init.D.0000/bbox.su >> build_base_auto_init.A.0000/bbox.su5c5 >> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, >> pov::BBOX_TREE**&, long int*, long int, long int) 160 static >> --- >> > bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, >> pov::BBOX_TREE**&, long int*, long int, long int) 96 static >> $ diff build_base_auto_init.D.0000/image.su >> build_base_auto_init.A.0000/image.su >> 9c9 >> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 624 >> static >> --- >> > image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*) 272 >> static >> …. >> Looks like that implementation D has more stack size impact than A. >> Do you have any insight on what the reason for this? > > D will keep all initialized aggregates as aggregates and live which > means stack will be allocated for it. With A the usual optimizations > to reduce stack usage can be applied. I checked the routine “poverties::bump_map” in 511.povray_r since it has a lot stack increase due to implementation D, by examine the IR immediate before RTL expansion phase. (image.cpp.244t.optimized), I found that we have the following additional statements for the array elements: void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double * normal) { … double p3[3]; double p2[3]; double p1[3]; float colour3[5]; float colour2[5]; float colour1[5]; … # DEBUG BEGIN_STMT colour1 = .DEFERRED_INIT (colour1, 2); colour2 = .DEFERRED_INIT (colour2, 2); colour3 = .DEFERRED_INIT (colour3, 2); # DEBUG BEGIN_STMT MEM <double> [(double[3] *)&p1] = p1$0_144(D); MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); p1 = .DEFERRED_INIT (p1, 2); # DEBUG D#12 => MEM <double> [(double[3] *)&p1] # DEBUG p1$0 => D#12 # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] # DEBUG p1$1 => D#11 # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] # DEBUG p1$2 => D#10 MEM <double> [(double[3] *)&p2] = p2$0_109(D); MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); p2 = .DEFERRED_INIT (p2, 2); # DEBUG D#9 => MEM <double> [(double[3] *)&p2] # DEBUG p2$0 => D#9 # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] # DEBUG p2$1 => D#8 # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] # DEBUG p2$2 => D#7 MEM <double> [(double[3] *)&p3] = p3$0_256(D); MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); p3 = .DEFERRED_INIT (p3, 2); …. } I guess that the above “MEM <double>….. = …” are the ones that make the differences. Which phase introduced them? > >> Let me know if you have any comments and suggestions. > > First of all I would check whether the prototype implementations > work as expected. I have done such check with small testing cases already, checking the IR generated with the implementation A or D, mainly Focus on *.c.006t.gimple. and *.c.*t.expand, all worked as expected. For the CPU2017, for example as the above, I also checked the IR for both A and D, looks like all worked as expected. Thanks. Qing > > Richard. > > >> thanks. >> Qing >> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> >> wrote: >> >> On Tue, 12 Jan 2021, Qing Zhao wrote: >> >> Hi, >> >> Just check in to see whether you have any comments >> and suggestions on this: >> >> FYI, I have been continue with Approach D >> implementation since last week: >> >> D. Adding calls to .DEFFERED_INIT during >> gimplification, expand the .DEFFERED_INIT during >> expand to >> real initialization. Adjusting uninitialized pass >> with the new refs with “.DEFFERED_INIT”. >> >> For the remaining work of Approach D: >> >> ** complete the implementation of >> -ftrivial-auto-var-init=pattern; >> ** complete the implementation of uninitialized >> warnings maintenance work for D. >> >> I have completed the uninitialized warnings >> maintenance work for D. >> And finished partial of the >> -ftrivial-auto-var-init=pattern implementation. >> >> The following are remaining work of Approach D: >> >> ** -ftrivial-auto-var-init=pattern for VLA; >> **add a new attribute for variable: >> __attribute((uninitialized) >> the marked variable is uninitialized intentionaly >> for performance purpose. >> ** adding complete testing cases; >> >> Please let me know if you have any objection on my >> current decision on implementing approach D. >> >> Did you do any analysis on how stack usage and code size are >> changed >> with approach D? How does compile-time behave (we could gobble >> up >> lots of .DEFERRED_INIT calls I guess)? >> >> Richard. >> >> Thanks a lot for your help. >> >> Qing >> >> On Jan 5, 2021, at 1:05 PM, Qing Zhao >> via Gcc-patches >> <gcc-patches@gcc.gnu.org> wrote: >> >> Hi, >> >> This is an update for our previous >> discussion. >> >> 1. I implemented the following two >> different implementations in the latest >> upstream gcc: >> >> A. Adding real initialization during >> gimplification, not maintain the >> uninitialized warnings. >> >> D. Adding calls to .DEFFERED_INIT >> during gimplification, expand the >> .DEFFERED_INIT during expand to >> real initialization. Adjusting >> uninitialized pass with the new refs >> with “.DEFFERED_INIT”. >> >> Note, in this initial implementation, >> ** I ONLY implement >> -ftrivial-auto-var-init=zero, the >> implementation of >> -ftrivial-auto-var-init=pattern >> is not done yet. Therefore, the >> performance data is only about >> -ftrivial-auto-var-init=zero. >> >> ** I added an temporary option >> -fauto-var-init-approach=A|B|C|D to >> choose implementation A or D for >> runtime performance study. >> ** I didn’t finish the uninitialized >> warnings maintenance work for D. (That >> might take more time than I expected). >> >> 2. I collected runtime data for CPU2017 >> on a x86 machine with this new gcc for >> the following 3 cases: >> >> no: default. (-g -O2 -march=native ) >> A: default + >> -ftrivial-auto-var-init=zero >> -fauto-var-init-approach=A >> D: default + >> -ftrivial-auto-var-init=zero >> -fauto-var-init-approach=D >> >> And then compute the slowdown data for >> both A and D as following: >> >> benchmarks A / no D /no >> >> 500.perlbench_r 1.25% 1.25% >> 502.gcc_r 0.68% 1.80% >> 505.mcf_r 0.68% 0.14% >> 520.omnetpp_r 4.83% 4.68% >> 523.xalancbmk_r 0.18% 1.96% >> 525.x264_r 1.55% 2.07% >> 531.deepsjeng_ 11.57% 11.85% >> 541.leela_r 0.64% 0.80% >> 557.xz_ -0.41% -0.41% >> >> 507.cactuBSSN_r 0.44% 0.44% >> 508.namd_r 0.34% 0.34% >> 510.parest_r 0.17% 0.25% >> 511.povray_r 56.57% 57.27% >> 519.lbm_r 0.00% 0.00% >> 521.wrf_r -0.28% -0.37% >> 526.blender_r 16.96% 17.71% >> 527.cam4_r 0.70% 0.53% >> 538.imagick_r 2.40% 2.40% >> 544.nab_r 0.00% -0.65% >> >> avg 5.17% 5.37% >> >> From the above data, we can see that in >> general, the runtime performance >> slowdown for >> implementation A and D are similar for >> individual benchmarks. >> >> There are several benchmarks that have >> significant slowdown with the new added >> initialization for both >> A and D, for example, 511.povray_r, >> 526.blender_, and 531.deepsjeng_r, I >> will try to study a little bit >> more on what kind of new initializations >> introduced such slowdown. >> >> From the current study so far, I think >> that approach D should be good enough >> for our final implementation. >> So, I will try to finish approach D with >> the following remaining work >> >> ** complete the implementation of >> -ftrivial-auto-var-init=pattern; >> ** complete the implementation of >> uninitialized warnings maintenance work >> for D. >> >> Let me know if you have any comments and >> suggestions on my current and future >> work. >> >> Thanks a lot for your help. >> >> Qing >> >> On Dec 9, 2020, at 10:18 AM, >> Qing Zhao via Gcc-patches >> <gcc-patches@gcc.gnu.org> >> wrote: >> >> The following are the >> approaches I will implement >> and compare: >> >> Our final goal is to keep >> the uninitialized warning >> and minimize the run-time >> performance cost. >> >> A. Adding real >> initialization during >> gimplification, not maintain >> the uninitialized warnings. >> B. Adding real >> initialization during >> gimplification, marking them >> with “artificial_init”. >> Adjusting uninitialized >> pass, maintaining the >> annotation, making sure the >> real init not >> Deleted from the fake >> init. >> C. Marking the DECL for an >> uninitialized auto variable >> as “no_explicit_init” during >> gimplification, >> maintain this >> “no_explicit_init” bit till >> after >> pass_late_warn_uninitialized, >> or till pass_expand, >> add real initialization >> for all DECLs that are >> marked with >> “no_explicit_init”. >> D. Adding .DEFFERED_INIT >> during gimplification, >> expand the .DEFFERED_INIT >> during expand to >> real initialization. >> Adjusting uninitialized pass >> with the new refs with >> “.DEFFERED_INIT”. >> >> In the above, approach A >> will be the one that have >> the minimum run-time cost, >> will be the base for the >> performance >> comparison. >> >> I will implement approach D >> then, this one is expected >> to have the most run-time >> overhead among the above >> list, but >> Implementation should be the >> cleanest among B, C, D. >> Let’s see how much more >> performance overhead this >> approach >> will be. If the data is >> good, maybe we can avoid the >> effort to implement B, and >> C. >> >> If the performance of D is >> not good, I will implement B >> or C at that time. >> >> Let me know if you have any >> comment or suggestions. >> >> Thanks. >> >> Qing >> >> -- >> Richard Biener <rguenther@suse.de> >> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 >> Nuernberg, >> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-15 16:16 ` Qing Zhao @ 2021-01-15 17:22 ` Richard Biener 2021-01-15 17:57 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2021-01-15 17:22 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On January 15, 2021 5:16:40 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM> wrote: > > >> On Jan 15, 2021, at 2:11 AM, Richard Biener <rguenther@suse.de> >wrote: >> >> >> >> On Thu, 14 Jan 2021, Qing Zhao wrote: >> >>> Hi, >>> More data on code size and compilation time with CPU2017: >>> ********Compilation time data: the numbers are the slowdown >against the >>> default “no”: >>> benchmarks A/no D/no >>> >>> 500.perlbench_r 5.19% 1.95% >>> 502.gcc_r 0.46% -0.23% >>> 505.mcf_r 0.00% 0.00% >>> 520.omnetpp_r 0.85% 0.00% >>> 523.xalancbmk_r 0.79% -0.40% >>> 525.x264_r -4.48% 0.00% >>> 531.deepsjeng_r 16.67% 16.67% >>> 541.leela_r 0.00% 0.00% >>> 557.xz_r 0.00% 0.00% >>> >>> 507.cactuBSSN_r 1.16% 0.58% >>> 508.namd_r 9.62% 8.65% >>> 510.parest_r 0.48% 1.19% >>> 511.povray_r 3.70% 3.70% >>> 519.lbm_r 0.00% 0.00% >>> 521.wrf_r 0.05% 0.02% >>> 526.blender_r 0.33% 1.32% >>> 527.cam4_r -0.93% -0.93% >>> 538.imagick_r 1.32% 3.95% >>> 544.nab_r 0.00% 0.00% >>> From the above data, looks like that the compilation time impact >>> from implementation A and D are almost the same. >>> *******code size data: the numbers are the code size increase >against the >>> default “no”: >>> benchmarks A/no D/no >>> >>> 500.perlbench_r 2.84% 0.34% >>> 502.gcc_r 2.59% 0.35% >>> 505.mcf_r 3.55% 0.39% >>> 520.omnetpp_r 0.54% 0.03% >>> 523.xalancbmk_r 0.36% 0.39% >>> 525.x264_r 1.39% 0.13% >>> 531.deepsjeng_r 2.15% -1.12% >>> 541.leela_r 0.50% -0.20% >>> 557.xz_r 0.31% 0.13% >>> >>> 507.cactuBSSN_r 5.00% -0.01% >>> 508.namd_r 3.64% -0.07% >>> 510.parest_r 1.12% 0.33% >>> 511.povray_r 4.18% 1.16% >>> 519.lbm_r 8.83% 6.44% >>> 521.wrf_r 0.08% 0.02% >>> 526.blender_r 1.63% 0.45% >>> 527.cam4_r 0.16% 0.06% >>> 538.imagick_r 3.18% -0.80% >>> 544.nab_r 5.76% -1.11% >>> Avg 2.52% 0.36% >>> From the above data, the implementation D is always better than A, >it’s a >>> surprising to me, not sure what’s the reason for this. >> >> D probably inhibits most interesting loop transforms (check SPEC FP >> performance). > >The call to .DEFERRED_INIT is marked as ECF_CONST: > >/* A function to represent an artifical initialization to an >uninitialized > automatic variable. The first argument is the variable itself, the > second argument is the initialization type. */ >DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, >NULL) > >So, I assume that such const call should minimize the impact to loop >optimizations. But yes, it will still inhibit some of the loop >transformations. > >> It will also most definitely disallow SRA which, when >> an aggregate is not completely elided, tends to grow code. > >Make sense to me. > >The run-time performance data for D and A are actually very similar as >I posted in the previous email (I listed it here for convenience) > >Run-time performance overhead with A and D: > >benchmarks A / no D /no > >500.perlbench_r 1.25% 1.25% >502.gcc_r 0.68% 1.80% >505.mcf_r 0.68% 0.14% >520.omnetpp_r 4.83% 4.68% >523.xalancbmk_r 0.18% 1.96% >525.x264_r 1.55% 2.07% >531.deepsjeng_ 11.57% 11.85% >541.leela_r 0.64% 0.80% >557.xz_ -0.41% -0.41% > >507.cactuBSSN_r 0.44% 0.44% >508.namd_r 0.34% 0.34% >510.parest_r 0.17% 0.25% >511.povray_r 56.57% 57.27% >519.lbm_r 0.00% 0.00% >521.wrf_r -0.28% -0.37% >526.blender_r 16.96% 17.71% >527.cam4_r 0.70% 0.53% >538.imagick_r 2.40% 2.40% >544.nab_r 0.00% -0.65% > >avg 5.17% 5.37% > >Especially for the SPEC FP benchmarks, I didn’t see too much >performance difference between A and D. >I guess that the RTL optimizations might be enough to get rid of most >of the overhead introduced by the additional initialization. > >> >>> ********stack usage data, I added -fstack-usage to the compilation >line when >>> compiling CPU2017 benchmarks. And all the *.su files were generated >for each >>> of the modules. >>> Since there a lot of such files, and the stack size information are >embedded >>> in each of the files. I just picked up one benchmark 511.povray to >>> check. Which is the one that >>> has the most runtime overhead when adding initialization (both A and >D). >>> I identified all the *.su files that are different between A and D >and do a >>> diff on those *.su files, and looks like that the stack size is much >higher >>> with D than that with A, for example: >>> $ diff build_base_auto_init.D.0000/bbox.su >>> build_base_auto_init.A.0000/bbox.su5c5 >>> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, >>> pov::BBOX_TREE**&, long int*, long int, long int) 160 static >>> --- >>> > bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, >>> pov::BBOX_TREE**&, long int*, long int, long int) 96 static >>> $ diff build_base_auto_init.D.0000/image.su >>> build_base_auto_init.A.0000/image.su >>> 9c9 >>> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, >double*) 624 >>> static >>> --- >>> > image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, >double*) 272 >>> static >>> …. >>> Looks like that implementation D has more stack size impact than A. >>> Do you have any insight on what the reason for this? >> >> D will keep all initialized aggregates as aggregates and live which >> means stack will be allocated for it. With A the usual optimizations >> to reduce stack usage can be applied. > >I checked the routine “poverties::bump_map” in 511.povray_r since it >has a lot stack increase >due to implementation D, by examine the IR immediate before RTL >expansion phase. >(image.cpp.244t.optimized), I found that we have the following >additional statements for the array elements: > >void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double >* normal) >{ >… > double p3[3]; > double p2[3]; > double p1[3]; > float colour3[5]; > float colour2[5]; > float colour1[5]; >… > # DEBUG BEGIN_STMT > colour1 = .DEFERRED_INIT (colour1, 2); > colour2 = .DEFERRED_INIT (colour2, 2); > colour3 = .DEFERRED_INIT (colour3, 2); > # DEBUG BEGIN_STMT > MEM <double> [(double[3] *)&p1] = p1$0_144(D); > MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); > MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); > p1 = .DEFERRED_INIT (p1, 2); > # DEBUG D#12 => MEM <double> [(double[3] *)&p1] > # DEBUG p1$0 => D#12 > # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] > # DEBUG p1$1 => D#11 > # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] > # DEBUG p1$2 => D#10 > MEM <double> [(double[3] *)&p2] = p2$0_109(D); > MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); > MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); > p2 = .DEFERRED_INIT (p2, 2); > # DEBUG D#9 => MEM <double> [(double[3] *)&p2] > # DEBUG p2$0 => D#9 > # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] > # DEBUG p2$1 => D#8 > # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] > # DEBUG p2$2 => D#7 > MEM <double> [(double[3] *)&p3] = p3$0_256(D); > MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); > MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); > p3 = .DEFERRED_INIT (p3, 2); > …. >} > >I guess that the above “MEM <double>….. = …” are the ones that make the >differences. Which phase introduced them? Looks like SRA. But you can just dump all and grep for the first occurrence. >> >>> Let me know if you have any comments and suggestions. >> >> First of all I would check whether the prototype implementations >> work as expected. >I have done such check with small testing cases already, checking the >IR generated with the implementation A or D, mainly >Focus on *.c.006t.gimple. and *.c.*t.expand, all worked as expected. > >For the CPU2017, for example as the above, I also checked the IR for >both A and D, looks like all worked as expected. > >Thanks. > >Qing >> >> Richard. >> >> >>> thanks. >>> Qing >>> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> >>> wrote: >>> >>> On Tue, 12 Jan 2021, Qing Zhao wrote: >>> >>> Hi, >>> >>> Just check in to see whether you have any comments >>> and suggestions on this: >>> >>> FYI, I have been continue with Approach D >>> implementation since last week: >>> >>> D. Adding calls to .DEFFERED_INIT during >>> gimplification, expand the .DEFFERED_INIT during >>> expand to >>> real initialization. Adjusting uninitialized pass >>> with the new refs with “.DEFFERED_INIT”. >>> >>> For the remaining work of Approach D: >>> >>> ** complete the implementation of >>> -ftrivial-auto-var-init=pattern; >>> ** complete the implementation of uninitialized >>> warnings maintenance work for D. >>> >>> I have completed the uninitialized warnings >>> maintenance work for D. >>> And finished partial of the >>> -ftrivial-auto-var-init=pattern implementation. >>> >>> The following are remaining work of Approach D: >>> >>> ** -ftrivial-auto-var-init=pattern for VLA; >>> **add a new attribute for variable: >>> __attribute((uninitialized) >>> the marked variable is uninitialized intentionaly >>> for performance purpose. >>> ** adding complete testing cases; >>> >>> Please let me know if you have any objection on my >>> current decision on implementing approach D. >>> >>> Did you do any analysis on how stack usage and code size are >>> changed >>> with approach D? How does compile-time behave (we could gobble >>> up >>> lots of .DEFERRED_INIT calls I guess)? >>> >>> Richard. >>> >>> Thanks a lot for your help. >>> >>> Qing >>> >>> On Jan 5, 2021, at 1:05 PM, Qing Zhao >>> via Gcc-patches >>> <gcc-patches@gcc.gnu.org> wrote: >>> >>> Hi, >>> >>> This is an update for our previous >>> discussion. >>> >>> 1. I implemented the following two >>> different implementations in the latest >>> upstream gcc: >>> >>> A. Adding real initialization during >>> gimplification, not maintain the >>> uninitialized warnings. >>> >>> D. Adding calls to .DEFFERED_INIT >>> during gimplification, expand the >>> .DEFFERED_INIT during expand to >>> real initialization. Adjusting >>> uninitialized pass with the new refs >>> with “.DEFFERED_INIT”. >>> >>> Note, in this initial implementation, >>> ** I ONLY implement >>> -ftrivial-auto-var-init=zero, the >>> implementation of >>> -ftrivial-auto-var-init=pattern >>> is not done yet. Therefore, the >>> performance data is only about >>> -ftrivial-auto-var-init=zero. >>> >>> ** I added an temporary option >>> -fauto-var-init-approach=A|B|C|D to >>> choose implementation A or D for >>> runtime performance study. >>> ** I didn’t finish the uninitialized >>> warnings maintenance work for D. (That >>> might take more time than I expected). >>> >>> 2. I collected runtime data for CPU2017 >>> on a x86 machine with this new gcc for >>> the following 3 cases: >>> >>> no: default. (-g -O2 -march=native ) >>> A: default + >>> -ftrivial-auto-var-init=zero >>> -fauto-var-init-approach=A >>> D: default + >>> -ftrivial-auto-var-init=zero >>> -fauto-var-init-approach=D >>> >>> And then compute the slowdown data for >>> both A and D as following: >>> >>> benchmarks A / no D /no >>> >>> 500.perlbench_r 1.25% 1.25% >>> 502.gcc_r 0.68% 1.80% >>> 505.mcf_r 0.68% 0.14% >>> 520.omnetpp_r 4.83% 4.68% >>> 523.xalancbmk_r 0.18% 1.96% >>> 525.x264_r 1.55% 2.07% >>> 531.deepsjeng_ 11.57% 11.85% >>> 541.leela_r 0.64% 0.80% >>> 557.xz_ -0.41% -0.41% >>> >>> 507.cactuBSSN_r 0.44% 0.44% >>> 508.namd_r 0.34% 0.34% >>> 510.parest_r 0.17% 0.25% >>> 511.povray_r 56.57% 57.27% >>> 519.lbm_r 0.00% 0.00% >>> 521.wrf_r -0.28% -0.37% >>> 526.blender_r 16.96% 17.71% >>> 527.cam4_r 0.70% 0.53% >>> 538.imagick_r 2.40% 2.40% >>> 544.nab_r 0.00% -0.65% >>> >>> avg 5.17% 5.37% >>> >>> From the above data, we can see that in >>> general, the runtime performance >>> slowdown for >>> implementation A and D are similar for >>> individual benchmarks. >>> >>> There are several benchmarks that have >>> significant slowdown with the new added >>> initialization for both >>> A and D, for example, 511.povray_r, >>> 526.blender_, and 531.deepsjeng_r, I >>> will try to study a little bit >>> more on what kind of new initializations >>> introduced such slowdown. >>> >>> From the current study so far, I think >>> that approach D should be good enough >>> for our final implementation. >>> So, I will try to finish approach D with >>> the following remaining work >>> >>> ** complete the implementation of >>> -ftrivial-auto-var-init=pattern; >>> ** complete the implementation of >>> uninitialized warnings maintenance work >>> for D. >>> >>> Let me know if you have any comments and >>> suggestions on my current and future >>> work. >>> >>> Thanks a lot for your help. >>> >>> Qing >>> >>> On Dec 9, 2020, at 10:18 AM, >>> Qing Zhao via Gcc-patches >>> <gcc-patches@gcc.gnu.org> >>> wrote: >>> >>> The following are the >>> approaches I will implement >>> and compare: >>> >>> Our final goal is to keep >>> the uninitialized warning >>> and minimize the run-time >>> performance cost. >>> >>> A. Adding real >>> initialization during >>> gimplification, not maintain >>> the uninitialized warnings. >>> B. Adding real >>> initialization during >>> gimplification, marking them >>> with “artificial_init”. >>> Adjusting uninitialized >>> pass, maintaining the >>> annotation, making sure the >>> real init not >>> Deleted from the fake >>> init. >>> C. Marking the DECL for an >>> uninitialized auto variable >>> as “no_explicit_init” during >>> gimplification, >>> maintain this >>> “no_explicit_init” bit till >>> after >>> pass_late_warn_uninitialized, >>> or till pass_expand, >>> add real initialization >>> for all DECLs that are >>> marked with >>> “no_explicit_init”. >>> D. Adding .DEFFERED_INIT >>> during gimplification, >>> expand the .DEFFERED_INIT >>> during expand to >>> real initialization. >>> Adjusting uninitialized pass >>> with the new refs with >>> “.DEFFERED_INIT”. >>> >>> In the above, approach A >>> will be the one that have >>> the minimum run-time cost, >>> will be the base for the >>> performance >>> comparison. >>> >>> I will implement approach D >>> then, this one is expected >>> to have the most run-time >>> overhead among the above >>> list, but >>> Implementation should be the >>> cleanest among B, C, D. >>> Let’s see how much more >>> performance overhead this >>> approach >>> will be. If the data is >>> good, maybe we can avoid the >>> effort to implement B, and >>> C. >>> >>> If the performance of D is >>> not good, I will implement B >>> or C at that time. >>> >>> Let me know if you have any >>> comment or suggestions. >>> >>> Thanks. >>> >>> Qing >>> >>> -- >>> Richard Biener <rguenther@suse.de> >>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 >>> Nuernberg, >>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-15 17:22 ` Richard Biener @ 2021-01-15 17:57 ` Qing Zhao 2021-01-18 13:09 ` Richard Sandiford 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-01-15 17:57 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches > On Jan 15, 2021, at 11:22 AM, Richard Biener <rguenther@suse.de> wrote: > > On January 15, 2021 5:16:40 PM GMT+01:00, Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> wrote: >> >> >>> On Jan 15, 2021, at 2:11 AM, Richard Biener <rguenther@suse.de> >> wrote: >>> >>> >>> >>> On Thu, 14 Jan 2021, Qing Zhao wrote: >>> >>>> Hi, >>>> More data on code size and compilation time with CPU2017: >>>> ********Compilation time data: the numbers are the slowdown >> against the >>>> default “no”: >>>> benchmarks A/no D/no >>>> >>>> 500.perlbench_r 5.19% 1.95% >>>> 502.gcc_r 0.46% -0.23% >>>> 505.mcf_r 0.00% 0.00% >>>> 520.omnetpp_r 0.85% 0.00% >>>> 523.xalancbmk_r 0.79% -0.40% >>>> 525.x264_r -4.48% 0.00% >>>> 531.deepsjeng_r 16.67% 16.67% >>>> 541.leela_r 0.00% 0.00% >>>> 557.xz_r 0.00% 0.00% >>>> >>>> 507.cactuBSSN_r 1.16% 0.58% >>>> 508.namd_r 9.62% 8.65% >>>> 510.parest_r 0.48% 1.19% >>>> 511.povray_r 3.70% 3.70% >>>> 519.lbm_r 0.00% 0.00% >>>> 521.wrf_r 0.05% 0.02% >>>> 526.blender_r 0.33% 1.32% >>>> 527.cam4_r -0.93% -0.93% >>>> 538.imagick_r 1.32% 3.95% >>>> 544.nab_r 0.00% 0.00% >>>> From the above data, looks like that the compilation time impact >>>> from implementation A and D are almost the same. >>>> *******code size data: the numbers are the code size increase >> against the >>>> default “no”: >>>> benchmarks A/no D/no >>>> >>>> 500.perlbench_r 2.84% 0.34% >>>> 502.gcc_r 2.59% 0.35% >>>> 505.mcf_r 3.55% 0.39% >>>> 520.omnetpp_r 0.54% 0.03% >>>> 523.xalancbmk_r 0.36% 0.39% >>>> 525.x264_r 1.39% 0.13% >>>> 531.deepsjeng_r 2.15% -1.12% >>>> 541.leela_r 0.50% -0.20% >>>> 557.xz_r 0.31% 0.13% >>>> >>>> 507.cactuBSSN_r 5.00% -0.01% >>>> 508.namd_r 3.64% -0.07% >>>> 510.parest_r 1.12% 0.33% >>>> 511.povray_r 4.18% 1.16% >>>> 519.lbm_r 8.83% 6.44% >>>> 521.wrf_r 0.08% 0.02% >>>> 526.blender_r 1.63% 0.45% >>>> 527.cam4_r 0.16% 0.06% >>>> 538.imagick_r 3.18% -0.80% >>>> 544.nab_r 5.76% -1.11% >>>> Avg 2.52% 0.36% >>>> From the above data, the implementation D is always better than A, >> it’s a >>>> surprising to me, not sure what’s the reason for this. >>> >>> D probably inhibits most interesting loop transforms (check SPEC FP >>> performance). >> >> The call to .DEFERRED_INIT is marked as ECF_CONST: >> >> /* A function to represent an artifical initialization to an >> uninitialized >> automatic variable. The first argument is the variable itself, the >> second argument is the initialization type. */ >> DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, >> NULL) >> >> So, I assume that such const call should minimize the impact to loop >> optimizations. But yes, it will still inhibit some of the loop >> transformations. >> >>> It will also most definitely disallow SRA which, when >>> an aggregate is not completely elided, tends to grow code. >> >> Make sense to me. >> >> The run-time performance data for D and A are actually very similar as >> I posted in the previous email (I listed it here for convenience) >> >> Run-time performance overhead with A and D: >> >> benchmarks A / no D /no >> >> 500.perlbench_r 1.25% 1.25% >> 502.gcc_r 0.68% 1.80% >> 505.mcf_r 0.68% 0.14% >> 520.omnetpp_r 4.83% 4.68% >> 523.xalancbmk_r 0.18% 1.96% >> 525.x264_r 1.55% 2.07% >> 531.deepsjeng_ 11.57% 11.85% >> 541.leela_r 0.64% 0.80% >> 557.xz_ -0.41% -0.41% >> >> 507.cactuBSSN_r 0.44% 0.44% >> 508.namd_r 0.34% 0.34% >> 510.parest_r 0.17% 0.25% >> 511.povray_r 56.57% 57.27% >> 519.lbm_r 0.00% 0.00% >> 521.wrf_r -0.28% -0.37% >> 526.blender_r 16.96% 17.71% >> 527.cam4_r 0.70% 0.53% >> 538.imagick_r 2.40% 2.40% >> 544.nab_r 0.00% -0.65% >> >> avg 5.17% 5.37% >> >> Especially for the SPEC FP benchmarks, I didn’t see too much >> performance difference between A and D. >> I guess that the RTL optimizations might be enough to get rid of most >> of the overhead introduced by the additional initialization. >> >>> >>>> ********stack usage data, I added -fstack-usage to the compilation >> line when >>>> compiling CPU2017 benchmarks. And all the *.su files were generated >> for each >>>> of the modules. >>>> Since there a lot of such files, and the stack size information are >> embedded >>>> in each of the files. I just picked up one benchmark 511.povray to >>>> check. Which is the one that >>>> has the most runtime overhead when adding initialization (both A and >> D). >>>> I identified all the *.su files that are different between A and D >> and do a >>>> diff on those *.su files, and looks like that the stack size is much >> higher >>>> with D than that with A, for example: >>>> $ diff build_base_auto_init.D.0000/bbox.su >>>> build_base_auto_init.A.0000/bbox.su5c5 >>>> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, >>>> pov::BBOX_TREE**&, long int*, long int, long int) 160 static >>>> --- >>>>> bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, >>>> pov::BBOX_TREE**&, long int*, long int, long int) 96 static >>>> $ diff build_base_auto_init.D.0000/image.su >>>> build_base_auto_init.A.0000/image.su >>>> 9c9 >>>> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, >> double*) 624 >>>> static >>>> --- >>>>> image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, >> double*) 272 >>>> static >>>> …. >>>> Looks like that implementation D has more stack size impact than A. >>>> Do you have any insight on what the reason for this? >>> >>> D will keep all initialized aggregates as aggregates and live which >>> means stack will be allocated for it. With A the usual optimizations >>> to reduce stack usage can be applied. >> >> I checked the routine “poverties::bump_map” in 511.povray_r since it >> has a lot stack increase >> due to implementation D, by examine the IR immediate before RTL >> expansion phase. >> (image.cpp.244t.optimized), I found that we have the following >> additional statements for the array elements: >> >> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double >> * normal) >> { >> … >> double p3[3]; >> double p2[3]; >> double p1[3]; >> float colour3[5]; >> float colour2[5]; >> float colour1[5]; >> … >> # DEBUG BEGIN_STMT >> colour1 = .DEFERRED_INIT (colour1, 2); >> colour2 = .DEFERRED_INIT (colour2, 2); >> colour3 = .DEFERRED_INIT (colour3, 2); >> # DEBUG BEGIN_STMT >> MEM <double> [(double[3] *)&p1] = p1$0_144(D); >> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); >> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); >> p1 = .DEFERRED_INIT (p1, 2); >> # DEBUG D#12 => MEM <double> [(double[3] *)&p1] >> # DEBUG p1$0 => D#12 >> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] >> # DEBUG p1$1 => D#11 >> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] >> # DEBUG p1$2 => D#10 >> MEM <double> [(double[3] *)&p2] = p2$0_109(D); >> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); >> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); >> p2 = .DEFERRED_INIT (p2, 2); >> # DEBUG D#9 => MEM <double> [(double[3] *)&p2] >> # DEBUG p2$0 => D#9 >> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] >> # DEBUG p2$1 => D#8 >> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] >> # DEBUG p2$2 => D#7 >> MEM <double> [(double[3] *)&p3] = p3$0_256(D); >> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); >> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); >> p3 = .DEFERRED_INIT (p3, 2); >> …. >> } >> >> I guess that the above “MEM <double>….. = …” are the ones that make the >> differences. Which phase introduced them? > > Looks like SRA. But you can just dump all and grep for the first occurrence. Yes, looks like that SRA is the one: image.cpp.035t.esra: MEM <double> [(double[3] *)&p1] = p1$0_195(D); image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D); image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D); Qing > > >>> >>>> Let me know if you have any comments and suggestions. >>> >>> First of all I would check whether the prototype implementations >>> work as expected. >> I have done such check with small testing cases already, checking the >> IR generated with the implementation A or D, mainly >> Focus on *.c.006t.gimple. and *.c.*t.expand, all worked as expected. >> >> For the CPU2017, for example as the above, I also checked the IR for >> both A and D, looks like all worked as expected. >> >> Thanks. >> >> Qing >>> >>> Richard. >>> >>> >>>> thanks. >>>> Qing >>>> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de> >>>> wrote: >>>> >>>> On Tue, 12 Jan 2021, Qing Zhao wrote: >>>> >>>> Hi, >>>> >>>> Just check in to see whether you have any comments >>>> and suggestions on this: >>>> >>>> FYI, I have been continue with Approach D >>>> implementation since last week: >>>> >>>> D. Adding calls to .DEFFERED_INIT during >>>> gimplification, expand the .DEFFERED_INIT during >>>> expand to >>>> real initialization. Adjusting uninitialized pass >>>> with the new refs with “.DEFFERED_INIT”. >>>> >>>> For the remaining work of Approach D: >>>> >>>> ** complete the implementation of >>>> -ftrivial-auto-var-init=pattern; >>>> ** complete the implementation of uninitialized >>>> warnings maintenance work for D. >>>> >>>> I have completed the uninitialized warnings >>>> maintenance work for D. >>>> And finished partial of the >>>> -ftrivial-auto-var-init=pattern implementation. >>>> >>>> The following are remaining work of Approach D: >>>> >>>> ** -ftrivial-auto-var-init=pattern for VLA; >>>> **add a new attribute for variable: >>>> __attribute((uninitialized) >>>> the marked variable is uninitialized intentionaly >>>> for performance purpose. >>>> ** adding complete testing cases; >>>> >>>> Please let me know if you have any objection on my >>>> current decision on implementing approach D. >>>> >>>> Did you do any analysis on how stack usage and code size are >>>> changed >>>> with approach D? How does compile-time behave (we could gobble >>>> up >>>> lots of .DEFERRED_INIT calls I guess)? >>>> >>>> Richard. >>>> >>>> Thanks a lot for your help. >>>> >>>> Qing >>>> >>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao >>>> via Gcc-patches >>>> <gcc-patches@gcc.gnu.org> wrote: >>>> >>>> Hi, >>>> >>>> This is an update for our previous >>>> discussion. >>>> >>>> 1. I implemented the following two >>>> different implementations in the latest >>>> upstream gcc: >>>> >>>> A. Adding real initialization during >>>> gimplification, not maintain the >>>> uninitialized warnings. >>>> >>>> D. Adding calls to .DEFFERED_INIT >>>> during gimplification, expand the >>>> .DEFFERED_INIT during expand to >>>> real initialization. Adjusting >>>> uninitialized pass with the new refs >>>> with “.DEFFERED_INIT”. >>>> >>>> Note, in this initial implementation, >>>> ** I ONLY implement >>>> -ftrivial-auto-var-init=zero, the >>>> implementation of >>>> -ftrivial-auto-var-init=pattern >>>> is not done yet. Therefore, the >>>> performance data is only about >>>> -ftrivial-auto-var-init=zero. >>>> >>>> ** I added an temporary option >>>> -fauto-var-init-approach=A|B|C|D to >>>> choose implementation A or D for >>>> runtime performance study. >>>> ** I didn’t finish the uninitialized >>>> warnings maintenance work for D. (That >>>> might take more time than I expected). >>>> >>>> 2. I collected runtime data for CPU2017 >>>> on a x86 machine with this new gcc for >>>> the following 3 cases: >>>> >>>> no: default. (-g -O2 -march=native ) >>>> A: default + >>>> -ftrivial-auto-var-init=zero >>>> -fauto-var-init-approach=A >>>> D: default + >>>> -ftrivial-auto-var-init=zero >>>> -fauto-var-init-approach=D >>>> >>>> And then compute the slowdown data for >>>> both A and D as following: >>>> >>>> benchmarks A / no D /no >>>> >>>> 500.perlbench_r 1.25% 1.25% >>>> 502.gcc_r 0.68% 1.80% >>>> 505.mcf_r 0.68% 0.14% >>>> 520.omnetpp_r 4.83% 4.68% >>>> 523.xalancbmk_r 0.18% 1.96% >>>> 525.x264_r 1.55% 2.07% >>>> 531.deepsjeng_ 11.57% 11.85% >>>> 541.leela_r 0.64% 0.80% >>>> 557.xz_ -0.41% -0.41% >>>> >>>> 507.cactuBSSN_r 0.44% 0.44% >>>> 508.namd_r 0.34% 0.34% >>>> 510.parest_r 0.17% 0.25% >>>> 511.povray_r 56.57% 57.27% >>>> 519.lbm_r 0.00% 0.00% >>>> 521.wrf_r -0.28% -0.37% >>>> 526.blender_r 16.96% 17.71% >>>> 527.cam4_r 0.70% 0.53% >>>> 538.imagick_r 2.40% 2.40% >>>> 544.nab_r 0.00% -0.65% >>>> >>>> avg 5.17% 5.37% >>>> >>>> From the above data, we can see that in >>>> general, the runtime performance >>>> slowdown for >>>> implementation A and D are similar for >>>> individual benchmarks. >>>> >>>> There are several benchmarks that have >>>> significant slowdown with the new added >>>> initialization for both >>>> A and D, for example, 511.povray_r, >>>> 526.blender_, and 531.deepsjeng_r, I >>>> will try to study a little bit >>>> more on what kind of new initializations >>>> introduced such slowdown. >>>> >>>> From the current study so far, I think >>>> that approach D should be good enough >>>> for our final implementation. >>>> So, I will try to finish approach D with >>>> the following remaining work >>>> >>>> ** complete the implementation of >>>> -ftrivial-auto-var-init=pattern; >>>> ** complete the implementation of >>>> uninitialized warnings maintenance work >>>> for D. >>>> >>>> Let me know if you have any comments and >>>> suggestions on my current and future >>>> work. >>>> >>>> Thanks a lot for your help. >>>> >>>> Qing >>>> >>>> On Dec 9, 2020, at 10:18 AM, >>>> Qing Zhao via Gcc-patches >>>> <gcc-patches@gcc.gnu.org> >>>> wrote: >>>> >>>> The following are the >>>> approaches I will implement >>>> and compare: >>>> >>>> Our final goal is to keep >>>> the uninitialized warning >>>> and minimize the run-time >>>> performance cost. >>>> >>>> A. Adding real >>>> initialization during >>>> gimplification, not maintain >>>> the uninitialized warnings. >>>> B. Adding real >>>> initialization during >>>> gimplification, marking them >>>> with “artificial_init”. >>>> Adjusting uninitialized >>>> pass, maintaining the >>>> annotation, making sure the >>>> real init not >>>> Deleted from the fake >>>> init. >>>> C. Marking the DECL for an >>>> uninitialized auto variable >>>> as “no_explicit_init” during >>>> gimplification, >>>> maintain this >>>> “no_explicit_init” bit till >>>> after >>>> pass_late_warn_uninitialized, >>>> or till pass_expand, >>>> add real initialization >>>> for all DECLs that are >>>> marked with >>>> “no_explicit_init”. >>>> D. Adding .DEFFERED_INIT >>>> during gimplification, >>>> expand the .DEFFERED_INIT >>>> during expand to >>>> real initialization. >>>> Adjusting uninitialized pass >>>> with the new refs with >>>> “.DEFFERED_INIT”. >>>> >>>> In the above, approach A >>>> will be the one that have >>>> the minimum run-time cost, >>>> will be the base for the >>>> performance >>>> comparison. >>>> >>>> I will implement approach D >>>> then, this one is expected >>>> to have the most run-time >>>> overhead among the above >>>> list, but >>>> Implementation should be the >>>> cleanest among B, C, D. >>>> Let’s see how much more >>>> performance overhead this >>>> approach >>>> will be. If the data is >>>> good, maybe we can avoid the >>>> effort to implement B, and >>>> C. >>>> >>>> If the performance of D is >>>> not good, I will implement B >>>> or C at that time. >>>> >>>> Let me know if you have any >>>> comment or suggestions. >>>> >>>> Thanks. >>>> >>>> Qing >>>> >>>> -- >>>> Richard Biener <rguenther@suse.de> >>>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 >>>> Nuernberg, >>>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-15 17:57 ` Qing Zhao @ 2021-01-18 13:09 ` Richard Sandiford 2021-01-18 16:12 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Sandiford @ 2021-01-18 13:09 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Biener, Richard Biener via Gcc-patches Qing Zhao <QING.ZHAO@ORACLE.COM> writes: >>>> D will keep all initialized aggregates as aggregates and live which >>>> means stack will be allocated for it. With A the usual optimizations >>>> to reduce stack usage can be applied. >>> >>> I checked the routine “poverties::bump_map” in 511.povray_r since it >>> has a lot stack increase >>> due to implementation D, by examine the IR immediate before RTL >>> expansion phase. >>> (image.cpp.244t.optimized), I found that we have the following >>> additional statements for the array elements: >>> >>> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double >>> * normal) >>> { >>> … >>> double p3[3]; >>> double p2[3]; >>> double p1[3]; >>> float colour3[5]; >>> float colour2[5]; >>> float colour1[5]; >>> … >>> # DEBUG BEGIN_STMT >>> colour1 = .DEFERRED_INIT (colour1, 2); >>> colour2 = .DEFERRED_INIT (colour2, 2); >>> colour3 = .DEFERRED_INIT (colour3, 2); >>> # DEBUG BEGIN_STMT >>> MEM <double> [(double[3] *)&p1] = p1$0_144(D); >>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); >>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); >>> p1 = .DEFERRED_INIT (p1, 2); >>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1] >>> # DEBUG p1$0 => D#12 >>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] >>> # DEBUG p1$1 => D#11 >>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] >>> # DEBUG p1$2 => D#10 >>> MEM <double> [(double[3] *)&p2] = p2$0_109(D); >>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); >>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); >>> p2 = .DEFERRED_INIT (p2, 2); >>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2] >>> # DEBUG p2$0 => D#9 >>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] >>> # DEBUG p2$1 => D#8 >>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] >>> # DEBUG p2$2 => D#7 >>> MEM <double> [(double[3] *)&p3] = p3$0_256(D); >>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); >>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); >>> p3 = .DEFERRED_INIT (p3, 2); >>> …. >>> } >>> >>> I guess that the above “MEM <double>….. = …” are the ones that make the >>> differences. Which phase introduced them? >> >> Looks like SRA. But you can just dump all and grep for the first occurrence. > > Yes, looks like that SRA is the one: > > image.cpp.035t.esra: MEM <double> [(double[3] *)&p1] = p1$0_195(D); > image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D); > image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D); I realise no-one was suggesting otherwise, but FWIW: SRA could easily be extended to handle .DEFERRED_INIT if that's the main source of excess stack usage. A single .DEFERRED_INIT of an aggregate can be split into .DEFERRED_INITs of individual components. In other words, the investigation you're doing looks like the right way of deciding which passes are worth extending to handle .DEFERRED_INIT. Thanks, Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-18 13:09 ` Richard Sandiford @ 2021-01-18 16:12 ` Qing Zhao 2021-02-01 19:12 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-01-18 16:12 UTC (permalink / raw) To: Richard Sandiford; +Cc: Richard Biener, Richard Biener via Gcc-patches > On Jan 18, 2021, at 7:09 AM, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Qing Zhao <QING.ZHAO@ORACLE.COM> writes: >>>>> D will keep all initialized aggregates as aggregates and live which >>>>> means stack will be allocated for it. With A the usual optimizations >>>>> to reduce stack usage can be applied. >>>> >>>> I checked the routine “poverties::bump_map” in 511.povray_r since it >>>> has a lot stack increase >>>> due to implementation D, by examine the IR immediate before RTL >>>> expansion phase. >>>> (image.cpp.244t.optimized), I found that we have the following >>>> additional statements for the array elements: >>>> >>>> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double >>>> * normal) >>>> { >>>> … >>>> double p3[3]; >>>> double p2[3]; >>>> double p1[3]; >>>> float colour3[5]; >>>> float colour2[5]; >>>> float colour1[5]; >>>> … >>>> # DEBUG BEGIN_STMT >>>> colour1 = .DEFERRED_INIT (colour1, 2); >>>> colour2 = .DEFERRED_INIT (colour2, 2); >>>> colour3 = .DEFERRED_INIT (colour3, 2); >>>> # DEBUG BEGIN_STMT >>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D); >>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); >>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); >>>> p1 = .DEFERRED_INIT (p1, 2); >>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1] >>>> # DEBUG p1$0 => D#12 >>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] >>>> # DEBUG p1$1 => D#11 >>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] >>>> # DEBUG p1$2 => D#10 >>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D); >>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); >>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); >>>> p2 = .DEFERRED_INIT (p2, 2); >>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2] >>>> # DEBUG p2$0 => D#9 >>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] >>>> # DEBUG p2$1 => D#8 >>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] >>>> # DEBUG p2$2 => D#7 >>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D); >>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); >>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); >>>> p3 = .DEFERRED_INIT (p3, 2); >>>> …. >>>> } >>>> >>>> I guess that the above “MEM <double>….. = …” are the ones that make the >>>> differences. Which phase introduced them? >>> >>> Looks like SRA. But you can just dump all and grep for the first occurrence. >> >> Yes, looks like that SRA is the one: >> >> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1] = p1$0_195(D); >> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D); >> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D); > > I realise no-one was suggesting otherwise, but FWIW: SRA could easily > be extended to handle .DEFERRED_INIT if that's the main source of > excess stack usage. A single .DEFERRED_INIT of an aggregate can > be split into .DEFERRED_INITs of individual components. Thanks a lot for the suggestion, I will study the code of SRA to see how to do this and then see whether this can resolve the issue. > > In other words, the investigation you're doing looks like the right way > of deciding which passes are worth extending to handle .DEFERRED_INIT. Yes, with the study so far, looks like the major issue with the .DERERRED_INIT approach is the stack size increase. Hopefully after resolving this issue, we will be done. Qing > > Thanks, > Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-01-18 16:12 ` Qing Zhao @ 2021-02-01 19:12 ` Qing Zhao 2021-02-02 7:43 ` Richard Biener 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-02-01 19:12 UTC (permalink / raw) To: Richard Sandiford; +Cc: Richard Biener via Gcc-patches, Richard Biener Hi, Richard, I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion. And now the routine “bump_map” in 511.povray is like following: ... # DEBUG BEGIN_STMT xcoor = 0.0; ycoor = 0.0; # DEBUG BEGIN_STMT index = .DEFERRED_INIT (index, 2); index2 = .DEFERRED_INIT (index2, 2); index3 = .DEFERRED_INIT (index3, 2); # DEBUG BEGIN_STMT colour1 = .DEFERRED_INIT (colour1, 2); colour2 = .DEFERRED_INIT (colour2, 2); colour3 = .DEFERRED_INIT (colour3, 2); # DEBUG BEGIN_STMT p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2); # DEBUG p1$0 => p1$0_181 p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2); # DEBUG p1$1 => p1$1_184 p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2); # DEBUG p1$2 => p1$2_172 p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2); # DEBUG p2$0 => p2$0_177 p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2); # DEBUG p2$1 => p2$1_135 p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2); # DEBUG p2$2 => p2$2_137 p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2); # DEBUG p3$0 => p3$0_377 p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2); # DEBUG p3$1 => p3$1_379 p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2); # DEBUG p3$2 => p3$2_381 In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are: Approach A Approach D-old Approach D-new 272 624 368 From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. However, looks like that the stack size for D is still bigger than A. I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D): (Due to the call to: colour1 = .DEFERRED_INIT (colour1, 2); ) ******Approach A: Points_to analysis: Constraints: … colour1 = &NULL … colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL ... callarg(53) = &colour1 ... _53 = colour1 Points_to sets: … colour1 = { NULL ESCAPED NONLOCAL } same as _53 ... CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 } CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48) ... callarg(53) = { NULL ESCAPED NONLOCAL colour1 } ******Apprach D: Points_to analysis: Constraints: … callarg(19) = colour1 callarg(19) = &NONLOCAL colour1 = callarg(19) + UNKNOWN colour1 = &NONLOCAL … colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL … callarg(74) = &colour1 callarg(74) = callarg(74) + UNKNOWN callarg(74) = *callarg(74) + UNKNOWN … _53 = colour1 _54 = _53 _55 = _54 + UNKNOWN _55 = &NONLOCAL _56 = colour1 _57 = _56 _58 = _57 + UNKNOWN _58 = &NONLOCAL _59 = _55 + UNKNOWN _59 = _58 + UNKNOWN _60 = colour1 _61 = _60 _62 = _61 + UNKNOWN _62 = &NONLOCAL _63 = _59 + UNKNOWN _63 = _62 + UNKNOWN _64 = _63 + UNKNOWN .. Points_to set: … colour1 = { ESCAPED NONLOCAL } same as callarg(19) … CALLUSED(69) = { ESCAPED NONLOCAL index colour1 } CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69) callarg(71) = { ESCAPED NONLOCAL } callarg(72) = { ESCAPED NONLOCAL } callarg(73) = { ESCAPED NONLOCAL } callarg(74) = { ESCAPED NONLOCAL colour1 } My question: Is it possible to adjust alias analysis to resolve this issue? thanks. Qing > On Jan 18, 2021, at 10:12 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > >>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it >>>>> has a lot stack increase >>>>> due to implementation D, by examine the IR immediate before RTL >>>>> expansion phase. >>>>> (image.cpp.244t.optimized), I found that we have the following >>>>> additional statements for the array elements: >>>>> >>>>> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double >>>>> * normal) >>>>> { >>>>> … >>>>> double p3[3]; >>>>> double p2[3]; >>>>> double p1[3]; >>>>> float colour3[5]; >>>>> float colour2[5]; >>>>> float colour1[5]; >>>>> … >>>>> # DEBUG BEGIN_STMT >>>>> colour1 = .DEFERRED_INIT (colour1, 2); >>>>> colour2 = .DEFERRED_INIT (colour2, 2); >>>>> colour3 = .DEFERRED_INIT (colour3, 2); >>>>> # DEBUG BEGIN_STMT >>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D); >>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); >>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); >>>>> p1 = .DEFERRED_INIT (p1, 2); >>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1] >>>>> # DEBUG p1$0 => D#12 >>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] >>>>> # DEBUG p1$1 => D#11 >>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] >>>>> # DEBUG p1$2 => D#10 >>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D); >>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); >>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); >>>>> p2 = .DEFERRED_INIT (p2, 2); >>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2] >>>>> # DEBUG p2$0 => D#9 >>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] >>>>> # DEBUG p2$1 => D#8 >>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] >>>>> # DEBUG p2$2 => D#7 >>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D); >>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); >>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); >>>>> p3 = .DEFERRED_INIT (p3, 2); >>>>> …. >>>>> } >>>>> >>>>> I guess that the above “MEM <double>….. = …” are the ones that make the >>>>> differences. Which phase introduced them? >>>> >>>> Looks like SRA. But you can just dump all and grep for the first occurrence. >>> >>> Yes, looks like that SRA is the one: >>> >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1] = p1$0_195(D); >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D); >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D); >> >> I realise no-one was suggesting otherwise, but FWIW: SRA could easily >> be extended to handle .DEFERRED_INIT if that's the main source of >> excess stack usage. A single .DEFERRED_INIT of an aggregate can >> be split into .DEFERRED_INITs of individual components. > > Thanks a lot for the suggestion, > I will study the code of SRA to see how to do this and then see whether this can resolve the issue. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-02-01 19:12 ` Qing Zhao @ 2021-02-02 7:43 ` Richard Biener 2021-02-02 15:17 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Richard Biener @ 2021-02-02 7:43 UTC (permalink / raw) To: Qing Zhao; +Cc: Richard Sandiford, Richard Biener via Gcc-patches On Mon, 1 Feb 2021, Qing Zhao wrote: > Hi, Richard, > > I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion. > > And now the routine “bump_map” in 511.povray is like following: > ... > > # DEBUG BEGIN_STMT > xcoor = 0.0; > ycoor = 0.0; > # DEBUG BEGIN_STMT > index = .DEFERRED_INIT (index, 2); > index2 = .DEFERRED_INIT (index2, 2); > index3 = .DEFERRED_INIT (index3, 2); > # DEBUG BEGIN_STMT > colour1 = .DEFERRED_INIT (colour1, 2); > colour2 = .DEFERRED_INIT (colour2, 2); > colour3 = .DEFERRED_INIT (colour3, 2); > # DEBUG BEGIN_STMT > p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2); > # DEBUG p1$0 => p1$0_181 > p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2); > # DEBUG p1$1 => p1$1_184 > p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2); > # DEBUG p1$2 => p1$2_172 > p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2); > # DEBUG p2$0 => p2$0_177 > p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2); > # DEBUG p2$1 => p2$1_135 > p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2); > # DEBUG p2$2 => p2$2_137 > p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2); > # DEBUG p3$0 => p3$0_377 > p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2); > # DEBUG p3$1 => p3$1_379 > p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2); > # DEBUG p3$2 => p3$2_381 > > > In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. > > With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are: > > Approach A Approach D-old Approach D-new > > 272 624 368 > > From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. > > However, looks like that the stack size for D is still bigger than A. > > I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D): > > (Due to the call to: > > colour1 = .DEFERRED_INIT (colour1, 2); > ) > > ******Approach A: > > Points_to analysis: > > Constraints: > … > colour1 = &NULL > … > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > ... > callarg(53) = &colour1 > ... > _53 = colour1 > > Points_to sets: > … > colour1 = { NULL ESCAPED NONLOCAL } same as _53 > ... > CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 } > CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48) > ... > callarg(53) = { NULL ESCAPED NONLOCAL colour1 } > > ******Apprach D: > > Points_to analysis: > > Constraints: > … > callarg(19) = colour1 > callarg(19) = &NONLOCAL > colour1 = callarg(19) + UNKNOWN > colour1 = &NONLOCAL > … > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > colour1 = &NONLOCAL > … > callarg(74) = &colour1 > callarg(74) = callarg(74) + UNKNOWN > callarg(74) = *callarg(74) + UNKNOWN > … > _53 = colour1 > _54 = _53 > _55 = _54 + UNKNOWN > _55 = &NONLOCAL > _56 = colour1 > _57 = _56 > _58 = _57 + UNKNOWN > _58 = &NONLOCAL > _59 = _55 + UNKNOWN > _59 = _58 + UNKNOWN > _60 = colour1 > _61 = _60 > _62 = _61 + UNKNOWN > _62 = &NONLOCAL > _63 = _59 + UNKNOWN > _63 = _62 + UNKNOWN > _64 = _63 + UNKNOWN > .. > Points_to set: > … > colour1 = { ESCAPED NONLOCAL } same as callarg(19) > … > CALLUSED(69) = { ESCAPED NONLOCAL index colour1 } > CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69) > callarg(71) = { ESCAPED NONLOCAL } > callarg(72) = { ESCAPED NONLOCAL } > callarg(73) = { ESCAPED NONLOCAL } > callarg(74) = { ESCAPED NONLOCAL colour1 } > > My question: > > Is it possible to adjust alias analysis to resolve this issue? You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c find_func_aliases_for_call (it's not a builtin but you can look in the respective subroutine for examples). Specifically you want to avoid making anything escaped or clobbered. > thanks. > > Qing > > > On Jan 18, 2021, at 10:12 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > >>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it > >>>>> has a lot stack increase > >>>>> due to implementation D, by examine the IR immediate before RTL > >>>>> expansion phase. > >>>>> (image.cpp.244t.optimized), I found that we have the following > >>>>> additional statements for the array elements: > >>>>> > >>>>> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double > >>>>> * normal) > >>>>> { > >>>>> … > >>>>> double p3[3]; > >>>>> double p2[3]; > >>>>> double p1[3]; > >>>>> float colour3[5]; > >>>>> float colour2[5]; > >>>>> float colour1[5]; > >>>>> … > >>>>> # DEBUG BEGIN_STMT > >>>>> colour1 = .DEFERRED_INIT (colour1, 2); > >>>>> colour2 = .DEFERRED_INIT (colour2, 2); > >>>>> colour3 = .DEFERRED_INIT (colour3, 2); > >>>>> # DEBUG BEGIN_STMT > >>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D); > >>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); > >>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); > >>>>> p1 = .DEFERRED_INIT (p1, 2); > >>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1] > >>>>> # DEBUG p1$0 => D#12 > >>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] > >>>>> # DEBUG p1$1 => D#11 > >>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] > >>>>> # DEBUG p1$2 => D#10 > >>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D); > >>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); > >>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); > >>>>> p2 = .DEFERRED_INIT (p2, 2); > >>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2] > >>>>> # DEBUG p2$0 => D#9 > >>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] > >>>>> # DEBUG p2$1 => D#8 > >>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] > >>>>> # DEBUG p2$2 => D#7 > >>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D); > >>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); > >>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); > >>>>> p3 = .DEFERRED_INIT (p3, 2); > >>>>> …. > >>>>> } > >>>>> > >>>>> I guess that the above “MEM <double>….. = …” are the ones that make the > >>>>> differences. Which phase introduced them? > >>>> > >>>> Looks like SRA. But you can just dump all and grep for the first occurrence. > >>> > >>> Yes, looks like that SRA is the one: > >>> > >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1] = p1$0_195(D); > >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D); > >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D); > >> > >> I realise no-one was suggesting otherwise, but FWIW: SRA could easily > >> be extended to handle .DEFERRED_INIT if that's the main source of > >> excess stack usage. A single .DEFERRED_INIT of an aggregate can > >> be split into .DEFERRED_INITs of individual components. > > > > Thanks a lot for the suggestion, > > I will study the code of SRA to see how to do this and then see whether this can resolve the issue. > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-02-02 7:43 ` Richard Biener @ 2021-02-02 15:17 ` Qing Zhao 2021-02-02 23:32 ` Qing Zhao 0 siblings, 1 reply; 56+ messages in thread From: Qing Zhao @ 2021-02-02 15:17 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches > On Feb 2, 2021, at 1:43 AM, Richard Biener <rguenther@suse.de> wrote: > > On Mon, 1 Feb 2021, Qing Zhao wrote: > >> Hi, Richard, >> >> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion. >> >> And now the routine “bump_map” in 511.povray is like following: >> ... >> >> # DEBUG BEGIN_STMT >> xcoor = 0.0; >> ycoor = 0.0; >> # DEBUG BEGIN_STMT >> index = .DEFERRED_INIT (index, 2); >> index2 = .DEFERRED_INIT (index2, 2); >> index3 = .DEFERRED_INIT (index3, 2); >> # DEBUG BEGIN_STMT >> colour1 = .DEFERRED_INIT (colour1, 2); >> colour2 = .DEFERRED_INIT (colour2, 2); >> colour3 = .DEFERRED_INIT (colour3, 2); >> # DEBUG BEGIN_STMT >> p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2); >> # DEBUG p1$0 => p1$0_181 >> p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2); >> # DEBUG p1$1 => p1$1_184 >> p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2); >> # DEBUG p1$2 => p1$2_172 >> p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2); >> # DEBUG p2$0 => p2$0_177 >> p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2); >> # DEBUG p2$1 => p2$1_135 >> p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2); >> # DEBUG p2$2 => p2$2_137 >> p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2); >> # DEBUG p3$0 => p3$0_377 >> p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2); >> # DEBUG p3$1 => p3$1_379 >> p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2); >> # DEBUG p3$2 => p3$2_381 >> >> >> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. >> >> With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are: >> >> Approach A Approach D-old Approach D-new >> >> 272 624 368 >> >> From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. >> >> However, looks like that the stack size for D is still bigger than A. >> >> I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D): >> >> (Due to the call to: >> >> colour1 = .DEFERRED_INIT (colour1, 2); >> ) >> >> ******Approach A: >> >> Points_to analysis: >> >> Constraints: >> … >> colour1 = &NULL >> … >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> ... >> callarg(53) = &colour1 >> ... >> _53 = colour1 >> >> Points_to sets: >> … >> colour1 = { NULL ESCAPED NONLOCAL } same as _53 >> ... >> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 } >> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48) >> ... >> callarg(53) = { NULL ESCAPED NONLOCAL colour1 } >> >> ******Apprach D: >> >> Points_to analysis: >> >> Constraints: >> … >> callarg(19) = colour1 >> callarg(19) = &NONLOCAL >> colour1 = callarg(19) + UNKNOWN >> colour1 = &NONLOCAL >> … >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> … >> callarg(74) = &colour1 >> callarg(74) = callarg(74) + UNKNOWN >> callarg(74) = *callarg(74) + UNKNOWN >> … >> _53 = colour1 >> _54 = _53 >> _55 = _54 + UNKNOWN >> _55 = &NONLOCAL >> _56 = colour1 >> _57 = _56 >> _58 = _57 + UNKNOWN >> _58 = &NONLOCAL >> _59 = _55 + UNKNOWN >> _59 = _58 + UNKNOWN >> _60 = colour1 >> _61 = _60 >> _62 = _61 + UNKNOWN >> _62 = &NONLOCAL >> _63 = _59 + UNKNOWN >> _63 = _62 + UNKNOWN >> _64 = _63 + UNKNOWN >> .. >> Points_to set: >> … >> colour1 = { ESCAPED NONLOCAL } same as callarg(19) >> … >> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 } >> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69) >> callarg(71) = { ESCAPED NONLOCAL } >> callarg(72) = { ESCAPED NONLOCAL } >> callarg(73) = { ESCAPED NONLOCAL } >> callarg(74) = { ESCAPED NONLOCAL colour1 } >> >> My question: >> >> Is it possible to adjust alias analysis to resolve this issue? > > You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c > find_func_aliases_for_call (it's not a builtin but you can look in > the respective subroutine for examples). Specifically you want to > avoid making anything escaped or clobbered. Okay, thanks. Will check on that. Qing >> > > -- > Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de>> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init 2021-02-02 15:17 ` Qing Zhao @ 2021-02-02 23:32 ` Qing Zhao 0 siblings, 0 replies; 56+ messages in thread From: Qing Zhao @ 2021-02-02 23:32 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Sandiford, Richard Biener via Gcc-patches Hi, With the following patch: [qinzhao@localhost gcc]$ git diff tree-ssa-structalias.c diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index cf653be..bd18841 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -4851,6 +4851,30 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t) return false; } +static void +find_func_aliases_for_deferred_init (gcall *t) +{ + + tree lhsop = gimple_call_lhs (t); + enum auto_init_type init_type + = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (t, 1)); + auto_vec<ce_s, 2> lhsc; + auto_vec<ce_s, 4> rhsc; + struct constraint_expr temp; + + get_constraint_for (lhsop, &lhsc); + if (init_type == AUTO_INIT_ZERO && flag_delete_null_pointer_checks) + temp.var = nothing_id; + else + temp.var = nonlocal_id; + temp.type = ADDRESSOF; + temp.offset = 0; + rhsc.safe_push (temp); + + process_all_all_constraints (lhsc, rhsc); + return; +} + /* Create constraints for the call T. */ static void @@ -4864,6 +4888,12 @@ find_func_aliases_for_call (struct function *fn, gcall *t) && find_func_aliases_for_builtin_call (fn, t)) return; + if (gimple_call_internal_p (t, IFN_DEFERRED_INIT)) + { + find_func_aliases_for_deferred_init (t); + return; + } + The *.ealias dump for the routine “bump_map” are exactly the same for approach A and D. However, the stack size for D still bigger than A. Any suggestions? Qing On Feb 2, 2021, at 9:17 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > >> On Feb 2, 2021, at 1:43 AM, Richard Biener <rguenther@suse.de> wrote: >> >> On Mon, 1 Feb 2021, Qing Zhao wrote: >> >>> Hi, Richard, >>> >>> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion. >>> >>> And now the routine “bump_map” in 511.povray is like following: >>> ... >>> >>> # DEBUG BEGIN_STMT >>> xcoor = 0.0; >>> ycoor = 0.0; >>> # DEBUG BEGIN_STMT >>> index = .DEFERRED_INIT (index, 2); >>> index2 = .DEFERRED_INIT (index2, 2); >>> index3 = .DEFERRED_INIT (index3, 2); >>> # DEBUG BEGIN_STMT >>> colour1 = .DEFERRED_INIT (colour1, 2); >>> colour2 = .DEFERRED_INIT (colour2, 2); >>> colour3 = .DEFERRED_INIT (colour3, 2); >>> # DEBUG BEGIN_STMT >>> p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2); >>> # DEBUG p1$0 => p1$0_181 >>> p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2); >>> # DEBUG p1$1 => p1$1_184 >>> p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2); >>> # DEBUG p1$2 => p1$2_172 >>> p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2); >>> # DEBUG p2$0 => p2$0_177 >>> p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2); >>> # DEBUG p2$1 => p2$1_135 >>> p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2); >>> # DEBUG p2$2 => p2$2_137 >>> p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2); >>> # DEBUG p3$0 => p3$0_377 >>> p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2); >>> # DEBUG p3$1 => p3$1_379 >>> p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2); >>> # DEBUG p3$2 => p3$2_381 >>> >>> >>> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. >>> >>> With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are: >>> >>> Approach A Approach D-old Approach D-new >>> >>> 272 624 368 >>> >>> From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. >>> >>> However, looks like that the stack size for D is still bigger than A. >>> >>> I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D): >>> >>> (Due to the call to: >>> >>> colour1 = .DEFERRED_INIT (colour1, 2); >>> ) >>> >>> ******Approach A: >>> >>> Points_to analysis: >>> >>> Constraints: >>> … >>> colour1 = &NULL >>> … >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> ... >>> callarg(53) = &colour1 >>> ... >>> _53 = colour1 >>> >>> Points_to sets: >>> … >>> colour1 = { NULL ESCAPED NONLOCAL } same as _53 >>> ... >>> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 } >>> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48) >>> ... >>> callarg(53) = { NULL ESCAPED NONLOCAL colour1 } >>> >>> ******Apprach D: >>> >>> Points_to analysis: >>> >>> Constraints: >>> … >>> callarg(19) = colour1 >>> callarg(19) = &NONLOCAL >>> colour1 = callarg(19) + UNKNOWN >>> colour1 = &NONLOCAL >>> … >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> … >>> callarg(74) = &colour1 >>> callarg(74) = callarg(74) + UNKNOWN >>> callarg(74) = *callarg(74) + UNKNOWN >>> … >>> _53 = colour1 >>> _54 = _53 >>> _55 = _54 + UNKNOWN >>> _55 = &NONLOCAL >>> _56 = colour1 >>> _57 = _56 >>> _58 = _57 + UNKNOWN >>> _58 = &NONLOCAL >>> _59 = _55 + UNKNOWN >>> _59 = _58 + UNKNOWN >>> _60 = colour1 >>> _61 = _60 >>> _62 = _61 + UNKNOWN >>> _62 = &NONLOCAL >>> _63 = _59 + UNKNOWN >>> _63 = _62 + UNKNOWN >>> _64 = _63 + UNKNOWN >>> .. >>> Points_to set: >>> … >>> colour1 = { ESCAPED NONLOCAL } same as callarg(19) >>> … >>> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 } >>> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69) >>> callarg(71) = { ESCAPED NONLOCAL } >>> callarg(72) = { ESCAPED NONLOCAL } >>> callarg(73) = { ESCAPED NONLOCAL } >>> callarg(74) = { ESCAPED NONLOCAL colour1 } >>> >>> My question: >>> >>> Is it possible to adjust alias analysis to resolve this issue? >> >> You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c >> find_func_aliases_for_call (it's not a builtin but you can look in >> the respective subroutine for examples). Specifically you want to >> avoid making anything escaped or clobbered. > > Okay, thanks. > > Will check on that. > > Qing >>> >> >> -- >> Richard Biener <rguenther@suse.de <mailto:rguenther@suse.de> <mailto:rguenther@suse.de <mailto:rguenther@suse.de>>> >> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, >> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: How to traverse all the local variables that declared in the current routine? 2020-12-04 8:50 ` Richard Biener 2020-12-04 16:19 ` Qing Zhao @ 2020-12-07 17:21 ` Richard Sandiford 1 sibling, 0 replies; 56+ messages in thread From: Richard Sandiford @ 2020-12-07 17:21 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Biener via Gcc-patches, Qing Zhao Richard Biener <richard.guenther@gmail.com> writes: > On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote: >> >> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as: >> >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> >> unsigned decl_is_initialized :1; >> >> >> >> /* IN VAR_DECL, set when the decl is initialized at the declaration. */ >> >> #define DECL_IS_INITIALIZED(NODE) \ >> >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >> >> >> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep it >> >> even though DECL_INITIAL might be NULLed. >> > >> > For locals it would be more reliable to set this flag during gimplification. >> > >> >> Do you have any comment and suggestions? >> > >> > As said above - do you want to cover registers as well as locals? I'd do >> > the actual zeroing during RTL expansion instead since otherwise you >> > have to figure youself whether a local is actually used (see expand_stack_vars) >> > >> > Note that optimization will already made have use of "uninitialized" state >> > of locals so depending on what the actual goal is here "late" may be too late. >> >> Haven't thought about this much, so it might be a daft idea, but would a >> compromise be to use a const internal function: >> >> X1 = .DEFERRED_INIT (X0, INIT) >> >> where the X0 argument is an uninitialised value and the INIT argument >> describes the initialisation pattern? So for a decl we'd have: >> >> X = .DEFERRED_INIT (X, INIT) >> >> and for an SSA name we'd have: >> >> X_2 = .DEFERRED_INIT (X_1(D), INIT) >> >> with all other uses of X_1(D) being replaced by X_2. The idea is that: >> >> * Having the X0 argument would keep the uninitialised use of the >> variable around for the later warning passes. >> >> * Using a const function should still allow the UB to be deleted as dead >> if X1 isn't needed. >> >> * Having a function in the way should stop passes from taking advantage >> of direct uninitialised uses for optimisation. >> >> This means we won't be able to optimise based on the actual init >> value at the gimple level, but that seems like a fair trade-off. >> AIUI this is really a security feature or anti-UB hardening feature >> (in the sense that users are more likely to see predictable behaviour >> “in the field” even if the program has UB). > > The question is whether it's in line of peoples expectation that > explicitely zero-initialized code behaves differently from > implicitely zero-initialized code with respect to optimization > and secondary side-effects (late diagnostics, latent bugs, etc.). From my understanding, that's OK. I don't think this option is like -g, which is supposed to have no observable effect other than adding or removing debug info. It's OK for implicit zero initialisation to be slower than explicit zero initialisation. After all, if someone actively wants something to be initialised to zero, they're still expected to do it in the source code. The implicit initalisation is just a safety net. Similarly, I think it's OK that code won't be optimised identically with and without .DEFERRED_INIT (or whatever other mechanism we use), and so won't provide identical late warnings. In both cases we should just do our best to diagnose what we can. > Btw, I don't think theres any reason to cling onto clangs semantics > for a particular switch. We'll never be able to emulate 1:1 behavior > and our -Wuninit behavior is probably wastly different already. Yeah, this isn't about trying to match compilers diagnostic-for-diagnostic. It's more about matching them principle-for-principle. Thanks, Richard ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2021-02-02 23:32 UTC | newest] Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-23 23:05 How to traverse all the local variables that declared in the current routine? Qing Zhao 2020-11-24 7:32 ` Richard Biener 2020-11-24 15:47 ` Qing Zhao 2020-11-24 15:55 ` Richard Biener 2020-11-24 16:54 ` Qing Zhao 2020-11-25 9:11 ` Richard Biener 2020-11-25 17:41 ` Qing Zhao 2020-12-01 19:47 ` Qing Zhao 2020-12-02 8:45 ` Richard Biener 2020-12-02 15:36 ` Qing Zhao 2020-12-03 8:45 ` Richard Biener 2020-12-03 16:07 ` Qing Zhao 2020-12-03 16:36 ` Richard Biener 2020-12-03 16:40 ` Qing Zhao 2020-12-03 16:56 ` Richard Sandiford 2020-11-26 0:08 ` Martin Sebor 2020-11-30 16:23 ` Qing Zhao 2020-11-30 17:18 ` Martin Sebor 2020-11-30 23:05 ` Qing Zhao 2020-12-03 17:32 ` Richard Sandiford 2020-12-03 23:04 ` Qing Zhao 2020-12-04 8:50 ` Richard Biener 2020-12-04 16:19 ` Qing Zhao 2020-12-07 7:12 ` Richard Biener 2020-12-07 16:20 ` Qing Zhao 2020-12-07 17:10 ` Richard Sandiford 2020-12-07 17:36 ` Qing Zhao 2020-12-07 18:05 ` Richard Sandiford 2020-12-07 18:34 ` Qing Zhao 2020-12-08 7:35 ` Richard Biener 2020-12-08 7:40 ` Richard Biener 2020-12-08 19:54 ` Qing Zhao 2020-12-09 8:23 ` Richard Biener 2020-12-09 15:04 ` Qing Zhao 2020-12-09 15:12 ` Richard Biener 2020-12-09 16:18 ` Qing Zhao 2021-01-05 19:05 ` The performance data for two different implementation of new security feature -ftrivial-auto-var-init Qing Zhao 2021-01-05 19:10 ` Qing Zhao 2021-01-12 20:34 ` Qing Zhao 2021-01-13 7:39 ` Richard Biener 2021-01-13 15:06 ` Qing Zhao 2021-01-13 15:10 ` Richard Biener 2021-01-13 15:35 ` Qing Zhao 2021-01-13 15:40 ` Richard Biener 2021-01-14 21:16 ` Qing Zhao 2021-01-15 8:11 ` Richard Biener 2021-01-15 16:16 ` Qing Zhao 2021-01-15 17:22 ` Richard Biener 2021-01-15 17:57 ` Qing Zhao 2021-01-18 13:09 ` Richard Sandiford 2021-01-18 16:12 ` Qing Zhao 2021-02-01 19:12 ` Qing Zhao 2021-02-02 7:43 ` Richard Biener 2021-02-02 15:17 ` Qing Zhao 2021-02-02 23:32 ` Qing Zhao 2020-12-07 17:21 ` How to traverse all the local variables that declared in the current routine? Richard Sandiford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).