From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by sourceware.org (Postfix) with ESMTPS id 2902E386103F for ; Mon, 30 Nov 2020 23:05:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2902E386103F Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0AUN5qRG088248; Mon, 30 Nov 2020 23:05:52 GMT Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 353egkfs1t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 30 Nov 2020 23:05:51 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0AUMuDhP147120; Mon, 30 Nov 2020 23:05:51 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 3540ar95kk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Nov 2020 23:05:50 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0AUN5nRH020073; Mon, 30 Nov 2020 23:05:50 GMT Received: from dhcp-10-154-168-119.vpn.oracle.com (/10.154.168.119) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 30 Nov 2020 15:05:49 -0800 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: How to traverse all the local variables that declared in the current routine? From: Qing Zhao In-Reply-To: <08c7d0ff-468b-994f-1299-6e76eb137b43@gmail.com> Date: Mon, 30 Nov 2020 17:05:48 -0600 Cc: Richard Biener , Richard Sandiford , gcc Patches Content-Transfer-Encoding: quoted-printable Message-Id: <5C8DCC08-7926-457F-90F0-B05F62D3ECFD@ORACLE.COM> References: <217BE64F-A623-4453-B45B-D38B66B71B72@ORACLE.COM> <15EA64C7-D75F-4CE1-92C8-6940186A512A@ORACLE.COM> <29CCC07E-6203-46BA-8D35-8FF7F55563A2@ORACLE.COM> <08c7d0ff-468b-994f-1299-6e76eb137b43@gmail.com> To: Martin Sebor X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9821 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 bulkscore=0 phishscore=0 mlxscore=0 adultscore=0 malwarescore=0 suspectscore=3 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011300142 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9821 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 bulkscore=0 suspectscore=3 phishscore=0 mlxlogscore=999 lowpriorityscore=0 malwarescore=0 priorityscore=1501 spamscore=0 impostorscore=0 clxscore=1015 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011300143 X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Nov 2020 23:05:58 -0000 On Nov 30, 2020, at 11:18 AM, Martin Sebor wrote: >>>>>>>> Does gcc provide an iterator to traverse all the local = variables that are declared in the current routine? >>>>>>>>=20 >>>>>>>> If not, what=E2=80=99s the best way to traverse the local = variables? >>>>>>>=20 >>>>>>> Depends on what for. There's the source level view you get by = walking >>>>>>> BLOCK_VARS of the >>>>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) = and >>>>>>> there's SSA names >>>>>>> (FOR_EACH_SSA_NAME). >>>>>>=20 >>>>>> I am planing to add a new phase immediately after = =E2=80=9Cpass_late_warn_uninitialized=E2=80=9D to initialize all = auto-variables that are >>>>>> not explicitly initialized in the declaration, the basic idea is = following: >>>>>>=20 >>>>>> ** The proposal: >>>>>>=20 >>>>>> A. add a new GCC option: (same name and meaning as CLANG) >>>>>> -ftrivial-auto-var-init=3D[pattern|zero], similar pattern init as = CLANG; >>>>>>=20 >>>>>> B. add a new attribute for variable: >>>>>> __attribute((uninitialized) >>>>>> the marked variable is uninitialized intentionaly for performance = purpose. >>>>>>=20 >>>>>> C. The implementation needs to keep the current static warning on = uninitialized >>>>>> variables untouched in order to avoid "forking the language". >>>>>>=20 >>>>>>=20 >>>>>> ** The implementation: >>>>>>=20 >>>>>> There are two major requirements for the implementation: >>>>>>=20 >>>>>> 1. all auto-variables that do not have an explicit initializer = should be initialized to >>>>>> zero by this option. (Same behavior as CLANG) >>>>>>=20 >>>>>> 2. keep the current static warning on uninitialized variables = untouched. >>>>>>=20 >>>>>> In order to satisfy 1, we should check whether an auto-variable = has initializer >>>>>> or not; >>>>>> In order to satisfy 2, we should add this new transformation = after >>>>>> "pass_late_warn_uninitialized". >>>>>>=20 >>>>>> So, we should be able to check whether an auto-variable has = initializer or not after =E2=80=9Cpass_late_warn_uninitialized=E2=80=9D, >>>>>> If Not, then insert an initialization for it. >>>>>>=20 >>>>>> For this purpose, I guess that =E2=80=9CFOR_EACH_LOCAL_DECL=E2=80=9D= might be better? >>>>>=20 >>>>> Yes, but do you want to catch variables promoted to register as = well >>>>> or just variables >>>>> on the stack? >>>> I think both as long as they are source-level auto-variables. Then = which one is better? >>>>>=20 >>>>>> Another issue is, in order to check whether an auto-variable has = initializer, I plan to add a new bit in =E2=80=9Cdecl_common=E2=80=9D = as: >>>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>>>> unsigned decl_is_initialized :1; >>>>>>=20 >>>>>> /* IN VAR_DECL, set when the decl is initialized at the = declaration. */ >>>>>> #define DECL_IS_INITIALIZED(NODE) \ >>>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>>>>=20 >>>>>> set this bit when setting DECL_INITIAL for the variables in FE. = then keep it >>>>>> even though DECL_INITIAL might be NULLed. >>>>>=20 >>>>> For locals it would be more reliable to set this flag during = gimplification. >>>> You mean I can set the flag =E2=80=9CDECL_IS_INITIALIZED (decl)=E2=80= =9D inside the routine =E2=80=9Cgimpley_decl_expr=E2=80=9D (gimplify.c) = as following: >>>> if (VAR_P (decl) && !DECL_EXTERNAL (decl)) >>>> { >>>> tree init =3D DECL_INITIAL (decl); >>>> ... >>>> if (init && init !=3D error_mark_node) >>>> { >>>> if (!TREE_STATIC (decl)) >>>> { >>>> DECL_IS_INITIALIZED(decl) =3D 1; >>>> } >>>> Is this enough for all Frontends? Are there other places that I = need to maintain this bit? >>>>>=20 >>>>>> Do you have any comment and suggestions? >>>>>=20 >>>>> As said above - do you want to cover registers as well as locals? >>>> All the locals from the source-code point of view should be = covered. (=46rom my study so far, looks like that Clang adds that = phase in FE). >>>> If GCC adds this phase in FE, then the following design requirement >>>> C. The implementation needs to keep the current static warning on = uninitialized >>>> variables untouched in order to avoid "forking the language=E2=80=9D.= >>>> cannot be satisfied. Since gcc=E2=80=99s uninitialized variables = analysis is applied quite late. >>>> So, we have to add this new phase after = =E2=80=9Cpass_late_warn_uninitialized=E2=80=9D. >>>>> I'd do >>>>> the actual zeroing during RTL expansion instead since otherwise = you >>>>> have to figure youself whether a local is actually used (see = expand_stack_vars) >>>> Adding this new transformation during RTL expansion is okay. I = will check on this in more details to see how to add it to RTL expansion = phase. >>>>>=20 >>>>> Note that optimization will already made have use of = "uninitialized" state >>>>> of locals so depending on what the actual goal is here "late" may = be too late. >>>> This is a really good point=E2=80=A6 >>>> In order to avoid optimization to use the =E2=80=9Cuninitialized=E2=80= =9D state of locals, we should add the zeroing phase as early as = possible (adding it in FE might be best >>>> for this issue). However, if we have to met the following = requirement: >>>> C. The implementation needs to keep the current static warning on = uninitialized >>>> variables untouched in order to avoid "forking the language=E2=80=9D.= >>>> We have to move the new phase after all the uninitialized analysis = is done in order to avoid =E2=80=9Cforking the language=E2=80=9D. >>>> So, this is a problem that is not easy to resolve. >>>> Do you have suggestion on this? >>>=20 >>> Not having thought about it very long or hard I'd be tempted to do >>> it the other way around. For each use of an uninitialized variable >>> found, first either issue or queue up a -Wuninitialized for it and >>> then initialize it. Then (if queued) at some later point, issue >>> the queued up -Wuninitialized. The last part would be done in >>> tree-ssa-uninit.c where the remaining uses of uninitialized >>> variables would trigger warnings and induce their initialization >>> (if there were any left). >> The major issue with this approach is: >> There are two passes for uninitialized variable analysis: >> pass_early_warn_uninitialized >> pass_late_warn_uninitialized >> The early pass is placed at the very beginning of the tree optimizer. = But the late pass is placed at the very late stage of the tree = optimizer. >> If we add the initializations at the early pass, the result of the = late pass will be changed by the new added initializations. This does = not meet >> the requirement. >> Do I miss anything here? >=20 > I'm not sure. As I said, I'd consider issuing (or queuing up for > issuing later) -Wuninitialized at the same time as initializing > the uninitialized variables. I have considered this approach in the very beginning of my study, but = later I realized that it would not work. For example, for the following small example: qinzhao@gcc10:~/Bugs/auto-init$ cat t1.c void blah(int); int foo_2 (int n, int l, int m, int r) { int v; if ( (n < 10) && (m !=3D 100) && (r < 20) ) v =3D r; if (l > 100) if ( (n <=3D 8) && (m < 102) && (r < 19) ) blah(v); /* { dg-warning "uninitialized" "real warning" } */ return 0; } With the latest gcc and the following options: qinzhao@gcc10:~/Bugs/auto-init$ = /home/qinzhao/Install/latest_write/bin/gcc -Wuninitialized = -Wmaybe-uninitialized -S t1.c qinzhao@gcc10:~/Bugs/auto-init$=20 We can see that there is no any uninitialized warning issued by the = latest gcc if no optimization is specified. But for this case, It=E2=80=99s clear that we should insert a zero initializer for = auto-variable =E2=80=9Cv=E2=80=9D even though the current uninitialized = variable analysis pass is not able to determine =E2=80=9Cv=E2=80=9D is not initialized in some = execution paths.=20 The above is just a simple example to show that we cannot rely on the = result of the uninitialized variable analysis pass to decide which variable should be initialized.=20 For security purpose, we should conservatively initialize all = auto-variables that might not be initialized. i.e, for all the = auto-variables that=20 do not have an explicit initializer in source code level, we should = insert initializer for them.=20 This is the current behavior of LLVM with -ftrivial-auto-var-init=3Dzero = -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang.=20= I believe that GCC should do the same thing for the security benefit.=20 > With that approach I'd expect to > diagnose all the same instances of uninitialized uses as the two > passes do today (actually, I'd expect to diagnose more of them, > including those Richard referred to above whose uninitialized > state may have been made use of for optimization decisions(*)). > Also with this approach the two existing warning passes would > cease to serve their current purpose of hunting down uninitialized > variables because by the time they ran all their uses would have > been initialized (and warnings issued). Inserting the zero-initializer before pass_late_warn_uninitialized will = invalid the current uninitialized variable analysis, which is = unacceptable based on my current understanding.=20 > One question in my mind is what to do with -Wmaybe-uninitialized. > Should those also be initialized, even though they're not necessarily > used? Or are you only hoping to tackle -Wuninitialized? All the auto-variables that might not be initialized should be = initialized with the new option.=20 The decision on which auto-variable should be initialized should based = on the source code level initializer: If an auto-variable does not have a source code level initializer, the = compiler should add a zero-initializer=20 for it.=20 Qing >=20 > Martin >=20 > [*] With the initialization approach I'd expect concerns about > the cost of losing those optimization opportunities. Although > those could be addressed by making the initialization optional > (i.e., opt-in).