From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by sourceware.org (Postfix) with ESMTPS id 25D18386F802 for ; Wed, 9 Dec 2020 16:18:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 25D18386F802 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B9G9cW9134323; Wed, 9 Dec 2020 16:18:34 GMT Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 35825m91hg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 09 Dec 2020 16:18:34 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B9GAiLh111866; Wed, 9 Dec 2020 16:18:33 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 358kyuwq1j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 09 Dec 2020 16:18:33 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0B9GIVG7015991; Wed, 9 Dec 2020 16:18:32 GMT Received: from dhcp-10-39-205-147.vpn.oracle.com (/10.39.205.147) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 09 Dec 2020 08:18:31 -0800 From: Qing Zhao Message-Id: <9127AAB9-92C8-4A1B-BAD5-2F5F8762DCF9@ORACLE.COM> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: How to traverse all the local variables that declared in the current routine? Date: Wed, 9 Dec 2020 10:18:30 -0600 In-Reply-To: Cc: Richard Sandiford , Richard Biener via Gcc-patches To: Richard Biener References: <217BE64F-A623-4453-B45B-D38B66B71B72@ORACLE.COM> <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM> <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM> <9585CBB2-0082-4B9A-AC75-250F54F0797C@ORACLE.COM> <51911859-45D5-4566-B588-F828B9D7313B@ORACLE.COM> X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 spamscore=0 mlxscore=0 malwarescore=0 suspectscore=0 mlxlogscore=999 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012090113 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 adultscore=0 bulkscore=0 phishscore=0 mlxlogscore=999 clxscore=1015 priorityscore=1501 mlxscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012090113 X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Dec 2020 16:18:43 -0000 > On Dec 9, 2020, at 9:12 AM, Richard Biener = wrote: >=20 > On Wed, Dec 9, 2020 at 4:04 PM Qing Zhao > wrote: >>=20 >>=20 >>=20 >> On Dec 9, 2020, at 2:23 AM, Richard Biener = wrote: >>=20 >> On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao = wrote: >>=20 >>=20 >>=20 >>=20 >> On Dec 8, 2020, at 1:40 AM, Richard Biener = wrote: >>=20 >> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao = wrote: >>=20 >>=20 >>=20 >>=20 >> On Dec 7, 2020, at 1:12 AM, Richard Biener = wrote: >>=20 >> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao = wrote: >>=20 >>=20 >>=20 >>=20 >> On Dec 4, 2020, at 2:50 AM, Richard Biener = wrote: >>=20 >> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford >> wrote: >>=20 >>=20 >> Richard Biener via Gcc-patches writes: >>=20 >> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao = wrote: >>=20 >> Another issue is, in order to check whether an auto-variable has = initializer, I plan to add a new bit in =E2=80=9Cdecl_common=E2=80=9D = as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >>=20 >> /* IN VAR_DECL, set when the decl is initialized at the declaration. = */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>=20 >> set this bit when setting DECL_INITIAL for the variables in FE. then = keep it >> even though DECL_INITIAL might be NULLed. >>=20 >>=20 >> For locals it would be more reliable to set this flag during = gimplification. >>=20 >> Do you have any comment and suggestions? >>=20 >>=20 >> As said above - do you want to cover registers as well as locals? = I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see = expand_stack_vars) >>=20 >> Note that optimization will already made have use of "uninitialized" = state >> of locals so depending on what the actual goal is here "late" may be = too late. >>=20 >>=20 >> Haven't thought about this much, so it might be a daft idea, but = would a >> compromise be to use a const internal function: >>=20 >> X1 =3D .DEFERRED_INIT (X0, INIT) >>=20 >> where the X0 argument is an uninitialised value and the INIT argument >> describes the initialisation pattern? So for a decl we'd have: >>=20 >> X =3D .DEFERRED_INIT (X, INIT) >>=20 >> and for an SSA name we'd have: >>=20 >> X_2 =3D .DEFERRED_INIT (X_1(D), INIT) >>=20 >> with all other uses of X_1(D) being replaced by X_2. The idea is = that: >>=20 >> * Having the X0 argument would keep the uninitialised use of the >> variable around for the later warning passes. >>=20 >> * Using a const function should still allow the UB to be deleted as = dead >> if X1 isn't needed. >>=20 >> * Having a function in the way should stop passes from taking = advantage >> of direct uninitialised uses for optimisation. >>=20 >> This means we won't be able to optimise based on the actual init >> value at the gimple level, but that seems like a fair trade-off. >> AIUI this is really a security feature or anti-UB hardening feature >> (in the sense that users are more likely to see predictable behaviour >> =E2=80=9Cin the field=E2=80=9D even if the program has UB). >>=20 >>=20 >> The question is whether it's in line of peoples expectation that >> explicitely zero-initialized code behaves differently from >> implicitely zero-initialized code with respect to optimization >> and secondary side-effects (late diagnostics, latent bugs, etc.). >>=20 >> Introducing a new concept like .DEFERRED_INIT is much more >> heavy-weight than an explicit zero initializer. >>=20 >>=20 >> What exactly you mean by =E2=80=9Cheavy-weight=E2=80=9D? More = difficult to implement or much more run-time overhead or both? Or = something else? >>=20 >> The major benefit of the approach of =E2=80=9C.DEFERRED_INIT=E2=80=9D = is to enable us keep the current -Wuninitialized analysis untouched and = also pass >> the =E2=80=9Cuninitialized=E2=80=9D info from source code level to = =E2=80=9Cpass_expand=E2=80=9D. >>=20 >>=20 >> Well, "untouched" is a bit oversimplified. You do need to handle >> .DEFERRED_INIT as not >> being an initialization which will definitely get interesting. >>=20 >>=20 >> Yes, during uninitialized variable analysis pass, we should specially = handle the defs with =E2=80=9C.DEFERRED_INIT=E2=80=9D, to treat them as = uninitializations. >>=20 >> If we want to keep the current -Wuninitialized analysis untouched, = this is a quite reasonable approach. >>=20 >> However, if it=E2=80=99s not required to keep the current = -Wuninitialized analysis untouched, adding zero-initializer directly = during gimplification should >> be much easier and simpler, and also smaller run-time overhead. >>=20 >>=20 >> As for optimization I fear you'll get a load of redundant zero-init >> actually emitted if you can just rely on RTL DSE/DCE to remove it. >>=20 >>=20 >> Runtime overhead for -fauto-init=3Dzero is one important = consideration for the whole feature, we should minimize the runtime = overhead for zero >> Initialization since it will be used in production build. >> We can do some run-time performance evaluation when we have an = implementation ready. >>=20 >>=20 >> Note there will be other passes "confused" by .DEFERRED_INIT. Note >> that there's going to be other >> considerations - namely where to emit the .DEFERRED_INIT - when >> emitting it during gimplification >> you can emit it at the start of the block of block-scope variables. >> When emitting after gimplification >> you have to emit at function start which will probably make stack = slot >> sharing inefficient because >> the deferred init will cause overlapping lifetimes. With emitting at >> block boundary the .DEFERRED_INIT >> will act as code-motion barrier (and it itself likely cannot be = moved) >> so for example invariant motion >> will no longer happen. Likewise optimizations like SRA will be >> confused by .DEFERRED_INIT which >> again will lead to bigger stack usage (and less optimization). >>=20 >>=20 >> Yes, looks like that the inserted =E2=80=9C.DEFERRED_INIT=E2=80=9D = function calls will negatively impact tree optimizations. >>=20 >>=20 >> But sure, you can try implement a few variants but definitely >> .DEFERRED_INIT will be the most >> work. >>=20 >>=20 >> How about implement the following two approaches and compare the = run-time cost: >>=20 >> A. Insert the real initialization during gimplification phase. >> B. Insert the .DEFERRED_INIT during gimplification phase, and then = expand this call to real initialization during expand phase. >>=20 >> The Approach A will have less run-time overhead, but will mess up the = current uninitialized variable analysis in GCC. >> The Approach B will have more run-time overhead, but will keep the = current uninitialized variable analysis in GCC. >>=20 >> And then decide which approach we will go with? >>=20 >> What=E2=80=99s your opinion on this? >>=20 >>=20 >> Well, in the end you have to try. Note for the purpose of stack slot >> sharing you do want the >> instrumentation to happen during gimplification. >>=20 >> Another possibility is to materialize .DEFERRED_INIT earlier than >> expand, for example shortly >> after IPA optimizations to avoid pessimizing loop transforms and = allow >> SRA. At the point you >> materialize the inits you could run the late uninit warning pass >> (which would then be earlier >> than regular but would still see the .DEFERRED_INIT). >>=20 >>=20 >> If we put the =E2=80=9Cmaterializing .DEFERRED_INIT=E2=80=9D phase = earlier as you suggested above, >> the late uninitialized warning pass has to be moved earlier in order = to utilize the =E2=80=9C.DEFERRED_INIT=E2=80=9D. >> Then we might miss some opportunities for the late uninitialized = warning. I think that this is not we really >> want. >>=20 >>=20 >> While users may be happy to pay some performance stack usage is >> probably more critical >>=20 >>=20 >> So, which pass is for computing the stack usage? >>=20 >>=20 >> There is no pass doing that, stack slot assignment and sharing (when >> lifetimes do >> not overlap) is done by RTL expansion. >>=20 >>=20 >> Okay. I see. >>=20 >>=20 >> (just thinking of the kernel) so not regressing there should be as >> important as preserving >> uninit warnings (which I for practical purposes see not important at >> all - people can do >> "debug" builds without -fzero-init). >>=20 >>=20 >> Looks like that the major issue with the =E2=80=9C.DERERRED_INIT=E2=80=9D= approach is: the new inserted calls to internal const function >> might inhibit some important tree optimizations. >>=20 >> So, I am thinking again the following another approach I raised in = the very beginning: >>=20 >> During gimplification phase, mark the DECL for an auto variable = without initialization as =E2=80=9Cno_explicit_init=E2=80=9D, then = maintain this >> =E2=80=9Cno_explicit_init=E2=80=9D bit till after = pass_late_warn_uninitialized, or till pass_expand, add = zero-iniitiazation for all DECLs that are >> marked with =E2=80=9Cno_explicit_init=E2=80=9D. >>=20 >> This approach will not have the issue to interrupt tree = optimizations, however, I guess that =E2=80=9Cmaintaining this = =E2=80=9Cno_explicit_init=E2=80=9D bit >> might be very difficult? >>=20 >> Do you have any comments on this approach? >>=20 >>=20 >> As said earlier you'll still get optimistic propagation bypassing the >> still missing >> implicit zero init. Maybe that's OK - you don't get "garbage" but = you'll get >> some other defined value. >>=20 >>=20 >> There is another approach: >>=20 >> During gimplification phase, adding the real initialization to the = uninitialized variables, but mark these initializations as = =E2=80=9Cartificial_init=E2=80=9D. >> Then update the uninitialized analysis phase to handle these = initializations marked with =E2=80=9Cartificial_init=E2=80=9D specially = as Non-initialization to >> keep the uninitialized warnings. >>=20 >> Then we should be able to get the maximum optimization and also keep = the uninitialized warning at the same time. >>=20 >> Microsoft compiler seems used this approach: = (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-me= mory-on-windows/) >>=20 >> " >> Does InitAll Break Static Analysis? >>=20 >> Static analysis is incredibly useful at letting developers know they = forgot to initialize before use. >>=20 >> The InitAll feature indicates if a variable assignment was caused by = InitAll to both PREfast and the compiler backend (both of which have = uninitialized warnings). This allows the analysis tools to ignore = InitAll variable assignments for the purposes of these warnings. With = InitAll enabled, a developer will still receive static analysis warnings = if they forget to initialize a variable even if InitAll forcibly = initializes it for them. >>=20 >> =E2=80=9C >>=20 >> Any comment on this? >=20 > You have to try. Bits to implement are adjusting the uninit pass and > maining the annotation > as well as making sure to not elide the real init because there's a > 'fake' init (we have redundant > store elimination which works in this direction for example just to = name one). Okay, I see.=20 >=20 > Richard. >=20 >> As said, you have to implement a few options and compare. >>=20 >>=20 >> Yes, I will do that, just make sure which approaches we should = implement and compare first. The following are the approaches I will implement and compare: Our final goal is to keep the uninitialized warning and minimize the = run-time performance cost. A. Adding real initialization during gimplification, not maintain the = uninitialized warnings. B. Adding real initialization during gimplification, marking them with = =E2=80=9Cartificial_init=E2=80=9D.=20 Adjusting uninitialized pass, maintaining the annotation, making = sure the real init not Deleted from the fake init.=20 C. Marking the DECL for an uninitialized auto variable as = =E2=80=9Cno_explicit_init=E2=80=9D during gimplification, maintain this =E2=80=9Cno_explicit_init=E2=80=9D bit till after = pass_late_warn_uninitialized, or till pass_expand,=20 add real initialization for all DECLs that are marked with = =E2=80=9Cno_explicit_init=E2=80=9D. D. Adding .DEFFERED_INIT during gimplification, expand the = .DEFFERED_INIT during expand to real initialization. Adjusting uninitialized pass with the new refs = with =E2=80=9C.DEFFERED_INIT=E2=80=9D. In the above, approach A will be the one that have the minimum run-time = cost, will be the base for the performance comparison.=20 I will implement approach D then, this one is expected to have the most = run-time overhead among the above list, but Implementation should be the cleanest among B, C, D. Let=E2=80=99s see = how much more performance overhead this approach will be. If the data is good, maybe we can avoid the effort to implement = B, and C.=20 If the performance of D is not good, I will implement B or C at that = time. Let me know if you have any comment or suggestions. Thanks. Qing