From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by sourceware.org (Postfix) with ESMTPS id 021FD386F013 for ; Tue, 8 Dec 2020 19:54:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 021FD386F013 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8JntiQ124421; Tue, 8 Dec 2020 19:54:33 GMT Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 3581mqvq5t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 19:54:32 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8JomG7123424; Tue, 8 Dec 2020 19:54:31 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 358ksp1xpr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 19:54:31 +0000 Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0B8JsUxI013236; Tue, 8 Dec 2020 19:54:31 GMT Received: from dhcp-10-154-100-168.vpn.oracle.com (/10.154.100.168) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 11:54:30 -0800 From: Qing Zhao Message-Id: <9585CBB2-0082-4B9A-AC75-250F54F0797C@ORACLE.COM> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: How to traverse all the local variables that declared in the current routine? Date: Tue, 8 Dec 2020 13:54:29 -0600 In-Reply-To: Cc: Richard Biener via Gcc-patches To: Richard Biener , Richard Sandiford References: <217BE64F-A623-4453-B45B-D38B66B71B72@ORACLE.COM> <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM> <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM> X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 suspectscore=4 bulkscore=0 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080122 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 mlxlogscore=999 clxscore=1015 malwarescore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080122 X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Dec 2020 19:54:43 -0000 > On Dec 8, 2020, at 1:40 AM, Richard Biener = wrote: >=20 > On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao > wrote: >>=20 >>=20 >>=20 >> On Dec 7, 2020, at 1:12 AM, Richard Biener = wrote: >>=20 >> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao = wrote: >>=20 >>=20 >>=20 >>=20 >> On Dec 4, 2020, at 2:50 AM, Richard Biener = wrote: >>=20 >> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford >> wrote: >>=20 >>=20 >> Richard Biener via Gcc-patches writes: >>=20 >> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao = wrote: >>=20 >> Another issue is, in order to check whether an auto-variable has = initializer, I plan to add a new bit in =E2=80=9Cdecl_common=E2=80=9D = as: >> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >> unsigned decl_is_initialized :1; >>=20 >> /* IN VAR_DECL, set when the decl is initialized at the declaration. = */ >> #define DECL_IS_INITIALIZED(NODE) \ >> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>=20 >> set this bit when setting DECL_INITIAL for the variables in FE. then = keep it >> even though DECL_INITIAL might be NULLed. >>=20 >>=20 >> For locals it would be more reliable to set this flag during = gimplification. >>=20 >> Do you have any comment and suggestions? >>=20 >>=20 >> As said above - do you want to cover registers as well as locals? = I'd do >> the actual zeroing during RTL expansion instead since otherwise you >> have to figure youself whether a local is actually used (see = expand_stack_vars) >>=20 >> Note that optimization will already made have use of "uninitialized" = state >> of locals so depending on what the actual goal is here "late" may be = too late. >>=20 >>=20 >> Haven't thought about this much, so it might be a daft idea, but = would a >> compromise be to use a const internal function: >>=20 >> X1 =3D .DEFERRED_INIT (X0, INIT) >>=20 >> where the X0 argument is an uninitialised value and the INIT argument >> describes the initialisation pattern? So for a decl we'd have: >>=20 >> X =3D .DEFERRED_INIT (X, INIT) >>=20 >> and for an SSA name we'd have: >>=20 >> X_2 =3D .DEFERRED_INIT (X_1(D), INIT) >>=20 >> with all other uses of X_1(D) being replaced by X_2. The idea is = that: >>=20 >> * Having the X0 argument would keep the uninitialised use of the >> variable around for the later warning passes. >>=20 >> * Using a const function should still allow the UB to be deleted as = dead >> if X1 isn't needed. >>=20 >> * Having a function in the way should stop passes from taking = advantage >> of direct uninitialised uses for optimisation. >>=20 >> This means we won't be able to optimise based on the actual init >> value at the gimple level, but that seems like a fair trade-off. >> AIUI this is really a security feature or anti-UB hardening feature >> (in the sense that users are more likely to see predictable behaviour >> =E2=80=9Cin the field=E2=80=9D even if the program has UB). >>=20 >>=20 >> The question is whether it's in line of peoples expectation that >> explicitely zero-initialized code behaves differently from >> implicitely zero-initialized code with respect to optimization >> and secondary side-effects (late diagnostics, latent bugs, etc.). >>=20 >> Introducing a new concept like .DEFERRED_INIT is much more >> heavy-weight than an explicit zero initializer. >>=20 >>=20 >> What exactly you mean by =E2=80=9Cheavy-weight=E2=80=9D? More = difficult to implement or much more run-time overhead or both? Or = something else? >>=20 >> The major benefit of the approach of =E2=80=9C.DEFERRED_INIT=E2=80=9D = is to enable us keep the current -Wuninitialized analysis untouched and = also pass >> the =E2=80=9Cuninitialized=E2=80=9D info from source code level to = =E2=80=9Cpass_expand=E2=80=9D. >>=20 >>=20 >> Well, "untouched" is a bit oversimplified. You do need to handle >> .DEFERRED_INIT as not >> being an initialization which will definitely get interesting. >>=20 >>=20 >> Yes, during uninitialized variable analysis pass, we should specially = handle the defs with =E2=80=9C.DEFERRED_INIT=E2=80=9D, to treat them as = uninitializations. >>=20 >> If we want to keep the current -Wuninitialized analysis untouched, = this is a quite reasonable approach. >>=20 >> However, if it=E2=80=99s not required to keep the current = -Wuninitialized analysis untouched, adding zero-initializer directly = during gimplification should >> be much easier and simpler, and also smaller run-time overhead. >>=20 >>=20 >> As for optimization I fear you'll get a load of redundant zero-init >> actually emitted if you can just rely on RTL DSE/DCE to remove it. >>=20 >>=20 >> Runtime overhead for -fauto-init=3Dzero is one important = consideration for the whole feature, we should minimize the runtime = overhead for zero >> Initialization since it will be used in production build. >> We can do some run-time performance evaluation when we have an = implementation ready. >>=20 >>=20 >> Note there will be other passes "confused" by .DEFERRED_INIT. Note >> that there's going to be other >> considerations - namely where to emit the .DEFERRED_INIT - when >> emitting it during gimplification >> you can emit it at the start of the block of block-scope variables. >> When emitting after gimplification >> you have to emit at function start which will probably make stack = slot >> sharing inefficient because >> the deferred init will cause overlapping lifetimes. With emitting at >> block boundary the .DEFERRED_INIT >> will act as code-motion barrier (and it itself likely cannot be = moved) >> so for example invariant motion >> will no longer happen. Likewise optimizations like SRA will be >> confused by .DEFERRED_INIT which >> again will lead to bigger stack usage (and less optimization). >>=20 >>=20 >> Yes, looks like that the inserted =E2=80=9C.DEFERRED_INIT=E2=80=9D = function calls will negatively impact tree optimizations. >>=20 >>=20 >> But sure, you can try implement a few variants but definitely >> .DEFERRED_INIT will be the most >> work. >>=20 >>=20 >> How about implement the following two approaches and compare the = run-time cost: >>=20 >> A. Insert the real initialization during gimplification phase. >> B. Insert the .DEFERRED_INIT during gimplification phase, and then = expand this call to real initialization during expand phase. >>=20 >> The Approach A will have less run-time overhead, but will mess up the = current uninitialized variable analysis in GCC. >> The Approach B will have more run-time overhead, but will keep the = current uninitialized variable analysis in GCC. >>=20 >> And then decide which approach we will go with? >>=20 >> What=E2=80=99s your opinion on this? >=20 > Well, in the end you have to try. Note for the purpose of stack slot > sharing you do want the > instrumentation to happen during gimplification. >=20 > Another possibility is to materialize .DEFERRED_INIT earlier than > expand, for example shortly > after IPA optimizations to avoid pessimizing loop transforms and allow > SRA. At the point you > materialize the inits you could run the late uninit warning pass > (which would then be earlier > than regular but would still see the .DEFERRED_INIT). If we put the =E2=80=9Cmaterializing .DEFERRED_INIT=E2=80=9D phase = earlier as you suggested above,=20 the late uninitialized warning pass has to be moved earlier in order to = utilize the =E2=80=9C.DEFERRED_INIT=E2=80=9D. Then we might miss some opportunities for the late uninitialized = warning. I think that this is not we really want.=20 >=20 > While users may be happy to pay some performance stack usage is > probably more critical So, which pass is for computing the stack usage? > (just thinking of the kernel) so not regressing there should be as > important as preserving > uninit warnings (which I for practical purposes see not important at > all - people can do > "debug" builds without -fzero-init). Looks like that the major issue with the =E2=80=9C.DERERRED_INIT=E2=80=9D = approach is: the new inserted calls to internal const function might inhibit some important tree optimizations.=20 So, I am thinking again the following another approach I raised in the = very beginning: During gimplification phase, mark the DECL for an auto variable without = initialization as =E2=80=9Cno_explicit_init=E2=80=9D, then maintain this=20= =E2=80=9Cno_explicit_init=E2=80=9D bit till after = pass_late_warn_uninitialized, or till pass_expand, add = zero-iniitiazation for all DECLs that are marked with =E2=80=9Cno_explicit_init=E2=80=9D.=20 This approach will not have the issue to interrupt tree optimizations, = however, I guess that =E2=80=9Cmaintaining this =E2=80=9Cno_explicit_init=E2= =80=9D bit might be very difficult? Do you have any comments on this approach? thanks. Qing >=20 > Richard. >=20 >>=20 >> Btw, I don't think theres any reason to cling onto clangs semantics >> for a particular switch. We'll never be able to emulate 1:1 behavior >> and our -Wuninit behavior is probably wastly different already. >>=20 >>=20 >> =46rom my study so far, yes, the currently behavior of -Wunit for = Clang and GCC is not exactly the same. >>=20 >> For example, for the following small testing case: >> void blah(int); >>=20 >> int foo_2 (int n, int l, int m, int r) >> { >> int v; >>=20 >> if ( (n > 10) && (m !=3D 100) && (r < 20) ) >> v =3D r; >>=20 >> if (l > 100) >> if ( (n <=3D 8) && (m < 102) && (r < 19) ) >> blah(v); /* { dg-warning "uninitialized" "real warning" } */ >>=20 >> return 0; >> } >>=20 >> GCC is able to report maybe uninitialized warning, but Clang cannot. >> Looks like that GCC=E2=80=99s uninitialized analysis relies on more = analysis and optimization information than CLANG. >>=20 >> Really curious on how clang implement its uninitialized analysis? >>=20 >>=20 >>=20 >> Actually, I studied a little bit on how clang implement its = uninitialized analysis last Friday. >> And noticed that CLANG has a data flow analysis phase based on = CLANG's AST. >> = http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-= in-Clang-td4069644.html >>=20 >> And clang=E2=80=99s uninitialized analysis is based on this data flow = analysis. >>=20 >> Therefore, adding initialization AFTER clang=E2=80=99s = uninitialization analysis phase is straightforward. >>=20 >> However, for GCC, we don=E2=80=99t have data flow analysis in FE. The = uninitialized variable analysis is put in TREE optimization phase, >> Therefore, it=E2=80=99s much more difficult to implement this feature = in GCC than that in CLANG. >>=20 >> Qing >>=20 >>=20 >>=20 >> Qing >>=20 >>=20 >>=20 >>=20 >> Richard. >>=20 >> Thanks, >> Richard