From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 3EA943851C33 for ; Mon, 7 Dec 2020 18:05:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3EA943851C33 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E1BD81042; Mon, 7 Dec 2020 10:05:54 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 519F63F66B; Mon, 7 Dec 2020 10:05:54 -0800 (PST) From: Richard Sandiford To: Qing Zhao Mail-Followup-To: Qing Zhao , Qing Zhao via Gcc-patches , Richard Biener , richard.sandiford@arm.com Cc: Qing Zhao via Gcc-patches , Richard Biener Subject: Re: How to traverse all the local variables that declared in the current routine? References: <217BE64F-A623-4453-B45B-D38B66B71B72@ORACLE.COM> <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM> <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM> <23B74D40-BB4F-46AF-94A5-E490585CCCC0@ORACLE.COM> Date: Mon, 07 Dec 2020 18:05:52 +0000 In-Reply-To: <23B74D40-BB4F-46AF-94A5-E490585CCCC0@ORACLE.COM> (Qing Zhao's message of "Mon, 7 Dec 2020 11:36:09 -0600") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Dec 2020 18:05:56 -0000 Qing Zhao writes: >> On Dec 7, 2020, at 11:10 AM, Richard Sandiford wrote: >>>>>=20 >>>>> Another issue is, in order to check whether an auto-variable has init= ializer, I plan to add a new bit in =E2=80=9Cdecl_common=E2=80=9D as: >>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED. */ >>>>> unsigned decl_is_initialized :1; >>>>>=20 >>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration. = */ >>>>> #define DECL_IS_INITIALIZED(NODE) \ >>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized) >>>>>=20 >>>>> set this bit when setting DECL_INITIAL for the variables in FE. then = keep it >>>>> even though DECL_INITIAL might be NULLed. >>>>>=20 >>>>>=20 >>>>> For locals it would be more reliable to set this flag during gimplifi= cation. >>>>>=20 >>>>> Do you have any comment and suggestions? >>>>>=20 >>>>>=20 >>>>> As said above - do you want to cover registers as well as locals? I'= d do >>>>> the actual zeroing during RTL expansion instead since otherwise you >>>>> have to figure youself whether a local is actually used (see expand_s= tack_vars) >>>>>=20 >>>>> Note that optimization will already made have use of "uninitialized" = state >>>>> of locals so depending on what the actual goal is here "late" may be = too late. >>>>>=20 >>>>>=20 >>>>> Haven't thought about this much, so it might be a daft idea, but woul= d a >>>>> compromise be to use a const internal function: >>>>>=20 >>>>> X1 =3D .DEFERRED_INIT (X0, INIT) >>>>>=20 >>>>> where the X0 argument is an uninitialised value and the INIT argument >>>>> describes the initialisation pattern? So for a decl we'd have: >>>>>=20 >>>>> X =3D .DEFERRED_INIT (X, INIT) >>>>>=20 >>>>> and for an SSA name we'd have: >>>>>=20 >>>>> X_2 =3D .DEFERRED_INIT (X_1(D), INIT) >>>>>=20 >>>>> with all other uses of X_1(D) being replaced by X_2. The idea is tha= t: >>>>>=20 >>>>> * Having the X0 argument would keep the uninitialised use of the >>>>> variable around for the later warning passes. >>>>>=20 >>>>> * Using a const function should still allow the UB to be deleted as d= ead >>>>> if X1 isn't needed. >>>>>=20 >>>>> * Having a function in the way should stop passes from taking advanta= ge >>>>> of direct uninitialised uses for optimisation. >>>>>=20 >>>>> This means we won't be able to optimise based on the actual init >>>>> value at the gimple level, but that seems like a fair trade-off. >>>>> AIUI this is really a security feature or anti-UB hardening feature >>>>> (in the sense that users are more likely to see predictable behaviour >>>>> =E2=80=9Cin the field=E2=80=9D even if the program has UB). >>>>>=20 >>>>>=20 >>>>> The question is whether it's in line of peoples expectation that >>>>> explicitely zero-initialized code behaves differently from >>>>> implicitely zero-initialized code with respect to optimization >>>>> and secondary side-effects (late diagnostics, latent bugs, etc.). >>>>>=20 >>>>> Introducing a new concept like .DEFERRED_INIT is much more >>>>> heavy-weight than an explicit zero initializer. >>>>>=20 >>>>>=20 >>>>> What exactly you mean by =E2=80=9Cheavy-weight=E2=80=9D? More difficu= lt to implement or much more run-time overhead or both? Or something else? >>>>>=20 >>>>> The major benefit of the approach of =E2=80=9C.DEFERRED_INIT=E2=80=9D= is to enable us keep the current -Wuninitialized analysis untouched and a= lso pass >>>>> the =E2=80=9Cuninitialized=E2=80=9D info from source code level to = =E2=80=9Cpass_expand=E2=80=9D. >>>>=20 >>>> Well, "untouched" is a bit oversimplified. You do need to handle >>>> .DEFERRED_INIT as not >>>> being an initialization which will definitely get interesting. >>>=20 >>> Yes, during uninitialized variable analysis pass, we should specially h= andle the defs with =E2=80=9C.DEFERRED_INIT=E2=80=9D, to treat them as unin= itializations. >>=20 >> Are you sure we need to do that? The point of having the first argument >> to .DEFERRED_INIT was that that argument would still provide an >> uninitialised use of the variable. And the values are passed and >> returned by value, so the lack of initialisation is explicit in >> the gcall itself, without knowing what the target function does. >>=20 >> The idea is that we can essentially treat .DEFERRED_INIT as a normal >> (const) function call. I'd be surprised if many passes needed to >> handle it specially. >>=20 > > Just checked with a small testing case (to emulate the .DEFERRED_INIT app= roach): > > qinzhao@gcc10:~/Bugs/auto-init$ cat t.c > extern int DEFFERED_INIT (int, int) __attribute__ ((const)); > > int foo (int n, int r) > { > int v; > > v =3D DEFFERED_INIT (v, 0); > if (n < 10)=20 > v =3D r; > > return v; > } > qinzhao@gcc10:~/Bugs/auto-init$ sh t > /home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree= -all -S t.c > t.c: In function =E2=80=98foo=E2=80=99: > t.c:7:7: warning: =E2=80=98v=E2=80=99 is used uninitialized [-Wuninitiali= zed] > 7 | v =3D DEFFERED_INIT (v, 0); > | ^~~~~~~~~~~~~~~~~~~~ > > We can see that the current uninitialized variable analysis treats the ne= w added artificial initialization as the first use of the uninialized varia= ble. Therefore report the warning there. > However, we should report warning at =E2=80=9Creturn v=E2=80=9D.=20 Ah, OK, so this is about the quality of the warning, rather than about whether we report a warning or not? > So, I think that we still need to specifically handle the new added artif= icial initialization during uninitialized analysis phase. Yeah, that sounds like one approach. But if we're adding .DEFERRED_INIT in response to known uninitialised uses, two other approaches might be: (1) Give the call the same source location as one of the uninitialised uses. (2) Pass the locations of all uninitialised uses as additional arguments. The uninit pass would then be picking the source location differently from normal, but I don't know what effect it would have on the quality of diagnostics. One obvious problem is that if there are multiple uninitialised uses, some of them might get optimised away later. On the other hand, using early source locations might give better results in some cases. I guess it will depend. Thanks, Richard