From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <QING.ZHAO@ORACLE.COM>
Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86])
 by sourceware.org (Postfix) with ESMTPS id 021FD386F013
 for <gcc-patches@gcc.gnu.org>; Tue,  8 Dec 2020 19:54:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 021FD386F013
Received: from pps.filterd (userp2130.oracle.com [127.0.0.1])
 by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8JntiQ124421;
 Tue, 8 Dec 2020 19:54:33 GMT
Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71])
 by userp2130.oracle.com with ESMTP id 3581mqvq5t-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL);
 Tue, 08 Dec 2020 19:54:32 +0000
Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1])
 by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8JomG7123424;
 Tue, 8 Dec 2020 19:54:31 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236])
 by aserp3030.oracle.com with ESMTP id 358ksp1xpr-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Tue, 08 Dec 2020 19:54:31 +0000
Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13])
 by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0B8JsUxI013236;
 Tue, 8 Dec 2020 19:54:31 GMT
Received: from dhcp-10-154-100-168.vpn.oracle.com (/10.154.100.168)
 by default (Oracle Beehive Gateway v4.0)
 with ESMTP ; Tue, 08 Dec 2020 11:54:30 -0800
From: Qing Zhao <QING.ZHAO@ORACLE.COM>
Message-Id: <9585CBB2-0082-4B9A-AC75-250F54F0797C@ORACLE.COM>
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
Subject: Re: How to traverse all the local variables that declared in the
 current routine?
Date: Tue, 8 Dec 2020 13:54:29 -0600
In-Reply-To: <CAFiYyc27C=8UCe800sb7cBNaw2iv9PQZzogH70e=e1mMOo3Q+Q@mail.gmail.com>
Cc: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
To: Richard Biener <richard.guenther@gmail.com>,
 Richard Sandiford <richard.sandiford@arm.com>
References: <EBAE1FD7-0440-42FA-AE07-7F98F9552610@ORACLE.COM>
 <CAFiYyc3p19U9tJPCdch+GyDLdj9N9Eq3PWTFHwdguVfCAdW7zw@mail.gmail.com>
 <217BE64F-A623-4453-B45B-D38B66B71B72@ORACLE.COM>
 <CAFiYyc2fOb9TYRm8A8zTffx38eSHJ9=14pdxMbme2X=AZ8nm+A@mail.gmail.com>
 <mptczzq94mu.fsf@arm.com>
 <CAFiYyc1vroX6zZf1MKzRxy5Rnpkg_hUWpTTejspiZ-xU=HoNtg@mail.gmail.com>
 <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM>
 <CAFiYyc2fbaT7e055yfRZ3HivAEk5ysKbXJ_+MKZdcAjKnnN2Mw@mail.gmail.com>
 <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM>
 <CAFiYyc27C=8UCe800sb7cBNaw2iv9PQZzogH70e=e1mMOo3Q+Q@mail.gmail.com>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829
 signatures=668683
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999
 suspectscore=4
 bulkscore=0 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0
 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2009150000 definitions=main-2012080122
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829
 signatures=668683
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4
 mlxlogscore=999
 clxscore=1015 malwarescore=0 priorityscore=1501 adultscore=0
 lowpriorityscore=0 phishscore=0 spamscore=0 impostorscore=0 mlxscore=0
 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2009150000 definitions=main-2012080122
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE,
 RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP,
 UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Dec 2020 19:54:43 -0000


> On Dec 8, 2020, at 1:40 AM, Richard Biener =
<richard.guenther@gmail.com> wrote:
>=20
> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com =
<mailto:QING.ZHAO@oracle.com>> wrote:
>>=20
>>=20
>>=20
>> On Dec 7, 2020, at 1:12 AM, Richard Biener =
<richard.guenther@gmail.com> wrote:
>>=20
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> =
wrote:
>>=20
>>=20
>>=20
>>=20
>> On Dec 4, 2020, at 2:50 AM, Richard Biener =
<richard.guenther@gmail.com> wrote:
>>=20
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>>=20
>>=20
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>=20
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> =
wrote:
>>=20
>> Another issue is, in order to check whether an auto-variable has =
initializer, I plan to add a new bit in =E2=80=9Cdecl_common=E2=80=9D =
as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>>=20
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  =
*/
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>=20
>> set this bit when setting DECL_INITIAL for the variables in FE. then =
keep it
>> even though DECL_INITIAL might be NULLed.
>>=20
>>=20
>> For locals it would be more reliable to set this flag during =
gimplification.
>>=20
>> Do you have any comment and suggestions?
>>=20
>>=20
>> As said above - do you want to cover registers as well as locals?  =
I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see =
expand_stack_vars)
>>=20
>> Note that optimization will already made have use of "uninitialized" =
state
>> of locals so depending on what the actual goal is here "late" may be =
too late.
>>=20
>>=20
>> Haven't thought about this much, so it might be a daft idea, but =
would a
>> compromise be to use a const internal function:
>>=20
>> X1 =3D .DEFERRED_INIT (X0, INIT)
>>=20
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>>=20
>> X =3D .DEFERRED_INIT (X, INIT)
>>=20
>> and for an SSA name we'd have:
>>=20
>> X_2 =3D .DEFERRED_INIT (X_1(D), INIT)
>>=20
>> with all other uses of X_1(D) being replaced by X_2.  The idea is =
that:
>>=20
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>>=20
>> * Using a const function should still allow the UB to be deleted as =
dead
>> if X1 isn't needed.
>>=20
>> * Having a function in the way should stop passes from taking =
advantage
>> of direct uninitialised uses for optimisation.
>>=20
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> =E2=80=9Cin the field=E2=80=9D even if the program has UB).
>>=20
>>=20
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>>=20
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>>=20
>>=20
>> What exactly you mean by =E2=80=9Cheavy-weight=E2=80=9D? More =
difficult to implement or much more run-time overhead or both? Or =
something else?
>>=20
>> The major benefit of the approach of =E2=80=9C.DEFERRED_INIT=E2=80=9D =
 is to enable us keep the current -Wuninitialized analysis untouched and =
also pass
>> the =E2=80=9Cuninitialized=E2=80=9D info from source code level to =
=E2=80=9Cpass_expand=E2=80=9D.
>>=20
>>=20
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>>=20
>>=20
>> Yes, during uninitialized variable analysis pass, we should specially =
handle the defs with =E2=80=9C.DEFERRED_INIT=E2=80=9D, to treat them as =
uninitializations.
>>=20
>> If we want to keep the current -Wuninitialized analysis untouched, =
this is a quite reasonable approach.
>>=20
>> However, if it=E2=80=99s not required to keep the current =
-Wuninitialized analysis untouched, adding zero-initializer directly =
during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>>=20
>>=20
>> As for optimization I fear you'll get a load of redundant zero-init
>> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>>=20
>>=20
>> Runtime overhead for -fauto-init=3Dzero is one important =
consideration for the whole feature, we should minimize the runtime =
overhead for zero
>> Initialization since it will be used in production build.
>> We can do some run-time performance evaluation when we have an =
implementation ready.
>>=20
>>=20
>> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
>> that there's going to be other
>> considerations - namely where to emit the .DEFERRED_INIT - when
>> emitting it during gimplification
>> you can emit it at the start of the block of block-scope variables.
>> When emitting after gimplification
>> you have to emit at function start which will probably make stack =
slot
>> sharing inefficient because
>> the deferred init will cause overlapping lifetimes.  With emitting at
>> block boundary the .DEFERRED_INIT
>> will act as code-motion barrier (and it itself likely cannot be =
moved)
>> so for example invariant motion
>> will no longer happen.  Likewise optimizations like SRA will be
>> confused by .DEFERRED_INIT which
>> again will lead to bigger stack usage (and less optimization).
>>=20
>>=20
>> Yes, looks like  that the inserted =E2=80=9C.DEFERRED_INIT=E2=80=9D =
function calls will negatively impact tree optimizations.
>>=20
>>=20
>> But sure, you can try implement a few variants but definitely
>> .DEFERRED_INIT will be the most
>> work.
>>=20
>>=20
>> How about implement the following two approaches and compare the =
run-time cost:
>>=20
>> A.  Insert the real initialization during gimplification phase.
>> B.  Insert the .DEFERRED_INIT during gimplification phase, and then =
expand this call to real initialization during expand phase.
>>=20
>> The Approach A will have less run-time overhead, but will mess up the =
current uninitialized variable analysis in GCC.
>> The Approach B will have more run-time overhead, but will keep the =
current uninitialized variable analysis in GCC.
>>=20
>> And then decide which approach we will go with?
>>=20
>> What=E2=80=99s your opinion on this?
>=20
> Well, in the end you have to try.  Note for the purpose of stack slot
> sharing you do want the
> instrumentation to happen during gimplification.
>=20
> Another possibility is to materialize .DEFERRED_INIT earlier than
> expand, for example shortly
> after IPA optimizations to avoid pessimizing loop transforms and allow
> SRA.  At the point you
> materialize the inits you could run the late uninit warning pass
> (which would then be earlier
> than regular but would still see the .DEFERRED_INIT).

If we put the =E2=80=9Cmaterializing .DEFERRED_INIT=E2=80=9D phase =
earlier as you suggested above,=20
the late uninitialized warning pass has to be moved earlier in order to =
utilize the =E2=80=9C.DEFERRED_INIT=E2=80=9D.
Then we might miss some opportunities for the late uninitialized =
warning. I think that this is not we really
want.=20

>=20
> While users may be happy to pay some performance stack usage is
> probably more critical

So, which pass is for computing the stack usage?

> (just thinking of the kernel) so not regressing there should be as
> important as preserving
> uninit warnings (which I for practical purposes see not important at
> all - people can do
> "debug" builds without -fzero-init).

Looks like that the major issue with the =E2=80=9C.DERERRED_INIT=E2=80=9D =
approach is:  the new inserted calls to internal const function
might inhibit some important tree optimizations.=20

So, I am thinking again the following another approach I raised in the =
very beginning:

During gimplification phase, mark the DECL for an auto variable without =
initialization as =E2=80=9Cno_explicit_init=E2=80=9D, then maintain this=20=

=E2=80=9Cno_explicit_init=E2=80=9D bit till after =
pass_late_warn_uninitialized, or till pass_expand, add =
zero-iniitiazation for all DECLs that are
marked with =E2=80=9Cno_explicit_init=E2=80=9D.=20

This approach will not have the issue to interrupt tree optimizations, =
however, I guess that =E2=80=9Cmaintaining this =E2=80=9Cno_explicit_init=E2=
=80=9D bit
might be very difficult?

Do you have any comments on this approach?

thanks.

Qing


>=20
> Richard.
>=20
>>=20
>> Btw, I don't think theres any reason to cling onto clangs semantics
>> for a particular switch.  We'll never be able to emulate 1:1 behavior
>> and our -Wuninit behavior is probably wastly different already.
>>=20
>>=20
>> =46rom my study so far, yes, the currently behavior of -Wunit for =
Clang and GCC is not exactly the same.
>>=20
>> For example, for the following small testing case:
>> void blah(int);
>>=20
>> int foo_2 (int n, int l, int m, int r)
>> {
>> int v;
>>=20
>> if ( (n > 10) && (m !=3D 100)  && (r < 20) )
>>   v =3D r;
>>=20
>> if (l > 100)
>>   if ( (n <=3D 8) &&  (m < 102)  && (r < 19) )
>>     blah(v); /* { dg-warning "uninitialized" "real warning" } */
>>=20
>> return 0;
>> }
>>=20
>> GCC is able to report maybe uninitialized warning, but Clang cannot.
>> Looks like that GCC=E2=80=99s uninitialized analysis relies on more =
analysis and optimization information than CLANG.
>>=20
>> Really curious on how clang implement its uninitialized analysis?
>>=20
>>=20
>>=20
>> Actually, I studied a little bit on how clang implement its =
uninitialized analysis last Friday.
>> And noticed that CLANG has a data flow analysis phase based on =
CLANG's AST.
>> =
http://clang-developers.42468.n3.nabble.com/A-survey-of-dataflow-analyses-=
in-Clang-td4069644.html
>>=20
>> And clang=E2=80=99s uninitialized analysis is based on this data flow =
analysis.
>>=20
>> Therefore, adding initialization AFTER clang=E2=80=99s =
uninitialization analysis phase is straightforward.
>>=20
>> However, for GCC, we don=E2=80=99t have data flow analysis in FE. The =
uninitialized variable analysis is put in TREE optimization phase,
>> Therefore, it=E2=80=99s much more difficult to implement this feature =
in GCC than that in CLANG.
>>=20
>> Qing
>>=20
>>=20
>>=20
>> Qing
>>=20
>>=20
>>=20
>>=20
>> Richard.
>>=20
>> Thanks,
>> Richard