From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
 by sourceware.org (Postfix) with ESMTPS id ED39F39878E5
 for <gcc-patches@gcc.gnu.org>; Fri, 15 Jan 2021 17:22:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org ED39F39878E5
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=rguenther@suse.de
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.221.27])
 by mx2.suse.de (Postfix) with ESMTP id B5727B7EC;
 Fri, 15 Jan 2021 17:22:38 +0000 (UTC)
Date: Fri, 15 Jan 2021 18:22:36 +0100
User-Agent: K-9 Mail for Android
In-Reply-To: <C43EAB76-F9DC-476D-BA32-85F4C8CE3C22@ORACLE.COM>
References: <EBAE1FD7-0440-42FA-AE07-7F98F9552610@ORACLE.COM>
 <mptczzq94mu.fsf@arm.com>
 <CAFiYyc1vroX6zZf1MKzRxy5Rnpkg_hUWpTTejspiZ-xU=HoNtg@mail.gmail.com>
 <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM>
 <CAFiYyc2fbaT7e055yfRZ3HivAEk5ysKbXJ_+MKZdcAjKnnN2Mw@mail.gmail.com>
 <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM>
 <CAFiYyc27C=8UCe800sb7cBNaw2iv9PQZzogH70e=e1mMOo3Q+Q@mail.gmail.com>
 <9585CBB2-0082-4B9A-AC75-250F54F0797C@ORACLE.COM>
 <CAFiYyc2=47bUM1OpD_anSGnnj-ZgGAqVffq57XyAr4iq8uLPgA@mail.gmail.com>
 <51911859-45D5-4566-B588-F828B9D7313B@ORACLE.COM>
 <CAFiYyc3G7iFPhtasJ9+NTvbNJQ++fjRqFLCe84KKdoC4hc+gtg@mail.gmail.com>
 <9127AAB9-92C8-4A1B-BAD5-2F5F8762DCF9@ORACLE.COM>
 <5A0F7219-DAFA-4EAA-B845-0E236A108738@ORACLE.COM>
 <AA43C665-DE55-4113-B879-AAC63AAC6F59@ORACLE.COM>
 <nycvar.YFH.7.76.2101130838090.17979@zhemvz.fhfr.qr>
 <7E70D6B0-CA52-4957-BF84-401AA6E094D7@ORACLE.COM>
 <alpine.DEB.2.21.2101150907170.2612@rguenther-XPS-13-9380>
 <C43EAB76-F9DC-476D-BA32-85F4C8CE3C22@ORACLE.COM>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: The performance data for two different implementation of new
 security feature -ftrivial-auto-var-init
To: Qing Zhao <QING.ZHAO@ORACLE.COM>
CC: Richard Sandiford <richard.sandiford@arm.com>,
 Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
From: Richard Biener <rguenther@suse.de>
Message-ID: <C4C3F197-A328-4483-9469-AE1827066EF1@suse.de>
X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jan 2021 17:22:42 -0000

On January 15, 2021 5:16:40 PM GMT+01:00, Qing Zhao <QING=2EZHAO@ORACLE=2EC=
OM> wrote:
>
>
>> On Jan 15, 2021, at 2:11 AM, Richard Biener <rguenther@suse=2Ede>
>wrote:
>>=20
>>=20
>>=20
>> On Thu, 14 Jan 2021, Qing Zhao wrote:
>>=20
>>> Hi,=20
>>> More data on code size and compilation time with CPU2017:
>>> ********Compilation time data:   the numbers are the slowdown
>against the
>>> default =E2=80=9Cno=E2=80=9D:
>>> benchmarks  A/no D/no
>>>                        =20
>>> 500=2Eperlbench_r 5=2E19% 1=2E95%
>>> 502=2Egcc_r 0=2E46% -0=2E23%
>>> 505=2Emcf_r 0=2E00% 0=2E00%
>>> 520=2Eomnetpp_r 0=2E85% 0=2E00%
>>> 523=2Exalancbmk_r 0=2E79% -0=2E40%
>>> 525=2Ex264_r -4=2E48% 0=2E00%
>>> 531=2Edeepsjeng_r 16=2E67% 16=2E67%
>>> 541=2Eleela_r  0=2E00%  0=2E00%
>>> 557=2Exz_r 0=2E00%  0=2E00%
>>>                        =20
>>> 507=2EcactuBSSN_r 1=2E16% 0=2E58%
>>> 508=2Enamd_r 9=2E62% 8=2E65%
>>> 510=2Eparest_r 0=2E48% 1=2E19%
>>> 511=2Epovray_r 3=2E70% 3=2E70%
>>> 519=2Elbm_r 0=2E00% 0=2E00%
>>> 521=2Ewrf_r 0=2E05% 0=2E02%
>>> 526=2Eblender_r 0=2E33% 1=2E32%
>>> 527=2Ecam4_r -0=2E93% -0=2E93%
>>> 538=2Eimagick_r 1=2E32% 3=2E95%
>>> 544=2Enab_r  0=2E00% 0=2E00%
>>> From the above data, looks like that the compilation time impact
>>> from implementation A and D are almost the same=2E
>>> *******code size data: the numbers are the code size increase
>against the
>>> default =E2=80=9Cno=E2=80=9D:
>>> benchmarks A/no D/no
>>>                        =20
>>> 500=2Eperlbench_r 2=2E84% 0=2E34%
>>> 502=2Egcc_r 2=2E59% 0=2E35%
>>> 505=2Emcf_r 3=2E55% 0=2E39%
>>> 520=2Eomnetpp_r 0=2E54% 0=2E03%
>>> 523=2Exalancbmk_r 0=2E36%  0=2E39%
>>> 525=2Ex264_r 1=2E39% 0=2E13%
>>> 531=2Edeepsjeng_r 2=2E15% -1=2E12%
>>> 541=2Eleela_r 0=2E50% -0=2E20%
>>> 557=2Exz_r 0=2E31% 0=2E13%
>>>                        =20
>>> 507=2EcactuBSSN_r 5=2E00% -0=2E01%
>>> 508=2Enamd_r 3=2E64% -0=2E07%
>>> 510=2Eparest_r 1=2E12% 0=2E33%
>>> 511=2Epovray_r 4=2E18% 1=2E16%
>>> 519=2Elbm_r 8=2E83% 6=2E44%
>>> 521=2Ewrf_r 0=2E08% 0=2E02%
>>> 526=2Eblender_r 1=2E63% 0=2E45%
>>> 527=2Ecam4_r  0=2E16% 0=2E06%
>>> 538=2Eimagick_r 3=2E18% -0=2E80%
>>> 544=2Enab_r 5=2E76% -1=2E11%
>>> Avg 2=2E52% 0=2E36%
>>> From the above data, the implementation D is always better than A,
>it=E2=80=99s a
>>> surprising to me, not sure what=E2=80=99s the reason for this=2E
>>=20
>> D probably inhibits most interesting loop transforms (check SPEC FP
>> performance)=2E
>
>The call to =2EDEFERRED_INIT is marked as ECF_CONST:
>
>/* A function to represent an artifical initialization to an
>uninitialized
>   automatic variable=2E The first argument is the variable itself, the
>   second argument is the initialization type=2E  */
>DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW,
>NULL)
>
>So, I assume that such const call should minimize the impact to loop
>optimizations=2E But yes, it will still inhibit some of the loop
>transformations=2E
>
>>  It will also most definitely disallow SRA which, when
>> an aggregate is not completely elided, tends to grow code=2E
>
>Make sense to me=2E=20
>
>The run-time performance data for D and A are actually very similar as
>I posted in the previous email (I listed it here for convenience)
>
>Run-time performance overhead with A and D:
>
>benchmarks		A / no	D /no
>
>500=2Eperlbench_r	1=2E25%	1=2E25%
>502=2Egcc_r		0=2E68%	1=2E80%
>505=2Emcf_r		0=2E68%	0=2E14%
>520=2Eomnetpp_r	4=2E83%	4=2E68%
>523=2Exalancbmk_r	0=2E18%	1=2E96%
>525=2Ex264_r		1=2E55%	2=2E07%
>531=2Edeepsjeng_	11=2E57%	11=2E85%
>541=2Eleela_r		0=2E64%	0=2E80%
>557=2Exz_			 -0=2E41%	-0=2E41%
>
>507=2EcactuBSSN_r	0=2E44%	0=2E44%
>508=2Enamd_r		0=2E34%	0=2E34%
>510=2Eparest_r		0=2E17%	0=2E25%
>511=2Epovray_r		56=2E57%	57=2E27%
>519=2Elbm_r		0=2E00%	0=2E00%
>521=2Ewrf_r			 -0=2E28%	-0=2E37%
>526=2Eblender_r		16=2E96%	17=2E71%
>527=2Ecam4_r		0=2E70%	0=2E53%
>538=2Eimagick_r		2=2E40%	2=2E40%
>544=2Enab_r		0=2E00%	-0=2E65%
>
>avg				5=2E17%	5=2E37%
>
>Especially for the SPEC FP benchmarks, I didn=E2=80=99t see too much
>performance difference between A and D=2E=20
>I guess that the RTL optimizations might be enough to get rid of most
>of the overhead introduced by the additional initialization=2E=20
>
>>=20
>>> ********stack usage data, I added -fstack-usage to the compilation
>line when
>>> compiling CPU2017 benchmarks=2E And all the *=2Esu files were generate=
d
>for each
>>> of the modules=2E
>>> Since there a lot of such files, and the stack size information are
>embedded
>>> in each of the files=2E  I just picked up one benchmark 511=2Epovray t=
o
>>> check=2E Which is the one that=20
>>> has the most runtime overhead when adding initialization (both A and
>D)=2E=20
>>> I identified all the *=2Esu files that are different between A and D
>and do a
>>> diff on those *=2Esu files, and looks like that the stack size is much
>higher
>>> with D than that with A, for example:
>>> $ diff build_base_auto_init=2ED=2E0000/bbox=2Esu
>>> build_base_auto_init=2EA=2E0000/bbox=2Esu5c5
>>> < bbox=2Ecpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>>> pov::BBOX_TREE**&, long int*, long int, long int) 160 static
>>> ---
>>> > bbox=2Ecpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>>> pov::BBOX_TREE**&, long int*, long int, long int) 96 static
>>> $ diff build_base_auto_init=2ED=2E0000/image=2Esu
>>> build_base_auto_init=2EA=2E0000/image=2Esu
>>> 9c9
>>> < image=2Ecpp:240:6:void pov::bump_map(double*, pov::TNORMAL*,
>double*) 624
>>> static
>>> ---
>>> > image=2Ecpp:240:6:void pov::bump_map(double*, pov::TNORMAL*,
>double*) 272
>>> static
>>> =E2=80=A6=2E
>>> Looks like that implementation D has more stack size impact than A=2E=
=20
>>> Do you have any insight on what the reason for this?
>>=20
>> D will keep all initialized aggregates as aggregates and live which
>> means stack will be allocated for it=2E  With A the usual optimizations
>> to reduce stack usage can be applied=2E
>
>I checked the routine =E2=80=9Cpoverties::bump_map=E2=80=9D in 511=2Epovr=
ay_r since it
>has a lot stack increase=20
>due to implementation D, by examine the IR immediate before RTL
>expansion phase=2E =20
>(image=2Ecpp=2E244t=2Eoptimized), I found that we have the following
>additional statements for the array elements:
>
>void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>* normal)
>{
>=E2=80=A6
>  double p3[3];
>  double p2[3];
>  double p1[3];
>  float colour3[5];
>  float colour2[5];
>  float colour1[5];
>=E2=80=A6
>   # DEBUG BEGIN_STMT
>  colour1 =3D =2EDEFERRED_INIT (colour1, 2);
>  colour2 =3D =2EDEFERRED_INIT (colour2, 2);
>  colour3 =3D =2EDEFERRED_INIT (colour3, 2);
>  # DEBUG BEGIN_STMT
>  MEM <double> [(double[3] *)&p1] =3D p1$0_144(D);
>  MEM <double> [(double[3] *)&p1 + 8B] =3D p1$1_135(D);
>  MEM <double> [(double[3] *)&p1 + 16B] =3D p1$2_138(D);
>  p1 =3D =2EDEFERRED_INIT (p1, 2);
>  # DEBUG D#12 =3D> MEM <double> [(double[3] *)&p1]
>  # DEBUG p1$0 =3D> D#12
>  # DEBUG D#11 =3D> MEM <double> [(double[3] *)&p1 + 8B]
>  # DEBUG p1$1 =3D> D#11
>  # DEBUG D#10 =3D> MEM <double> [(double[3] *)&p1 + 16B]
>  # DEBUG p1$2 =3D> D#10
>  MEM <double> [(double[3] *)&p2] =3D p2$0_109(D);
>  MEM <double> [(double[3] *)&p2 + 8B] =3D p2$1_111(D);
>  MEM <double> [(double[3] *)&p2 + 16B] =3D p2$2_254(D);
>  p2 =3D =2EDEFERRED_INIT (p2, 2);
>  # DEBUG D#9 =3D> MEM <double> [(double[3] *)&p2]
>  # DEBUG p2$0 =3D> D#9
>  # DEBUG D#8 =3D> MEM <double> [(double[3] *)&p2 + 8B]
>  # DEBUG p2$1 =3D> D#8
>  # DEBUG D#7 =3D> MEM <double> [(double[3] *)&p2 + 16B]
>  # DEBUG p2$2 =3D> D#7
>  MEM <double> [(double[3] *)&p3] =3D p3$0_256(D);
>  MEM <double> [(double[3] *)&p3 + 8B] =3D p3$1_258(D);
>  MEM <double> [(double[3] *)&p3 + 16B] =3D p3$2_260(D);
>  p3 =3D =2EDEFERRED_INIT (p3, 2);
>  =E2=80=A6=2E
>}
>
>I guess that the above =E2=80=9CMEM <double>=E2=80=A6=2E=2E =3D =E2=80=A6=
=E2=80=9D are the ones that make the
>differences=2E Which phase introduced them?

Looks like SRA=2E But you can just dump all and grep for the first occurre=
nce=2E=20


>>=20
>>> Let me know if you have any comments and suggestions=2E
>>=20
>> First of all I would check whether the prototype implementations
>> work as expected=2E
>I have done such check with small testing cases already, checking the
>IR generated with the implementation A or D, mainly
>Focus on *=2Ec=2E006t=2Egimple=2E  and *=2Ec=2E*t=2Eexpand, all worked as=
 expected=2E=20
>
>For the CPU2017, for example as the above, I also checked the IR for
>both A and D, looks like all worked as expected=2E
>
>Thanks=2E=20
>
>Qing
>>=20
>> Richard=2E
>>=20
>>=20
>>> thanks=2E
>>> Qing
>>>      On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse=2Ede>
>>>      wrote:
>>>=20
>>>      On Tue, 12 Jan 2021, Qing Zhao wrote:
>>>=20
>>>            Hi,=20
>>>=20
>>>            Just check in to see whether you have any comments
>>>            and suggestions on this:
>>>=20
>>>            FYI, I have been continue with Approach D
>>>            implementation since last week:
>>>=20
>>>            D=2E Adding  calls to =2EDEFFERED_INIT during
>>>            gimplification, expand the =2EDEFFERED_INIT during
>>>            expand to
>>>            real initialization=2E Adjusting uninitialized pass
>>>            with the new refs with =E2=80=9C=2EDEFFERED_INIT=E2=80=9D=
=2E
>>>=20
>>>            For the remaining work of Approach D:
>>>=20
>>>            ** complete the implementation of
>>>            -ftrivial-auto-var-init=3Dpattern;
>>>            ** complete the implementation of uninitialized
>>>            warnings maintenance work for D=2E=20
>>>=20
>>>            I have completed the uninitialized warnings
>>>            maintenance work for D=2E
>>>            And finished partial of the
>>>            -ftrivial-auto-var-init=3Dpattern implementation=2E=20
>>>=20
>>>            The following are remaining work of Approach D:
>>>=20
>>>              ** -ftrivial-auto-var-init=3Dpattern for VLA;
>>>              **add a new attribute for variable:
>>>            __attribute((uninitialized)
>>>            the marked variable is uninitialized intentionaly
>>>            for performance purpose=2E
>>>              ** adding complete testing cases;
>>>=20
>>>            Please let me know if you have any objection on my
>>>            current decision on implementing approach D=2E=20
>>>=20
>>>      Did you do any analysis on how stack usage and code size are
>>>      changed=20
>>>      with approach D?  How does compile-time behave (we could gobble
>>>      up
>>>      lots of =2EDEFERRED_INIT calls I guess)?
>>>=20
>>>      Richard=2E
>>>=20
>>>            Thanks a lot for your help=2E
>>>=20
>>>            Qing
>>>=20
>>>                  On Jan 5, 2021, at 1:05 PM, Qing Zhao
>>>                  via Gcc-patches
>>>                  <gcc-patches@gcc=2Egnu=2Eorg> wrote:
>>>=20
>>>                  Hi,
>>>=20
>>>                  This is an update for our previous
>>>                  discussion=2E=20
>>>=20
>>>                  1=2E I implemented the following two
>>>                  different implementations in the latest
>>>                  upstream gcc:
>>>=20
>>>                  A=2E Adding real initialization during
>>>                  gimplification, not maintain the
>>>                  uninitialized warnings=2E
>>>=20
>>>                  D=2E Adding  calls to =2EDEFFERED_INIT
>>>                  during gimplification, expand the
>>>                  =2EDEFFERED_INIT during expand to
>>>                  real initialization=2E Adjusting
>>>                  uninitialized pass with the new refs
>>>                  with =E2=80=9C=2EDEFFERED_INIT=E2=80=9D=2E
>>>=20
>>>                  Note, in this initial implementation,
>>>                  ** I ONLY implement
>>>                  -ftrivial-auto-var-init=3Dzero, the
>>>                  implementation of
>>>                  -ftrivial-auto-var-init=3Dpattern=20
>>>                     is not done yet=2E  Therefore, the
>>>                  performance data is only about
>>>                  -ftrivial-auto-var-init=3Dzero=2E=20
>>>=20
>>>                  ** I added an temporary  option
>>>                  -fauto-var-init-approach=3DA|B|C|D  to
>>>                  choose implementation A or D for=20
>>>                     runtime performance study=2E
>>>                  ** I didn=E2=80=99t finish the uninitialized
>>>                  warnings maintenance work for D=2E (That
>>>                  might take more time than I expected)=2E=20
>>>=20
>>>                  2=2E I collected runtime data for CPU2017
>>>                  on a x86 machine with this new gcc for
>>>                  the following 3 cases:
>>>=20
>>>                  no: default=2E (-g -O2 -march=3Dnative )
>>>                  A:  default +
>>>                   -ftrivial-auto-var-init=3Dzero
>>>                  -fauto-var-init-approach=3DA=20
>>>                  D:  default +
>>>                   -ftrivial-auto-var-init=3Dzero
>>>                  -fauto-var-init-approach=3DD=20
>>>=20
>>>                  And then compute the slowdown data for
>>>                  both A and D as following:
>>>=20
>>>                  benchmarks A / no D /no
>>>=20
>>>                  500=2Eperlbench_r 1=2E25% 1=2E25%
>>>                  502=2Egcc_r 0=2E68% 1=2E80%
>>>                  505=2Emcf_r 0=2E68% 0=2E14%
>>>                  520=2Eomnetpp_r 4=2E83% 4=2E68%
>>>                  523=2Exalancbmk_r 0=2E18% 1=2E96%
>>>                  525=2Ex264_r 1=2E55% 2=2E07%
>>>                  531=2Edeepsjeng_ 11=2E57% 11=2E85%
>>>                  541=2Eleela_r 0=2E64% 0=2E80%
>>>                  557=2Exz_  -0=2E41% -0=2E41%
>>>=20
>>>                  507=2EcactuBSSN_r 0=2E44% 0=2E44%
>>>                  508=2Enamd_r 0=2E34% 0=2E34%
>>>                  510=2Eparest_r 0=2E17% 0=2E25%
>>>                  511=2Epovray_r 56=2E57% 57=2E27%
>>>                  519=2Elbm_r 0=2E00% 0=2E00%
>>>                  521=2Ewrf_r  -0=2E28% -0=2E37%
>>>                  526=2Eblender_r 16=2E96% 17=2E71%
>>>                  527=2Ecam4_r 0=2E70% 0=2E53%
>>>                  538=2Eimagick_r 2=2E40% 2=2E40%
>>>                  544=2Enab_r 0=2E00% -0=2E65%
>>>=20
>>>                  avg 5=2E17% 5=2E37%
>>>=20
>>>                  From the above data, we can see that in
>>>                  general, the runtime performance
>>>                  slowdown for=20
>>>                  implementation A and D are similar for
>>>                  individual benchmarks=2E
>>>=20
>>>                  There are several benchmarks that have
>>>                  significant slowdown with the new added
>>>                  initialization for both
>>>                  A and D, for example, 511=2Epovray_r,
>>>                  526=2Eblender_, and 531=2Edeepsjeng_r, I
>>>                  will try to study a little bit
>>>                  more on what kind of new initializations
>>>                  introduced such slowdown=2E=20
>>>=20
>>>                  From the current study so far, I think
>>>                  that approach D should be good enough
>>>                  for our final implementation=2E=20
>>>                  So, I will try to finish approach D with
>>>                  the following remaining work
>>>=20
>>>                      ** complete the implementation of
>>>                  -ftrivial-auto-var-init=3Dpattern;
>>>                      ** complete the implementation of
>>>                  uninitialized warnings maintenance work
>>>                  for D=2E=20
>>>=20
>>>                  Let me know if you have any comments and
>>>                  suggestions on my current and future
>>>                  work=2E
>>>=20
>>>                  Thanks a lot for your help=2E
>>>=20
>>>                  Qing
>>>=20
>>>                        On Dec 9, 2020, at 10:18 AM,
>>>                        Qing Zhao via Gcc-patches
>>>                        <gcc-patches@gcc=2Egnu=2Eorg>
>>>                        wrote:
>>>=20
>>>                        The following are the
>>>                        approaches I will implement
>>>                        and compare:
>>>=20
>>>                        Our final goal is to keep
>>>                        the uninitialized warning
>>>                        and minimize the run-time
>>>                        performance cost=2E
>>>=20
>>>                        A=2E Adding real
>>>                        initialization during
>>>                        gimplification, not maintain
>>>                        the uninitialized warnings=2E
>>>                        B=2E Adding real
>>>                        initialization during
>>>                        gimplification, marking them
>>>                        with =E2=80=9Cartificial_init=E2=80=9D=2E=20
>>>                          Adjusting uninitialized
>>>                        pass, maintaining the
>>>                        annotation, making sure the
>>>                        real init not
>>>                          Deleted from the fake
>>>                        init=2E=20
>>>                        C=2E  Marking the DECL for an
>>>                        uninitialized auto variable
>>>                        as =E2=80=9Cno_explicit_init=E2=80=9D during
>>>                        gimplification,
>>>                           maintain this
>>>                        =E2=80=9Cno_explicit_init=E2=80=9D bit till
>>>                        after
>>>                        pass_late_warn_uninitialized,
>>>                        or till pass_expand,=20
>>>                           add real initialization
>>>                        for all DECLs that are
>>>                        marked with
>>>                        =E2=80=9Cno_explicit_init=E2=80=9D=2E
>>>                        D=2E Adding =2EDEFFERED_INIT
>>>                        during gimplification,
>>>                        expand the =2EDEFFERED_INIT
>>>                        during expand to
>>>                          real initialization=2E
>>>                        Adjusting uninitialized pass
>>>                        with the new refs with
>>>                        =E2=80=9C=2EDEFFERED_INIT=E2=80=9D=2E
>>>=20
>>>                        In the above, approach A
>>>                        will be the one that have
>>>                        the minimum run-time cost,
>>>                        will be the base for the
>>>                        performance
>>>                        comparison=2E=20
>>>=20
>>>                        I will implement approach D
>>>                        then, this one is expected
>>>                        to have the most run-time
>>>                        overhead among the above
>>>                        list, but
>>>                        Implementation should be the
>>>                        cleanest among B, C, D=2E
>>>                        Let=E2=80=99s see how much more
>>>                        performance overhead this
>>>                        approach
>>>                        will be=2E If the data is
>>>                        good, maybe we can avoid the
>>>                        effort to implement B, and
>>>                        C=2E=20
>>>=20
>>>                        If the performance of D is
>>>                        not good, I will implement B
>>>                        or C at that time=2E
>>>=20
>>>                        Let me know if you have any
>>>                        comment or suggestions=2E
>>>=20
>>>                        Thanks=2E
>>>=20
>>>                        Qing
>>>=20
>>>      --=20
>>>      Richard Biener <rguenther@suse=2Ede>
>>>      SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
>>>      Nuernberg,
>>>      Germany; GF: Felix Imend=C3=B6rffer; HRB 36809 (AG Nuernberg)