From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <QING.ZHAO@ORACLE.COM>
Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86])
 by sourceware.org (Postfix) with ESMTPS id 120EC3854805
 for <gcc-patches@gcc.gnu.org>; Fri, 15 Jan 2021 18:00:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 120EC3854805
Received: from pps.filterd (userp2130.oracle.com [127.0.0.1])
 by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 10FHjHtF122709;
 Fri, 15 Jan 2021 17:59:56 GMT
Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70])
 by userp2130.oracle.com with ESMTP id 360kvke1u8-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 15 Jan 2021 17:59:56 +0000
Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1])
 by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 10FHk7wQ158042;
 Fri, 15 Jan 2021 17:57:55 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75])
 by aserp3020.oracle.com with ESMTP id 360kebe36b-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 15 Jan 2021 17:57:55 +0000
Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20])
 by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 10FHvrex022884;
 Fri, 15 Jan 2021 17:57:53 GMT
Received: from dhcp-10-154-190-241.vpn.oracle.com (/10.154.190.241)
 by default (Oracle Beehive Gateway v4.0)
 with ESMTP ; Fri, 15 Jan 2021 09:57:52 -0800
From: Qing Zhao <QING.ZHAO@ORACLE.COM>
Message-Id: <4BD13E0F-A561-4429-AEB8-5BDEA6987D21@ORACLE.COM>
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
Subject: Re: The performance data for two different implementation of new
 security feature -ftrivial-auto-var-init
Date: Fri, 15 Jan 2021 11:57:51 -0600
In-Reply-To: <C4C3F197-A328-4483-9469-AE1827066EF1@suse.de>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
 Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
To: Richard Biener <rguenther@suse.de>
References: <EBAE1FD7-0440-42FA-AE07-7F98F9552610@ORACLE.COM>
 <mptczzq94mu.fsf@arm.com>
 <CAFiYyc1vroX6zZf1MKzRxy5Rnpkg_hUWpTTejspiZ-xU=HoNtg@mail.gmail.com>
 <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM>
 <CAFiYyc2fbaT7e055yfRZ3HivAEk5ysKbXJ_+MKZdcAjKnnN2Mw@mail.gmail.com>
 <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM>
 <CAFiYyc27C=8UCe800sb7cBNaw2iv9PQZzogH70e=e1mMOo3Q+Q@mail.gmail.com>
 <9585CBB2-0082-4B9A-AC75-250F54F0797C@ORACLE.COM>
 <CAFiYyc2=47bUM1OpD_anSGnnj-ZgGAqVffq57XyAr4iq8uLPgA@mail.gmail.com>
 <51911859-45D5-4566-B588-F828B9D7313B@ORACLE.COM>
 <CAFiYyc3G7iFPhtasJ9+NTvbNJQ++fjRqFLCe84KKdoC4hc+gtg@mail.gmail.com>
 <9127AAB9-92C8-4A1B-BAD5-2F5F8762DCF9@ORACLE.COM>
 <5A0F7219-DAFA-4EAA-B845-0E236A108738@ORACLE.COM>
 <AA43C665-DE55-4113-B879-AAC63AAC6F59@ORACLE.COM>
 <nycvar.YFH.7.76.2101130838090.17979@zhemvz.fhfr.qr>
 <7E70D6B0-CA52-4957-BF84-401AA6E094D7@ORACLE.COM>
 <alpine.DEB.2.21.2101150907170.2612@rguenther-XPS-13-9380>
 <C43EAB76-F9DC-476D-BA32-85F4C8CE3C22@ORACLE! .COM>
 <C4C3F197-A328-4483-9469-AE1827066EF1@suse.de>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9865
 signatures=668683
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0
 suspectscore=0 spamscore=0
 mlxlogscore=999 malwarescore=0 bulkscore=0 mlxscore=0 phishscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000
 definitions=main-2101150108
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9865
 signatures=668683
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999
 phishscore=0
 lowpriorityscore=0 bulkscore=0 priorityscore=1501 malwarescore=0
 clxscore=1015 impostorscore=0 spamscore=0 mlxscore=0 suspectscore=0
 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2009150000 definitions=main-2101150108
X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE,
 RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP,
 UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jan 2021 18:00:05 -0000


> On Jan 15, 2021, at 11:22 AM, Richard Biener <rguenther@suse.de> =
wrote:
>=20
> On January 15, 2021 5:16:40 PM GMT+01:00, Qing Zhao =
<QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> wrote:
>>=20
>>=20
>>> On Jan 15, 2021, at 2:11 AM, Richard Biener <rguenther@suse.de>
>> wrote:
>>>=20
>>>=20
>>>=20
>>> On Thu, 14 Jan 2021, Qing Zhao wrote:
>>>=20
>>>> Hi,=20
>>>> More data on code size and compilation time with CPU2017:
>>>> ********Compilation time data:   the numbers are the slowdown
>> against the
>>>> default =E2=80=9Cno=E2=80=9D:
>>>> benchmarks  A/no D/no
>>>>=20
>>>> 500.perlbench_r 5.19% 1.95%
>>>> 502.gcc_r 0.46% -0.23%
>>>> 505.mcf_r 0.00% 0.00%
>>>> 520.omnetpp_r 0.85% 0.00%
>>>> 523.xalancbmk_r 0.79% -0.40%
>>>> 525.x264_r -4.48% 0.00%
>>>> 531.deepsjeng_r 16.67% 16.67%
>>>> 541.leela_r  0.00%  0.00%
>>>> 557.xz_r 0.00%  0.00%
>>>>=20
>>>> 507.cactuBSSN_r 1.16% 0.58%
>>>> 508.namd_r 9.62% 8.65%
>>>> 510.parest_r 0.48% 1.19%
>>>> 511.povray_r 3.70% 3.70%
>>>> 519.lbm_r 0.00% 0.00%
>>>> 521.wrf_r 0.05% 0.02%
>>>> 526.blender_r 0.33% 1.32%
>>>> 527.cam4_r -0.93% -0.93%
>>>> 538.imagick_r 1.32% 3.95%
>>>> 544.nab_r  0.00% 0.00%
>>>> =46rom the above data, looks like that the compilation time impact
>>>> from implementation A and D are almost the same.
>>>> *******code size data: the numbers are the code size increase
>> against the
>>>> default =E2=80=9Cno=E2=80=9D:
>>>> benchmarks A/no D/no
>>>>=20
>>>> 500.perlbench_r 2.84% 0.34%
>>>> 502.gcc_r 2.59% 0.35%
>>>> 505.mcf_r 3.55% 0.39%
>>>> 520.omnetpp_r 0.54% 0.03%
>>>> 523.xalancbmk_r 0.36%  0.39%
>>>> 525.x264_r 1.39% 0.13%
>>>> 531.deepsjeng_r 2.15% -1.12%
>>>> 541.leela_r 0.50% -0.20%
>>>> 557.xz_r 0.31% 0.13%
>>>>=20
>>>> 507.cactuBSSN_r 5.00% -0.01%
>>>> 508.namd_r 3.64% -0.07%
>>>> 510.parest_r 1.12% 0.33%
>>>> 511.povray_r 4.18% 1.16%
>>>> 519.lbm_r 8.83% 6.44%
>>>> 521.wrf_r 0.08% 0.02%
>>>> 526.blender_r 1.63% 0.45%
>>>> 527.cam4_r  0.16% 0.06%
>>>> 538.imagick_r 3.18% -0.80%
>>>> 544.nab_r 5.76% -1.11%
>>>> Avg 2.52% 0.36%
>>>> =46rom the above data, the implementation D is always better than =
A,
>> it=E2=80=99s a
>>>> surprising to me, not sure what=E2=80=99s the reason for this.
>>>=20
>>> D probably inhibits most interesting loop transforms (check SPEC FP
>>> performance).
>>=20
>> The call to .DEFERRED_INIT is marked as ECF_CONST:
>>=20
>> /* A function to represent an artifical initialization to an
>> uninitialized
>>  automatic variable. The first argument is the variable itself, the
>>  second argument is the initialization type.  */
>> DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW,
>> NULL)
>>=20
>> So, I assume that such const call should minimize the impact to loop
>> optimizations. But yes, it will still inhibit some of the loop
>> transformations.
>>=20
>>> It will also most definitely disallow SRA which, when
>>> an aggregate is not completely elided, tends to grow code.
>>=20
>> Make sense to me.=20
>>=20
>> The run-time performance data for D and A are actually very similar =
as
>> I posted in the previous email (I listed it here for convenience)
>>=20
>> Run-time performance overhead with A and D:
>>=20
>> benchmarks		A / no	D /no
>>=20
>> 500.perlbench_r	1.25%	1.25%
>> 502.gcc_r		0.68%	1.80%
>> 505.mcf_r		0.68%	0.14%
>> 520.omnetpp_r	4.83%	4.68%
>> 523.xalancbmk_r	0.18%	1.96%
>> 525.x264_r		1.55%	2.07%
>> 531.deepsjeng_	11.57%	11.85%
>> 541.leela_r		0.64%	0.80%
>> 557.xz_			 -0.41%	-0.41%
>>=20
>> 507.cactuBSSN_r	0.44%	0.44%
>> 508.namd_r		0.34%	0.34%
>> 510.parest_r		0.17%	0.25%
>> 511.povray_r		56.57%	57.27%
>> 519.lbm_r		0.00%	0.00%
>> 521.wrf_r			 -0.28%	-0.37%
>> 526.blender_r		16.96%	17.71%
>> 527.cam4_r		0.70%	0.53%
>> 538.imagick_r		2.40%	2.40%
>> 544.nab_r		0.00%	-0.65%
>>=20
>> avg				5.17%	5.37%
>>=20
>> Especially for the SPEC FP benchmarks, I didn=E2=80=99t see too much
>> performance difference between A and D.=20
>> I guess that the RTL optimizations might be enough to get rid of most
>> of the overhead introduced by the additional initialization.=20
>>=20
>>>=20
>>>> ********stack usage data, I added -fstack-usage to the compilation
>> line when
>>>> compiling CPU2017 benchmarks. And all the *.su files were generated
>> for each
>>>> of the modules.
>>>> Since there a lot of such files, and the stack size information are
>> embedded
>>>> in each of the files.  I just picked up one benchmark 511.povray to
>>>> check. Which is the one that=20
>>>> has the most runtime overhead when adding initialization (both A =
and
>> D).=20
>>>> I identified all the *.su files that are different between A and D
>> and do a
>>>> diff on those *.su files, and looks like that the stack size is =
much
>> higher
>>>> with D than that with A, for example:
>>>> $ diff build_base_auto_init.D.0000/bbox.su
>>>> build_base_auto_init.A.0000/bbox.su5c5
>>>> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>>>> pov::BBOX_TREE**&, long int*, long int, long int) 160 static
>>>> ---
>>>>> bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**,
>>>> pov::BBOX_TREE**&, long int*, long int, long int) 96 static
>>>> $ diff build_base_auto_init.D.0000/image.su
>>>> build_base_auto_init.A.0000/image.su
>>>> 9c9
>>>> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*,
>> double*) 624
>>>> static
>>>> ---
>>>>> image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*,
>> double*) 272
>>>> static
>>>> =E2=80=A6.
>>>> Looks like that implementation D has more stack size impact than A.=20=

>>>> Do you have any insight on what the reason for this?
>>>=20
>>> D will keep all initialized aggregates as aggregates and live which
>>> means stack will be allocated for it.  With A the usual =
optimizations
>>> to reduce stack usage can be applied.
>>=20
>> I checked the routine =E2=80=9Cpoverties::bump_map=E2=80=9D in =
511.povray_r since it
>> has a lot stack increase=20
>> due to implementation D, by examine the IR immediate before RTL
>> expansion phase. =20
>> (image.cpp.244t.optimized), I found that we have the following
>> additional statements for the array elements:
>>=20
>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, =
double
>> * normal)
>> {
>> =E2=80=A6
>> double p3[3];
>> double p2[3];
>> double p1[3];
>> float colour3[5];
>> float colour2[5];
>> float colour1[5];
>> =E2=80=A6
>>  # DEBUG BEGIN_STMT
>> colour1 =3D .DEFERRED_INIT (colour1, 2);
>> colour2 =3D .DEFERRED_INIT (colour2, 2);
>> colour3 =3D .DEFERRED_INIT (colour3, 2);
>> # DEBUG BEGIN_STMT
>> MEM <double> [(double[3] *)&p1] =3D p1$0_144(D);
>> MEM <double> [(double[3] *)&p1 + 8B] =3D p1$1_135(D);
>> MEM <double> [(double[3] *)&p1 + 16B] =3D p1$2_138(D);
>> p1 =3D .DEFERRED_INIT (p1, 2);
>> # DEBUG D#12 =3D> MEM <double> [(double[3] *)&p1]
>> # DEBUG p1$0 =3D> D#12
>> # DEBUG D#11 =3D> MEM <double> [(double[3] *)&p1 + 8B]
>> # DEBUG p1$1 =3D> D#11
>> # DEBUG D#10 =3D> MEM <double> [(double[3] *)&p1 + 16B]
>> # DEBUG p1$2 =3D> D#10
>> MEM <double> [(double[3] *)&p2] =3D p2$0_109(D);
>> MEM <double> [(double[3] *)&p2 + 8B] =3D p2$1_111(D);
>> MEM <double> [(double[3] *)&p2 + 16B] =3D p2$2_254(D);
>> p2 =3D .DEFERRED_INIT (p2, 2);
>> # DEBUG D#9 =3D> MEM <double> [(double[3] *)&p2]
>> # DEBUG p2$0 =3D> D#9
>> # DEBUG D#8 =3D> MEM <double> [(double[3] *)&p2 + 8B]
>> # DEBUG p2$1 =3D> D#8
>> # DEBUG D#7 =3D> MEM <double> [(double[3] *)&p2 + 16B]
>> # DEBUG p2$2 =3D> D#7
>> MEM <double> [(double[3] *)&p3] =3D p3$0_256(D);
>> MEM <double> [(double[3] *)&p3 + 8B] =3D p3$1_258(D);
>> MEM <double> [(double[3] *)&p3 + 16B] =3D p3$2_260(D);
>> p3 =3D .DEFERRED_INIT (p3, 2);
>> =E2=80=A6.
>> }
>>=20
>> I guess that the above =E2=80=9CMEM <double>=E2=80=A6.. =3D =E2=80=A6=E2=
=80=9D are the ones that make the
>> differences. Which phase introduced them?
>=20
> Looks like SRA. But you can just dump all and grep for the first =
occurrence.=20

Yes, looks like that SRA is the one:

image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] =3D p1$0_195(D);
image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] =3D =
p1$1_182(D);
image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] =3D =
p1$2_185(D);

Qing
>=20
>=20
>>>=20
>>>> Let me know if you have any comments and suggestions.
>>>=20
>>> First of all I would check whether the prototype implementations
>>> work as expected.
>> I have done such check with small testing cases already, checking the
>> IR generated with the implementation A or D, mainly
>> Focus on *.c.006t.gimple.  and *.c.*t.expand, all worked as expected.=20=

>>=20
>> For the CPU2017, for example as the above, I also checked the IR for
>> both A and D, looks like all worked as expected.
>>=20
>> Thanks.=20
>>=20
>> Qing
>>>=20
>>> Richard.
>>>=20
>>>=20
>>>> thanks.
>>>> Qing
>>>>     On Jan 13, 2021, at 1:39 AM, Richard Biener <rguenther@suse.de>
>>>>     wrote:
>>>>=20
>>>>     On Tue, 12 Jan 2021, Qing Zhao wrote:
>>>>=20
>>>>           Hi,=20
>>>>=20
>>>>           Just check in to see whether you have any comments
>>>>           and suggestions on this:
>>>>=20
>>>>           FYI, I have been continue with Approach D
>>>>           implementation since last week:
>>>>=20
>>>>           D. Adding  calls to .DEFFERED_INIT during
>>>>           gimplification, expand the .DEFFERED_INIT during
>>>>           expand to
>>>>           real initialization. Adjusting uninitialized pass
>>>>           with the new refs with =E2=80=9C.DEFFERED_INIT=E2=80=9D.
>>>>=20
>>>>           For the remaining work of Approach D:
>>>>=20
>>>>           ** complete the implementation of
>>>>           -ftrivial-auto-var-init=3Dpattern;
>>>>           ** complete the implementation of uninitialized
>>>>           warnings maintenance work for D.=20
>>>>=20
>>>>           I have completed the uninitialized warnings
>>>>           maintenance work for D.
>>>>           And finished partial of the
>>>>           -ftrivial-auto-var-init=3Dpattern implementation.=20
>>>>=20
>>>>           The following are remaining work of Approach D:
>>>>=20
>>>>             ** -ftrivial-auto-var-init=3Dpattern for VLA;
>>>>             **add a new attribute for variable:
>>>>           __attribute((uninitialized)
>>>>           the marked variable is uninitialized intentionaly
>>>>           for performance purpose.
>>>>             ** adding complete testing cases;
>>>>=20
>>>>           Please let me know if you have any objection on my
>>>>           current decision on implementing approach D.=20
>>>>=20
>>>>     Did you do any analysis on how stack usage and code size are
>>>>     changed=20
>>>>     with approach D?  How does compile-time behave (we could gobble
>>>>     up
>>>>     lots of .DEFERRED_INIT calls I guess)?
>>>>=20
>>>>     Richard.
>>>>=20
>>>>           Thanks a lot for your help.
>>>>=20
>>>>           Qing
>>>>=20
>>>>                 On Jan 5, 2021, at 1:05 PM, Qing Zhao
>>>>                 via Gcc-patches
>>>>                 <gcc-patches@gcc.gnu.org> wrote:
>>>>=20
>>>>                 Hi,
>>>>=20
>>>>                 This is an update for our previous
>>>>                 discussion.=20
>>>>=20
>>>>                 1. I implemented the following two
>>>>                 different implementations in the latest
>>>>                 upstream gcc:
>>>>=20
>>>>                 A. Adding real initialization during
>>>>                 gimplification, not maintain the
>>>>                 uninitialized warnings.
>>>>=20
>>>>                 D. Adding  calls to .DEFFERED_INIT
>>>>                 during gimplification, expand the
>>>>                 .DEFFERED_INIT during expand to
>>>>                 real initialization. Adjusting
>>>>                 uninitialized pass with the new refs
>>>>                 with =E2=80=9C.DEFFERED_INIT=E2=80=9D.
>>>>=20
>>>>                 Note, in this initial implementation,
>>>>                 ** I ONLY implement
>>>>                 -ftrivial-auto-var-init=3Dzero, the
>>>>                 implementation of
>>>>                 -ftrivial-auto-var-init=3Dpattern=20
>>>>                    is not done yet.  Therefore, the
>>>>                 performance data is only about
>>>>                 -ftrivial-auto-var-init=3Dzero.=20
>>>>=20
>>>>                 ** I added an temporary  option
>>>>                 -fauto-var-init-approach=3DA|B|C|D  to
>>>>                 choose implementation A or D for=20
>>>>                    runtime performance study.
>>>>                 ** I didn=E2=80=99t finish the uninitialized
>>>>                 warnings maintenance work for D. (That
>>>>                 might take more time than I expected).=20
>>>>=20
>>>>                 2. I collected runtime data for CPU2017
>>>>                 on a x86 machine with this new gcc for
>>>>                 the following 3 cases:
>>>>=20
>>>>                 no: default. (-g -O2 -march=3Dnative )
>>>>                 A:  default +
>>>>                  -ftrivial-auto-var-init=3Dzero
>>>>                 -fauto-var-init-approach=3DA=20
>>>>                 D:  default +
>>>>                  -ftrivial-auto-var-init=3Dzero
>>>>                 -fauto-var-init-approach=3DD=20
>>>>=20
>>>>                 And then compute the slowdown data for
>>>>                 both A and D as following:
>>>>=20
>>>>                 benchmarks A / no D /no
>>>>=20
>>>>                 500.perlbench_r 1.25% 1.25%
>>>>                 502.gcc_r 0.68% 1.80%
>>>>                 505.mcf_r 0.68% 0.14%
>>>>                 520.omnetpp_r 4.83% 4.68%
>>>>                 523.xalancbmk_r 0.18% 1.96%
>>>>                 525.x264_r 1.55% 2.07%
>>>>                 531.deepsjeng_ 11.57% 11.85%
>>>>                 541.leela_r 0.64% 0.80%
>>>>                 557.xz_  -0.41% -0.41%
>>>>=20
>>>>                 507.cactuBSSN_r 0.44% 0.44%
>>>>                 508.namd_r 0.34% 0.34%
>>>>                 510.parest_r 0.17% 0.25%
>>>>                 511.povray_r 56.57% 57.27%
>>>>                 519.lbm_r 0.00% 0.00%
>>>>                 521.wrf_r  -0.28% -0.37%
>>>>                 526.blender_r 16.96% 17.71%
>>>>                 527.cam4_r 0.70% 0.53%
>>>>                 538.imagick_r 2.40% 2.40%
>>>>                 544.nab_r 0.00% -0.65%
>>>>=20
>>>>                 avg 5.17% 5.37%
>>>>=20
>>>>                 =46rom the above data, we can see that in
>>>>                 general, the runtime performance
>>>>                 slowdown for=20
>>>>                 implementation A and D are similar for
>>>>                 individual benchmarks.
>>>>=20
>>>>                 There are several benchmarks that have
>>>>                 significant slowdown with the new added
>>>>                 initialization for both
>>>>                 A and D, for example, 511.povray_r,
>>>>                 526.blender_, and 531.deepsjeng_r, I
>>>>                 will try to study a little bit
>>>>                 more on what kind of new initializations
>>>>                 introduced such slowdown.=20
>>>>=20
>>>>                 =46rom the current study so far, I think
>>>>                 that approach D should be good enough
>>>>                 for our final implementation.=20
>>>>                 So, I will try to finish approach D with
>>>>                 the following remaining work
>>>>=20
>>>>                     ** complete the implementation of
>>>>                 -ftrivial-auto-var-init=3Dpattern;
>>>>                     ** complete the implementation of
>>>>                 uninitialized warnings maintenance work
>>>>                 for D.=20
>>>>=20
>>>>                 Let me know if you have any comments and
>>>>                 suggestions on my current and future
>>>>                 work.
>>>>=20
>>>>                 Thanks a lot for your help.
>>>>=20
>>>>                 Qing
>>>>=20
>>>>                       On Dec 9, 2020, at 10:18 AM,
>>>>                       Qing Zhao via Gcc-patches
>>>>                       <gcc-patches@gcc.gnu.org>
>>>>                       wrote:
>>>>=20
>>>>                       The following are the
>>>>                       approaches I will implement
>>>>                       and compare:
>>>>=20
>>>>                       Our final goal is to keep
>>>>                       the uninitialized warning
>>>>                       and minimize the run-time
>>>>                       performance cost.
>>>>=20
>>>>                       A. Adding real
>>>>                       initialization during
>>>>                       gimplification, not maintain
>>>>                       the uninitialized warnings.
>>>>                       B. Adding real
>>>>                       initialization during
>>>>                       gimplification, marking them
>>>>                       with =E2=80=9Cartificial_init=E2=80=9D.=20
>>>>                         Adjusting uninitialized
>>>>                       pass, maintaining the
>>>>                       annotation, making sure the
>>>>                       real init not
>>>>                         Deleted from the fake
>>>>                       init.=20
>>>>                       C.  Marking the DECL for an
>>>>                       uninitialized auto variable
>>>>                       as =E2=80=9Cno_explicit_init=E2=80=9D during
>>>>                       gimplification,
>>>>                          maintain this
>>>>                       =E2=80=9Cno_explicit_init=E2=80=9D bit till
>>>>                       after
>>>>                       pass_late_warn_uninitialized,
>>>>                       or till pass_expand,=20
>>>>                          add real initialization
>>>>                       for all DECLs that are
>>>>                       marked with
>>>>                       =E2=80=9Cno_explicit_init=E2=80=9D.
>>>>                       D. Adding .DEFFERED_INIT
>>>>                       during gimplification,
>>>>                       expand the .DEFFERED_INIT
>>>>                       during expand to
>>>>                         real initialization.
>>>>                       Adjusting uninitialized pass
>>>>                       with the new refs with
>>>>                       =E2=80=9C.DEFFERED_INIT=E2=80=9D.
>>>>=20
>>>>                       In the above, approach A
>>>>                       will be the one that have
>>>>                       the minimum run-time cost,
>>>>                       will be the base for the
>>>>                       performance
>>>>                       comparison.=20
>>>>=20
>>>>                       I will implement approach D
>>>>                       then, this one is expected
>>>>                       to have the most run-time
>>>>                       overhead among the above
>>>>                       list, but
>>>>                       Implementation should be the
>>>>                       cleanest among B, C, D.
>>>>                       Let=E2=80=99s see how much more
>>>>                       performance overhead this
>>>>                       approach
>>>>                       will be. If the data is
>>>>                       good, maybe we can avoid the
>>>>                       effort to implement B, and
>>>>                       C.=20
>>>>=20
>>>>                       If the performance of D is
>>>>                       not good, I will implement B
>>>>                       or C at that time.
>>>>=20
>>>>                       Let me know if you have any
>>>>                       comment or suggestions.
>>>>=20
>>>>                       Thanks.
>>>>=20
>>>>                       Qing
>>>>=20
>>>>     --=20
>>>>     Richard Biener <rguenther@suse.de>
>>>>     SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
>>>>     Nuernberg,
>>>>     Germany; GF: Felix Imend=C3=B6rffer; HRB 36809 (AG Nuernberg)