From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by sourceware.org (Postfix) with ESMTPS id A6C63386183F for ; Wed, 13 Jan 2021 15:36:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A6C63386183F Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 10DFU7eZ009488; Wed, 13 Jan 2021 15:36:04 GMT Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 360kcyv2rb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Jan 2021 15:36:04 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 10DFTtKf189764; Wed, 13 Jan 2021 15:36:04 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 360ke8jr0g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Jan 2021 15:36:03 +0000 Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 10DFa17m012992; Wed, 13 Jan 2021 15:36:01 GMT Received: from dhcp-10-39-222-121.vpn.oracle.com (/10.39.222.121) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 13 Jan 2021 07:36:01 -0800 From: Qing Zhao Message-Id: <8E2C1F74-3BC2-4377-BEEF-EB9BCA474F3F@ORACLE.COM> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init Date: Wed, 13 Jan 2021 09:35:59 -0600 In-Reply-To: Cc: Richard Sandiford , Richard Biener via Gcc-patches To: Richard Biener References: <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM> <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM> <9585CBB2-0082-4B9A-AC75-250F54F0797C@ORACLE.COM> <51911859-45D5-4566-B588-F828B9D7313B@ORACLE.COM> <9127AAB9-92C8-4A1B-BAD5-2F5F8762DCF9@ORACLE.COM> <5A0F7219-DAFA-4EAA-B845-0E236A108738@ORACLE.COM> <2C0218A8-0D9F-4C49-8293-EF0D19E00288@ORACLE.COM> X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9862 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 bulkscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101130096 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9862 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 impostorscore=0 bulkscore=0 adultscore=0 suspectscore=0 malwarescore=0 lowpriorityscore=0 clxscore=1015 mlxlogscore=999 mlxscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101130096 X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2021 15:36:19 -0000 > On Jan 13, 2021, at 9:10 AM, Richard Biener wrote: >=20 > On Wed, 13 Jan 2021, Qing Zhao wrote: >=20 >>=20 >>=20 >>> On Jan 13, 2021, at 1:39 AM, Richard Biener = wrote: >>>=20 >>> On Tue, 12 Jan 2021, Qing Zhao wrote: >>>=20 >>>> Hi,=20 >>>>=20 >>>> Just check in to see whether you have any comments and suggestions = on this: >>>>=20 >>>> FYI, I have been continue with Approach D implementation since last = week: >>>>=20 >>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand = the .DEFFERED_INIT during expand to >>>> real initialization. Adjusting uninitialized pass with the new refs = with =E2=80=9C.DEFFERED_INIT=E2=80=9D. >>>>=20 >>>> For the remaining work of Approach D: >>>>=20 >>>> ** complete the implementation of -ftrivial-auto-var-init=3Dpattern; >>>> ** complete the implementation of uninitialized warnings = maintenance work for D.=20 >>>>=20 >>>> I have completed the uninitialized warnings maintenance work for D. >>>> And finished partial of the -ftrivial-auto-var-init=3Dpattern = implementation.=20 >>>>=20 >>>> The following are remaining work of Approach D: >>>>=20 >>>> ** -ftrivial-auto-var-init=3Dpattern for VLA; >>>> **add a new attribute for variable: >>>> __attribute((uninitialized) >>>> the marked variable is uninitialized intentionaly for performance = purpose. >>>> ** adding complete testing cases; >>>>=20 >>>>=20 >>>> Please let me know if you have any objection on my current decision = on implementing approach D.=20 >>>=20 >>> Did you do any analysis on how stack usage and code size are changed=20= >>> with approach D? >>=20 >> I did the code size change comparison (I will provide the data in = another email). And with this data, D works better than A in general. = (This is surprise to me actually). >>=20 >> But not the stack usage. Not sure how to collect the stack usage = data,=20 >> do you have any suggestion on this? >=20 > There is -fstack-usage you could use, then of course watching > the stack segment at runtime. I can do this for CPU2017 to collect the stack usage data and report = back. > I'm mostly concerned about > stack-limited "processes" such as the linux kernel which I think > is a primary target of your work. I don=E2=80=99t have any experience on building linux kernel.=20 Do we have to collect data for linux kernel at this time? Is CPU2017 = data not enough? Qing >=20 > Richard. >=20 >>=20 >>> How does compile-time behave (we could gobble up >>> lots of .DEFERRED_INIT calls I guess)? >> I can collect this data too and report it later. >>=20 >> Thanks. >>=20 >> Qing >>>=20 >>> Richard. >>>=20 >>>> Thanks a lot for your help. >>>>=20 >>>> Qing >>>>=20 >>>>=20 >>>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches = wrote: >>>>>=20 >>>>> Hi, >>>>>=20 >>>>> This is an update for our previous discussion.=20 >>>>>=20 >>>>> 1. I implemented the following two different implementations in = the latest upstream gcc: >>>>>=20 >>>>> A. Adding real initialization during gimplification, not maintain = the uninitialized warnings. >>>>>=20 >>>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand = the .DEFFERED_INIT during expand to >>>>> real initialization. Adjusting uninitialized pass with the new = refs with =E2=80=9C.DEFFERED_INIT=E2=80=9D. >>>>>=20 >>>>> Note, in this initial implementation, >>>>> ** I ONLY implement -ftrivial-auto-var-init=3Dzero, the = implementation of -ftrivial-auto-var-init=3Dpattern=20 >>>>> is not done yet. Therefore, the performance data is only = about -ftrivial-auto-var-init=3Dzero.=20 >>>>>=20 >>>>> ** I added an temporary option -fauto-var-init-approach=3DA|B|C|D= to choose implementation A or D for=20 >>>>> runtime performance study. >>>>> ** I didn=E2=80=99t finish the uninitialized warnings = maintenance work for D. (That might take more time than I expected).=20 >>>>>=20 >>>>> 2. I collected runtime data for CPU2017 on a x86 machine with this = new gcc for the following 3 cases: >>>>>=20 >>>>> no: default. (-g -O2 -march=3Dnative ) >>>>> A: default + -ftrivial-auto-var-init=3Dzero = -fauto-var-init-approach=3DA=20 >>>>> D: default + -ftrivial-auto-var-init=3Dzero = -fauto-var-init-approach=3DD=20 >>>>>=20 >>>>> And then compute the slowdown data for both A and D as following: >>>>>=20 >>>>> benchmarks A / no D /no >>>>>=20 >>>>> 500.perlbench_r 1.25% 1.25% >>>>> 502.gcc_r 0.68% 1.80% >>>>> 505.mcf_r 0.68% 0.14% >>>>> 520.omnetpp_r 4.83% 4.68% >>>>> 523.xalancbmk_r 0.18% 1.96% >>>>> 525.x264_r 1.55% 2.07% >>>>> 531.deepsjeng_ 11.57% 11.85% >>>>> 541.leela_r 0.64% 0.80% >>>>> 557.xz_ -0.41% -0.41% >>>>>=20 >>>>> 507.cactuBSSN_r 0.44% 0.44% >>>>> 508.namd_r 0.34% 0.34% >>>>> 510.parest_r 0.17% 0.25% >>>>> 511.povray_r 56.57% 57.27% >>>>> 519.lbm_r 0.00% 0.00% >>>>> 521.wrf_r -0.28% -0.37% >>>>> 526.blender_r 16.96% 17.71% >>>>> 527.cam4_r 0.70% 0.53% >>>>> 538.imagick_r 2.40% 2.40% >>>>> 544.nab_r 0.00% -0.65% >>>>>=20 >>>>> avg 5.17% 5.37% >>>>>=20 >>>>> =46rom the above data, we can see that in general, the runtime = performance slowdown for=20 >>>>> implementation A and D are similar for individual benchmarks. >>>>>=20 >>>>> There are several benchmarks that have significant slowdown with = the new added initialization for both >>>>> A and D, for example, 511.povray_r, 526.blender_, and = 531.deepsjeng_r, I will try to study a little bit >>>>> more on what kind of new initializations introduced such slowdown.=20= >>>>>=20 >>>>> =46rom the current study so far, I think that approach D should be = good enough for our final implementation.=20 >>>>> So, I will try to finish approach D with the following remaining = work >>>>>=20 >>>>> ** complete the implementation of = -ftrivial-auto-var-init=3Dpattern; >>>>> ** complete the implementation of uninitialized warnings = maintenance work for D.=20 >>>>>=20 >>>>>=20 >>>>> Let me know if you have any comments and suggestions on my current = and future work. >>>>>=20 >>>>> Thanks a lot for your help. >>>>>=20 >>>>> Qing >>>>>=20 >>>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches = wrote: >>>>>>=20 >>>>>> The following are the approaches I will implement and compare: >>>>>>=20 >>>>>> Our final goal is to keep the uninitialized warning and minimize = the run-time performance cost. >>>>>>=20 >>>>>> A. Adding real initialization during gimplification, not maintain = the uninitialized warnings. >>>>>> B. Adding real initialization during gimplification, marking them = with =E2=80=9Cartificial_init=E2=80=9D.=20 >>>>>> Adjusting uninitialized pass, maintaining the annotation, making = sure the real init not >>>>>> Deleted from the fake init.=20 >>>>>> C. Marking the DECL for an uninitialized auto variable as = =E2=80=9Cno_explicit_init=E2=80=9D during gimplification, >>>>>> maintain this =E2=80=9Cno_explicit_init=E2=80=9D bit till after = pass_late_warn_uninitialized, or till pass_expand,=20 >>>>>> add real initialization for all DECLs that are marked with = =E2=80=9Cno_explicit_init=E2=80=9D. >>>>>> D. Adding .DEFFERED_INIT during gimplification, expand the = .DEFFERED_INIT during expand to >>>>>> real initialization. Adjusting uninitialized pass with the new = refs with =E2=80=9C.DEFFERED_INIT=E2=80=9D. >>>>>>=20 >>>>>>=20 >>>>>> In the above, approach A will be the one that have the minimum = run-time cost, will be the base for the performance >>>>>> comparison.=20 >>>>>>=20 >>>>>> I will implement approach D then, this one is expected to have = the most run-time overhead among the above list, but >>>>>> Implementation should be the cleanest among B, C, D. Let=E2=80=99s = see how much more performance overhead this approach >>>>>> will be. If the data is good, maybe we can avoid the effort to = implement B, and C.=20 >>>>>>=20 >>>>>> If the performance of D is not good, I will implement B or C at = that time. >>>>>>=20 >>>>>> Let me know if you have any comment or suggestions. >>>>>>=20 >>>>>> Thanks. >>>>>>=20 >>>>>> Qing >>>>>=20 >>>>=20 >>>>=20 >>>=20 >>> --=20 >>> Richard Biener = >> >>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 = Nuernberg, >>> Germany; GF: Felix Imend=C3=B6rffer; HRB 36809 (AG Nuernberg) >>=20 >>=20 >=20 > --=20 > Richard Biener > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 = Nuernberg, > Germany; GF: Felix Imend=C3=B6rffer; HRB 36809 (AG Nuernberg)