From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 1A5A2384C003 for ; Wed, 13 Jan 2021 15:10:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1A5A2384C003 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rguenther@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E3665AB92; Wed, 13 Jan 2021 15:10:01 +0000 (UTC) Date: Wed, 13 Jan 2021 16:10:01 +0100 (CET) From: Richard Biener To: Qing Zhao cc: Richard Sandiford , Richard Biener via Gcc-patches Subject: Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init In-Reply-To: <2C0218A8-0D9F-4C49-8293-EF0D19E00288@ORACLE.COM> Message-ID: References: <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM> <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM> <9585CBB2-0082-4B9A-AC75-250F54F0797C@ORACLE.COM> <51911859-45D5-4566-B588-F828B9D7313B@ORACLE.COM> <9127AAB9-92C8-4A1B-BAD5-2F5F8762DCF9@ORACLE.COM> <5A0F7219-DAFA-4EAA-B845-0E236A108738@ORACLE.COM> <2C0218A8-0D9F-4C49-8293-EF0D19E00288@ORACLE.COM> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2021 15:10:08 -0000 On Wed, 13 Jan 2021, Qing Zhao wrote: > > > > On Jan 13, 2021, at 1:39 AM, Richard Biener wrote: > > > > On Tue, 12 Jan 2021, Qing Zhao wrote: > > > >> Hi, > >> > >> Just check in to see whether you have any comments and suggestions on this: > >> > >> FYI, I have been continue with Approach D implementation since last week: > >> > >> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >> > >> For the remaining work of Approach D: > >> > >> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >> ** complete the implementation of uninitialized warnings maintenance work for D. > >> > >> I have completed the uninitialized warnings maintenance work for D. > >> And finished partial of the -ftrivial-auto-var-init=pattern implementation. > >> > >> The following are remaining work of Approach D: > >> > >> ** -ftrivial-auto-var-init=pattern for VLA; > >> **add a new attribute for variable: > >> __attribute((uninitialized) > >> the marked variable is uninitialized intentionaly for performance purpose. > >> ** adding complete testing cases; > >> > >> > >> Please let me know if you have any objection on my current decision on implementing approach D. > > > > Did you do any analysis on how stack usage and code size are changed > > with approach D? > > I did the code size change comparison (I will provide the data in another email). And with this data, D works better than A in general. (This is surprise to me actually). > > But not the stack usage. Not sure how to collect the stack usage data, > do you have any suggestion on this? There is -fstack-usage you could use, then of course watching the stack segment at runtime. I'm mostly concerned about stack-limited "processes" such as the linux kernel which I think is a primary target of your work. Richard. > > > How does compile-time behave (we could gobble up > > lots of .DEFERRED_INIT calls I guess)? > I can collect this data too and report it later. > > Thanks. > > Qing > > > > Richard. > > > >> Thanks a lot for your help. > >> > >> Qing > >> > >> > >>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches wrote: > >>> > >>> Hi, > >>> > >>> This is an update for our previous discussion. > >>> > >>> 1. I implemented the following two different implementations in the latest upstream gcc: > >>> > >>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > >>> > >>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >>> > >>> Note, in this initial implementation, > >>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern > >>> is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. > >>> > >>> ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for > >>> runtime performance study. > >>> ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). > >>> > >>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: > >>> > >>> no: default. (-g -O2 -march=native ) > >>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > >>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > >>> > >>> And then compute the slowdown data for both A and D as following: > >>> > >>> benchmarks A / no D /no > >>> > >>> 500.perlbench_r 1.25% 1.25% > >>> 502.gcc_r 0.68% 1.80% > >>> 505.mcf_r 0.68% 0.14% > >>> 520.omnetpp_r 4.83% 4.68% > >>> 523.xalancbmk_r 0.18% 1.96% > >>> 525.x264_r 1.55% 2.07% > >>> 531.deepsjeng_ 11.57% 11.85% > >>> 541.leela_r 0.64% 0.80% > >>> 557.xz_ -0.41% -0.41% > >>> > >>> 507.cactuBSSN_r 0.44% 0.44% > >>> 508.namd_r 0.34% 0.34% > >>> 510.parest_r 0.17% 0.25% > >>> 511.povray_r 56.57% 57.27% > >>> 519.lbm_r 0.00% 0.00% > >>> 521.wrf_r -0.28% -0.37% > >>> 526.blender_r 16.96% 17.71% > >>> 527.cam4_r 0.70% 0.53% > >>> 538.imagick_r 2.40% 2.40% > >>> 544.nab_r 0.00% -0.65% > >>> > >>> avg 5.17% 5.37% > >>> > >>> From the above data, we can see that in general, the runtime performance slowdown for > >>> implementation A and D are similar for individual benchmarks. > >>> > >>> There are several benchmarks that have significant slowdown with the new added initialization for both > >>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit > >>> more on what kind of new initializations introduced such slowdown. > >>> > >>> From the current study so far, I think that approach D should be good enough for our final implementation. > >>> So, I will try to finish approach D with the following remaining work > >>> > >>> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >>> ** complete the implementation of uninitialized warnings maintenance work for D. > >>> > >>> > >>> Let me know if you have any comments and suggestions on my current and future work. > >>> > >>> Thanks a lot for your help. > >>> > >>> Qing > >>> > >>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches wrote: > >>>> > >>>> The following are the approaches I will implement and compare: > >>>> > >>>> Our final goal is to keep the uninitialized warning and minimize the run-time performance cost. > >>>> > >>>> A. Adding real initialization during gimplification, not maintain the uninitialized warnings. > >>>> B. Adding real initialization during gimplification, marking them with “artificial_init”. > >>>> Adjusting uninitialized pass, maintaining the annotation, making sure the real init not > >>>> Deleted from the fake init. > >>>> C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification, > >>>> maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, > >>>> add real initialization for all DECLs that are marked with “no_explicit_init”. > >>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to > >>>> real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. > >>>> > >>>> > >>>> In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance > >>>> comparison. > >>>> > >>>> I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but > >>>> Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach > >>>> will be. If the data is good, maybe we can avoid the effort to implement B, and C. > >>>> > >>>> If the performance of D is not good, I will implement B or C at that time. > >>>> > >>>> Let me know if you have any comment or suggestions. > >>>> > >>>> Thanks. > >>>> > >>>> Qing > >>> > >> > >> > > > > -- > > Richard Biener > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)