From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id C2D903858407 for ; Tue, 24 Aug 2021 11:52:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C2D903858407 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id ADEED220E1 for ; Tue, 24 Aug 2021 11:52:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1629805977; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: resent-to:resent-from:resent-message-id; bh=EIM5CQxfsL8K+j01IVVGQ8tWbxz3Hz2K84jIH6EpWCg=; b=U3gTN6mYQ0nXYxcuePoWfJJDJAYZf8GXo9106AnOX9FbiyspbIhFKdkDS1X7xEZejK+sx3 jlWOYxb2FO7kPvvsn+sW9vIR356OXT+sNx4S2Z24wzsq7mHYnCxAPxZ3CrKS2zHJmmBPZf aop++m9UBDKpKAybVsZx1ko7MBfjXrs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1629805977; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: resent-to:resent-from:resent-message-id; bh=EIM5CQxfsL8K+j01IVVGQ8tWbxz3Hz2K84jIH6EpWCg=; b=LTUONvTirafXClP1alMn4cv39OhNcp+bar90Zwv5Qj1tLyPFoyGHZZaxFvXJEA1eMsJrtT 7aDmyf4ASe/5CeCw== Received: from suse.cz (virgil.suse.cz [10.100.13.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 9BE53A3BC0 for ; Tue, 24 Aug 2021 11:52:57 +0000 (UTC) Resent-From: Martin Jambor Resent-Date: Tue, 24 Aug 2021 13:52:57 +0200 Resent-Message-ID: Resent-To: GCC Patches Message-Id: From: Martin Jambor Date: Tue, 24 Aug 2021 13:48:39 +0200 Subject: [PATCH 0/4] IPA-CP profile feedback handling fixes To: GCC Patches Cc: Jan Hubicka , Xionghu Luo X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Aug 2021 11:53:09 -0000 Hi, this patch set addresses a number of shortcomings of IPA-CP when it has profile feedback data at its disposal. While at this point it is mostly RFC material because I expect Honza will correct many of the places where I use a wrong method of profile_count and should be using some slightly different one, I do want to turn it into material I can push to master rather quickly. Most of the changes were motivated by SPEC 2017 exchange2 benchmark, which exposes the problems nicely, is now 22% slower with profile feedback, and this patch fixes that. Overall, the patch set does not have any effect on SPEC 2017 FPrate. SPEC 2017 INTrate results, as quickly gathered on my znver2 desktop overnight (1 run only), are: PGO only: | Benchmark | Trunk | Rate | Patch | % | Rate | |-----------------+-------+------+-------+--------+------| | 500.perlbench_r | 236 | 6.74 | 239 | +1.27 | 6.67 | | 502.gcc_r | 160 | 8.85 | 159 | -0.62 | 8.89 | | 505.mcf_r | 227 | 7.11 | 228 | +0.44 | 7.08 | | 520.omnetpp_r | 314 | 4.18 | 311 | -0.96 | 4.21 | | 523.xalancbmk_r | 195 | 5.41 | 199 | +2.05 | 5.32 | | 525.x264_r | 129 | 13.6 | 131 | +1.55 | 13.4 | | 531.deepsjeng_r | 230 | 4.98 | 230 | +0.00 | 4.98 | | 541.leela_r | 353 | 4.70 | 353 | +0.00 | 4.69 | | 548.exchange2_r | 249 | 10.5 | 189 | -24.10 | 13.8 | | 557.xz_r | 246 | 4.39 | 248 | +0.81 | 4.36 | |-----------------+-------+------+-------+--------+------| | Geomean | | 6.53 | | | 6.68 | I have re-run 523.xalancbmk_r and the regression seems to be noise. PGO+LTO: | Benchmark | Trunk | Rate | Patch | % | Rate | |-----------------+-------+------+-------+--------+-------| | 500.perlbench_r | 231 | 6.88 | 230 | -0.43 | 6.93 | | 502.gcc_r | 149 | 9.51 | 149 | +0.00 | 9.53 | | 505.mcf_r | 208 | 7.76 | 202 | -2.88 | 7.98 | | 520.omnetpp_r | 282 | 4.64 | 282 | +0.00 | 4.65 | | 523.xalancbmk_r | 185 | 5.70 | 188 | +1.62 | 5.63 | | 525.x264_r | 133 | 13.1 | 134 | +0.75 | 13.00 | | 531.deepsjeng_r | 190 | 6.04 | 185 | -2.63 | 6.20 | | 541.leela_r | 298 | 5.56 | 298 | +0.00 | 5.57 | | 548.exchange2_r | 247 | 10.6 | 193 | -21.86 | 13.60 | | 557.xz_r | 250 | 4.32 | 251 | +0.40 | 4.31 | |-----------------+-------+------+-------+--------+-------| | Geomean | | 6.97 | | | 7.18 | I have re-run 531.deepsjeng_r and 505.mcf_r and while the former improvement seems to be noise, the latter is consistent and even explainable by more cloning of spec_qsort, which is the result of the last patch and saner updates of counts of call graph edges from these clones. In both cases the exchange2 improvement is achieved by: 1) The second patch which makes sure that IPA-CP creates a clone for the first value, even though the non-recursive edge bringing the value is quite cold, because it enables specializing for much hotter contexts, and 2) the third patch which changes how values resulting from arithmetic jump functions on self-recursive edges are evaluated and then modifies the profile count of the whole resulting call graph part. The final patch is not necessary to address the exchange2 regression. I have bootstrapped and LTO-profile-bootstrapped and tested the whole patch series on x86_64-linux without any issues. As written above, I'll be happy to address any comments/concerns so that something like this can be pushed to master soon. Thanks, Martin Martin Jambor (4): cgraph: Do not warn about caller count mismatches of removed functions ipa-cp: Propagation boost for recursion generated values ipa-cp: Fix updating of profile counts and self-gen value evaluation ipa-cp: Select saner profile count to base heuristics on gcc/cgraph.c | 4 +- gcc/ipa-cp.c | 786 ++++++++++++++++++++++++++++++++++++++----------- gcc/params.opt | 8 + 3 files changed, 621 insertions(+), 177 deletions(-) -- 2.32.0