public inbox for gcc-regression@sourceware.org
help / color / mirror / Atom feed
* [TCWG CI] 459.GemsFDTD grew in size by 2% after gcc: Limit inlining functions called once
@ 2021-12-11  7:51 ci_notify
  0 siblings, 0 replies; only message in thread
From: ci_notify @ 2021-12-11  7:51 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-regression

After gcc commit f157c5362b4844f7676cae2aba81a4cf75bd68d5
Author: Jan Hubicka <jh@suse.cz>

    Limit inlining functions called once

the following benchmarks grew in size by more than 1%:
- 459.GemsFDTD grew in size by 2% from 245916 to 249636 bytes

Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection.  Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/build-f157c5362b4844f7676cae2aba81a4cf75bd68d5/save-temps/
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/build-243a980437b5e7fca56587bf86667005bdf343a7/save-temps/
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -Os -flto
- Hardware: APM Mustang 8x X-Gene1

This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO

First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/build-f157c5362b4844f7676cae2aba81a4cf75bd68d5/
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/build-243a980437b5e7fca56587bf86667005bdf343a7/
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/build-baseline/
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-gcc-f157c5362b4844f7676cae2aba81a4cf75bd68d5
cd investigate-gcc-f157c5362b4844f7676cae2aba81a4cf75bd68d5

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/manifests/build-baseline.sh --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/manifests/build-parameters.sh --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os_LTO/10/artifact/artifacts/test.sh --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/

cd gcc

# Reproduce first_bad build
git checkout --detach f157c5362b4844f7676cae2aba81a4cf75bd68d5
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 243a980437b5e7fca56587bf86667005bdf343a7
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit f157c5362b4844f7676cae2aba81a4cf75bd68d5
Author: Jan Hubicka <jh@suse.cz>
Date:   Thu Dec 9 21:02:17 2021 +0100

    Limit inlining functions called once
    
    as dicussed in PR ipa/103454 there are several benchmarks that regresses
    for -finline-functions-called once. Runtmes:
     - tramp3d with -Ofast. 31%
     - exchange2 with -Ofast 11-21%
     - roms O2 9%-10%
     - tonto 2.5-3.5% with LTO
    Build times:
     - specfp2006 41% (mostly wrf that builds 71% faster)
     - specint2006 1.5-3%
     - specfp2017 64% (again mostly wrf)
     - specint2017 2.5-3.5%
    
    This patch adds two params to tweak the behaviour:
     1) max-inline-functions-called-once-loop-depth limiting the loop depth
        (this is useful primarily for exchange where the inlined function is in
         loop depth 9)
     2) max-inline-functions-called-once-insns
        We already have large-function-insns/growth parameters, but these are
        limiting also inlining small functions, so reducing them will regress
        very large functions that are hot.
    
        Because inlining functions called once is meant just as a cleanup pass
        I think it makes sense to have separate limit for it.
    gcc/ChangeLog:
    
    2021-12-09  Jan Hubicka  <hubicka@ucw.cz>
    
            * doc/invoke.texi (max-inline-functions-called-once-loop-depth,
            max-inline-functions-called-once-insns): New parameters.
            * ipa-inline.c (check_callers): Handle
            param_inline_functions_called_once_loop_depth and
            param_inline_functions_called_once_insns.
            (edge_badness): Fix linebreaks.
            * params.opt (param=max-inline-functions-called-once-loop-depth,
            param=max-inline-functions-called-once-insn): New params.
---
 gcc/doc/invoke.texi |  8 ++++++++
 gcc/ipa-inline.c    | 47 ++++++++++++++++++++++++++++++-----------------
 gcc/params.opt      |  8 ++++++++
 3 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8a70adaeb28..9b4371b9213 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13605,6 +13605,14 @@ The maximum number of backtrack attempts the scheduler should make
 when modulo scheduling a loop.  Larger values can exponentially increase
 compilation time.
 
+@item max-inline-functions-called-once-loop-depth
+Maximal loop depth of a call considered by inline heuristics that tries to
+inline all functions called once.
+
+@item max-inline-functions-called-once-insns
+Maximal estimated size of functions produced while inlining functions called
+once.
+
 @item max-inline-insns-single
 Several parameters control the tree inliner used in GCC@.  This number sets the
 maximum number of instructions (counted in GCC's internal representation) in a
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 012b326b5e9..54cd085a84d 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1091,20 +1091,30 @@ static bool
 check_callers (struct cgraph_node *node, void *has_hot_call)
 {
   struct cgraph_edge *e;
-   for (e = node->callers; e; e = e->next_caller)
-     {
-       if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once)
-	   || !opt_for_fn (e->caller->decl, optimize))
-	 return true;
-       if (!can_inline_edge_p (e, true))
-         return true;
-       if (e->recursive_p ())
-	 return true;
-       if (!can_inline_edge_by_limits_p (e, true))
-         return true;
-       if (!(*(bool *)has_hot_call) && e->maybe_hot_p ())
-	 *(bool *)has_hot_call = true;
-     }
+  for (e = node->callers; e; e = e->next_caller)
+    {
+      if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once)
+	  || !opt_for_fn (e->caller->decl, optimize))
+	return true;
+      if (!can_inline_edge_p (e, true))
+	return true;
+      if (e->recursive_p ())
+	return true;
+      if (!can_inline_edge_by_limits_p (e, true))
+	return true;
+      /* Inlining large functions to large loop depth is often harmful because
+	 of register pressure it implies.  */
+      if ((int)ipa_call_summaries->get (e)->loop_depth
+	  > param_inline_functions_called_once_loop_depth)
+	return true;
+      /* Do not produce gigantic functions.  */
+      if (estimate_size_after_inlining (e->caller->inlined_to ?
+					e->caller->inlined_to : e->caller, e)
+	  > param_inline_functions_called_once_insns)
+	return true;
+      if (!(*(bool *)has_hot_call) && e->maybe_hot_p ())
+	*(bool *)has_hot_call = true;
+    }
   return false;
 }
 
@@ -1327,9 +1337,12 @@ edge_badness (struct cgraph_edge *edge, bool dump)
 		   " %i (compensated)\n",
 		   badness.to_double (),
 		   freq.to_double (),
-		   edge->count.ipa ().initialized_p () ? edge->count.ipa ().to_gcov_type () : -1,
-		   caller->count.ipa ().initialized_p () ? caller->count.ipa ().to_gcov_type () : -1,
-		   inlining_speedup (edge, freq, unspec_edge_time, edge_time).to_double (),
+		   edge->count.ipa ().initialized_p ()
+		   ? edge->count.ipa ().to_gcov_type () : -1,
+		   caller->count.ipa ().initialized_p ()
+		   ? caller->count.ipa ().to_gcov_type () : -1,
+		   inlining_speedup (edge, freq, unspec_edge_time,
+				     edge_time).to_double (),
 		   estimate_growth (callee),
 		   callee_info->growth, overall_growth);
 	}
diff --git a/gcc/params.opt b/gcc/params.opt
index e725c99e5e4..f1b5757461c 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -545,6 +545,14 @@ The maximum expansion factor when copying basic blocks.
 Common Joined UInteger Var(param_max_hoist_depth) Init(30) Param Optimization
 Maximum depth of search in the dominator tree for expressions to hoist.
 
+-param=max-inline-functions-called-once-loop-depth=
+Common Joined UInteger Var(param_inline_functions_called_once_loop_depth) Init(6) Optimization Param
+Maximum loop depth of a call which is considered for inlining functions called once
+
+-param=max-inline-functions-called-once-insns=
+Common Joined UInteger Var(param_inline_functions_called_once_insns) Init(4000) Optimization Param
+Maximum combinaed size of caller and callee wich is inlined if callee is called once.
+
 -param=max-inline-insns-auto=
 Common Joined UInteger Var(param_max_inline_insns_auto) Init(15) Optimization Param
 The maximum number of instructions when automatically inlining.
</cut>
>From hjl@sc.intel.com  Sat Dec 11 09:00:05 2021
Return-Path: <hjl@sc.intel.com>
X-Original-To: gcc-regression@gcc.gnu.org
Delivered-To: gcc-regression@gcc.gnu.org
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by sourceware.org (Postfix) with ESMTPS id 0E4F23858031
 for <gcc-regression@gcc.gnu.org>; Sat, 11 Dec 2021 09:00:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E4F23858031
X-IronPort-AV: E=McAfee;i="6200,9189,10194"; a="238464311"
X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="238464311"
Received: from fmsmga007.fm.intel.com ([10.253.24.52])
 by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 11 Dec 2021 01:00:02 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.88,197,1635231600"; d="scan'208";a="517610790"
Received: from scymds02.sc.intel.com ([10.82.73.244])
 by fmsmga007.fm.intel.com with ESMTP; 11 Dec 2021 00:59:57 -0800
Received: from gnu-34.sc.intel.com (gnu-34.sc.intel.com [172.25.70.212])
 by scymds02.sc.intel.com with ESMTP id 1BB8xvqK001262;
 Sat, 11 Dec 2021 00:59:57 -0800
Received: by gnu-34.sc.intel.com (Postfix, from userid 1000)
 id A10AE65CAB; Sat, 11 Dec 2021 00:59:57 -0800 (PST)
Date: Sat, 11 Dec 2021 00:59:57 -0800
To: skpgkp2@gmail.com, hjl.tools@gmail.com, gcc-regression@gcc.gnu.org
Subject: Regressions on releases/gcc-11 at commit r11-9375 vs commit
 r11-9371 on Linux/x86_64
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20211211085957.A10AE65CAB@gnu-34.sc.intel.com>
From: "H.J. Lu" <hjl@sc.intel.com>
X-Spam-Status: No, score=-3468.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-regression@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-regression mailing list <gcc-regression.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-regression/>
List-Post: <mailto:gcc-regression@gcc.gnu.org>
List-Help: <mailto:gcc-regression-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Dec 2021 09:00:05 -0000

New failures:

New passes:
FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 14 y == 2
FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 16 y == 2
FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 16 y == 2


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-12-11  7:51 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-11  7:51 [TCWG CI] 459.GemsFDTD grew in size by 2% after gcc: Limit inlining functions called once ci_notify

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).