public inbox for gcc-regression@sourceware.org
help / color / mirror / Atom feed
* [TCWG CI] 482.sphinx3:[.] subvq_mgau_shortlist slowed down by 18% after gcc: reassoc: Do not bias loop-carried PHIs early
@ 2021-10-04  9:39 ci_notify
  2021-10-04 10:44 ` Ilya Leoshkevich
  0 siblings, 1 reply; 3+ messages in thread
From: ci_notify @ 2021-10-04  9:39 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: gcc-regression

After gcc commit 99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e
Author: Ilya Leoshkevich <iii@linux.ibm.com>

    reassoc: Do not bias loop-carried PHIs early

the following hot functions slowed down by more than 10% (but their benchmarks slowed down by less than 2%):
- 482.sphinx3:[.] subvq_mgau_shortlist slowed down by 18% from 1628 to 1914 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection.  Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e/save-temps/
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-3b7041e8345c2f1030e58620f28e22d64b2c196b/save-temps/
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2
- Hardware: NVidia TX1 4x Cortex-A57

This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2

First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e/
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-3b7041e8345c2f1030e58620f28e22d64b2c196b/
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-baseline/
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-gcc-99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e
cd investigate-gcc-99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/manifests/build-baseline.sh --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/manifests/build-parameters.sh --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/test.sh --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/

cd gcc

# Reproduce first_bad build
git checkout --detach 99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 3b7041e8345c2f1030e58620f28e22d64b2c196b
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit 99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Tue Sep 14 20:41:18 2021 +0200

    reassoc: Do not bias loop-carried PHIs early
    
    Biasing loop-carried PHIs during the 1st reassociation pass interferes
    with reduction chains and does not bring measurable benefits, so do it
    only during the 2nd reassociation pass.
    
    gcc/ChangeLog:
    
            * passes.def (pass_reassoc): Rename parameter to early_p.
            * tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
            New variable.
            (phi_rank): Don't bias loop-carried phi ranks
            before vectorization pass.
            (execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
            (pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
            initializer.
            (pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
            value.
            (pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
            execute_reassoc.
            (pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
---
 gcc/passes.def         |  4 ++--
 gcc/tree-ssa-reassoc.c | 16 ++++++++++++++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 9115da7beb6..c11c237f6d2 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -243,7 +243,7 @@ along with GCC; see the file COPYING3.  If not see
       /* Identify paths that should never be executed in a conforming
 	 program and isolate those paths.  */
       NEXT_PASS (pass_isolate_erroneous_paths);
-      NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
+      NEXT_PASS (pass_reassoc, true /* early_p */);
       NEXT_PASS (pass_dce);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt, false /* early_p */);
@@ -326,7 +326,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_lower_switch);
       NEXT_PASS (pass_cse_reciprocals);
-      NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
+      NEXT_PASS (pass_reassoc, false /* early_p */);
       NEXT_PASS (pass_strength_reduction);
       NEXT_PASS (pass_split_paths);
       NEXT_PASS (pass_tracer);
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 8498cfc7aa8..420c14e8cf5 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -180,6 +180,10 @@ along with GCC; see the file COPYING3.  If not see
    point 3a in the pass header comment.  */
 static bool reassoc_insert_powi_p;
 
+/* Enable biasing ranks of loop accumulators.  We don't want this before
+   vectorization, since it interferes with reduction chains.  */
+static bool reassoc_bias_loop_carried_phi_ranks_p;
+
 /* Statistics */
 static struct
 {
@@ -269,6 +273,9 @@ phi_rank (gimple *stmt)
   use_operand_p use;
   gimple *use_stmt;
 
+  if (!reassoc_bias_loop_carried_phi_ranks_p)
+    return bb_rank[bb->index];
+
   /* We only care about real loops (those with a latch).  */
   if (!father->latch)
     return bb_rank[bb->index];
@@ -6940,9 +6947,10 @@ fini_reassoc (void)
    optimization of a gimple conditional.  Otherwise returns zero.  */
 
 static unsigned int
-execute_reassoc (bool insert_powi_p)
+execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p)
 {
   reassoc_insert_powi_p = insert_powi_p;
+  reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p;
 
   init_reassoc ();
 
@@ -6983,15 +6991,19 @@ public:
     {
       gcc_assert (n == 0);
       insert_powi_p = param;
+      bias_loop_carried_phi_ranks_p = !param;
     }
   virtual bool gate (function *) { return flag_tree_reassoc != 0; }
   virtual unsigned int execute (function *)
-    { return execute_reassoc (insert_powi_p); }
+  {
+    return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p);
+  }
 
  private:
   /* Enable insertion of __builtin_powi calls during execute_reassoc.  See
      point 3a in the pass header comment.  */
   bool insert_powi_p;
+  bool bias_loop_carried_phi_ranks_p;
 }; // class pass_reassoc
 
 } // anon namespace
</cut>
>From hjl@sc.intel.com  Mon Oct  4 09:49:31 2021
Return-Path: <hjl@sc.intel.com>
X-Original-To: gcc-regression@gcc.gnu.org
Delivered-To: gcc-regression@gcc.gnu.org
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
 by sourceware.org (Postfix) with ESMTPS id 1729F3858408
 for <gcc-regression@gcc.gnu.org>; Mon,  4 Oct 2021 09:49:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1729F3858408
X-IronPort-AV: E=McAfee;i="6200,9189,10126"; a="206171829"
X-IronPort-AV: E=Sophos;i="5.85,345,1624345200"; d="scan'208";a="206171829"
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
 by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Oct 2021 02:49:19 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.85,345,1624345200"; d="scan'208";a="712597133"
Received: from scymds02.sc.intel.com ([10.82.73.244])
 by fmsmga005.fm.intel.com with ESMTP; 04 Oct 2021 02:49:18 -0700
Received: from gnu-34.sc.intel.com (gnu-34.sc.intel.com [172.25.70.212])
 by scymds02.sc.intel.com with ESMTP id 1949nIr8024660;
 Mon, 4 Oct 2021 02:49:18 -0700
Received: by gnu-34.sc.intel.com (Postfix, from userid 1000)
 id DBF4565238; Mon,  4 Oct 2021 02:49:18 -0700 (PDT)
Date: Mon, 04 Oct 2021 02:49:18 -0700
To: skpgkp2@gmail.com, hjl.tools@gmail.com, gcc-regression@gcc.gnu.org
Subject: Regressions on releases/gcc-10 at commit r10-10166 vs commit
 r10-10151 on Linux/x86_64
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20211004094918.DBF4565238@gnu-34.sc.intel.com>
From: "H.J. Lu" <hjl@sc.intel.com>
X-Spam-Status: No, score=-3469.2 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_LAZY_DOMAIN_SECURITY, KAM_LINKBAIT, KAM_NUMSUBJECT, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-regression@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-regression mailing list <gcc-regression.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-regression/>
List-Post: <mailto:gcc-regression@gcc.gnu.org>
List-Help: <mailto:gcc-regression-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Oct 2021 09:49:31 -0000

New failures:

New passes:
FAIL: 30_threads/future/members/poll.cc execution test
FAIL: 30_threads/future/members/poll.cc execution test
FAIL: 30_threads/future/members/poll.cc execution test


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [TCWG CI] 482.sphinx3:[.] subvq_mgau_shortlist slowed down by 18% after gcc: reassoc: Do not bias loop-carried PHIs early
  2021-10-04  9:39 [TCWG CI] 482.sphinx3:[.] subvq_mgau_shortlist slowed down by 18% after gcc: reassoc: Do not bias loop-carried PHIs early ci_notify
@ 2021-10-04 10:44 ` Ilya Leoshkevich
  2021-10-04 12:30   ` Maxim Kuvyrkov
  0 siblings, 1 reply; 3+ messages in thread
From: Ilya Leoshkevich @ 2021-10-04 10:44 UTC (permalink / raw)
  To: ci_notify; +Cc: gcc-regression

On Mon, 2021-10-04 at 09:39 +0000, ci_notify@linaro.org wrote:
> After gcc commit 99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e
> Author: Ilya Leoshkevich <iii@linux.ibm.com>
> 
>     reassoc: Do not bias loop-carried PHIs early
> 
> the following hot functions slowed down by more than 10% (but their
> benchmarks slowed down by less than 2%):
> - 482.sphinx3:[.] subvq_mgau_shortlist slowed down by 18% from 1628
> to 1914 perf samples
> 
> Below reproducer instructions can be used to re-build both
> "first_bad" and "last_good" cross-toolchains used in this bisection. 
> Naturally, the scripts will fail when triggerring benchmarking jobs
> if you don't have access to Linaro TCWG CI.
> 
> For your convenience, we have uploaded tarballs with pre-processed
> source and assembly files at:
> - First_bad save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e/save-temps/
> - Last_good save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-3b7041e8345c2f1030e58620f28e22d64b2c196b/save-temps/
> - Baseline save-temps:  
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-baseline/save-temps/

...

The asm files in "Last_good save-temps" and "First_bad save-temps" are
identical.  Futhermore, subvq.s is identical between all the three
builds.  Can this be a benchmarking setup issue?

Best regards,
Ilya



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [TCWG CI] 482.sphinx3:[.] subvq_mgau_shortlist slowed down by 18% after gcc: reassoc: Do not bias loop-carried PHIs early
  2021-10-04 10:44 ` Ilya Leoshkevich
@ 2021-10-04 12:30   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 3+ messages in thread
From: Maxim Kuvyrkov @ 2021-10-04 12:30 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: gcc-regression

Hi Ilya,

Thanks for looking into this!

This is an interesting corner-case (one of many!) in automated bisection of benchmarking CI.  Looking at the detailed logs, the last-good build is, actually, bad, so this regression is due to some other commit.

We track performance of the functions that make up 90% of the total runtime profile.  The function in question — subvq_mgau_shortlist() — is just at the boundary of that 90%, and it shows up in first-bad summary [1], but not in last-good summary [2].  Checking the full comparison for last-good [3], it’s, indeed, there with 17% regression.

This regression 18% regression in subvq_mgau_shortlist() translates in 2% regression in 482.sphinx3, so this is something I’ll continue to chase.

[1] https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-first_bad/11-check_regression/results.csv/*view*/
[2] https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-last_good/11-check_regression/results.csv/*view*/
[3] https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-last_good/11-check_regression/results-full.csv/*view*/

--
Maxim Kuvyrkov
https://www.linaro.org

> On 4 Oct 2021, at 13:44, Ilya Leoshkevich via Gcc-regression <gcc-regression@gcc.gnu.org> wrote:
> 
> On Mon, 2021-10-04 at 09:39 +0000, ci_notify@linaro.org wrote:
>> After gcc commit 99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e
>> Author: Ilya Leoshkevich <iii@linux.ibm.com>
>> 
>>     reassoc: Do not bias loop-carried PHIs early
>> 
>> the following hot functions slowed down by more than 10% (but their
>> benchmarks slowed down by less than 2%):
>> - 482.sphinx3:[.] subvq_mgau_shortlist slowed down by 18% from 1628
>> to 1914 perf samples
>> 
>> Below reproducer instructions can be used to re-build both
>> "first_bad" and "last_good" cross-toolchains used in this bisection. 
>> Naturally, the scripts will fail when triggerring benchmarking jobs
>> if you don't have access to Linaro TCWG CI.
>> 
>> For your convenience, we have uploaded tarballs with pre-processed
>> source and assembly files at:
>> - First_bad save-temps: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-99c106e695bd8f1de580c4ff0b1d3fe59c9a4f1e/save-temps/
>> - Last_good save-temps: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-3b7041e8345c2f1030e58620f28e22d64b2c196b/save-temps/
>> - Baseline save-temps:  
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/29/artifact/artifacts/build-baseline/save-temps/
> 
> ...
> 
> The asm files in "Last_good save-temps" and "First_bad save-temps" are
> identical.  Futhermore, subvq.s is identical between all the three
> builds.  Can this be a benchmarking setup issue?
> 
> Best regards,
> Ilya
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-04 12:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-04  9:39 [TCWG CI] 482.sphinx3:[.] subvq_mgau_shortlist slowed down by 18% after gcc: reassoc: Do not bias loop-carried PHIs early ci_notify
2021-10-04 10:44 ` Ilya Leoshkevich
2021-10-04 12:30   ` Maxim Kuvyrkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).