From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ci_notify@linaro.org>
Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com
 [IPv6:2a00:1450:4864:20::434])
 by sourceware.org (Postfix) with ESMTPS id B97D33858408
 for <gcc-regression@gcc.gnu.org>; Thu,  7 Oct 2021 03:30:44 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B97D33858408
Received: by mail-wr1-x434.google.com with SMTP id t8so14685207wri.1
 for <gcc-regression@gcc.gnu.org>; Wed, 06 Oct 2021 20:30:44 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:date:to:cc:message-id:subject:mime-version;
 bh=aKHH5Gq04E2szYP34ML1b74yMoxgdvbVQCHkF/hwFaM=;
 b=OgUV8wSuzagkjKNMSgVkU0zpHiGPJMcGqQwaCqDTSvHndibwYuP3GfGh09jKeDwK1k
 LItjvHLl2klnUiHq39N3FzmxBH+yCrTlGU6KFVWlhbm9NiauXWoOTS4FAYMwb12lTIXH
 6HqQGdoligFXNXyTb6BaatD+f0iHu97pgsCUtL+9KdmgR60sB+59jKgLtyLUOCI+Fwmn
 hRXVW4/sop9mF3gSAgYEuaFlILFU/RisqJNNzqD0vrqiHh4knzc3IPjyKQ4uwEyL3Ska
 HWqEkBOTdqk/XH8WbghEFRRyOsfDTCaZ+2kqvhFAjc3dQPvA2QYHwPeEVB62ObLhPMCD
 Ha8A==
X-Gm-Message-State: AOAM531EiV4DgvyxgNuN+ryCFLWDJ70UpG7PuZjMypi76zmFShxbmZE/
 9hse3bXrmEH6DWIvrB8vV2CzGA==
X-Google-Smtp-Source: ABdhPJxuNvtjaOuQKhPbG/mHoePNWieQ4ePBAzwIenClX8o34SmG1HtDXDQc7XY2R7xqzAV93nt1BA==
X-Received: by 2002:a1c:a712:: with SMTP id q18mr1943537wme.23.1633577443504; 
 Wed, 06 Oct 2021 20:30:43 -0700 (PDT)
Received: from 172.17.0.5 (ci.linaro.org. [88.99.136.175])
 by smtp.gmail.com with ESMTPSA id p11sm4055477wmi.0.2021.10.06.20.30.42
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 06 Oct 2021 20:30:42 -0700 (PDT)
From: ci_notify@linaro.org
X-Google-Original-From: linaro-infrastructure-errors@lists.linaro.org
Date: Thu, 7 Oct 2021 03:30:42 +0000 (UTC)
To: Aldy Hernandez <aldyh@redhat.com>
Cc: gcc-regression@gcc.gnu.org
Message-ID: <1886148038.9700.1633577442959@localhost>
Subject: [TCWG CI] 482.sphinx3 slowed down by 4% after gcc: Loosen loop
 crossing restriction in threader.
MIME-Version: 1.0
X-Jenkins-Job: TCWG Bisect tcwg_bmk_tx1/gnu-master-aarch64-spec2k6-O3
X-Jenkins-Result: SUCCESS
X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-regression@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-regression mailing list <gcc-regression.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-regression/>
List-Post: <mailto:gcc-regression@gcc.gnu.org>
List-Help: <mailto:gcc-regression-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Oct 2021 03:30:47 -0000

After gcc commit ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc
Author: Aldy Hernandez <aldyh@redhat.com>

    Loosen loop crossing restriction in threader.

the following benchmarks slowed down by more than 2%:
- 482.sphinx3 slowed down by 4% from 21091 to 21983 perf samples

the following hot functions slowed down by more than 10% (but their benchmarks slowed down by less than 2%):
- 471.omnetpp:[.] _ZN12cMessageHeap8getFirstEv slowed down by 1397% from 146 to 2185 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection.  Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc/save-temps/
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-1f51e9af7b615838424214e6aaea0de793cb10fe/save-temps/
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57

This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3

First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc/
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-1f51e9af7b615838424214e6aaea0de793cb10fe/
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-baseline/
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-gcc-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc
cd investigate-gcc-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/manifests/build-baseline.sh --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/manifests/build-parameters.sh --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/test.sh --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/

cd gcc

# Reproduce first_bad build
git checkout --detach ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 1f51e9af7b615838424214e6aaea0de793cb10fe
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Tue Oct 5 15:03:34 2021 +0200

    Loosen loop crossing restriction in threader.
    
    Crossing loops is generally discouraged from the threader, but we can
    make an exception when we don't cross the latch or enter another loop,
    since this is just an early exit out of the loop.
    
    In fact, the whole threaded path is logically outside the loop.  This
    has nice secondary effects.  For example, objects on the threaded path
    will no longer necessarily be live throughout the loop, so we can get
    register allocation improvements.  The threaded path can physically
    move outside the loop resulting in better icache efficiency, etc.
    
    Tested on x86-64 Linux, and on a visium-elf cross making sure that the
    following tests do not have an abort in the final assembly:
    
    gcc.c-torture/execute/960218-1.c
    gcc.c-torture/execute/visium-pending-4.c
    gcc.c-torture/execute/pr58209.c
    
    gcc/ChangeLog:
    
            * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
            Loosen restrictions
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/tree-ssa/ssa-thread-valid.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c | 39 +++++++++++++++++++++++
 gcc/tree-ssa-threadupdate.c                      | 40 +++++++++++++++++-------
 2 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c
new file mode 100644
index 00000000000..7adca97cc2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c
@@ -0,0 +1,39 @@
+// { dg-do compile }
+// { dg-options "-O2 -fgimple -fdump-statistics" }
+
+// This is a collection of threadable paths.  To simplify maintenance,
+// there should only be one threadable path per function.
+
+int global;
+
+// The thread from 3->4->5 crosses loops but is allowed because it
+// never crosses the latch (BB3) and is just an early exit out of the
+// loop.
+int __GIMPLE (ssa)
+foo1 (int x)
+{
+  int D_1420;
+  int a;
+
+  __BB(2):
+  a_4 = ~x_3(D);
+  goto __BB4;
+
+  // Latch.
+  __BB(3):
+  global = a_1;
+  goto __BB4;
+
+  __BB(4,loop_header(1)):
+  a_1 = __PHI (__BB2: a_4, __BB3: 0);
+  if (a_1 != 0)
+    goto __BB3;
+  else
+    goto __BB5;
+
+  __BB(5):
+  return;
+
+}
+
+// { dg-final { scan-tree-dump "Jumps threaded\" \"foo1\" 1" "statistics" } }
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index dcabfdb30d2..32ce1e3af40 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2766,10 +2766,17 @@ bool
 jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 {
   gcc_checking_assert (!path.is_empty ());
-  edge taken_edge = path[path.length () - 1]->e;
-  loop_p loop = taken_edge->src->loop_father;
+  edge entry = path[0]->e;
+  edge exit = path[path.length () - 1]->e;
   bool seen_latch = false;
-  bool path_crosses_loops = false;
+  int loops_crossed = 0;
+  bool crossed_latch = false;
+  // Use ->dest here instead of ->src to ignore the first block.  The
+  // first block is allowed to be in a different loop, since it'll be
+  // redirected.  See similar comment in profitable_path_p: "we don't
+  // care about that block...".
+  loop_p loop = entry->dest->loop_father;
+  loop_p curr_loop = loop;
 
   for (unsigned int i = 0; i < path.length (); i++)
     {
@@ -2784,19 +2791,30 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 	}
 
       if (loop->latch == e->src || loop->latch == e->dest)
-	seen_latch = true;
+	{
+	  seen_latch = true;
+	  // Like seen_latch, but excludes the first block.
+	  if (e->src != entry->src)
+	    crossed_latch = true;
+	}
 
-      // The first entry represents the block with an outgoing edge
-      // that we will redirect to the jump threading path.  Thus we
-      // don't care about that block's loop father.
-      if ((i > 0 && e->src->loop_father != loop)
-	  || e->dest->loop_father != loop)
-	path_crosses_loops = true;
+      if (e->dest->loop_father != curr_loop)
+	{
+	  curr_loop = e->dest->loop_father;
+	  ++loops_crossed;
+	}
 
       if (flag_checking && !m_backedge_threads)
 	gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
     }
 
+  // If we crossed a loop into an outer loop without crossing the
+  // latch, this is just an early exit from the loop.
+  if (loops_crossed == 1
+      && !crossed_latch
+      && flow_loop_nested_p (exit->dest->loop_father, exit->src->loop_father))
+    return false;
+
   if (cfun->curr_properties & PROP_loop_opts_done)
     return false;
 
@@ -2806,7 +2824,7 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 		     "would create non-empty latch");
       return true;
     }
-  if (path_crosses_loops)
+  if (loops_crossed)
     {
       cancel_thread (&path, "Path crosses loops");
       return true;
</cut>
>>From ci_notify@linaro.org  Thu Oct  7 06:38:21 2021
Return-Path: <ci_notify@linaro.org>
X-Original-To: gcc-regression@gcc.gnu.org
Delivered-To: gcc-regression@gcc.gnu.org
Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com
 [IPv6:2a00:1450:4864:20::433])
 by sourceware.org (Postfix) with ESMTPS id B3E783858C39
 for <gcc-regression@gcc.gnu.org>; Thu,  7 Oct 2021 06:38:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B3E783858C39
Received: by mail-wr1-x433.google.com with SMTP id r7so15595639wrc.10
 for <gcc-regression@gcc.gnu.org>; Wed, 06 Oct 2021 23:38:18 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:date:to:cc:message-id:subject:mime-version;
 bh=4sbzjAxtUhfgZXv72EubxUVgs7hrTwMpeVrOpMGGWR4=;
 b=3NmQ2Z+rL3UxqHPd19FONKH56gLxxilMt8nPZ1q6QF5Wi+im8FJBfccCyWwv0sBvbA
 WmVvhHJ4ZJX67At7OfWdevSb0B6wvbsbutqZs6DCaooyzhpszPhmy2WK24Sao1Gx9GMx
 iuNIVz3oXh2LZ5WYDTXOS75BGZKPRhZQy5fKzEw82qoMKoAgg/dP5OVy8a/L/DVamu1j
 OXk2qj52toM4konxrb+ynGDhb/1Pc/sPTq0UcveryRIZXfRpn88hh9TlOuLtidvQJ+tb
 JncGHOy7NhlDaV/U9OIryXC9qmmN6fsCz7vJO7jy6Owej/aQ/1oZPCkE6z6cQAoG/cfc
 5g7A==
X-Gm-Message-State: AOAM5331H/Bf+/jCrIuCEBcNOOY6DhxsfSaI5bpNuabGo8jirnFlETHU
 I/8Dpwtr0Ue+Jk7G/Pq2vDN/ag==
X-Google-Smtp-Source: ABdhPJw9p9r7ZRpTeQfLMU/UqgVpf1Tk3iG7VwdEBslnbUK8B+tw1Bj+Zb2aNS0+nrwX2iFWENJQ3A==
X-Received: by 2002:adf:bc48:: with SMTP id a8mr3174683wrh.397.1633588697696; 
 Wed, 06 Oct 2021 23:38:17 -0700 (PDT)
Received: from 172.17.0.5 (ci.linaro.org. [88.99.136.175])
 by smtp.gmail.com with ESMTPSA id z16sm7598741wmk.6.2021.10.06.23.38.17
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 06 Oct 2021 23:38:17 -0700 (PDT)
From: ci_notify@linaro.org
X-Google-Original-From: linaro-infrastructure-errors@lists.linaro.org
Date: Thu, 7 Oct 2021 06:38:16 +0000 (UTC)
To: Aldy Hernandez <aldyh@redhat.com>
Cc: gcc-regression@gcc.gnu.org
Message-ID: <2142114940.9731.1633588697156@localhost>
Subject: [TCWG CI] 429.mcf slowed down by 9% after gcc: Loosen loop crossing
 restriction in threader.
MIME-Version: 1.0
X-Jenkins-Job: TCWG Bisect tcwg_bmk_tk1/gnu-master-arm-spec2k6-O2_LTO
X-Jenkins-Result: SUCCESS
X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-regression@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-regression mailing list <gcc-regression.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-regression/>
List-Post: <mailto:gcc-regression@gcc.gnu.org>
List-Help: <mailto:gcc-regression-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Oct 2021 06:38:21 -0000

After gcc commit ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc
Author: Aldy Hernandez <aldyh@redhat.com>

    Loosen loop crossing restriction in threader.

the following benchmarks slowed down by more than 2%:
- 429.mcf slowed down by 9% from 9961 to 10815 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection.  Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc/save-temps/
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-1f51e9af7b615838424214e6aaea0de793cb10fe/save-temps/
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: arm-linux-gnueabihf
- Compiler flags: -O2 -flto -marm
- Hardware: NVidia TK1 4x Cortex-A15

This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO

First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc/
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-1f51e9af7b615838424214e6aaea0de793cb10fe/
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-baseline/
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-gcc-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc
cd investigate-gcc-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/manifests/build-baseline.sh --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/manifests/build-parameters.sh --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/test.sh --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/

cd gcc

# Reproduce first_bad build
git checkout --detach ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 1f51e9af7b615838424214e6aaea0de793cb10fe
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Tue Oct 5 15:03:34 2021 +0200

    Loosen loop crossing restriction in threader.
    
    Crossing loops is generally discouraged from the threader, but we can
    make an exception when we don't cross the latch or enter another loop,
    since this is just an early exit out of the loop.
    
    In fact, the whole threaded path is logically outside the loop.  This
    has nice secondary effects.  For example, objects on the threaded path
    will no longer necessarily be live throughout the loop, so we can get
    register allocation improvements.  The threaded path can physically
    move outside the loop resulting in better icache efficiency, etc.
    
    Tested on x86-64 Linux, and on a visium-elf cross making sure that the
    following tests do not have an abort in the final assembly:
    
    gcc.c-torture/execute/960218-1.c
    gcc.c-torture/execute/visium-pending-4.c
    gcc.c-torture/execute/pr58209.c
    
    gcc/ChangeLog:
    
            * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
            Loosen restrictions
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/tree-ssa/ssa-thread-valid.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c | 39 +++++++++++++++++++++++
 gcc/tree-ssa-threadupdate.c                      | 40 +++++++++++++++++-------
 2 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c
new file mode 100644
index 00000000000..7adca97cc2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c
@@ -0,0 +1,39 @@
+// { dg-do compile }
+// { dg-options "-O2 -fgimple -fdump-statistics" }
+
+// This is a collection of threadable paths.  To simplify maintenance,
+// there should only be one threadable path per function.
+
+int global;
+
+// The thread from 3->4->5 crosses loops but is allowed because it
+// never crosses the latch (BB3) and is just an early exit out of the
+// loop.
+int __GIMPLE (ssa)
+foo1 (int x)
+{
+  int D_1420;
+  int a;
+
+  __BB(2):
+  a_4 = ~x_3(D);
+  goto __BB4;
+
+  // Latch.
+  __BB(3):
+  global = a_1;
+  goto __BB4;
+
+  __BB(4,loop_header(1)):
+  a_1 = __PHI (__BB2: a_4, __BB3: 0);
+  if (a_1 != 0)
+    goto __BB3;
+  else
+    goto __BB5;
+
+  __BB(5):
+  return;
+
+}
+
+// { dg-final { scan-tree-dump "Jumps threaded\" \"foo1\" 1" "statistics" } }
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index dcabfdb30d2..32ce1e3af40 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2766,10 +2766,17 @@ bool
 jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 {
   gcc_checking_assert (!path.is_empty ());
-  edge taken_edge = path[path.length () - 1]->e;
-  loop_p loop = taken_edge->src->loop_father;
+  edge entry = path[0]->e;
+  edge exit = path[path.length () - 1]->e;
   bool seen_latch = false;
-  bool path_crosses_loops = false;
+  int loops_crossed = 0;
+  bool crossed_latch = false;
+  // Use ->dest here instead of ->src to ignore the first block.  The
+  // first block is allowed to be in a different loop, since it'll be
+  // redirected.  See similar comment in profitable_path_p: "we don't
+  // care about that block...".
+  loop_p loop = entry->dest->loop_father;
+  loop_p curr_loop = loop;
 
   for (unsigned int i = 0; i < path.length (); i++)
     {
@@ -2784,19 +2791,30 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 	}
 
       if (loop->latch == e->src || loop->latch == e->dest)
-	seen_latch = true;
+	{
+	  seen_latch = true;
+	  // Like seen_latch, but excludes the first block.
+	  if (e->src != entry->src)
+	    crossed_latch = true;
+	}
 
-      // The first entry represents the block with an outgoing edge
-      // that we will redirect to the jump threading path.  Thus we
-      // don't care about that block's loop father.
-      if ((i > 0 && e->src->loop_father != loop)
-	  || e->dest->loop_father != loop)
-	path_crosses_loops = true;
+      if (e->dest->loop_father != curr_loop)
+	{
+	  curr_loop = e->dest->loop_father;
+	  ++loops_crossed;
+	}
 
       if (flag_checking && !m_backedge_threads)
 	gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
     }
 
+  // If we crossed a loop into an outer loop without crossing the
+  // latch, this is just an early exit from the loop.
+  if (loops_crossed == 1
+      && !crossed_latch
+      && flow_loop_nested_p (exit->dest->loop_father, exit->src->loop_father))
+    return false;
+
   if (cfun->curr_properties & PROP_loop_opts_done)
     return false;
 
@@ -2806,7 +2824,7 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 		     "would create non-empty latch");
       return true;
     }
-  if (path_crosses_loops)
+  if (loops_crossed)
     {
       cancel_thread (&path, "Path crosses loops");
       return true;
</cut>
>>From hjl@sc.intel.com  Thu Oct  7 08:40:22 2021
Return-Path: <hjl@sc.intel.com>
X-Original-To: gcc-regression@gcc.gnu.org
Delivered-To: gcc-regression@gcc.gnu.org
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by sourceware.org (Postfix) with ESMTPS id CC9883858C39
 for <gcc-regression@gcc.gnu.org>; Thu,  7 Oct 2021 08:40:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CC9883858C39
X-IronPort-AV: E=McAfee;i="6200,9189,10129"; a="226150586"
X-IronPort-AV: E=Sophos;i="5.85,354,1624345200"; d="scan'208";a="226150586"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Oct 2021 01:40:19 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.85,354,1624345200"; d="scan'208";a="560496749"
Received: from scymds01.sc.intel.com ([10.148.94.138])
 by FMSMGA003.fm.intel.com with ESMTP; 07 Oct 2021 01:40:19 -0700
Received: from gnu-ivb-1.sc.intel.com (gnu-ivb-1.sc.intel.com [172.25.70.227])
 by scymds01.sc.intel.com with ESMTP id 1978eJHg016438;
 Thu, 7 Oct 2021 01:40:19 -0700
Received: by gnu-ivb-1.sc.intel.com (Postfix, from userid 1000)
 id 30F8D180B6E; Thu,  7 Oct 2021 01:40:19 -0700 (PDT)
Date: Thu, 07 Oct 2021 01:40:19 -0700
To: skpgkp2@gmail.com, hjl.tools@gmail.com, gcc-regression@gcc.gnu.org
Subject: Regressions on master at commit r12-4218 vs commit r12-4202 on
 Linux/i686
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20211007084019.30F8D180B6E@gnu-ivb-1.sc.intel.com>
From: "H.J. Lu" <hjl@sc.intel.com>
X-Spam-Status: No, score=-3471.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-regression@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-regression mailing list <gcc-regression.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-regression/>
List-Post: <mailto:gcc-regression@gcc.gnu.org>
List-Help: <mailto:gcc-regression-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Oct 2021 08:40:22 -0000

New failures:

New passes:
FAIL: libgomp.c/../libgomp.c-c++-common/atomic-21.c execution test