From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by sourceware.org (Postfix) with ESMTPS id B97D33858408 for ; Thu, 7 Oct 2021 03:30:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B97D33858408 Received: by mail-wr1-x434.google.com with SMTP id t8so14685207wri.1 for ; Wed, 06 Oct 2021 20:30:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:date:to:cc:message-id:subject:mime-version; bh=aKHH5Gq04E2szYP34ML1b74yMoxgdvbVQCHkF/hwFaM=; b=OgUV8wSuzagkjKNMSgVkU0zpHiGPJMcGqQwaCqDTSvHndibwYuP3GfGh09jKeDwK1k LItjvHLl2klnUiHq39N3FzmxBH+yCrTlGU6KFVWlhbm9NiauXWoOTS4FAYMwb12lTIXH 6HqQGdoligFXNXyTb6BaatD+f0iHu97pgsCUtL+9KdmgR60sB+59jKgLtyLUOCI+Fwmn hRXVW4/sop9mF3gSAgYEuaFlILFU/RisqJNNzqD0vrqiHh4knzc3IPjyKQ4uwEyL3Ska HWqEkBOTdqk/XH8WbghEFRRyOsfDTCaZ+2kqvhFAjc3dQPvA2QYHwPeEVB62ObLhPMCD Ha8A== X-Gm-Message-State: AOAM531EiV4DgvyxgNuN+ryCFLWDJ70UpG7PuZjMypi76zmFShxbmZE/ 9hse3bXrmEH6DWIvrB8vV2CzGA== X-Google-Smtp-Source: ABdhPJxuNvtjaOuQKhPbG/mHoePNWieQ4ePBAzwIenClX8o34SmG1HtDXDQc7XY2R7xqzAV93nt1BA== X-Received: by 2002:a1c:a712:: with SMTP id q18mr1943537wme.23.1633577443504; Wed, 06 Oct 2021 20:30:43 -0700 (PDT) Received: from 172.17.0.5 (ci.linaro.org. [88.99.136.175]) by smtp.gmail.com with ESMTPSA id p11sm4055477wmi.0.2021.10.06.20.30.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Oct 2021 20:30:42 -0700 (PDT) From: ci_notify@linaro.org X-Google-Original-From: linaro-infrastructure-errors@lists.linaro.org Date: Thu, 7 Oct 2021 03:30:42 +0000 (UTC) To: Aldy Hernandez Cc: gcc-regression@gcc.gnu.org Message-ID: <1886148038.9700.1633577442959@localhost> Subject: [TCWG CI] 482.sphinx3 slowed down by 4% after gcc: Loosen loop crossing restriction in threader. MIME-Version: 1.0 X-Jenkins-Job: TCWG Bisect tcwg_bmk_tx1/gnu-master-aarch64-spec2k6-O3 X-Jenkins-Result: SUCCESS X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-regression@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-regression mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 03:30:47 -0000 After gcc commit ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc Author: Aldy Hernandez Loosen loop crossing restriction in threader. the following benchmarks slowed down by more than 2%: - 482.sphinx3 slowed down by 4% from 21091 to 21983 perf samples the following hot functions slowed down by more than 10% (but their benchmarks slowed down by less than 2%): - 471.omnetpp:[.] _ZN12cMessageHeap8getFirstEv slowed down by 1397% from 146 to 2185 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc/save-temps/ - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-1f51e9af7b615838424214e6aaea0de793cb10fe/save-temps/ - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-baseline/save-temps/ Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc/ Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-1f51e9af7b615838424214e6aaea0de793cb10fe/ Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/build-baseline/ Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/ Reproduce builds: mkdir investigate-gcc-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc cd investigate-gcc-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/manifests/build-baseline.sh --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/manifests/build-parameters.sh --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O3/36/artifact/artifacts/test.sh --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1f51e9af7b615838424214e6aaea0de793cb10fe ../artifacts/test.sh cd .. Full commit (up to 1000 lines): commit ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc Author: Aldy Hernandez Date: Tue Oct 5 15:03:34 2021 +0200 Loosen loop crossing restriction in threader. Crossing loops is generally discouraged from the threader, but we can make an exception when we don't cross the latch or enter another loop, since this is just an early exit out of the loop. In fact, the whole threaded path is logically outside the loop. This has nice secondary effects. For example, objects on the threaded path will no longer necessarily be live throughout the loop, so we can get register allocation improvements. The threaded path can physically move outside the loop resulting in better icache efficiency, etc. Tested on x86-64 Linux, and on a visium-elf cross making sure that the following tests do not have an abort in the final assembly: gcc.c-torture/execute/960218-1.c gcc.c-torture/execute/visium-pending-4.c gcc.c-torture/execute/pr58209.c gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): Loosen restrictions gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-thread-valid.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c | 39 +++++++++++++++++++++++ gcc/tree-ssa-threadupdate.c | 40 +++++++++++++++++------- 2 files changed, 68 insertions(+), 11 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c new file mode 100644 index 00000000000..7adca97cc2b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c @@ -0,0 +1,39 @@ +// { dg-do compile } +// { dg-options "-O2 -fgimple -fdump-statistics" } + +// This is a collection of threadable paths. To simplify maintenance, +// there should only be one threadable path per function. + +int global; + +// The thread from 3->4->5 crosses loops but is allowed because it +// never crosses the latch (BB3) and is just an early exit out of the +// loop. +int __GIMPLE (ssa) +foo1 (int x) +{ + int D_1420; + int a; + + __BB(2): + a_4 = ~x_3(D); + goto __BB4; + + // Latch. + __BB(3): + global = a_1; + goto __BB4; + + __BB(4,loop_header(1)): + a_1 = __PHI (__BB2: a_4, __BB3: 0); + if (a_1 != 0) + goto __BB3; + else + goto __BB5; + + __BB(5): + return; + +} + +// { dg-final { scan-tree-dump "Jumps threaded\" \"foo1\" 1" "statistics" } } diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index dcabfdb30d2..32ce1e3af40 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2766,10 +2766,17 @@ bool jt_path_registry::cancel_invalid_paths (vec &path) { gcc_checking_assert (!path.is_empty ()); - edge taken_edge = path[path.length () - 1]->e; - loop_p loop = taken_edge->src->loop_father; + edge entry = path[0]->e; + edge exit = path[path.length () - 1]->e; bool seen_latch = false; - bool path_crosses_loops = false; + int loops_crossed = 0; + bool crossed_latch = false; + // Use ->dest here instead of ->src to ignore the first block. The + // first block is allowed to be in a different loop, since it'll be + // redirected. See similar comment in profitable_path_p: "we don't + // care about that block...". + loop_p loop = entry->dest->loop_father; + loop_p curr_loop = loop; for (unsigned int i = 0; i < path.length (); i++) { @@ -2784,19 +2791,30 @@ jt_path_registry::cancel_invalid_paths (vec &path) } if (loop->latch == e->src || loop->latch == e->dest) - seen_latch = true; + { + seen_latch = true; + // Like seen_latch, but excludes the first block. + if (e->src != entry->src) + crossed_latch = true; + } - // The first entry represents the block with an outgoing edge - // that we will redirect to the jump threading path. Thus we - // don't care about that block's loop father. - if ((i > 0 && e->src->loop_father != loop) - || e->dest->loop_father != loop) - path_crosses_loops = true; + if (e->dest->loop_father != curr_loop) + { + curr_loop = e->dest->loop_father; + ++loops_crossed; + } if (flag_checking && !m_backedge_threads) gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); } + // If we crossed a loop into an outer loop without crossing the + // latch, this is just an early exit from the loop. + if (loops_crossed == 1 + && !crossed_latch + && flow_loop_nested_p (exit->dest->loop_father, exit->src->loop_father)) + return false; + if (cfun->curr_properties & PROP_loop_opts_done) return false; @@ -2806,7 +2824,7 @@ jt_path_registry::cancel_invalid_paths (vec &path) "would create non-empty latch"); return true; } - if (path_crosses_loops) + if (loops_crossed) { cancel_thread (&path, "Path crosses loops"); return true; >>From ci_notify@linaro.org Thu Oct 7 06:38:21 2021 Return-Path: X-Original-To: gcc-regression@gcc.gnu.org Delivered-To: gcc-regression@gcc.gnu.org Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by sourceware.org (Postfix) with ESMTPS id B3E783858C39 for ; Thu, 7 Oct 2021 06:38:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B3E783858C39 Received: by mail-wr1-x433.google.com with SMTP id r7so15595639wrc.10 for ; Wed, 06 Oct 2021 23:38:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:date:to:cc:message-id:subject:mime-version; bh=4sbzjAxtUhfgZXv72EubxUVgs7hrTwMpeVrOpMGGWR4=; b=3NmQ2Z+rL3UxqHPd19FONKH56gLxxilMt8nPZ1q6QF5Wi+im8FJBfccCyWwv0sBvbA WmVvhHJ4ZJX67At7OfWdevSb0B6wvbsbutqZs6DCaooyzhpszPhmy2WK24Sao1Gx9GMx iuNIVz3oXh2LZ5WYDTXOS75BGZKPRhZQy5fKzEw82qoMKoAgg/dP5OVy8a/L/DVamu1j OXk2qj52toM4konxrb+ynGDhb/1Pc/sPTq0UcveryRIZXfRpn88hh9TlOuLtidvQJ+tb JncGHOy7NhlDaV/U9OIryXC9qmmN6fsCz7vJO7jy6Owej/aQ/1oZPCkE6z6cQAoG/cfc 5g7A== X-Gm-Message-State: AOAM5331H/Bf+/jCrIuCEBcNOOY6DhxsfSaI5bpNuabGo8jirnFlETHU I/8Dpwtr0Ue+Jk7G/Pq2vDN/ag== X-Google-Smtp-Source: ABdhPJw9p9r7ZRpTeQfLMU/UqgVpf1Tk3iG7VwdEBslnbUK8B+tw1Bj+Zb2aNS0+nrwX2iFWENJQ3A== X-Received: by 2002:adf:bc48:: with SMTP id a8mr3174683wrh.397.1633588697696; Wed, 06 Oct 2021 23:38:17 -0700 (PDT) Received: from 172.17.0.5 (ci.linaro.org. [88.99.136.175]) by smtp.gmail.com with ESMTPSA id z16sm7598741wmk.6.2021.10.06.23.38.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Oct 2021 23:38:17 -0700 (PDT) From: ci_notify@linaro.org X-Google-Original-From: linaro-infrastructure-errors@lists.linaro.org Date: Thu, 7 Oct 2021 06:38:16 +0000 (UTC) To: Aldy Hernandez Cc: gcc-regression@gcc.gnu.org Message-ID: <2142114940.9731.1633588697156@localhost> Subject: [TCWG CI] 429.mcf slowed down by 9% after gcc: Loosen loop crossing restriction in threader. MIME-Version: 1.0 X-Jenkins-Job: TCWG Bisect tcwg_bmk_tk1/gnu-master-arm-spec2k6-O2_LTO X-Jenkins-Result: SUCCESS X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-regression@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-regression mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 06:38:21 -0000 After gcc commit ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc Author: Aldy Hernandez Loosen loop crossing restriction in threader. the following benchmarks slowed down by more than 2%: - 429.mcf slowed down by 9% from 9961 to 10815 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc/save-temps/ - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-1f51e9af7b615838424214e6aaea0de793cb10fe/save-temps/ - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-baseline/save-temps/ Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O2 -flto -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc/ Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-1f51e9af7b615838424214e6aaea0de793cb10fe/ Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/build-baseline/ Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/ Reproduce builds: mkdir investigate-gcc-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc cd investigate-gcc-ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/manifests/build-baseline.sh --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/manifests/build-parameters.sh --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O2_LTO/34/artifact/artifacts/test.sh --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc ../artifacts/test.sh # Reproduce last_good build git checkout --detach 1f51e9af7b615838424214e6aaea0de793cb10fe ../artifacts/test.sh cd .. Full commit (up to 1000 lines): commit ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc Author: Aldy Hernandez Date: Tue Oct 5 15:03:34 2021 +0200 Loosen loop crossing restriction in threader. Crossing loops is generally discouraged from the threader, but we can make an exception when we don't cross the latch or enter another loop, since this is just an early exit out of the loop. In fact, the whole threaded path is logically outside the loop. This has nice secondary effects. For example, objects on the threaded path will no longer necessarily be live throughout the loop, so we can get register allocation improvements. The threaded path can physically move outside the loop resulting in better icache efficiency, etc. Tested on x86-64 Linux, and on a visium-elf cross making sure that the following tests do not have an abort in the final assembly: gcc.c-torture/execute/960218-1.c gcc.c-torture/execute/visium-pending-4.c gcc.c-torture/execute/pr58209.c gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): Loosen restrictions gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-thread-valid.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c | 39 +++++++++++++++++++++++ gcc/tree-ssa-threadupdate.c | 40 +++++++++++++++++------- 2 files changed, 68 insertions(+), 11 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c new file mode 100644 index 00000000000..7adca97cc2b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c @@ -0,0 +1,39 @@ +// { dg-do compile } +// { dg-options "-O2 -fgimple -fdump-statistics" } + +// This is a collection of threadable paths. To simplify maintenance, +// there should only be one threadable path per function. + +int global; + +// The thread from 3->4->5 crosses loops but is allowed because it +// never crosses the latch (BB3) and is just an early exit out of the +// loop. +int __GIMPLE (ssa) +foo1 (int x) +{ + int D_1420; + int a; + + __BB(2): + a_4 = ~x_3(D); + goto __BB4; + + // Latch. + __BB(3): + global = a_1; + goto __BB4; + + __BB(4,loop_header(1)): + a_1 = __PHI (__BB2: a_4, __BB3: 0); + if (a_1 != 0) + goto __BB3; + else + goto __BB5; + + __BB(5): + return; + +} + +// { dg-final { scan-tree-dump "Jumps threaded\" \"foo1\" 1" "statistics" } } diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index dcabfdb30d2..32ce1e3af40 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2766,10 +2766,17 @@ bool jt_path_registry::cancel_invalid_paths (vec &path) { gcc_checking_assert (!path.is_empty ()); - edge taken_edge = path[path.length () - 1]->e; - loop_p loop = taken_edge->src->loop_father; + edge entry = path[0]->e; + edge exit = path[path.length () - 1]->e; bool seen_latch = false; - bool path_crosses_loops = false; + int loops_crossed = 0; + bool crossed_latch = false; + // Use ->dest here instead of ->src to ignore the first block. The + // first block is allowed to be in a different loop, since it'll be + // redirected. See similar comment in profitable_path_p: "we don't + // care about that block...". + loop_p loop = entry->dest->loop_father; + loop_p curr_loop = loop; for (unsigned int i = 0; i < path.length (); i++) { @@ -2784,19 +2791,30 @@ jt_path_registry::cancel_invalid_paths (vec &path) } if (loop->latch == e->src || loop->latch == e->dest) - seen_latch = true; + { + seen_latch = true; + // Like seen_latch, but excludes the first block. + if (e->src != entry->src) + crossed_latch = true; + } - // The first entry represents the block with an outgoing edge - // that we will redirect to the jump threading path. Thus we - // don't care about that block's loop father. - if ((i > 0 && e->src->loop_father != loop) - || e->dest->loop_father != loop) - path_crosses_loops = true; + if (e->dest->loop_father != curr_loop) + { + curr_loop = e->dest->loop_father; + ++loops_crossed; + } if (flag_checking && !m_backedge_threads) gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); } + // If we crossed a loop into an outer loop without crossing the + // latch, this is just an early exit from the loop. + if (loops_crossed == 1 + && !crossed_latch + && flow_loop_nested_p (exit->dest->loop_father, exit->src->loop_father)) + return false; + if (cfun->curr_properties & PROP_loop_opts_done) return false; @@ -2806,7 +2824,7 @@ jt_path_registry::cancel_invalid_paths (vec &path) "would create non-empty latch"); return true; } - if (path_crosses_loops) + if (loops_crossed) { cancel_thread (&path, "Path crosses loops"); return true; >>From hjl@sc.intel.com Thu Oct 7 08:40:22 2021 Return-Path: X-Original-To: gcc-regression@gcc.gnu.org Delivered-To: gcc-regression@gcc.gnu.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by sourceware.org (Postfix) with ESMTPS id CC9883858C39 for ; Thu, 7 Oct 2021 08:40:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CC9883858C39 X-IronPort-AV: E=McAfee;i="6200,9189,10129"; a="226150586" X-IronPort-AV: E=Sophos;i="5.85,354,1624345200"; d="scan'208";a="226150586" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2021 01:40:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,354,1624345200"; d="scan'208";a="560496749" Received: from scymds01.sc.intel.com ([10.148.94.138]) by FMSMGA003.fm.intel.com with ESMTP; 07 Oct 2021 01:40:19 -0700 Received: from gnu-ivb-1.sc.intel.com (gnu-ivb-1.sc.intel.com [172.25.70.227]) by scymds01.sc.intel.com with ESMTP id 1978eJHg016438; Thu, 7 Oct 2021 01:40:19 -0700 Received: by gnu-ivb-1.sc.intel.com (Postfix, from userid 1000) id 30F8D180B6E; Thu, 7 Oct 2021 01:40:19 -0700 (PDT) Date: Thu, 07 Oct 2021 01:40:19 -0700 To: skpgkp2@gmail.com, hjl.tools@gmail.com, gcc-regression@gcc.gnu.org Subject: Regressions on master at commit r12-4218 vs commit r12-4202 on Linux/i686 User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20211007084019.30F8D180B6E@gnu-ivb-1.sc.intel.com> From: "H.J. Lu" X-Spam-Status: No, score=-3471.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-regression@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-regression mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 08:40:22 -0000 New failures: New passes: FAIL: libgomp.c/../libgomp.c-c++-common/atomic-21.c execution test