From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 21B393857C6A for ; Fri, 10 Sep 2021 13:53:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 21B393857C6A Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-184-8Gx8y-4tOZObUjGLfk_BNg-1; Fri, 10 Sep 2021 09:53:37 -0400 X-MC-Unique: 8Gx8y-4tOZObUjGLfk_BNg-1 Received: by mail-wr1-f71.google.com with SMTP id r9-20020a5d4989000000b0015d0fbb8823so346736wrq.18 for ; Fri, 10 Sep 2021 06:53:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=BK1KCzV3e4e40mDzSXuueeSZ4IiJlj07omJADNWgRCY=; b=cyScMAbsjTQDmO3yIYS0cLZi+BLmKT6B2qgAi/hZ0obUkFS1f1yaooPk0nvPY2jb8P 21TrGYduhyA8ltz8T1UQ2pMas9E78Mik2nKBZeOE1qaBAGJ0UhciWato/qfY4pn4dFqB 3xbQUa5hlbanoeMOZyR9RaLqFGGD2KQlcgyiUkMR5MuMsVPPEGcjYeRdQ45mcPUcoMg6 wAYpP7ME3p9aNVVhR77J4Kn1M6CFS5LEBVX6tp8XDZxi97n6wWGUTqbSdHEzFpbVdSoP EmOKFR+nktp4Nf94vZuzM7MOkLZNzJgpLl3i3Yx4aLy3WjvYEsngy0ais09MiiCq4pgo WfAQ== X-Gm-Message-State: AOAM5321PiDIiYl3PcMk/BZmzwTlwzuii2liF7fQi9VsGFVqJ9wmQ93G lIXHdnh4dKNgRfIem1aA/T8LT7AOOuTMlC2i98HuM598ZmV8+JWMI/PLdBiUyLgx38va3SAeXqG MWsB5bvsr7agC8gYVq70nqz5T0bxvOdUrWrJqnDeajZi1jEow4wDUZkV1qRCxDyqP4Q== X-Received: by 2002:a1c:21c3:: with SMTP id h186mr7308153wmh.18.1631282015906; Fri, 10 Sep 2021 06:53:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyByPaftUZVu3dZPJyn7wO2fGkLcg6E8iG4BBGBtXSwKMnA5amWAC0LbLHOZ0yOJu1xWAODXQ== X-Received: by 2002:a1c:21c3:: with SMTP id h186mr7308134wmh.18.1631282015587; Fri, 10 Sep 2021 06:53:35 -0700 (PDT) Received: from abulafia.quesejoda.com ([139.47.33.227]) by smtp.gmail.com with ESMTPSA id r2sm5097109wrg.31.2021.09.10.06.53.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 10 Sep 2021 06:53:34 -0700 (PDT) Subject: Re: More aggressive threading causing loop-interchange-9.c regression To: Michael Matz Cc: gcc-patches References: <09e48b82-bc51-405e-7680-89a5f08e4e8f@redhat.com> <6d5695e4-e4eb-14a5-46a6-f425d1514008@redhat.com> <56bd6a6c-0416-7123-c792-521495d69654@redhat.com> <7dad1f1f-98e3-f6c7-8cbd-d01122b72260@redhat.com> From: Aldy Hernandez Message-ID: <8998611b-c229-e1d2-9a7a-770f6d45e582@redhat.com> Date: Fri, 10 Sep 2021 15:53:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="------------C53D6F24F5476DD0A0B9978A" Content-Language: en-US X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Sep 2021 13:53:41 -0000 This is a multi-part message in MIME format. --------------C53D6F24F5476DD0A0B9978A Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 9/10/21 3:16 PM, Michael Matz wrote: > Hi, > > On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote: > >> } >> + >> + /* Threading through a non-empty latch would cause code to be added > > "through an *empty* latch". The test in code is correct, though. Whoops. > > And for the before/after loops flag you added: we have a > cfun->curr_properties field which can be used. We even already have a > PROP_loops flag but that is set throughout compilation from CFG > construction until the RTL loop optimizers, so can't be re-used for what > is needed here. But you still could invent another PROP_ value instead of > adding a new field in struct function. Oooo, even better. No inline functions. Like this? Aldy --------------C53D6F24F5476DD0A0B9978A Content-Type: text/x-patch; charset=UTF-8; name="0001-Disable-threading-through-latches-until-after-loop-o.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-Disable-threading-through-latches-until-after-loop-o.pa"; filename*1="tch" >From ff25faa8dd8721da9bb4715706c662fc09fd4e8c Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Thu, 9 Sep 2021 20:30:28 +0200 Subject: [PATCH] Disable threading through latches until after loop optimizations. The motivation for this patch was enabling the use of global ranges in the path solver, but this caused certain properties of loops being destroyed which made subsequent loop optimizations to fail. Consequently, this patch's mail goal is to disable jump threading involving the latch until after loop optimizations have run. As can be seen in the test adjustments, we mostly shift the threading from the early threaders (ethread, thread[12] to the late threaders thread[34]). I have nuked some of the early notes in the testcases that came as part of the jump threader rewrite. They're mostly noise now. Note that we could probably relax some other restrictions in profitable_path_p when loop optimizations have completed, but it would require more testing, and I'm hesitant to touch more things than needed at this point. I have added a reminder to the function to keep this in mind. Finally, perhaps as a follow-up, we should apply the same restrictions to the forward threader. At some point I'd like to combine the cost models. Tested on x86-64 Linux. p.s. There is a thorough discussion involving the limitations of jump threading involving loops here: https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html gcc/ChangeLog: * tree-pass.h (PROP_loop_opts_done): New. * gimple-range-path.cc (path_range_query::internal_range_of_expr): Intersect with global range. * tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done. * tree-ssa-threadbackward.c (back_threader_profitability::profitable_path_p): Disable threading through latches until after loop optimizations have run. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of threading through latches. * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. Co-authored-by: Michael Matz --- gcc/gimple-range-path.cc | 3 ++ .../gcc.dg/tree-ssa/ssa-dom-thread-2b.c | 4 +- .../gcc.dg/tree-ssa/ssa-dom-thread-6.c | 37 +------------------ .../gcc.dg/tree-ssa/ssa-dom-thread-7.c | 17 +-------- gcc/tree-pass.h | 2 + gcc/tree-ssa-loop.c | 2 +- gcc/tree-ssa-threadbackward.c | 28 +++++++++++++- 7 files changed, 37 insertions(+), 56 deletions(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index a4fa3b296ff..c616b65756f 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt) basic_block bb = stmt ? gimple_bb (stmt) : exit_bb (); if (stmt && range_defined_in_block (r, name, bb)) { + if (TREE_CODE (name) == SSA_NAME) + r.intersect (gimple_range_global (name)); + set_cache (r, name); return true; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c index e1c33e86cd7..823ada982ff 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ +/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ void foo(); void bla(); @@ -26,4 +26,4 @@ void thread_latch_through_header (void) case. And we want to thread through the header as well. These are both caught by threading in DOM. */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */ -/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */ +/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c index c7bf867b084..ee46759bacc 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c @@ -1,41 +1,8 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */ -/* All the threads in the thread1 dump start on a X->BB12 edge, as can - be seen in the dump: - - Registering FSM jump thread: (x, 12) incoming edge; ... - etc - etc - - Before the new evrp, we were threading paths that started at the - following edges: - - Registering FSM jump thread: (10, 12) incoming edge - Registering FSM jump thread: (6, 12) incoming edge - Registering FSM jump thread: (9, 12) incoming edge - - This was because the PHI at BB12 had constant values coming in from - BB10, BB6, and BB9: - - # state_10 = PHI - - Now with the new evrp, we get: - - # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)> - - Thus, we have 3 more paths that are known to be constant and can be - threaded. Which means that by the second threading pass, we can - only find one profitable path. - - For the record, all these extra constants are better paths coming - out of switches. For example: - - SWITCH_BB -> BBx -> BBy -> BBz -> PHI - - We now know the value of the switch index at PHI. */ /* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */ +/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */ int sum0, sum1, sum2, sum3; int foo (char *s, char **ret) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index 5fc2145a432..ba07942f9dd 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,23 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* Here we have the same issue as was commented in ssa-dom-thread-6.c. - The PHI coming into the threader has a lot more constants, so the - threader can thread more paths. - -$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2 -252c252 -< # s_50 = PHI ---- -> # s_50 = PHI -272a273 - - I spot checked a few and they all have the same pattern. We are - basically tracking the switch index better through multiple - paths. */ - /* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 83941bc0cee..eb75eb17951 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -225,6 +225,8 @@ protected: been optimized. */ #define PROP_gimple_lomp_dev (1 << 16) /* done omp_device_lower */ #define PROP_rtl_split_insns (1 << 17) /* RTL has insns split. */ +#define PROP_loop_opts_done (1 << 18) /* SSA loop optimizations + have completed. */ #define PROP_gimple \ (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp) diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 0cc4b3bbccf..1bbf2f1fb2c 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -540,7 +540,7 @@ const pass_data pass_data_tree_loop_done = OPTGROUP_LOOP, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_cfg, /* properties_required */ - 0, /* properties_provided */ + PROP_loop_opts_done, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_cleanup_cfg, /* todo_flags_finish */ diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 449232c7715..e72992328de 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "ssa.h" #include "tree-cfgcleanup.h" #include "tree-pretty-print.h" +#include "cfghooks.h" // Path registry for the backwards threader. After all paths have been // registered with register_path(), thread_through_all_blocks() is called @@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers) TAKEN_EDGE, otherwise it is NULL. CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path - would create an irreducible loop. */ + would create an irreducible loop. + + ?? It seems we should be able to loosen some of the restrictions in + this function after loop optimizations have run. */ bool back_threader_profitability::profitable_path_p (const vec &m_path, @@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec &m_path, the last entry in the array when determining if we thread through the loop latch. */ if (loop->latch == bb) - threaded_through_latch = true; + { + threaded_through_latch = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " (latch)"); + } } gimple *stmt = get_gimple_control_stmt (m_path[0]); @@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec &m_path, "a multiway branch.\n"); return false; } + + /* Threading through an empty latch would cause code to be added to + the latch. This could alter the loop form sufficiently to cause + loop optimizations to fail. Disable these threads until after + loop optimizations have run. */ + if ((threaded_through_latch + || (taken_edge && taken_edge->dest == loop->latch)) + && !(cfun->curr_properties & PROP_loop_opts_done) + && empty_block_p (loop->latch)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + " FAIL: FSM Thread through latch before loop opts would create non-empty latch\n"); + return false; + + } return true; } -- 2.31.1 --------------C53D6F24F5476DD0A0B9978A--