From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 4/5] if-conv: Apply VN to hoisted conversions
Date: Mon, 15 Nov 2021 13:59:57 +0000 [thread overview]
Message-ID: <mptilwtldsy.fsf@arm.com> (raw)
In-Reply-To: <CAFiYyc3ifGhTNFO+x-8h0fpyyPReRNgyXQBnAUKtSaFTpmg9hw@mail.gmail.com> (Richard Biener via Gcc-patches's message of "Mon, 15 Nov 2021 13:47:15 +0100")
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> This patch is a prerequisite for a later one. At the moment,
>> if-conversion converts predicated POINTER_PLUS_EXPRs into
>> non-wrapping forms, which for:
>>
>> … = base + offset
>>
>> becomes:
>>
>> tmp = (unsigned long) base
>> … = tmp + offset
>>
>> It then hoists these conversions out of the loop where possible.
>>
>> However, because “base” is a valid gimple operand, there can be
>> multiple POINTER_PLUS_EXPRs with the same base, which can in turn
>> lead to multiple instances of the same conversion. The later VN pass
>> is (and I think needs to be) restricted to the new if-converted code,
>> whereas here we're deliberately inserting the conversions before the
>> .LOOP_VECTORIZED condition:
>>
>> /* If we versioned loop then make sure to insert invariant
>> stmts before the .LOOP_VECTORIZED check since the vectorizer
>> will re-use that for things like runtime alias versioning
>> whose condition can end up using those invariants. */
>>
>> We can therefore enter the vectoriser with redundant conversions.
>>
>> The easiest fix seemed to be to defer the hoisting until after VN.
>> This catches other hoisting opportunities too.
>>
>> Hoisting the code from the (artificial) loop in pr99102.c means
>> that it's no longer worth vectorising. The patch forces vectorisation
>> instead of relying on the cost model.
>>
>> The patch also reverts pr87007-4.c and pr87007-5.c back to their
>> original forms, undoing changes in 783dc66f9ccb0019c3dad.
>> The code at the time the tests were added was:
>>
>> testl %edi, %edi
>> je .L10
>> vxorps %xmm1, %xmm1, %xmm1
>> vsqrtsd d3(%rip), %xmm1, %xmm0
>> vsqrtsd d2(%rip), %xmm1, %xmm1
>> ...
>> .L10:
>> ret
>>
>> with the operations being hoisted, and the vxorps was specifically
>> wanted (compared to the previous code). This patch restores the code
>> to that form, with the hoisted operations and the vxorps.
>>
>> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>>
>> Richard
>>
>>
>> gcc/
>> * tree-if-conv.c: Include tree-eh.h.
>> (predicate_statements): Remove pe argument. Don't hoist
>> statements here.
>> (combine_blocks): Remove pe argument.
>> (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions.
>> (ifcvt_hoist_invariants): Likewise.
>> (tree_if_conversion): Update call to combine_blocks. Call
>> ifcvt_hoist_invariants after VN.
>>
>> gcc/testsuite/
>> * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.
>>
>> Revert:
>>
>> 2020-09-09 Richard Biener <rguenther@suse.de>
>>
>> * gcc.target/i386/pr87007-4.c: Adjust.
>> * gcc.target/i386/pr87007-5.c: Likewise.
>> ---
>> gcc/testsuite/gcc.dg/vect/pr99102.c | 2 +-
>> gcc/testsuite/gcc.target/i386/pr87007-4.c | 2 +-
>> gcc/testsuite/gcc.target/i386/pr87007-5.c | 2 +-
>> gcc/tree-if-conv.c | 122 ++++++++++++++++++++--
>> 4 files changed, 114 insertions(+), 14 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> index 6c1a13f0783..0d030d15c86 100644
>> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c
>> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
>> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
>> /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw } } */
>> long a[44];
>> short d, e = -7;
>> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> index 9c4b8005af3..e91bdcbac44 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
>> @@ -15,4 +15,4 @@ foo (int n, int k)
>> d1 = ceil (d3);
>> }
>>
>> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
>> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> index e4d956a5d7f..20d13cf650b 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
>> @@ -15,4 +15,4 @@ foo (int n, int k)
>> d1 = sqrt (d3);
>> }
>>
>> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
>> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
>> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> index e88ddc9f788..0ad557a2f4d 100644
>> --- a/gcc/tree-if-conv.c
>> +++ b/gcc/tree-if-conv.c
>> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3. If not see
>> #include "tree-cfgcleanup.h"
>> #include "tree-ssa-dse.h"
>> #include "tree-vectorizer.h"
>> +#include "tree-eh.h"
>>
>> /* Only handle PHIs with no more arguments unless we are asked to by
>> simd pragma. */
>> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
>> */
>>
>> static void
>> -predicate_statements (loop_p loop, edge pe)
>> +predicate_statements (loop_p loop)
>> {
>> unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>> auto_vec<int, 1> vect_sizes;
>> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe)
>> {
>> gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2));
>> gsi_remove (&gsi2, false);
>> - /* Make sure to move invariant conversions out of the
>> - loop. */
>> - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
>> - && expr_invariant_in_loop_p (loop,
>> - gimple_assign_rhs1 (stmt2)))
>> - gsi_insert_on_edge_immediate (pe, stmt2);
>> - else if (first)
>> + if (first)
>> {
>> gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
>> first = false;
>> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop)
>> blocks. Replace PHI nodes with conditional modify expressions. */
>>
>> static void
>> -combine_blocks (class loop *loop, edge pe)
>> +combine_blocks (class loop *loop)
>> {
>> basic_block bb, exit_bb, merge_target_bb;
>> unsigned int orig_loop_num_nodes = loop->num_nodes;
>> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe)
>> predicate_all_scalar_phis (loop);
>>
>> if (need_to_predicate || need_to_rewrite_undefined)
>> - predicate_statements (loop, pe);
>> + predicate_statements (loop);
>>
>> /* Merge basic blocks. */
>> exit_bb = NULL;
>> @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop)
>> }
>> }
>>
>> +/* Return true if STMT can be hoisted from if-converted loop LOOP. */
>> +
>> +static bool
>> +ifcvt_can_hoist (class loop *loop, gimple *stmt)
>> +{
>> + if (auto *call = dyn_cast<gcall *> (stmt))
>> + {
>> + if (gimple_call_internal_p (call)
>> + && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
>> + return false;
>> + }
>> + else if (auto *assign = dyn_cast<gassign *> (stmt))
>> + {
>> + if (gimple_assign_rhs_code (assign) == COND_EXPR)
>> + return false;
>> + }
>> + else
>> + return false;
>> +
>> + if (gimple_has_side_effects (stmt)
>> + || gimple_could_trap_p (stmt)
>> + || stmt_could_throw_p (cfun, stmt)
>> + || gimple_vdef (stmt)
>> + || gimple_vuse (stmt))
>> + return false;
>> +
>> + int num_args = gimple_num_args (stmt);
>> + for (int i = 0; i < num_args; ++i)
>> + if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i)))
>> + return false;
>> +
>> + return true;
>> +}
>> +
>> +/* PE is the preferred hoisting edge selected by tree_if_conversion, which
>> + s known to be different from (and to dominate) the preheader edge of the
>> + if-converted loop. We already know that STMT can be inserted on the loop
>> + preheader edge. Return true if we prefer to insert it on PE instead. */
>> +
>> +static bool
>> +ifcvt_can_hoist_further (edge pe, gimple *stmt)
>> +{
>> + /* As explained in tree_if_conversion, we want to hoist invariant
>> + conversions further so that they can be reused by alias analysis. */
>> + auto *assign = dyn_cast<gassign *> (stmt);
>> + if (assign
>> + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign)))
>> + {
>> + tree rhs = gimple_assign_rhs1 (assign);
>> + if (is_gimple_min_invariant (rhs))
>> + return true;
>> +
>> + if (TREE_CODE (rhs) == SSA_NAME)
>> + {
>> + basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs));
>> + if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb))
>> + return true;
>> + }
>> + }
>> + return false;
>> +}
>> +
>> +/* Hoist invariant statements from LOOP. PE is the preferred edge for
>> + hoisting conversions, as selected by tree_if_conversion; see there
>> + for details. */
>> +
>> +static void
>> +ifcvt_hoist_invariants (class loop *loop, edge pe)
>> +{
>> + gimple_stmt_iterator hoist_gsi = {};
>> + gimple_stmt_iterator hoist_gsi_pe = {};
>> + unsigned int num_blocks = loop->num_nodes;
>> + basic_block *body = get_loop_body (loop);
>> + for (unsigned int i = 0; i < num_blocks; ++i)
>> + for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
>> + {
>> + gimple *stmt = gsi_stmt (gsi);
>> + if (ifcvt_can_hoist (loop, stmt))
>> + {
>> + /* Once we've hoisted one statement, insert other statements
>> + after it. */
>> + edge e = loop_preheader_edge (loop);
>> + gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi;
>> + if (e != pe && ifcvt_can_hoist_further (pe, stmt))
>
> One issue with hoisting to loop_preheader_edge instead of 'pe'
> is that eventually the loop versioning code will pick up the defs
> through the versioning condition and that will break because the
> versioning condition will be inserted in place of the
> IFN_VECTORIZED_LOOP, which is _before_ the preheader.
> The code basically expects the preheader to be empty.
Do you mean that the versioning-for-aliases code assumes that all
loop-invariant definitions are available in the versioning condition?
I hadn't realised that's what the comment was saying. It sounded more
like an optimisation.
Thanks,
Richard
> I don't see how ifcvt_can_hoist_further avoids this for
> non-CONVERT_EXPRs.
>
> So I don't see how this is actually safe? It's probably moderately
> easy to trigger issues when you disable both PRE and
> loop invariant motion before if-conversion/vectorization.
>
> Maybe doing something similar as cse_and_gimplify_to_preheader
> done in the vectorizer helps for the original problem. Or alternatively
> do the hoisting during loop versioning ...
>
> Richard.
>
>> + {
>> + e = pe;
>> + hoist_gsi_ptr = &hoist_gsi_pe;
>> + }
>> + gsi_remove (&gsi, false);
>> + if (hoist_gsi_ptr->ptr)
>> + gsi_insert_after (hoist_gsi_ptr, stmt, GSI_NEW_STMT);
>> + else
>> + {
>> + gsi_insert_on_edge_immediate (e, stmt);
>> + *hoist_gsi_ptr = gsi_for_stmt (stmt);
>> + }
>> + continue;
>> + }
>> + gsi_next (&gsi);
>> + }
>> + free (body);
>> +}
>> +
>> /* If-convert LOOP when it is legal. For the moment this pass has no
>> profitability analysis. Returns non-zero todo flags when something
>> changed. */
>> @@ -3275,7 +3373,7 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>> /* Now all statements are if-convertible. Combine all the basic
>> blocks into one huge basic block doing the if-conversion
>> on-the-fly. */
>> - combine_blocks (loop, pe);
>> + combine_blocks (loop);
>>
>> /* Perform local CSE, this esp. helps the vectorizer analysis if loads
>> and stores are involved. CSE only the loop body, not the entry
>> @@ -3297,6 +3395,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>> ifcvt_local_dce (loop);
>> BITMAP_FREE (exit_bbs);
>>
>> + ifcvt_hoist_invariants (loop, pe);
>> +
>> todo |= TODO_cleanup_cfg;
>>
>> cleanup:
>> --
>> 2.25.1
>>
next prev parent reply other threads:[~2021-11-15 13:59 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-12 18:04 Richard Sandiford
2021-11-15 12:47 ` Richard Biener
2021-11-15 13:59 ` Richard Sandiford [this message]
2021-11-15 14:11 ` Richard Biener
2021-11-16 18:05 ` Richard Sandiford
2021-11-19 10:09 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptilwtldsy.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).