From: Alex Coplan <alex.coplan@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: Richard Sandiford <richard.sandiford@arm.com>,
Kyrylo Tkachov <kyrylo.tkachov@arm.com>,
Richard Earnshaw <richard.earnshaw@arm.com>
Subject: Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]
Date: Mon, 15 Jan 2024 15:39:42 +0000 [thread overview]
Message-ID: <ZaVRvrUYv2qm41Jb@arm.com> (raw)
In-Reply-To: <ZaKwRyX/Ka3+xiqI@arm.com>
On 13/01/2024 15:46, Alex Coplan wrote:
> As the PR shows (specifically #c7) we are missing updating uses of mem
> when inserting an stp in the aarch64 load/store pair fusion pass. This
> patch fixes that.
>
> RTL-SSA has a simple view of memory and by default doesn't allow stores
> to be re-ordered w.r.t. other stores. In the ldp fusion pass, we do our
> own alias analysis and so can re-order stores over other accesses when
> we deem this is safe. If neither store can be re-purposed (moved into
> the required position to form the stp while respecting the RTL-SSA
> constraints), then we turn both the candidate stores into "tombstone"
> insns (logically delete them) and insert a new stp insn.
>
> As it stands, we implement the insert case separately (after dealing
> with the candidate stores) in fuse_pair by inserting into the middle of
> the vector of changes. This is OK when we only have to insert one
> change, but with this fix we would need to insert the change for the new
> stp plus multiple changes to fix up uses of mem (note the number of
> fix-ups is naturally bounded by the alias limit param to prevent
> quadratic behaviour). If we kept the code structured as is and inserted
> into the middle of the vector, that would lead to repeated moving of
> elements in the vector which seems inefficient. The structure of the
> code would also be a little unwieldy.
>
> To improve on that situation, this patch introduces a helper class,
> stp_change_builder, which implements a state machine that helps to build
> the required changes directly in program order. That state machine is
> reponsible for deciding what changes need to be made in what order, and
> the code in fuse_pair then simply follows those steps.
>
> Together with the fix in the previous patch for installing new defs
> correctly in RTL-SSA, this fixes PR113070.
>
> We take the opportunity to rename the function decide_stp_strategy to
> try_repurpose_store, as that seems more descriptive of what it actually
> does, since stp_change_builder is now responsible for the overall change
> strategy.
>
> Bootstrapped/regtested as a series with/without the passes enabled on
> aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
> PR target/113070
> * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> (decide_stp_strategy): Reanme to ...
> (try_repurpose_store): ... this.
> (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> construct stp changes. Fix up uses when inserting new stp insns.
> ---
> gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++++++++++++++++++-----
> 1 file changed, 194 insertions(+), 54 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 689a8c884bd..703cfb1228c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
> return range;
> }
>
> +// Class that implements a state machine for building the changes needed to form
> +// a store pair instruction. This allows us to easily build the changes in
> +// program order, as required by rtl-ssa.
> +struct stp_change_builder
> +{
> + enum class state
> + {
> + FIRST,
> + INSERT,
> + FIXUP_USE,
> + LAST,
> + DONE
> + };
> +
> + enum class action
> + {
> + TOMBSTONE,
> + CHANGE,
> + INSERT,
> + FIXUP_USE
> + };
> +
> + struct change
> + {
> + action type;
> + insn_info *insn;
> + };
> +
> + bool done () const { return m_state == state::DONE; }
> +
> + stp_change_builder (insn_info *insns[2],
> + insn_info *repurpose,
> + insn_info *dest)
> + : m_state (state::FIRST), m_insns { insns[0], insns[1] },
> + m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> +
> + change get_change () const
> + {
> + switch (m_state)
> + {
> + case state::FIRST:
> + return {
> + m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> + m_insns[0]
> + };
> + case state::LAST:
> + return {
> + m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> + m_insns[1]
> + };
> + case state::INSERT:
> + return { action::INSERT, m_dest };
> + case state::FIXUP_USE:
> + return { action::FIXUP_USE, m_use->insn () };
> + case state::DONE:
> + break;
> + }
> +
> + gcc_unreachable ();
> + }
> +
> + // Transition to the next state.
> + void advance ()
> + {
> + switch (m_state)
> + {
> + case state::FIRST:
> + if (m_repurpose)
> + m_state = state::LAST;
> + else
> + m_state = state::INSERT;
> + break;
> + case state::INSERT:
> + {
> + def_info *def = memory_access (m_insns[0]->defs ());
> + while (*def->next_def ()->insn () <= *m_dest)
> + def = def->next_def ();
> +
> + // Now we know DEF feeds the insertion point for the new stp.
> + // Look for any uses of DEF that will consume the new stp.
> + gcc_assert (*def->insn () <= *m_dest
> + && *def->next_def ()->insn () > *m_dest);
> +
> + if (auto set = dyn_cast<set_info *> (def))
> + for (auto use : set->nondebug_insn_uses ())
> + if (*use->insn () > *m_dest)
> + {
> + m_use = use;
> + break;
> + }
> +
> + if (m_use)
> + m_state = state::FIXUP_USE;
> + else
> + m_state = state::LAST;
> + break;
> + }
> + case state::FIXUP_USE:
> + m_use = m_use->next_nondebug_insn_use ();
> + if (!m_use)
> + m_state = state::LAST;
> + break;
> + case state::LAST:
> + m_state = state::DONE;
> + break;
> + case state::DONE:
> + gcc_unreachable ();
> + }
> + }
> +
> +private:
> + state m_state;
> +
> + // Original candidate stores.
> + insn_info *m_insns[2];
> +
> + // If non-null, this is a candidate insn to change into an stp. Otherwise we
> + // are deleting both original insns and inserting a new insn for the stp.
> + insn_info *m_repurpose;
> +
> + // Destionation of the stp, it will be placed immediately after m_dest.
> + insn_info *m_dest;
> +
> + // Current nondebug use that needs updating due to stp insertion.
> + use_info *m_use;
> +};
> +
> // Given candidate store insns FIRST and SECOND, see if we can re-purpose one
> // of them (together with its def of memory) for the stp insn. If so, return
> // that insn. Otherwise, return null.
> static insn_info *
> -decide_stp_strategy (insn_info *first,
> +try_repurpose_store (insn_info *first,
> insn_info *second,
> const insn_range_info &move_range)
> {
> @@ -1253,7 +1380,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>
> insn_info *insns[2] = { first, second };
>
> - auto_vec<insn_change *, 4> changes (4);
> + auto_vec<insn_change *> changes;
> auto_vec<int, 2> tombstone_uids (2);
>
> rtx pats[2] = {
> @@ -1455,9 +1582,9 @@ ldp_bb_info::fuse_pair (bool load_p,
>
> if (load_p)
> {
> - changes.quick_push (make_delete (first));
> + changes.safe_push (make_delete (first));
> pair_change = make_change (second);
> - changes.quick_push (pair_change);
> + changes.safe_push (pair_change);
>
> pair_change->move_range = move_range;
> pair_change->new_defs = merge_access_arrays (attempt,
> @@ -1474,18 +1601,22 @@ ldp_bb_info::fuse_pair (bool load_p,
> }
> else
> {
> - insn_info *store_to_change = decide_stp_strategy (first, second,
> + using Action = stp_change_builder::action;
> + insn_info *store_to_change = try_repurpose_store (first, second,
> move_range);
> -
> - if (store_to_change && dump_file)
> - fprintf (dump_file, " stp: re-purposing store %d\n",
> - store_to_change->uid ());
> -
> + insn_info *stp_dest = move_range.singleton ();
> + gcc_assert (stp_dest);
> + stp_change_builder builder (insns, store_to_change, stp_dest);
> insn_change *change;
> - for (int i = 0; i < 2; i++)
> + set_info *new_set = nullptr;
> + for (; !builder.done (); builder.advance ())
> {
> - change = make_change (insns[i]);
> - if (insns[i] == store_to_change)
> + auto action = builder.get_change ();
> + change = (action.type == Action::INSERT)
> + ? nullptr : make_change (action.insn);
> + switch (action.type)
> + {
> + case Action::CHANGE:
> {
> set_pair_pat (change);
> change->new_uses = merge_access_arrays (attempt,
> @@ -1495,67 +1626,76 @@ ldp_bb_info::fuse_pair (bool load_p,
> auto d2 = drop_memory_access (input_defs[1]);
> change->new_defs = merge_access_arrays (attempt, d1, d2);
> gcc_assert (change->new_defs.is_valid ());
> - def_info *stp_def = memory_access (store_to_change->defs ());
> + def_info *stp_def = memory_access (change->insn ()->defs ());
> change->new_defs = insert_access (attempt,
> stp_def,
> change->new_defs);
> gcc_assert (change->new_defs.is_valid ());
> change->move_range = move_range;
> pair_change = change;
> + break;
> }
> - else
> + case Action::TOMBSTONE:
> {
> - // Note that we are turning this insn into a tombstone,
> - // we need to keep track of these if we go ahead with the
> - // change.
> - tombstone_uids.quick_push (insns[i]->uid ());
> - rtx_insn *rti = insns[i]->rtl ();
> + tombstone_uids.quick_push (change->insn ()->uid ());
> + rtx_insn *rti = change->insn ()->rtl ();
> validate_change (rti, &PATTERN (rti), gen_tombstone (), true);
> validate_change (rti, ®_NOTES (rti), NULL_RTX, true);
> change->new_uses = use_array (nullptr, 0);
> + break;
> }
> - gcc_assert (change->new_uses.is_valid ());
> - changes.quick_push (change);
> - }
> + case Action::INSERT:
> + {
> + if (dump_file)
> + fprintf (dump_file,
> + " stp: cannot re-purpose candidate stores\n");
>
> - if (!store_to_change)
> - {
> - // Tricky case. Cannot re-purpose existing insns for stp.
> - // Need to insert new insn.
> - if (dump_file)
> - fprintf (dump_file,
> - " stp fusion: cannot re-purpose candidate stores\n");
> -
> - auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> - change = make_change (new_insn);
> - change->move_range = move_range;
> - change->new_uses = merge_access_arrays (attempt,
> - input_uses[0],
> - input_uses[1]);
> - gcc_assert (change->new_uses.is_valid ());
> -
> - auto d1 = drop_memory_access (input_defs[0]);
> - auto d2 = drop_memory_access (input_defs[1]);
> - change->new_defs = merge_access_arrays (attempt, d1, d2);
> - gcc_assert (change->new_defs.is_valid ());
> -
> - auto new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> - change->new_defs = insert_access (attempt, new_set,
> - change->new_defs);
> - gcc_assert (change->new_defs.is_valid ());
> - changes.safe_insert (1, change);
> - pair_change = change;
> + auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> + change = make_change (new_insn);
> + change->move_range = move_range;
> + change->new_uses = merge_access_arrays (attempt,
> + input_uses[0],
> + input_uses[1]);
> + gcc_assert (change->new_uses.is_valid ());
> +
> + auto d1 = drop_memory_access (input_defs[0]);
> + auto d2 = drop_memory_access (input_defs[1]);
> + change->new_defs = merge_access_arrays (attempt, d1, d2);
> + gcc_assert (change->new_defs.is_valid ());
> +
> + new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> + change->new_defs = insert_access (attempt, new_set,
> + change->new_defs);
> + gcc_assert (change->new_defs.is_valid ());
> + pair_change = change;
> + break;
> + }
> + case Action::FIXUP_USE:
> + {
> + // This use now needs to consume memory from our stp.
> + if (dump_file)
> + fprintf (dump_file,
> + " stp: changing i%d to use mem from new stp "
> + "(after i%d)\n",
> + action.insn->uid (), stp_dest->uid ());
> + change->new_uses = drop_memory_access (change->new_uses);
> + gcc_assert (new_set);
> + auto new_use = crtl->ssa->create_use (attempt, action.insn,
> + new_set);
> + change->new_uses = insert_access (attempt, new_use,
> + change->new_uses);
> + break;
> + }
> + }
> + changes.safe_push (change);
> }
> }
>
> if (trailing_add)
> changes.quick_push (make_delete (trailing_add));
Reviewing my own patch here, but just a note that this call to
quick_push should have been updated to safe_push with this change.
Will fix it locally and include in a v2 (if needed) when the patch gets
reviewed for real.
>
> - auto n_changes = changes.length ();
> - gcc_checking_assert (n_changes >= 2 && n_changes <= 4);
> -
> auto is_changing = insn_is_changing (changes);
> - for (unsigned i = 0; i < n_changes; i++)
> + for (unsigned i = 0; i < changes.length (); i++)
> gcc_assert (rtl_ssa::restrict_movement_ignoring (*changes[i], is_changing));
>
> // Check the pair pattern is recog'd.
next prev parent reply other threads:[~2024-01-15 15:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-13 15:46 Alex Coplan
2024-01-15 15:39 ` Alex Coplan [this message]
2024-01-22 15:59 ` Richard Sandiford
2024-01-22 21:50 ` Alex Coplan
2024-01-23 13:30 ` Alex Coplan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZaVRvrUYv2qm41Jb@arm.com \
--to=alex.coplan@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=kyrylo.tkachov@arm.com \
--cc=richard.earnshaw@arm.com \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).