public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Alex Coplan <alex.coplan@arm.com>
To: gcc-patches@gcc.gnu.org, Kyrylo Tkachov <kyrylo.tkachov@arm.com>,
	Richard Earnshaw <richard.earnshaw@arm.com>,
	richard.sandiford@arm.com
Subject: Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]
Date: Tue, 23 Jan 2024 13:30:18 +0000	[thread overview]
Message-ID: <Za+/aiqRg+ZUOXLh@arm.com> (raw)
In-Reply-To: <Za7jNoY/KlM3PxOq@arm.com>

On 22/01/2024 21:50, Alex Coplan wrote:
> On 22/01/2024 15:59, Richard Sandiford wrote:
> > Alex Coplan <alex.coplan@arm.com> writes:
> > > As the PR shows (specifically #c7) we are missing updating uses of mem
> > > when inserting an stp in the aarch64 load/store pair fusion pass.  This
> > > patch fixes that.
> > >
> > > RTL-SSA has a simple view of memory and by default doesn't allow stores
> > > to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> > > own alias analysis and so can re-order stores over other accesses when
> > > we deem this is safe.  If neither store can be re-purposed (moved into
> > > the required position to form the stp while respecting the RTL-SSA
> > > constraints), then we turn both the candidate stores into "tombstone"
> > > insns (logically delete them) and insert a new stp insn.
> > >
> > > As it stands, we implement the insert case separately (after dealing
> > > with the candidate stores) in fuse_pair by inserting into the middle of
> > > the vector of changes.  This is OK when we only have to insert one
> > > change, but with this fix we would need to insert the change for the new
> > > stp plus multiple changes to fix up uses of mem (note the number of
> > > fix-ups is naturally bounded by the alias limit param to prevent
> > > quadratic behaviour).  If we kept the code structured as is and inserted
> > > into the middle of the vector, that would lead to repeated moving of
> > > elements in the vector which seems inefficient.  The structure of the
> > > code would also be a little unwieldy.
> > >
> > > To improve on that situation, this patch introduces a helper class,
> > > stp_change_builder, which implements a state machine that helps to build
> > > the required changes directly in program order.  That state machine is
> > > reponsible for deciding what changes need to be made in what order, and
> > > the code in fuse_pair then simply follows those steps.
> > >
> > > Together with the fix in the previous patch for installing new defs
> > > correctly in RTL-SSA, this fixes PR113070.
> > >
> > > We take the opportunity to rename the function decide_stp_strategy to
> > > try_repurpose_store, as that seems more descriptive of what it actually
> > > does, since stp_change_builder is now responsible for the overall change
> > > strategy.
> > >
> > > Bootstrapped/regtested as a series with/without the passes enabled on
> > > aarch64-linux-gnu, OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > > 	PR target/113070
> > > 	* config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> > > 	(decide_stp_strategy): Reanme to ...
> > > 	(try_repurpose_store): ... this.
> > > 	(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> > > 	construct stp changes.  Fix up uses when inserting new stp insns.
> > > ---
> > >  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++++++++++++++++++-----
> > >  1 file changed, 194 insertions(+), 54 deletions(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > index 689a8c884bd..703cfb1228c 100644
> > > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
> > >    return range;
> > >  }
> > >  
> > > +// Class that implements a state machine for building the changes needed to form
> > > +// a store pair instruction.  This allows us to easily build the changes in
> > > +// program order, as required by rtl-ssa.
> > > +struct stp_change_builder
> > > +{
> > > +  enum class state
> > > +  {
> > > +    FIRST,
> > > +    INSERT,
> > > +    FIXUP_USE,
> > > +    LAST,
> > > +    DONE
> > > +  };
> > > +
> > > +  enum class action
> > > +  {
> > > +    TOMBSTONE,
> > > +    CHANGE,
> > > +    INSERT,
> > > +    FIXUP_USE
> > > +  };
> > > +
> > > +  struct change
> > > +  {
> > > +    action type;
> > > +    insn_info *insn;
> > > +  };
> > > +
> > > +  bool done () const { return m_state == state::DONE; }
> > > +
> > > +  stp_change_builder (insn_info *insns[2],
> > > +		      insn_info *repurpose,
> > > +		      insn_info *dest)
> > > +    : m_state (state::FIRST), m_insns { insns[0], insns[1] },
> > > +      m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> > 
> > Just to make sure I understand: is it the case that
> > 
> >   *insns[0] <= *dest <= *insns[1]
> > 
> > ?
> 
> Yes, that is my understanding.  I thought about asserting it somewhere in
> stp_change_builder, but it seemed a bit gratuitous.
> 
> > 
> > > +
> > > +  change get_change () const
> > > +  {
> > > +    switch (m_state)
> > > +      {
> > > +      case state::FIRST:
> > > +	return {
> > > +	  m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > > +	  m_insns[0]
> > > +	};
> > > +      case state::LAST:
> > > +	return {
> > > +	  m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > > +	  m_insns[1]
> > > +	};
> > > +      case state::INSERT:
> > > +	return { action::INSERT, m_dest };
> > > +      case state::FIXUP_USE:
> > > +	return { action::FIXUP_USE, m_use->insn () };
> > > +      case state::DONE:
> > > +	break;
> > > +      }
> > > +
> > > +    gcc_unreachable ();
> > > +  }
> > > +
> > > +  // Transition to the next state.
> > > +  void advance ()
> > > +  {
> > > +    switch (m_state)
> > > +      {
> > > +      case state::FIRST:
> > > +	if (m_repurpose)
> > > +	  m_state = state::LAST;
> > > +	else
> > > +	  m_state = state::INSERT;
> > > +	break;
> > > +      case state::INSERT:
> > > +      {
> > > +	def_info *def = memory_access (m_insns[0]->defs ());
> > > +	while (*def->next_def ()->insn () <= *m_dest)
> > > +	  def = def->next_def ();
> > > +
> > > +	// Now we know DEF feeds the insertion point for the new stp.
> > > +	// Look for any uses of DEF that will consume the new stp.
> > > +	gcc_assert (*def->insn () <= *m_dest
> > > +		    && *def->next_def ()->insn () > *m_dest);
> > > +
> > > +	if (auto set = dyn_cast<set_info *> (def))
> > 
> > I think this should be an unconditional as_a.  Clobbers of memory
> > shouldn't be a thing, and it's not obvious that doing nothing here
> > would be the correct behaviour.
> 
> Agreed, thanks.
> 
> > 
> > The patch looks good to me with that change and the one that you
> > pointed out in your reply.
> > 
> > However, as a follow-on, we should probably also handle debug
> > instructions.  The conservatively correct thing to do would be
> > to reset all debug uses that occur after *m_dest, since we (rightly)
> > haven't checked them for aliases before attempting the change.
> > A more expensive alternative would be to check each use for aliases
> > and only reset where necessary.
> > 
> > Unfortunately, that would apply to subsequent defs too, up to the
> > later of the two original instructions.
> > 
> > I'm not sure how good other passes are doing this kind of update though.
> 
> Yeah, should be handled by the PR113089 fixes (as you realised in later
> reviews).
> 
> Thanks a lot for the reviews!  I'll re-test the series with those
> changes.

Testing went OK, so I've pushed the series (with the requested changes)
as:

ef86659da9d aarch64: Fix up uses of mem following stp insert [PR113070]
6dd613df590 rtl-ssa: Ensure new defs get inserted [PR113070]
fce3994d04f rtl-ssa: Support for creating new uses [PR113070]
e0374b028a6 rtl-ssa: Run finalize_new_accesses forwards [PR113070]

Thanks,
Alex

> 
> Alex
> 
> > 
> > Thanks,
> > Richard
> > 
> > > +	  for (auto use : set->nondebug_insn_uses ())
> > > +	    if (*use->insn () > *m_dest)
> > > +	      {
> > > +		m_use = use;
> > > +		break;
> > > +	      }
> > > +
> > > +	if (m_use)
> > > +	  m_state = state::FIXUP_USE;
> > > +	else
> > > +	  m_state = state::LAST;
> > > +	break;
> > > +      }
> > > +      case state::FIXUP_USE:
> > > +	m_use = m_use->next_nondebug_insn_use ();
> > > +	if (!m_use)
> > > +	  m_state = state::LAST;
> > > +	break;
> > > +      case state::LAST:
> > > +	m_state = state::DONE;
> > > +	break;
> > > +      case state::DONE:
> > > +	gcc_unreachable ();
> > > +      }
> > > +  }
> > > +
> > > +private:
> > > +  state m_state;
> > > +
> > > +  // Original candidate stores.
> > > +  insn_info *m_insns[2];
> > > +
> > > +  // If non-null, this is a candidate insn to change into an stp.  Otherwise we
> > > +  // are deleting both original insns and inserting a new insn for the stp.
> > > +  insn_info *m_repurpose;
> > > +
> > > +  // Destionation of the stp, it will be placed immediately after m_dest.
> > > +  insn_info *m_dest;
> > > +
> > > +  // Current nondebug use that needs updating due to stp insertion.
> > > +  use_info *m_use;
> > > +};
> > > +
> > >  // Given candidate store insns FIRST and SECOND, see if we can re-purpose one
> > >  // of them (together with its def of memory) for the stp insn.  If so, return
> > >  // that insn.  Otherwise, return null.
> > >  static insn_info *
> > > -decide_stp_strategy (insn_info *first,
> > > +try_repurpose_store (insn_info *first,
> > >  		     insn_info *second,
> > >  		     const insn_range_info &move_range)
> > >  {
> > > @@ -1253,7 +1380,7 @@ ldp_bb_info::fuse_pair (bool load_p,
> > >  
> > >    insn_info *insns[2] = { first, second };
> > >  
> > > -  auto_vec<insn_change *, 4> changes (4);
> > > +  auto_vec<insn_change *> changes;
> > >    auto_vec<int, 2> tombstone_uids (2);
> > >  
> > >    rtx pats[2] = {
> > > @@ -1455,9 +1582,9 @@ ldp_bb_info::fuse_pair (bool load_p,
> > >  
> > >    if (load_p)
> > >      {
> > > -      changes.quick_push (make_delete (first));
> > > +      changes.safe_push (make_delete (first));
> > >        pair_change = make_change (second);
> > > -      changes.quick_push (pair_change);
> > > +      changes.safe_push (pair_change);
> > >  
> > >        pair_change->move_range = move_range;
> > >        pair_change->new_defs = merge_access_arrays (attempt,
> > > @@ -1474,18 +1601,22 @@ ldp_bb_info::fuse_pair (bool load_p,
> > >      }
> > >    else
> > >      {
> > > -      insn_info *store_to_change = decide_stp_strategy (first, second,
> > > +      using Action = stp_change_builder::action;
> > > +      insn_info *store_to_change = try_repurpose_store (first, second,
> > >  							move_range);
> > > -
> > > -      if (store_to_change && dump_file)
> > > -	fprintf (dump_file, "  stp: re-purposing store %d\n",
> > > -		 store_to_change->uid ());
> > > -
> > > +      insn_info *stp_dest = move_range.singleton ();
> > > +      gcc_assert (stp_dest);
> > > +      stp_change_builder builder (insns, store_to_change, stp_dest);
> > >        insn_change *change;
> > > -      for (int i = 0; i < 2; i++)
> > > +      set_info *new_set = nullptr;
> > > +      for (; !builder.done (); builder.advance ())
> > >  	{
> > > -	  change = make_change (insns[i]);
> > > -	  if (insns[i] == store_to_change)
> > > +	  auto action = builder.get_change ();
> > > +	  change = (action.type == Action::INSERT)
> > > +	    ? nullptr : make_change (action.insn);
> > > +	  switch (action.type)
> > > +	    {
> > > +	    case Action::CHANGE:
> > >  	    {
> > >  	      set_pair_pat (change);
> > >  	      change->new_uses = merge_access_arrays (attempt,
> > > @@ -1495,67 +1626,76 @@ ldp_bb_info::fuse_pair (bool load_p,
> > >  	      auto d2 = drop_memory_access (input_defs[1]);
> > >  	      change->new_defs = merge_access_arrays (attempt, d1, d2);
> > >  	      gcc_assert (change->new_defs.is_valid ());
> > > -	      def_info *stp_def = memory_access (store_to_change->defs ());
> > > +	      def_info *stp_def = memory_access (change->insn ()->defs ());
> > >  	      change->new_defs = insert_access (attempt,
> > >  						stp_def,
> > >  						change->new_defs);
> > >  	      gcc_assert (change->new_defs.is_valid ());
> > >  	      change->move_range = move_range;
> > >  	      pair_change = change;
> > > +	      break;
> > >  	    }
> > > -	  else
> > > +	    case Action::TOMBSTONE:
> > >  	    {
> > > -	      // Note that we are turning this insn into a tombstone,
> > > -	      // we need to keep track of these if we go ahead with the
> > > -	      // change.
> > > -	      tombstone_uids.quick_push (insns[i]->uid ());
> > > -	      rtx_insn *rti = insns[i]->rtl ();
> > > +	      tombstone_uids.quick_push (change->insn ()->uid ());
> > > +	      rtx_insn *rti = change->insn ()->rtl ();
> > >  	      validate_change (rti, &PATTERN (rti), gen_tombstone (), true);
> > >  	      validate_change (rti, &REG_NOTES (rti), NULL_RTX, true);
> > >  	      change->new_uses = use_array (nullptr, 0);
> > > +	      break;
> > >  	    }
> > > -	  gcc_assert (change->new_uses.is_valid ());
> > > -	  changes.quick_push (change);
> > > -	}
> > > +	    case Action::INSERT:
> > > +	    {
> > > +	      if (dump_file)
> > > +		fprintf (dump_file,
> > > +			 "  stp: cannot re-purpose candidate stores\n");
> > >  
> > > -      if (!store_to_change)
> > > -	{
> > > -	  // Tricky case.  Cannot re-purpose existing insns for stp.
> > > -	  // Need to insert new insn.
> > > -	  if (dump_file)
> > > -	    fprintf (dump_file,
> > > -		     "  stp fusion: cannot re-purpose candidate stores\n");
> > > -
> > > -	  auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> > > -	  change = make_change (new_insn);
> > > -	  change->move_range = move_range;
> > > -	  change->new_uses = merge_access_arrays (attempt,
> > > -						  input_uses[0],
> > > -						  input_uses[1]);
> > > -	  gcc_assert (change->new_uses.is_valid ());
> > > -
> > > -	  auto d1 = drop_memory_access (input_defs[0]);
> > > -	  auto d2 = drop_memory_access (input_defs[1]);
> > > -	  change->new_defs = merge_access_arrays (attempt, d1, d2);
> > > -	  gcc_assert (change->new_defs.is_valid ());
> > > -
> > > -	  auto new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> > > -	  change->new_defs = insert_access (attempt, new_set,
> > > -					    change->new_defs);
> > > -	  gcc_assert (change->new_defs.is_valid ());
> > > -	  changes.safe_insert (1, change);
> > > -	  pair_change = change;
> > > +	      auto new_insn = crtl->ssa->create_insn (attempt, INSN, pair_pat);
> > > +	      change = make_change (new_insn);
> > > +	      change->move_range = move_range;
> > > +	      change->new_uses = merge_access_arrays (attempt,
> > > +						      input_uses[0],
> > > +						      input_uses[1]);
> > > +	      gcc_assert (change->new_uses.is_valid ());
> > > +
> > > +	      auto d1 = drop_memory_access (input_defs[0]);
> > > +	      auto d2 = drop_memory_access (input_defs[1]);
> > > +	      change->new_defs = merge_access_arrays (attempt, d1, d2);
> > > +	      gcc_assert (change->new_defs.is_valid ());
> > > +
> > > +	      new_set = crtl->ssa->create_set (attempt, new_insn, memory);
> > > +	      change->new_defs = insert_access (attempt, new_set,
> > > +						change->new_defs);
> > > +	      gcc_assert (change->new_defs.is_valid ());
> > > +	      pair_change = change;
> > > +	      break;
> > > +	    }
> > > +	    case Action::FIXUP_USE:
> > > +	    {
> > > +	      // This use now needs to consume memory from our stp.
> > > +	      if (dump_file)
> > > +		fprintf (dump_file,
> > > +			 "  stp: changing i%d to use mem from new stp "
> > > +			 "(after i%d)\n",
> > > +			 action.insn->uid (), stp_dest->uid ());
> > > +	      change->new_uses = drop_memory_access (change->new_uses);
> > > +	      gcc_assert (new_set);
> > > +	      auto new_use = crtl->ssa->create_use (attempt, action.insn,
> > > +						    new_set);
> > > +	      change->new_uses = insert_access (attempt, new_use,
> > > +						change->new_uses);
> > > +	      break;
> > > +	    }
> > > +	    }
> > > +	  changes.safe_push (change);
> > >  	}
> > >      }
> > >  
> > >    if (trailing_add)
> > >      changes.quick_push (make_delete (trailing_add));
> > >  
> > > -  auto n_changes = changes.length ();
> > > -  gcc_checking_assert (n_changes >= 2 && n_changes <= 4);
> > > -
> > >    auto is_changing = insn_is_changing (changes);
> > > -  for (unsigned i = 0; i < n_changes; i++)
> > > +  for (unsigned i = 0; i < changes.length (); i++)
> > >      gcc_assert (rtl_ssa::restrict_movement_ignoring (*changes[i], is_changing));
> > >  
> > >    // Check the pair pattern is recog'd.

      reply	other threads:[~2024-01-23 13:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-13 15:46 Alex Coplan
2024-01-15 15:39 ` Alex Coplan
2024-01-22 15:59 ` Richard Sandiford
2024-01-22 21:50   ` Alex Coplan
2024-01-23 13:30     ` Alex Coplan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Za+/aiqRg+ZUOXLh@arm.com \
    --to=alex.coplan@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kyrylo.tkachov@arm.com \
    --cc=richard.earnshaw@arm.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).