public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Alex Coplan <alex.coplan@arm.com>
To: Ajit Agarwal <aagarwa1@linux.ibm.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
	"Kewen.Lin" <linkw@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Michael Meissner <meissner@linux.ibm.com>,
	David Edelsohn <dje.gcc@gmail.com>,
	Peter Bergner <bergner@linux.ibm.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file
Date: Fri, 12 Apr 2024 15:45:38 +0100	[thread overview]
Message-ID: <ZhlJEoAW9izedHqA@arm.com> (raw)
In-Reply-To: <595cd863-3ba5-492b-84d1-ab470411507e@linux.ibm.com>

On 12/04/2024 20:02, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 11/04/24 7:55 pm, Alex Coplan wrote:
> > On 10/04/2024 23:48, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 10/04/24 7:52 pm, Alex Coplan wrote:
> >>> Hi Ajit,
> >>>
> >>> On 10/04/2024 15:31, Ajit Agarwal wrote:
> >>>> Hello Alex:
> >>>>
> >>>> On 10/04/24 1:42 pm, Alex Coplan wrote:
> >>>>> Hi Ajit,
> >>>>>
> >>>>> On 09/04/2024 20:59, Ajit Agarwal wrote:
> >>>>>> Hello Alex:
> >>>>>>
> >>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
> >>>>>>>> Hello Alex:
> >>>>>>>>
> >>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>>>>>>>>>> Hello Alex/Richard:
> >>>>>>>>>>>>
> >>>>>>>>>>>> All review comments are incorporated.
> >>> <snip>
> >>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list,
> >>>>>>>>>>>>  // of accesses.  If we find two sets of adjacent accesses, call
> >>>>>>>>>>>>  // merge_pairs.
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>> -				 access_group &group)
> >>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>> +					 access_group &group)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    const auto lfs = decode_lfs (encoded_lfs);
> >>>>>>>>>>>>    const unsigned access_size = lfs.size;
> >>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>>  		       access.cand_insns,
> >>>>>>>>>>>>  		       lfs.load_p,
> >>>>>>>>>>>>  		       access_size);
> >>>>>>>>>>>> -	  skip_next = access.cand_insns.empty ();
> >>>>>>>>>>>> +	  skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
> >>>>>>>>>>>
> >>>>>>>>>>> As above, why is this needed?
> >>>>>>>>>>
> >>>>>>>>>> For rs6000 we want to return always true. as load store pair
> >>>>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >>>>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want
> >>>>>>>>>> to generate always true for rs6000 to skip pairs as above.
> >>>>>>>>>
> >>>>>>>>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 you have
> >>>>>>>>> load/store pair instructions where the two arms of the access are storing
> >>>>>>>>> operands of different sizes?  Or something else?
> >>>>>>>>>
> >>>>>>>>> As it stands the logic is to skip the next iteration only if we
> >>>>>>>>> exhausted all the candidate insns for the current access.  In the case
> >>>>>>>>> that we didn't exhaust all such candidates, then the idea is that when
> >>>>>>>>> access becomes prev_access, we can attempt to use those candidates as
> >>>>>>>>> the "left-hand side" of a pair in the next iteration since we failed to
> >>>>>>>>> use them as the "right-hand side" of a pair in the current iteration.
> >>>>>>>>> I don't see why you wouldn't want that behaviour.  Please can you
> >>>>>>>>> explain?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> In merge_pair we get the 2 load candiates one load from 0 offset and
> >>>>>>>> other load is from 16th offset. Then in next iteration we get load
> >>>>>>>> from 16th offset and other load from 32 offset. In next iteration
> >>>>>>>> we get load from 32 offset and other load from 48 offset.
> >>>>>>>>
> >>>>>>>> For example:
> >>>>>>>>
> >>>>>>>> Currently we get the load candiates as follows.
> >>>>>>>>
> >>>>>>>> pairs:
> >>>>>>>>
> >>>>>>>> load from 0th offset.
> >>>>>>>> load from 16th offset.
> >>>>>>>>
> >>>>>>>> next pairs:
> >>>>>>>>
> >>>>>>>> load from 16th offset.
> >>>>>>>> load from 32th offset.
> >>>>>>>>
> >>>>>>>> next pairs:
> >>>>>>>>
> >>>>>>>> load from 32th offset
> >>>>>>>> load from 48th offset.
> >>>>>>>>
> >>>>>>>> Instead in rs6000 we should get:
> >>>>>>>>
> >>>>>>>> pairs:
> >>>>>>>>
> >>>>>>>> load from 0th offset
> >>>>>>>> load from 16th offset.
> >>>>>>>>
> >>>>>>>> next pairs:
> >>>>>>>>
> >>>>>>>> load from 32th offset
> >>>>>>>> load from 48th offset.
> >>>>>>>
> >>>>>>> Hmm, so then I guess my question is: why wouldn't you consider merging
> >>>>>>> the pair with offsets (16,32) for rs6000?  Is it because you have a
> >>>>>>> stricter alignment requirement on the base pair offsets (i.e. they have
> >>>>>>> to be a multiple of 32 when the operand size is 16)?  So the pair
> >>>>>>> offsets have to be a multiple of the entire pair size rather than a
> >>>>>>> single operand size> 
> >>>>>>
> >>>>>> We get load pair at a certain point with (0,16) and other program
> >>>>>> point we get load pair (32, 48).
> >>>>>>
> >>>>>> In current implementation it takes offsets loads as (0, 16),
> >>>>>> (16, 32), (32, 48).
> >>>>>>
> >>>>>> But In rs6000 we want  the load pair to be merged at different points
> >>>>>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with
> >>>>>> 0 offset and other load (32, 48) with lxvp with 32 offset.
> >>>>>>
> >>>>>> In current case it will merge with lxvp with 0 offset and lxvp with
> >>>>>> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which
> >>>>>> is incorrect in our case as the (16-32) case 16 offset will not
> >>>>>> load from even register and break for rs6000.
> >>>>>
> >>>>> Sorry, I think I'm still missing something here.  Why does the address offset
> >>>>> affect the parity of the tranfser register?  ISTM they needn't be related at
> >>>>> all (and indeed we can't even know the parity of the transfer register before
> >>>>> RA, but perhaps you're only intending on running the pass after RA?)
> >>>>>
> >>>>
> >>>> We have load pair with (0,16) wherein these loads are adjacent and
> >>>> replaced with lxvp.
> >>>>
> >>>> Semantic of lxvp instruction is that it loads adjacent load pair in
> >>>> even register and even_register + 1.
> >>>>
> >>>> We replace the above load pair with lxvp instruction and then we
> >>>> dont need to merge (16,32) as (0, 16) is already merged and instead
> >>>> we merge (32,48).
> >>>
> >>> Ok, but the existing logic should already account for this.  I.e. if we
> >>> successfully merge (0,16), then we don't attempt to merge (16,32).  We'd only
> >>> attempt to merge (16,32) if the merge of (0,16) failed (for whatever reason).
> >>> So I don't see that there's anything specific to lxvp that requires this logic
> >>> to change, _unless_ you have a stricter alignment requirement on the offsets as
> >>> I mentioned before.
> >>>
> >>
> >> Thanks for the suggestion. It worked for rs6000 also with current changes.
> >> Sorry for the confusion.
> > 
> > Alright, glad we got to the bottom of this!
> 
> Thanks.
> > 
> >>
> >>>>
> >>>> Yes you are correct, the addresss offset doesn't affect the parity of
> >>>> the register transfer as we are doing fusion pass before RA.
> >>>> If that is the case then I think it would be better to introduce a
> >>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base
> >>>>>>> offset of the pair (prev_access->offset in transform_for_base).  I guess
> >>>>>>> it would also take access_size as a parameter and for aarch64 it should
> >>>>>>> check:
> >>>>>>>
> >>>>>>>   multiple_p (offset, access_size)
> >>>>>>>
> >>>>>>> and for rs6000 it could check:
> >>>>>>>
> >>>>>>>   multiple_p (offset, access_size * 2)
> >>>>>>>
> >>>>>>> and we would then incorporate a call to that predicate in the else if
> >>>>>>> condition of tranform_for_base.
> >>>>>>>
> >>>>>>> It would have the same limitation whereby we assume that MEM_EXPR offset
> >>>>>>> alignment is a good proxy for RTL offset alignment, but we already
> >>>>>>> depend on that with the multiple_p check in track_via_mem_expr.
> >>>>>>>
> >>>> I have addressed the above hooks and it worked fine with both rs6000
> >>>> and aarch64. I am sending subsequent patch in some time that address
> >>>> above.
> >>>>
> >>>>> How do you plan on handling this even-odd requirement for rs6000?
> >>>>>
> >>>>
> >>>> We plan to handle with V16QI subreg: 0 and V16QI subreg : 16 to
> >>>> generate register pair and thats what we generate and implement
> >>>> in rs6000 target
> >>>> code.
> >>>
> >>> Ah, this is coming back to me now.  Sorry, I should have remembered this from
> >>> the previous discussion with Richard S.
> >>>
> >>> Apologies for going on a slight tangent here, but if you're running
> >>> before RA are you planning to create a new OImode pseudo register for
> >>> the lxvp insn and then somehow update uses of the old transfer registers
> >>> to replace them with subregs of that OImode pseudo?  
> >>
> >> Yes I do the same as you have mentioned. We generate register pairs
> >> with 256 bit mode with two subregs of 128 bit modes with 0 and
> >> 16 offset.
> >>
> >> Or do you just plan
> >>> or replacing the individual loads with moves (instead of deleting them)?
> >>> I guess the latter would be simpler and might work better in the
> >>> presence of hard regs.
> >>>
> >>
> >> Would you mind explaining how to generate register pairs with lxvp by
> >> replacing loads with moves.
> > 
> > Yeah, so suppose you have something like:
> > 
> > (set (reg:V16QI v1) (mem:V16QI addr))
> > (set (reg:V16QI v2) (mem:V16QI addr+16))
> > 
> > then when you insert the lxvp you can then (logically) replace the
> > original load instructions with moves from the appropriate subreg, as
> > follows:
> > 
> > (set (reg:OI pair-pseudo) (mem:OI addr)) ; lxvp
> > (set (reg:V16QI v1) (subreg:V16QI (reg:OI pair-pseudo) 0))
> > (set (reg:V16QI v2) (subreg:V16QI (reg:OI pair-pseudo) 16))
> > 
> 
> Any Pseudo created with gen_rtx_REG like 
> gen_rtx_REG (OOmode, REGNO (dest_exp) will error'ed out 
> with unrecognize insn by LRA.

I'm not surprised that goes wrong: you can't just create a new REG
rtx in a different mode and reuse the regno of an existing pseudo.

> 
> If I create pseudo with gen_reg_rtx (OOmode) will error'ed
> out with new_defs Pseudo register is not found in
> change->new_defs.

Yeah, I suppose you'd need to add an RTL-SSA def for the new pseudo.

> 
> Also the sequential register pairs are not generated by
> Register Allocator.

So how do you get the sequential pairs with your current approach?  My
understanding was that what I suggested above doesn't really change what
you're doing w.r.t the lxvp insn itself, but maybe I didn't properly
understand the approach taken in your initial patchset.

Thanks,
Alex

> 
> Thats why I haven't used above method as I also thought
> through.
> 
> Please let me know what do you think.
> 
> Thanks & Regards
> Ajit 
> > now I'm not sure off-hand if this exact combination of subregs and mode
> > changes is valid, but hopefully you get the idea.  The benefit of this
> > approach is that it keeps the transformation local and is more
> > compile-time efficient (we don't have to look through all mentions of
> > v1/v2 and replace them with subregs).  I'd hope that the RA can then
> > clean up any redundant moves (especially in the case that v1/v2 are
> > pseudos).
> > 
> > That would mean that you don't need all the grubbing around with DF
> > looking for occurrences of the transfer regs.  Instead we'd just need
> > some logic to insert the extra moves after the lxvp for rs6000, and I
> > think this might fit more cleanly into the current pass.
> > 
> > Does that make sense?  In any case, this shouldn't really affect the
> > preparatory aarch64 patch because we were planning to defer adding any
> > hooks that are only needed for rs6000 from the initial aarch64/generic
> > split.
> > 
> > Thanks,
> > Alex
> > 
> >>
> >> Thanks & Regards
> >> Ajit
> >>  
> >>> Thanks,
> >>> Alex
> >>>
> >>>>
> >>>> Thanks & Regards
> >>>> Ajit 
> >>>>> Thanks,
> >>>>> Alex
> >>>>>
> >>>>>>
> >>>>>> lxvp should load from even registers and then loaded value will
> >>>>>> be in even register and even register +1 (which is odd).
> >>>>>>
> >>>>>> Thanks & Regards
> >>>>>> Ajit
> >>>>>>> If that is the case then I think it would be better to introduce a
> >>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base
> >>>>>>> offset of the pair (prev_access->offset in transform_for_base).  I guess
> >>>>>>> it would also take access_size as a parameter and for aarch64 it should
> >>>>>>> check:
> >>>>>>>
> >>>>>>>   multiple_p (offset, access_size)
> >>>>>>>
> >>>>>>> and for rs6000 it could check:
> >>>>>>>
> >>>>>>>   multiple_p (offset, access_size * 2)
> >>>>>>>
> >>>>>>> and we would then incorporate a call to that predicate in the else if
> >>>>>>> condition of tranform_for_base.
> >>>>>>>
> >>>>>>> It would have the same limitation whereby we assume that MEM_EXPR offset
> >>>>>>> alignment is a good proxy for RTL offset alignment, but we already
> >>>>>>> depend on that with the multiple_p check in track_via_mem_expr.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Alex
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks & Regards
> >>>>>>>> Ajit
> >>>>>>>>> Thanks,
> >>>>>>>>> Alex
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>  	}
> >>>>>>>>>>>>        prev_access = &access;
> >>>>>>>>>>>>      }
> >>>>>>>>>>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>>  // and remove all the tombstone insns, being sure to reparent any uses
> >>>>>>>>>>>>  // of mem to previous defs when we do this.
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::cleanup_tombstones ()
> >>>>>>>>>>>> +pair_fusion_bb_info::cleanup_tombstones ()
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    // No need to do anything if we didn't emit a tombstone insn for this BB.
> >>>>>>>>>>>>    if (!m_emitted_tombstone)
> >>>>>>>>>>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones ()
> >>>>>>>>>>>>  
> >>>>>>>>>>>>  template<typename Map>
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::traverse_base_map (Map &map)
> >>>>>>>>>>>> +pair_fusion_bb_info::traverse_base_map (Map &map)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    for (auto kv : map)
> >>>>>>>>>>>>      {
> >>>>>>>>>>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map)
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>  
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::transform ()
> >>>>>>>>>>>> +pair_fusion_bb_info::transform ()
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    traverse_base_map (expr_map);
> >>>>>>>>>>>>    traverse_base_map (def_map);
> >>>>>>>>>>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn)
> >>>>>>>>>>>>  // for load/store candidates.  If running after RA, also try and promote
> >>>>>>>>>>>>  // non-writeback pairs to use writeback addressing.  Then try to fuse
> >>>>>>>>>>>>  // candidates into pairs.
> >>>>>>>>>>>> -void ldp_fusion_bb (bb_info *bb)
> >>>>>>>>>>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>> -  const bool track_loads
> >>>>>>>>>>>> -    = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> >>>>>>>>>>>> -  const bool track_stores
> >>>>>>>>>>>> -    = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> >>>>>>>>>>>> +  const bool track_loads = track_load_p ();
> >>>>>>>>>>>> +  const bool track_stores = track_store_p ();
> >>>>>>>>>>>>  
> >>>>>>>>>>>> -  ldp_bb_info bb_state (bb);
> >>>>>>>>>>>
> >>>>>>>>>>> This:
> >>>>>>>>>>>
> >>>>>>>>>>>> +  aarch64_pair_fusion derived;
> >>>>>>>>>>>
> >>>>>>>>>>> can be deleted and then:
> >>>>>>>>>>>
> >>>>>>>>>>>> +  pair_fusion_bb_info bb_info (bb, &derived);
> >>>>>>>>>>>
> >>>>>>>>>>> can just be:
> >>>>>>>>>>>
> >>>>>>>>>>>   pair_fusion_bb_info bb_info (bb, this);
> >>>>>>>>>>>
> >>>>>>>>>>> (or you can pass *this if you make bb_info take a reference).
> >>>>>>>>>>>
> >>>>>>>>>>> I don't think there's a particular need to change the variable name
> >>>>>>>>>>> (bb_state -> bb_info).  I chose the former because it doens't clash
> >>>>>>>>>>> with the RTL-SSA structure of the same name as the latter.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>>>  
> >>>>>>>>>>>>    for (auto insn : bb->nondebug_insns ())
> >>>>>>>>>>>>      {
> >>>>>>>>>>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb)
> >>>>>>>>>>>>  	continue;
> >>>>>>>>>>>>  
> >>>>>>>>>>>>        rtx pat = PATTERN (rti);
> >>>>>>>>>>>> -      if (reload_completed
> >>>>>>>>>>>> -	  && aarch64_ldp_writeback > 1
> >>>>>>>>>>>> -	  && GET_CODE (pat) == PARALLEL
> >>>>>>>>>>>> -	  && XVECLEN (pat, 0) == 2)
> >>>>>>>>>>>> +      if (pair_mem_promote_writeback_p (pat))
> >>>>>>>>>>>>  	try_promote_writeback (insn);
> >>>>>>>>>>>
> >>>>>>>>>>> It looks like try_promote_writeback itself will need some further work
> >>>>>>>>>>> to make it target-independent.  I suppose this check:
> >>>>>>>>>>>
> >>>>>>>>>>>   auto rti = insn->rtl ();
> >>>>>>>>>>>   const auto attr = get_attr_ldpstp (rti);
> >>>>>>>>>>>   if (attr == LDPSTP_NONE)
> >>>>>>>>>>>     return;
> >>>>>>>>>>>
> >>>>>>>>>>>   bool load_p = (attr == LDPSTP_LDP);
> >>>>>>>>>>>   gcc_checking_assert (load_p || attr == LDPSTP_STP);
> >>>>>>>>>>>
> >>>>>>>>>>> will need to become part of the pair_mem_promote_writeback_p hook that you
> >>>>>>>>>>> added, potentially changing it to return a boolean for load_p.
> >>>>>>>>>>>
> >>>>>>>>>>> Then I guess we will need hooks for destructuring the pair insn and
> >>>>>>>>>>> another hook to wrap aarch64_gen_writeback_pair.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>>>  
> >>>>>>>>>>>>        if (GET_CODE (pat) != SET)
> >>>>>>>>>>>>  	continue;
> >>>>>>>>>>>>  
> >>>>>>>>>>>>        if (track_stores && MEM_P (XEXP (pat, 0)))
> >>>>>>>>>>>> -	bb_state.track_access (insn, false, XEXP (pat, 0));
> >>>>>>>>>>>> +	bb_info.track_access (insn, false, XEXP (pat, 0));
> >>>>>>>>>>>>        else if (track_loads && MEM_P (XEXP (pat, 1)))
> >>>>>>>>>>>> -	bb_state.track_access (insn, true, XEXP (pat, 1));
> >>>>>>>>>>>> +	bb_info.track_access (insn, true, XEXP (pat, 1));
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>  
> >>>>>>>>>>>> -  bb_state.transform ();
> >>>>>>>>>>>> -  bb_state.cleanup_tombstones ();
> >>>>>>>>>>>> +  bb_info.transform ();
> >>>>>>>>>>>> +  bb_info.cleanup_tombstones ();
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>  
> >>>>>>>>>>>>  void ldp_fusion ()
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>    ldp_fusion_init ();
> >>>>>>>>>>>> +  pair_fusion *pfuse;
> >>>>>>>>>>>> +  aarch64_pair_fusion derived;
> >>>>>>>>>>>> +  pfuse = &derived;
> >>>>>>>>>>>
> >>>>>>>>>>> This is indeed the one place where I think it is acceptable to
> >>>>>>>>>>> instantiate aarch64_pair_fusion.  But again there's no need to create a
> >>>>>>>>>>> pointer to the parent class, just call any function you like directly.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>>>  
> >>>>>>>>>>>>    for (auto bb : crtl->ssa->bbs ())
> >>>>>>>>>>>> -    ldp_fusion_bb (bb);
> >>>>>>>>>>>> +    pfuse->ldp_fusion_bb (bb);
> >>>>>>>>>>>
> >>>>>>>>>>> I think even the code to iterate over bbs should itself be a member
> >>>>>>>>>>> function of pair_fusion (say "run") and then that becomes part of the
> >>>>>>>>>>> generic code.
> >>>>>>>>>>>
> >>>>>>>>>>> So this function would just become:
> >>>>>>>>>>>
> >>>>>>>>>>> aarch64_pair_fusion pass;
> >>>>>>>>>>> pass.run ();
> >>>>>>>>>>>
> >>>>>>>>>>> and could be inlined into the caller.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>> Perhaps you could also add an early return like:
> >>>>>>>>>>>
> >>>>>>>>>>>   if (!track_loads_p () && !track_stores_p ())
> >>>>>>>>>>>     return;
> >>>>>>>>>>>
> >>>>>>>>>>> in pair_fusion::run () and then remove the corresponding code from
> >>>>>>>>>>> pass_ldp_fusion::gate?
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Addressed.
> >>>>>>>>>>
> >>>>>>>>>>>>  
> >>>>>>>>>>>>    ldp_fusion_destroy ();
> >>>>>>>>>>>>  }
> >>>>>>>>>>>> -- 
> >>>>>>>>>>>> 2.39.3
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Alex
> >>>>>>>>>>
> >>>>>>>>>> Thanks & Regards
> >>>>>>>>>> Ajit

  reply	other threads:[~2024-04-12 14:45 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05  8:23 Ajit Agarwal
2024-04-05 16:33 ` Alex Coplan
2024-04-09 12:00   ` Ajit Agarwal
2024-04-09 13:59     ` Alex Coplan
2024-04-09 14:31       ` Ajit Agarwal
2024-04-09 15:09         ` Alex Coplan
2024-04-09 15:29           ` Ajit Agarwal
2024-04-10  8:12             ` Alex Coplan
2024-04-10 10:01               ` Ajit Agarwal
2024-04-10 14:22                 ` Alex Coplan
2024-04-10 18:18                   ` Ajit Agarwal
2024-04-11 14:25                     ` Alex Coplan
2024-04-12 14:32                       ` Ajit Agarwal
2024-04-12 14:45                         ` Alex Coplan [this message]
2024-04-12 17:32                           ` Ajit Agarwal
2024-04-14 16:59                             ` Ajit Agarwal
2024-04-22  7:31                               ` Ajit Agarwal
2024-05-03  8:55                                 ` Alex Coplan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhlJEoAW9izedHqA@arm.com \
    --to=alex.coplan@arm.com \
    --cc=aagarwa1@linux.ibm.com \
    --cc=bergner@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=meissner@linux.ibm.com \
    --cc=richard.sandiford@arm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).