From: Alex Coplan <alex.coplan@arm.com>
To: Ajit Agarwal <aagarwa1@linux.ibm.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
"Kewen.Lin" <linkw@linux.ibm.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
Michael Meissner <meissner@linux.ibm.com>,
David Edelsohn <dje.gcc@gmail.com>,
Peter Bergner <bergner@linux.ibm.com>,
gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file
Date: Tue, 9 Apr 2024 16:09:32 +0100 [thread overview]
Message-ID: <ZhVaLOLg8my3VA6c@arm.com> (raw)
In-Reply-To: <a1a614fb-e670-44f2-8821-40e0289ef159@linux.ibm.com>
On 09/04/2024 20:01, Ajit Agarwal wrote:
> Hello Alex:
>
> On 09/04/24 7:29 pm, Alex Coplan wrote:
> > On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>
> >>
> >> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>> Hello Alex/Richard:
> >>>>
> >>>> All review comments are incorporated.
> >>>
> >>> Thanks, I was kind-of expecting you to also send the renaming patch as a
> >>> preparatory patch as we discussed.
> >>>
> >>> Sorry for another meta comment, but: I think the reason that the Linaro
> >>> CI isn't running tests on your patches is actually because you're
> >>> sending 1/3 of a series but not sending the rest of the series.
> >>>
> >>> So please can you either send this as an individual preparatory patch
> >>> (not marked as a series) or if you're going to send a series (e.g. with
> >>> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> >>> series when you make updates. That way the CI should test your patches,
> >>> which would be helpful.
> >>>
> >>
> >> Addressed.
> >>
> >>>>
> >>>> Common infrastructure of load store pair fusion is divided into target
> >>>> independent and target dependent changed code.
> >>>>
> >>>> Target independent code is the Generic code with pure virtual function
> >>>> to interface betwwen target independent and dependent code.
> >>>>
> >>>> Target dependent code is the implementation of pure virtual function for
> >>>> aarch64 target and the call to target independent code.
> >>>>
> >>>> Thanks & Regards
> >>>> Ajit
> >>>>
> >>>>
> >>>> aarch64: Place target independent and dependent changed code in one file
> >>>>
> >>>> Common infrastructure of load store pair fusion is divided into target
> >>>> independent and target dependent changed code.
> >>>>
> >>>> Target independent code is the Generic code with pure virtual function
> >>>> to interface betwwen target independent and dependent code.
> >>>>
> >>>> Target dependent code is the implementation of pure virtual function for
> >>>> aarch64 target and the call to target independent code.
> >>>>
> >>>> 2024-04-06 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>> * config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>>> independent and dependent changed code.
> >>>
> >>> You're going to need a proper ChangeLog eventually, but I guess there's
> >>> no need for that right now.
> >>>
> >>>> ---
> >>>> gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++++++++++++++--------
> >>>> 1 file changed, 249 insertions(+), 122 deletions(-)
> >>>>
> >>>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>> index 22ed95eb743..cb21b514ef7 100644
> >>>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>> @@ -138,8 +138,122 @@ struct alt_base
> >>>> poly_int64 offset;
> >>>> };
> >>>>
> >>>> +// Virtual base class for load/store walkers used in alias analysis.
> >>>> +struct alias_walker
> >>>> +{
> >>>> + virtual bool conflict_p (int &budget) const = 0;
> >>>> + virtual insn_info *insn () const = 0;
> >>>> + virtual bool valid () const = 0;
> >>>
> >>> Heh, looking at this made me realise there is a whitespace bug here in
> >>> the existing code (double space after const). Sorry about that! I'll
> >>> push an obvious fix for that.
> >>>
> >>>> + virtual void advance () = 0;
> >>>> +};
> >>>> +
> >>>> +struct pair_fusion {
> >>>> +
> >>>> + pair_fusion () {};
> >>>
> >>> This ctor looks pointless at the moment. Perhaps instead we could put
> >>> the contents of ldp_fusion_init in here and then delete that function?
> >>>
> >>
> >> Addressed.
> >>
> >>>> + virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> >>>> + bool load_p) = 0;
> >>>
> >>> Please can we have comments above each of these virtual functions
> >>> describing any parameters, what the purpose of the hook is, and the
> >>> interpretation of the return value? This will serve as the
> >>> documentation for other targets that want to make use of the pass.
> >>>
> >>> It might make sense to have a default-false implementation for
> >>> fpsimd_op_p, especially if you don't want to make use of this bit for
> >>> rs6000.
> >>>
> >>
> >> Addressed.
> >>
> >>>> +
> >>>> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> >>>> + virtual bool pair_trailing_writeback_p () = 0;
> >>>
> >>> Sorry for the run-around, but: I think this and
> >>> handle_writeback_opportunities () should be the same function, either
> >>> returning an enum or taking an enum and returning a boolean.
> >>>
> >>> At a minimum they should have more similar sounding names.
> >>>
> >>
> >> Addressed.
> >>
> >>>> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> >>>> + machine_mode mem_mode) = 0;
> >>>> + virtual int pair_mem_alias_check_limit () = 0;
> >>>> + virtual bool handle_writeback_opportunities () = 0 ;
> >>>> + virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> >>>> + machine_mode mode) = 0;
> >>>> + virtual rtx gen_mem_pair (rtx *pats, rtx writeback,
> >>>
> >>> Nit: excess whitespace after pats,
> >>>
> >>>> + bool load_p) = 0;
> >>>> + virtual bool pair_mem_promote_writeback_p (rtx pat) = 0;
> >>>> + virtual bool track_load_p () = 0;
> >>>> + virtual bool track_store_p () = 0;
> >>>
> >>> I think it would probably make more sense for these two to have
> >>> default-true implementations rather than being pure virtual functions.
> >>>
> >>> Also, sorry for the bikeshedding, but please can we keep the plural
> >>> names (so track_loads_p and track_stores_p).
> >>>
> >>>> + virtual bool cand_insns_empty_p (std::list<insn_info *> &insns) = 0;
> >>>
> >>> Why does this need to be virtualised? I would it expect it to
> >>> just be insns.empty () on all targets.
> >>>
> >>>> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT off) = 0;
> >>>> + void ldp_fusion_bb (bb_info *bb);
> >>>
> >>> Just to check, are you planning to rename this function in a separate
> >>> patch?
> >>>
> >>
> >> Yes.
> >>>> +
> >>>> + ~pair_fusion() { }
> >>>
> >>> As with the ctor, perhaps we could put the contents of ldp_fusion_destroy
> >>> in here and then delete that function?
> >>>
> >>
> >> Addressed.
> >>>> +};
> >>>> +
> >>>> +struct aarch64_pair_fusion : public pair_fusion
> >>>> +{
> >>>> +public:
> >>>
> >>
> >> Addressed.
> >>> Nit: there shouldn't be a need for this as public is already the default
> >>> for structs.
> >>>
> >>>> + aarch64_pair_fusion () : pair_fusion () {};
> >>>
> >>> This ctor looks pointless, the default one should do, so can we remove
> >>> this?
> >>>
> >>
> >> Addressed.
> >>>> + bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> >>>> + bool load_p) override final
> >>>> + {
> >>>> + const bool fpsimd_op_p
> >>>> + = reload_completed
> >>>> + ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
> >>>> + : (GET_MODE_CLASS (mem_mode) != MODE_INT
> >>>> + && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
> >>>> + return fpsimd_op_p;
> >>>
> >>> Can this just directly return the boolean expression instead of storing
> >>> it into a variable first?
> >>>
> >> Addressed.
> >>>> + }
> >>>> +
> >>>> + bool pair_mem_promote_writeback_p (rtx pat)
> >>>> + {
> >>>> + if (reload_completed
> >>>> + && aarch64_ldp_writeback > 1
> >>>> + && GET_CODE (pat) == PARALLEL
> >>>> + && XVECLEN (pat, 0) == 2)
> >>>> + return true;
> >>>> +
> >>>> + return false;
> >>>> + }
> >>>> +
> >>>> + bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> >>>> + machine_mode mode)
> >>>> + {
> >>>> + return aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
> >>>> + load_p,
> >>>> + mode);
> >>>> + }
> >>>> + bool pair_operand_mode_ok_p (machine_mode mode);
> >>>
> >>> Why have you implemented this (one-line function) out-of-line but the
> >>> others are implemented inline? Please be consistent.
> >>>
> >>
> >> Addressed.
> >>>> +
> >>>> + rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p);
> >>>> +
> >>>> + bool pair_trailing_writeback_p ()
> >>>> + {
> >>>> + return aarch64_ldp_writeback > 1;
> >>>> + }
> >>>> + bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> >>>> + machine_mode mem_mode)
> >>>> + {
> >>>> + return (load_p
> >>>> + ? aarch64_ldp_reg_operand (reg_op, mem_mode)
> >>>> + : aarch64_stp_reg_operand (reg_op, mem_mode));
> >>>> + }
> >>>> + int pair_mem_alias_check_limit ()
> >>>> + {
> >>>> + return aarch64_ldp_alias_check_limit;
> >>>> + }
> >>>> + bool handle_writeback_opportunities ()
> >>>> + {
> >>>> + return aarch64_ldp_writeback;
> >>>> + }
> >>>> + bool track_load_p ()
> >>>> + {
> >>>> + const bool track_loads
> >>>> + = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> >>>> + return track_loads;
> >>>> + }
> >>>
> >>
> >> Addressed.
> >>> As above, can this just return the boolean expression directly without
> >>> the temporary?
> >>>
> >>>> + bool track_store_p ()
> >>>> + {
> >>>> + const bool track_stores
> >>>> + = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> >>>> + return track_stores;
> >>>> + }
> >>>
> >>
> >> Addressed.
> >>> Likewise.
> >>>
> >>>> + bool cand_insns_empty_p (std::list<insn_info *> &insns)
> >>>> + {
> >>>> + return insns.empty();
> >>>> + }
> >>>
> >>> As mentioned on the declaration, I don't see why this needs to be
> >>> virtualised.
> >>>
> >>>> + bool pair_mem_in_range_p (HOST_WIDE_INT off)
> >>>> + {
> >>>> + return (off < LDP_MIN_IMM || off > LDP_MAX_IMM);
> >>>> + }
> >>>
> >>> Hmm, this looks to test exactly the opposite condition from what the
> >>> name suggests, i.e. the implementation looks like what I would expect
> >>> for a function called pair_mem_out_of_range_p.
> >>>
> >>> Please can you invert the condition and fix up callers appropriately?
> >>>
> >>> This suggests you're not really thinking about the code you're writing,
> >>> please try to think a bit more carefully before posting patches.
> >>>
> >>
> >> Sorry for my mistake as I have overlooked it.
> >>>> +};
> >>>> +
> >>>> +
> >>>> // State used by the pass for a given basic block.
> >>>> -struct ldp_bb_info
> >>>> +struct pair_fusion_bb_info
> >>>> {
> >>>> using def_hash = nofree_ptr_hash<def_info>;
> >>>> using expr_key_t = pair_hash<tree_operand_hash, int_hash<int, -1, -2>>;
> >>>> @@ -160,14 +274,17 @@ struct ldp_bb_info
> >>>>
> >>>> static const size_t obstack_alignment = sizeof (void *);
> >>>> bb_info *m_bb;
> >>>> + pair_fusion *bb_state;
> >>>
> >>
> >> Addressed.
> >>
> >>> Can we call this m_pass instead of bb_state? The pair_fusion class is
> >>> explicitly at the pass level rather than the bb level.
> >>>
> >>> Also I think this (and indeed m_bb, although that's pre-existing) could
> >>> be private members. Not sure there's a good reason for making them
> >>> public.
> >>>
> >>
> >> Addressed.
> >>
> >>> It might be slightly nicer to use a reference for m_pass unless there's
> >>> a good reason to use a pointer, although I don't feel strongly about
> >>> this.
> >>>
> >>>>
> >>>> - ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
> >>>> + pair_fusion_bb_info (bb_info *bb,
> >>>> + aarch64_pair_fusion *d) : m_bb (bb),
> >>>
> >>> This generic class needs to take the generic pass type, i.e. just a
> >>> pair_fusion * rather than an aarch64_pair_fusion *.
> >>>
> >>>> + bb_state (d), m_emitted_tombstone (false)
> >>>> {
> >>>> obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
> >>>> obstack_alignment, obstack_chunk_alloc,
> >>>> obstack_chunk_free);
> >>>> }
> >>>> - ~ldp_bb_info ()
> >>>> + ~pair_fusion_bb_info ()
> >>>> {
> >>>> obstack_free (&m_obstack, nullptr);
> >>>>
> >>>> @@ -177,10 +294,32 @@ struct ldp_bb_info
> >>>> bitmap_obstack_release (&m_bitmap_obstack);
> >>>> }
> >>>> }
> >>>> + void track_access (insn_info *, bool load, rtx mem);
> >>>> + void transform ();
> >>>> + void cleanup_tombstones ();
> >>>
> >>> I think most (or indeed all?) of the changes here shouldn't be needed.
> >>> AFAICS, there shouldn't be a need to:
> >>> - drop inline from these functions.
> >>> - change the accessibility of any members.
> >>>
> >>
> >> Addressed.
> >>> Plese can you drop these changes or explain why they're needed (from
> >>> here until the end of the class).
> >>>
> >>>> + void merge_pairs (insn_list_t &, insn_list_t &,
> >>>> + bool load_p, unsigned access_size);
> >>>> + void transform_for_base (int load_size, access_group &group);
> >>>> +
> >>>> + bool try_fuse_pair (bool load_p, unsigned access_size,
> >>>> + insn_info *i1, insn_info *i2);
> >>>> +
> >>>> + bool fuse_pair (bool load_p, unsigned access_size,
> >>>> + int writeback,
> >>>> + insn_info *i1, insn_info *i2,
> >>>> + base_cand &base,
> >>>> + const insn_range_info &move_range);
> >>>>
> >>>> - inline void track_access (insn_info *, bool load, rtx mem);
> >>>> - inline void transform ();
> >>>> - inline void cleanup_tombstones ();
> >>>> + void do_alias_analysis (insn_info *alias_hazards[4],
> >>>> + alias_walker *walkers[4],
> >>>> + bool load_p);
> >>>> +
> >>>> + void track_tombstone (int uid);
> >>>> +
> >>>> + bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
> >>>> +
> >>>> + template<typename Map>
> >>>> + void traverse_base_map (Map &map);
> >>>>
> >>>> private:
> >>>> obstack m_obstack;
> >>>> @@ -191,30 +330,32 @@ private:
> >>>> bool m_emitted_tombstone;
> >>>>
> >>>> inline splay_tree_node<access_record *> *node_alloc (access_record *);
> >>>> +};
> >>>>
> >>>> - template<typename Map>
> >>>> - inline void traverse_base_map (Map &map);
> >>>> - inline void transform_for_base (int load_size, access_group &group);
> >>>> -
> >>>> - inline void merge_pairs (insn_list_t &, insn_list_t &,
> >>>> - bool load_p, unsigned access_size);
> >>>> -
> >>>> - inline bool try_fuse_pair (bool load_p, unsigned access_size,
> >>>> - insn_info *i1, insn_info *i2);
> >>>> -
> >>>> - inline bool fuse_pair (bool load_p, unsigned access_size,
> >>>> - int writeback,
> >>>> - insn_info *i1, insn_info *i2,
> >>>> - base_cand &base,
> >>>> - const insn_range_info &move_range);
> >>>> -
> >>>> - inline void track_tombstone (int uid);
> >>>> +rtx aarch64_pair_fusion::gen_mem_pair (rtx *pats,
> >>>
> >>> Style nit: newline after rtx.
> >>>
> >> Addressed.
> >>
> >>>> + rtx writeback,
> >>>> + bool load_p)
> >>>> + {
> >>>> + rtx pair_pat;
> >>>>
> >>>> - inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
> >>>> -};
> >>>> + if (writeback)
> >>>> + {
> >>>> + auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]);
> >>>> + pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
> >>>> + }
> >>>> + else if (load_p)
> >>>> + pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
> >>>> + XEXP (pats[1], 0),
> >>>> + XEXP (pats[0], 1));
> >>>> + else
> >>>> + pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
> >>>> + XEXP (pats[0], 1),
> >>>> + XEXP (pats[1], 1));
> >>>> + return pair_pat;
> >>>
> >>> Can the assignments just be direct returns here, please? Then
> >>> get rid of the temporary.
> >>>
> >>
> >> Addressed.
> >>
> >>>> + }
> >>>>
> >>>> splay_tree_node<access_record *> *
> >>>> -ldp_bb_info::node_alloc (access_record *access)
> >>>> +pair_fusion_bb_info::node_alloc (access_record *access)
> >>>> {
> >>>> using T = splay_tree_node<access_record *>;
> >>>> void *addr = obstack_alloc (&m_obstack, sizeof (T));
> >>>> @@ -262,7 +403,7 @@ drop_writeback (rtx mem)
> >>>> // RTX_AUTOINC addresses. The interface is like strip_offset except we take a
> >>>> // MEM so that we know the mode of the access.
> >>>> static rtx
> >>>> -ldp_strip_offset (rtx mem, poly_int64 *offset)
> >>>> +pair_mem_strip_offset (rtx mem, poly_int64 *offset)
> >>>
> >>> I thought we discussed splitting the renaming into a separate patch?
> >>>
> >>>> {
> >>>> rtx addr = XEXP (mem, 0);
> >>>>
> >>>> @@ -332,6 +473,12 @@ ldp_operand_mode_ok_p (machine_mode mode)
> >>>> return reload_completed || mode != TImode;
> >>>> }
> >>>>
> >>>> +bool
> >>>> +aarch64_pair_fusion::pair_operand_mode_ok_p (machine_mode mode)
> >>>> +{
> >>>> + return ldp_operand_mode_ok_p (mode);
> >>>> +}
> >>>> +
> >>>> // Given LFS (load_p, fpsimd_p, size) fields in FIELDS, encode these
> >>>> // into an integer for use as a hash table key.
> >>>> static int
> >>>> @@ -396,7 +543,7 @@ access_group::track (Alloc alloc_node, poly_int64 offset, insn_info *insn)
> >>>> // MEM_EXPR base (i.e. a tree decl) relative to which we can track the access.
> >>>> // LFS is used as part of the key to the hash table, see track_access.
> >>>> bool
> >>>> -ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
> >>>> +pair_fusion_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
> >>>
> >>> Again, please do any renaming in a separate patch.
> >>>
> >>
> >> Addressed.
> >>>> {
> >>>> if (!MEM_EXPR (mem) || !MEM_OFFSET_KNOWN_P (mem))
> >>>> return false;
> >>>> @@ -412,9 +559,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
> >>>> const machine_mode mem_mode = GET_MODE (mem);
> >>>> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
> >>>>
> >>>> - // Punt on misaligned offsets. LDP/STP instructions require offsets to be a
> >>>> - // multiple of the access size, and we believe that misaligned offsets on
> >>>> - // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases.
> >>>> + // Punt on misaligned offsets. Paired memory access instructions require
> >>>> + // offsets to be a multiple of the access size, and we believe that
> >>>> + // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned
> >>>> + // offsets w.r.t. RTL bases.
> >>>> if (!multiple_p (offset, mem_size))
> >>>> return false;
> >>>>
> >>>> @@ -438,46 +586,38 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
> >>>> }
> >>>>
> >>>> // Main function to begin pair discovery. Given a memory access INSN,
> >>>> -// determine whether it could be a candidate for fusing into an ldp/stp,
> >>>> +// determine whether it could be a candidate for fusing into an pair mem,
> >>>
> >>> s/an pair mem/a paired access/
> >>>
> >>>> // and if so, track it in the appropriate data structure for this basic
> >>>> // block. LOAD_P is true if the access is a load, and MEM is the mem
> >>>> // rtx that occurs in INSN.
> >>>> void
> >>>> -ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
> >>>> +pair_fusion_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
> >>>> {
> >>>> // We can't combine volatile MEMs, so punt on these.
> >>>> if (MEM_VOLATILE_P (mem))
> >>>> return;
> >>>>
> >>>> - // Ignore writeback accesses if the param says to do so.
> >>>> - if (!aarch64_ldp_writeback
> >>>> + // Ignore writeback accesses if the param says to do so
> >>>
> >>
> >> Addressed.
> >>> This comment needs updating, I guess something like "if the hook says to
> >>> do so". But also please don't delete the full stop.
> >>>
> >>>> + if (!bb_state->handle_writeback_opportunities ()
> >>>> && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
> >>>> return;
> >>>>
> >>>> const machine_mode mem_mode = GET_MODE (mem);
> >>>> - if (!ldp_operand_mode_ok_p (mem_mode))
> >>>> +
> >>>
> >>> I don't think we need the extra whitespace here.
> >>>
> >>>> + if (!bb_state->pair_operand_mode_ok_p (mem_mode))
> >>>> return;
> >>>>
> >>>> rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p);
> >>>>
> >>>> - // Ignore the access if the register operand isn't suitable for ldp/stp.
> >>>> - if (load_p
> >>>> - ? !aarch64_ldp_reg_operand (reg_op, mem_mode)
> >>>> - : !aarch64_stp_reg_operand (reg_op, mem_mode))
> >>>> + if (!bb_state->pair_reg_operand_ok_p (load_p, reg_op, mem_mode))
> >>>> return;
> >>>> -
> >>>> // We want to segregate FP/SIMD accesses from GPR accesses.
> >>>> //
> >>>> // Before RA, we use the modes, noting that stores of constant zero
> >>>> // operands use GPRs (even in non-integer modes). After RA, we use
> >>>> // the hard register numbers.
> >>>> - const bool fpsimd_op_p
> >>>> - = reload_completed
> >>>> - ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
> >>>> - : (GET_MODE_CLASS (mem_mode) != MODE_INT
> >>>> - && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
> >>>> -
> >>>> - // Note ldp_operand_mode_ok_p already rejected VL modes.
> >>>> + const bool fpsimd_op_p = bb_state->fpsimd_op_p (reg_op, mem_mode, load_p);
> >>>> + // Note pair_operand_mode_ok_p already rejected VL modes.
> >>>> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
> >>>> const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size };
> >>>>
> >>>> @@ -487,7 +627,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
> >>>> poly_int64 mem_off;
> >>>> rtx addr = XEXP (mem, 0);
> >>>> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC;
> >>>> - rtx base = ldp_strip_offset (mem, &mem_off);
> >>>> + rtx base = pair_mem_strip_offset (mem, &mem_off);
> >>>
> >>> Again, let's split the renaming off into a separate patch.
> >>>
> >>
> >> Addressed.
> >>>> if (!REG_P (base))
> >>>> return;
> >>>>
> >>>> @@ -506,8 +646,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
> >>>> // elimination offset pre-RA, we should postpone forming pairs on such
> >>>> // accesses until after RA.
> >>>> //
> >>>> - // As it stands, addresses with offsets in range for LDR but not
> >>>> - // in range for LDP/STP are currently reloaded inefficiently,
> >>>> + // As it stands, addresses in range for an individual load/store but not
> >>>> + // for a paired access are currently reloaded inefficiently,
> >>>> // ending up with a separate base register for each pair.
> >>>> //
> >>>> // In theory LRA should make use of
> >>>> @@ -519,8 +659,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
> >>>> // that calls targetm.legitimize_address_displacement.
> >>>> //
> >>>> // So for now, it's better to punt when we can't be sure that the
> >>>> - // offset is in range for LDP/STP. Out-of-range cases can then be
> >>>> - // handled after RA by the out-of-range LDP/STP peepholes. Eventually, it
> >>>> + // offset is in range for paired access. Out-of-range cases can then be
> >>>> + // handled after RA by the out-of-range PAIR MEM peepholes. Eventually, it
> >>>> // would be nice to handle known out-of-range opportunities in the
> >>>> // pass itself (for stack accesses, this would be in the post-RA pass).
> >>>> if (!reload_completed
> >>>> @@ -573,8 +713,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
> >>>> gcc_unreachable (); // Base defs should be unique.
> >>>> }
> >>>>
> >>>> - // Punt on misaligned offsets. LDP/STP require offsets to be a multiple of
> >>>> - // the access size.
> >>>> + // Punt on misaligned offsets. Paired memory accesses require offsets
> >>>> + // to be a multiple of the access size.
> >>>> if (!multiple_p (mem_off, mem_size))
> >>>> return;
> >>>>
> >>>> @@ -614,7 +754,7 @@ static bool no_ignore (insn_info *) { return false; }
> >>>> // making use of alias disambiguation.
> >>>> static insn_info *
> >>>> latest_hazard_before (insn_info *insn, rtx *ignore,
> >>>> - insn_info *ignore_insn = nullptr)
> >>>> + insn_info *ignore_insn = 0)
> >>>
> >>> Is this change needed?
> >>>
> >>
> >> Addressed.
> >>>> {
> >>>> insn_info *result = nullptr;
> >>>>
> >>>> @@ -1150,7 +1290,7 @@ extract_writebacks (bool load_p, rtx pats[2], int changed)
> >>>> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC;
> >>>>
> >>>> poly_int64 offset;
> >>>> - rtx this_base = ldp_strip_offset (mem, &offset);
> >>>> + rtx this_base = pair_mem_strip_offset (mem, &offset);
> >>>> gcc_assert (REG_P (this_base));
> >>>> if (base_reg)
> >>>> gcc_assert (rtx_equal_p (base_reg, this_base));
> >>>> @@ -1286,7 +1426,11 @@ find_trailing_add (insn_info *insns[2],
> >>>>
> >>>> off_hwi /= access_size;
> >>>>
> >>>> - if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM)
> >>>> + pair_fusion *pfuse;
> >>>> + aarch64_pair_fusion derived;
> >>>> + pfuse = &derived;
> >>>
> >>> Huh? We'll want to use the same instance of the pass structure (pair_fusion)
> >>> that gets created at the top level, so that should get passed around everywhere.
> >>>
> >>> In the case of find_trailing_add we probably don't want to add any more
> >>> parameters (as Segher noted there are already too many), so perhaps the
> >>> easiest thing would be to make find_trailing_add itself a (non-virtual)
> >>> member function of pair_fusion.
> >>>
> >>> Then here we can just query pair_mem_in_range_p directly.
> >>>
> >>> For try_promote_writeback itself you can either make it a non-virtual
> >>> member function of pair_fusion or just pass a reference to the pass
> >>> object in when you call it.
> >>>
> >>> Note also that you don't have to create a pointer of the parent type in
> >>> order to call a virtual function on a derived object, you can just call
> >>> it directly.
> >>>
> >>
> >> Addressed.
> >>>> +
> >>>> + if (pfuse->pair_mem_in_range_p (off_hwi))
> >>>> return nullptr;
> >>>
> >>> Note of course this condition needs inverting when you invert the
> >>> implementation.
> >>>
> >>
> >> Addressed.
> >>>>
> >>>> auto dump_prefix = [&]()
> >>>> @@ -1328,7 +1472,7 @@ find_trailing_add (insn_info *insns[2],
> >>>> // We just emitted a tombstone with uid UID, track it in a bitmap for
> >>>> // this BB so we can easily identify it later when cleaning up tombstones.
> >>>> void
> >>>> -ldp_bb_info::track_tombstone (int uid)
> >>>> +pair_fusion_bb_info::track_tombstone (int uid)
> >>>
> >>> Again, please split the renaming off. Same with other instances below.
> >>>
> >>>> {
> >>>> if (!m_emitted_tombstone)
> >>>> {
> >>>> @@ -1528,7 +1672,7 @@ fixup_debug_uses (obstack_watermark &attempt,
> >>>> gcc_checking_assert (GET_RTX_CLASS (GET_CODE (XEXP (mem, 0)))
> >>>> == RTX_AUTOINC);
> >>>>
> >>>> - base = ldp_strip_offset (mem, &offset);
> >>>> + base = pair_mem_strip_offset (mem, &offset);
> >>>> gcc_checking_assert (REG_P (base) && REGNO (base) == base_regno);
> >>>> }
> >>>> fixup_debug_use (attempt, use, def, base, offset);
> >>>> @@ -1664,7 +1808,7 @@ fixup_debug_uses (obstack_watermark &attempt,
> >>>> // BASE gives the chosen base candidate for the pair and MOVE_RANGE is
> >>>> // a singleton range which says where to place the pair.
> >>>> bool
> >>>> -ldp_bb_info::fuse_pair (bool load_p,
> >>>> +pair_fusion_bb_info::fuse_pair (bool load_p,
> >>>> unsigned access_size,
> >>>> int writeback,
> >>>> insn_info *i1, insn_info *i2,
> >>>> @@ -1800,7 +1944,7 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>>> {
> >>>> if (dump_file)
> >>>> fprintf (dump_file,
> >>>> - " ldp: i%d has wb but subsequent i%d has non-wb "
> >>>> + " load pair: i%d has wb but subsequent i%d has non-wb "
> >>>> "update of base (r%d), dropping wb\n",
> >>>> insns[0]->uid (), insns[1]->uid (), base_regno);
> >>>> gcc_assert (writeback_effect);
> >>>> @@ -1823,7 +1967,7 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>>> }
> >>>>
> >>>> // If either of the original insns had writeback, but the resulting pair insn
> >>>> - // does not (can happen e.g. in the ldp edge case above, or if the writeback
> >>>> + // does not (can happen e.g. in the load pair edge case above, or if the writeback
> >>>> // effects cancel out), then drop the def(s) of the base register as
> >>>> // appropriate.
> >>>> //
> >>>> @@ -1842,7 +1986,7 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>>> // update of the base register and try and fold it in to make this into a
> >>>> // writeback pair.
> >>>> insn_info *trailing_add = nullptr;
> >>>> - if (aarch64_ldp_writeback > 1
> >>>> + if (bb_state->pair_trailing_writeback_p ()
> >>>> && !writeback_effect
> >>>> && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
> >>>> XEXP (pats[0], 0), nullptr)
> >>>> @@ -1863,14 +2007,14 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>>> }
> >>>>
> >>>> // Now that we know what base mem we're going to use, check if it's OK
> >>>> - // with the ldp/stp policy.
> >>>> + // with the pair mem policy.
> >>>> rtx first_mem = XEXP (pats[0], load_p);
> >>>> - if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
> >>>> - load_p,
> >>>> - GET_MODE (first_mem)))
> >>>> + if (!bb_state->pair_mem_ok_with_policy (first_mem,
> >>>> + load_p,
> >>>> + GET_MODE (first_mem)))
> >>>> {
> >>>> if (dump_file)
> >>>> - fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n",
> >>>> + fprintf (dump_file, "punting on pair (%d,%d), pair mem policy says no\n",
> >>>> i1->uid (), i2->uid ());
> >>>> return false;
> >>>> }
> >>>> @@ -1878,21 +2022,10 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>>> rtx reg_notes = combine_reg_notes (first, second, load_p);
> >>>>
> >>>> rtx pair_pat;
> >>>> - if (writeback_effect)
> >>>> - {
> >>>> - auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]);
> >>>> - pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
> >>>> - }
> >>>> - else if (load_p)
> >>>> - pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
> >>>> - XEXP (pats[1], 0),
> >>>> - XEXP (pats[0], 1));
> >>>> - else
> >>>> - pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
> >>>> - XEXP (pats[0], 1),
> >>>> - XEXP (pats[1], 1));
> >>>>
> >>>> + pair_pat = bb_state->gen_mem_pair (pats, writeback_effect, load_p);
> >>>> insn_change *pair_change = nullptr;
> >>>> +
> >>>> auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) {
> >>>> rtx_insn *rti = change->insn ()->rtl ();
> >>>> validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
> >>>> @@ -2133,15 +2266,6 @@ load_modified_by_store_p (insn_info *load,
> >>>> return false;
> >>>> }
> >>>>
> >>>> -// Virtual base class for load/store walkers used in alias analysis.
> >>>> -struct alias_walker
> >>>> -{
> >>>> - virtual bool conflict_p (int &budget) const = 0;
> >>>> - virtual insn_info *insn () const = 0;
> >>>> - virtual bool valid () const = 0;
> >>>> - virtual void advance () = 0;
> >>>> -};
> >>>> -
> >>>> // Implement some common functionality used by both store_walker
> >>>> // and load_walker.
> >>>> template<bool reverse>
> >>>> @@ -2259,13 +2383,13 @@ public:
> >>>> //
> >>>> // We try to maintain the invariant that if a walker becomes invalid, we
> >>>> // set its pointer to null.
> >>>> -static void
> >>>> -do_alias_analysis (insn_info *alias_hazards[4],
> >>>> +void
> >>>> +pair_fusion_bb_info::do_alias_analysis (insn_info *alias_hazards[4],
> >>>
> >>> It would probably make more sense for this to be a (non-virtual) member
> >>> function of the pair_fusion class, since it doesn't need to access the
> >>> BB-specific state.
> >>>
> >>
> >> Addressed.
> >>>> alias_walker *walkers[4],
> >>>> bool load_p)
> >>>> {
> >>>> const int n_walkers = 2 + (2 * !load_p);
> >>>> - int budget = aarch64_ldp_alias_check_limit;
> >>>> + int budget = bb_state->pair_mem_alias_check_limit ();
> >>>>
> >>>> auto next_walker = [walkers,n_walkers](int current) -> int {
> >>>> for (int j = 1; j <= n_walkers; j++)
> >>>> @@ -2365,7 +2489,7 @@ get_viable_bases (insn_info *insns[2],
> >>>> {
> >>>> const bool is_lower = (i == reversed);
> >>>> poly_int64 poly_off;
> >>>> - rtx base = ldp_strip_offset (cand_mems[i], &poly_off);
> >>>> + rtx base = pair_mem_strip_offset (cand_mems[i], &poly_off);
> >>>> if (GET_RTX_CLASS (GET_CODE (XEXP (cand_mems[i], 0))) == RTX_AUTOINC)
> >>>> writeback |= (1 << i);
> >>>>
> >>>> @@ -2373,7 +2497,7 @@ get_viable_bases (insn_info *insns[2],
> >>>> continue;
> >>>>
> >>>> // Punt on accesses relative to eliminable regs. See the comment in
> >>>> - // ldp_bb_info::track_access for a detailed explanation of this.
> >>>> + // pair_fusion_bb_info::track_access for a detailed explanation of this.
> >>>> if (!reload_completed
> >>>> && (REGNO (base) == FRAME_POINTER_REGNUM
> >>>> || REGNO (base) == ARG_POINTER_REGNUM))
> >>>> @@ -2397,7 +2521,11 @@ get_viable_bases (insn_info *insns[2],
> >>>> if (!is_lower)
> >>>> base_off--;
> >>>>
> >>>> - if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM)
> >>>> + pair_fusion *pfuse;
> >>>> + aarch64_pair_fusion derived;
> >>>> + pfuse = &derived;
> >>>
> >>> Again, please don't do this but instead use the same instance
> >>> everywhere.
> >>>
> >>
> >> Addressed.
> >>>> +
> >>>> + if (pfuse->pair_mem_in_range_p (base_off))
> >>>> continue;
> >>>>
> >>>> use_info *use = find_access (insns[i]->uses (), REGNO (base));
> >>>> @@ -2454,12 +2582,12 @@ get_viable_bases (insn_info *insns[2],
> >>>> }
> >>>>
> >>>> // Given two adjacent memory accesses of the same size, I1 and I2, try
> >>>> -// and see if we can merge them into a ldp or stp.
> >>>> +// and see if we can merge them into a paired accesses load and store.
> >>>
> >>> "into a load pair or store pair" or "into a paired access" if you
> >>> prefer.
> >>>
> >>
> >> Addressed.
> >>>> //
> >>>> // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true
> >>>> // if the accesses are both loads, otherwise they are both stores.
> >>>> bool
> >>>> -ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
> >>>> +pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
> >>>> insn_info *i1, insn_info *i2)
> >>>> {
> >>>> if (dump_file)
> >>>> @@ -2494,7 +2622,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
> >>>> {
> >>>> if (dump_file)
> >>>> fprintf (dump_file,
> >>>> - "punting on ldp due to reg conflcits (%d,%d)\n",
> >>>> + "punting on pair mem load due to reg conflcits (%d,%d)\n",
> >>>> insns[0]->uid (), insns[1]->uid ());
> >>>> return false;
> >>>> }
> >>>> @@ -2843,7 +2971,7 @@ debug (const insn_list_t &l)
> >>>> // we can't re-order them anyway, so provided earlier passes have cleaned up
> >>>> // redundant loads, we shouldn't miss opportunities by doing this.
> >>>> void
> >>>> -ldp_bb_info::merge_pairs (insn_list_t &left_list,
> >>>> +pair_fusion_bb_info::merge_pairs (insn_list_t &left_list,
> >>>> insn_list_t &right_list,
> >>>> bool load_p,
> >>>> unsigned access_size)
> >>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list,
> >>>> // of accesses. If we find two sets of adjacent accesses, call
> >>>> // merge_pairs.
> >>>> void
> >>>> -ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>> - access_group &group)
> >>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
> >>>> + access_group &group)
> >>>> {
> >>>> const auto lfs = decode_lfs (encoded_lfs);
> >>>> const unsigned access_size = lfs.size;
> >>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>> access.cand_insns,
> >>>> lfs.load_p,
> >>>> access_size);
> >>>> - skip_next = access.cand_insns.empty ();
> >>>> + skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
> >>>
> >>> As above, why is this needed?
> >>
> >> For rs6000 we want to return always true. as load store pair
> >> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >> And we want load store pair to 8/16 32/64. Thats why we want
> >> to generate always true for rs6000 to skip pairs as above.
> >
> > Hmm, sorry, I'm not sure I follow. Are you saying that for rs6000 you have
> > load/store pair instructions where the two arms of the access are storing
> > operands of different sizes? Or something else?
> >
> > As it stands the logic is to skip the next iteration only if we
> > exhausted all the candidate insns for the current access. In the case
> > that we didn't exhaust all such candidates, then the idea is that when
> > access becomes prev_access, we can attempt to use those candidates as
> > the "left-hand side" of a pair in the next iteration since we failed to
> > use them as the "right-hand side" of a pair in the current iteration.
> > I don't see why you wouldn't want that behaviour. Please can you
> > explain?
> >
>
> In merge_pair we get the 2 load candiates one load from 0 offset and
> other load is from 16th offset. Then in next iteration we get load
> from 16th offset and other load from 32 offset. In next iteration
> we get load from 32 offset and other load from 48 offset.
>
> For example:
>
> Currently we get the load candiates as follows.
>
> pairs:
>
> load from 0th offset.
> load from 16th offset.
>
> next pairs:
>
> load from 16th offset.
> load from 32th offset.
>
> next pairs:
>
> load from 32th offset
> load from 48th offset.
>
> Instead in rs6000 we should get:
>
> pairs:
>
> load from 0th offset
> load from 16th offset.
>
> next pairs:
>
> load from 32th offset
> load from 48th offset.
Hmm, so then I guess my question is: why wouldn't you consider merging
the pair with offsets (16,32) for rs6000? Is it because you have a
stricter alignment requirement on the base pair offsets (i.e. they have
to be a multiple of 32 when the operand size is 16)? So the pair
offsets have to be a multiple of the entire pair size rather than a
single operand size?
If that is the case then I think it would be better to introduce a
virtual function (say pair_offset_alignment_ok_p) that vets the base
offset of the pair (prev_access->offset in transform_for_base). I guess
it would also take access_size as a parameter and for aarch64 it should
check:
multiple_p (offset, access_size)
and for rs6000 it could check:
multiple_p (offset, access_size * 2)
and we would then incorporate a call to that predicate in the else if
condition of tranform_for_base.
It would have the same limitation whereby we assume that MEM_EXPR offset
alignment is a good proxy for RTL offset alignment, but we already
depend on that with the multiple_p check in track_via_mem_expr.
Thanks,
Alex
>
> Thanks & Regards
> Ajit
> > Thanks,
> > Alex
> >
> >>
> >>>
> >>>> }
> >>>> prev_access = &access;
> >>>> }
> >>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>> // and remove all the tombstone insns, being sure to reparent any uses
> >>>> // of mem to previous defs when we do this.
> >>>> void
> >>>> -ldp_bb_info::cleanup_tombstones ()
> >>>> +pair_fusion_bb_info::cleanup_tombstones ()
> >>>> {
> >>>> // No need to do anything if we didn't emit a tombstone insn for this BB.
> >>>> if (!m_emitted_tombstone)
> >>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones ()
> >>>>
> >>>> template<typename Map>
> >>>> void
> >>>> -ldp_bb_info::traverse_base_map (Map &map)
> >>>> +pair_fusion_bb_info::traverse_base_map (Map &map)
> >>>> {
> >>>> for (auto kv : map)
> >>>> {
> >>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map)
> >>>> }
> >>>>
> >>>> void
> >>>> -ldp_bb_info::transform ()
> >>>> +pair_fusion_bb_info::transform ()
> >>>> {
> >>>> traverse_base_map (expr_map);
> >>>> traverse_base_map (def_map);
> >>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn)
> >>>> // for load/store candidates. If running after RA, also try and promote
> >>>> // non-writeback pairs to use writeback addressing. Then try to fuse
> >>>> // candidates into pairs.
> >>>> -void ldp_fusion_bb (bb_info *bb)
> >>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb)
> >>>> {
> >>>> - const bool track_loads
> >>>> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> >>>> - const bool track_stores
> >>>> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
> >>>> + const bool track_loads = track_load_p ();
> >>>> + const bool track_stores = track_store_p ();
> >>>>
> >>>> - ldp_bb_info bb_state (bb);
> >>>
> >>> This:
> >>>
> >>>> + aarch64_pair_fusion derived;
> >>>
> >>> can be deleted and then:
> >>>
> >>>> + pair_fusion_bb_info bb_info (bb, &derived);
> >>>
> >>> can just be:
> >>>
> >>> pair_fusion_bb_info bb_info (bb, this);
> >>>
> >>> (or you can pass *this if you make bb_info take a reference).
> >>>
> >>> I don't think there's a particular need to change the variable name
> >>> (bb_state -> bb_info). I chose the former because it doens't clash
> >>> with the RTL-SSA structure of the same name as the latter.
> >>>
> >>
> >> Addressed.
> >>>>
> >>>> for (auto insn : bb->nondebug_insns ())
> >>>> {
> >>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb)
> >>>> continue;
> >>>>
> >>>> rtx pat = PATTERN (rti);
> >>>> - if (reload_completed
> >>>> - && aarch64_ldp_writeback > 1
> >>>> - && GET_CODE (pat) == PARALLEL
> >>>> - && XVECLEN (pat, 0) == 2)
> >>>> + if (pair_mem_promote_writeback_p (pat))
> >>>> try_promote_writeback (insn);
> >>>
> >>> It looks like try_promote_writeback itself will need some further work
> >>> to make it target-independent. I suppose this check:
> >>>
> >>> auto rti = insn->rtl ();
> >>> const auto attr = get_attr_ldpstp (rti);
> >>> if (attr == LDPSTP_NONE)
> >>> return;
> >>>
> >>> bool load_p = (attr == LDPSTP_LDP);
> >>> gcc_checking_assert (load_p || attr == LDPSTP_STP);
> >>>
> >>> will need to become part of the pair_mem_promote_writeback_p hook that you
> >>> added, potentially changing it to return a boolean for load_p.
> >>>
> >>> Then I guess we will need hooks for destructuring the pair insn and
> >>> another hook to wrap aarch64_gen_writeback_pair.
> >>>
> >>
> >> Addressed.
> >>>>
> >>>> if (GET_CODE (pat) != SET)
> >>>> continue;
> >>>>
> >>>> if (track_stores && MEM_P (XEXP (pat, 0)))
> >>>> - bb_state.track_access (insn, false, XEXP (pat, 0));
> >>>> + bb_info.track_access (insn, false, XEXP (pat, 0));
> >>>> else if (track_loads && MEM_P (XEXP (pat, 1)))
> >>>> - bb_state.track_access (insn, true, XEXP (pat, 1));
> >>>> + bb_info.track_access (insn, true, XEXP (pat, 1));
> >>>> }
> >>>>
> >>>> - bb_state.transform ();
> >>>> - bb_state.cleanup_tombstones ();
> >>>> + bb_info.transform ();
> >>>> + bb_info.cleanup_tombstones ();
> >>>> }
> >>>>
> >>>> void ldp_fusion ()
> >>>> {
> >>>> ldp_fusion_init ();
> >>>> + pair_fusion *pfuse;
> >>>> + aarch64_pair_fusion derived;
> >>>> + pfuse = &derived;
> >>>
> >>> This is indeed the one place where I think it is acceptable to
> >>> instantiate aarch64_pair_fusion. But again there's no need to create a
> >>> pointer to the parent class, just call any function you like directly.
> >>>
> >>
> >> Addressed.
> >>>>
> >>>> for (auto bb : crtl->ssa->bbs ())
> >>>> - ldp_fusion_bb (bb);
> >>>> + pfuse->ldp_fusion_bb (bb);
> >>>
> >>> I think even the code to iterate over bbs should itself be a member
> >>> function of pair_fusion (say "run") and then that becomes part of the
> >>> generic code.
> >>>
> >>> So this function would just become:
> >>>
> >>> aarch64_pair_fusion pass;
> >>> pass.run ();
> >>>
> >>> and could be inlined into the caller.
> >>>
> >>
> >> Addressed.
> >>> Perhaps you could also add an early return like:
> >>>
> >>> if (!track_loads_p () && !track_stores_p ())
> >>> return;
> >>>
> >>> in pair_fusion::run () and then remove the corresponding code from
> >>> pass_ldp_fusion::gate?
> >>>
> >>
> >> Addressed.
> >>
> >>>>
> >>>> ldp_fusion_destroy ();
> >>>> }
> >>>> --
> >>>> 2.39.3
> >>>>
> >>>
> >>> Thanks,
> >>> Alex
> >>
> >> Thanks & Regards
> >> Ajit
next prev parent reply other threads:[~2024-04-09 15:09 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-05 8:23 Ajit Agarwal
2024-04-05 16:33 ` Alex Coplan
2024-04-09 12:00 ` Ajit Agarwal
2024-04-09 13:59 ` Alex Coplan
2024-04-09 14:31 ` Ajit Agarwal
2024-04-09 15:09 ` Alex Coplan [this message]
2024-04-09 15:29 ` Ajit Agarwal
2024-04-10 8:12 ` Alex Coplan
2024-04-10 10:01 ` Ajit Agarwal
2024-04-10 14:22 ` Alex Coplan
2024-04-10 18:18 ` Ajit Agarwal
2024-04-11 14:25 ` Alex Coplan
2024-04-12 14:32 ` Ajit Agarwal
2024-04-12 14:45 ` Alex Coplan
2024-04-12 17:32 ` Ajit Agarwal
2024-04-14 16:59 ` Ajit Agarwal
2024-04-22 7:31 ` Ajit Agarwal
2024-05-03 8:55 ` Alex Coplan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZhVaLOLg8my3VA6c@arm.com \
--to=alex.coplan@arm.com \
--cc=aagarwa1@linux.ibm.com \
--cc=bergner@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=linkw@linux.ibm.com \
--cc=meissner@linux.ibm.com \
--cc=richard.sandiford@arm.com \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).