From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 7CF793858D39 for ; Thu, 9 May 2024 10:17:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7CF793858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7CF793858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715249829; cv=none; b=rm1rOZhcz+Nq/6Sop6CCy4AQBRKGM4XDlO4Jfucd+kmkrx3s5B/8gwsObNBhm0VKOFJrqO5Wg0mzeHetAQfewbaCUGSZ9qYRuIQvO7LJbbAt2bab5ozBuQ91eX7TT/WEn7NSkiUPxil5SerkuPurhzzDqQhAjbTdaUzdRRQYV3I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715249829; c=relaxed/simple; bh=hKL7mw/D6WL/17MsBFj77d9oDEXVb0YNUFZseONSeEU=; h=DKIM-Signature:Message-ID:Date:Subject:To:From:MIME-Version; b=jjOb8MKbOwZZ18++B0VqpVvcUtjE775P9qvgQXewIpBKyeF2baKG+q6gbz6uEeq5210bedfnUhmVwhy0kBicHxIR3bziLwwFuIHSgcV2mF2EovjbcO5SfkZA7lmeHEYvMrYNGAwZwdNlmSoORGRHWRGt0QoWuvhYoyJN6VKdDYk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4496etUC019875; Thu, 9 May 2024 10:17:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=wRlAl8BNxE4PU19bhprKgvSIoUXh3q4c1ZX2r846OOE=; b=B4bGGGcxLuNRHIs9en7yiuZ6e2/ikl+YpTlxfqS4HB+lFyeRzn7M+s85jsRWNAto7vOE 6FBmk4gKZSnThrsJNpOwDw1cwP8pd4sUQ07HIdINcf6UVSd9l7Ba5s2vU2JHZXGoxgDV h9OVk5tdK1vFaOSC2qhiuI41T0qlh9mNzN4cCKgBz3EbPQVeYMa9YL95u0eCO8zAvP4+ +Hvg0Ug1pPUTuS6wuU35dTjDyDzCOoV+VUViqGC2+OAuWAxPefeGSS70/DlHKyT5ylBC CAKNUiPa3My+gY7aBhs/EmGLqra4HfxAZgDvkXkHKVP8LQVK7ZDs9SRH7OQ02Xc/7uHn QQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3y0n9k11cs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 May 2024 10:17:01 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 449AH0t8001638; Thu, 9 May 2024 10:17:01 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3y0n9k11cq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 May 2024 10:17:00 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4498RNPF017582; Thu, 9 May 2024 10:17:00 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3xysht26fs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 09 May 2024 10:17:00 +0000 Received: from smtpav03.wdc07v.mail.ibm.com (smtpav03.wdc07v.mail.ibm.com [10.39.53.230]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 449AGvDI62325064 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 May 2024 10:16:59 GMT Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 10B99580A8; Thu, 9 May 2024 10:16:57 +0000 (GMT) Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B67125807D; Thu, 9 May 2024 10:16:52 +0000 (GMT) Received: from [9.43.19.62] (unknown [9.43.19.62]) by smtpav03.wdc07v.mail.ibm.com (Postfix) with ESMTP; Thu, 9 May 2024 10:16:52 +0000 (GMT) Message-ID: <4b8eb692-1cb8-4d16-b0df-f43fb36ccc5b@linux.ibm.com> Date: Thu, 9 May 2024 15:46:50 +0530 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH, aarch64] v2: Preparatory patch to place target independent and,dependent changed code in one file To: Alex Coplan Cc: Richard Sandiford , "Kewen.Lin" , Segher Boessenkool , Peter Bergner , Michael Meissner , David Edelsohn , gcc-patches References: <6f43d8ab-4f66-4079-b1dc-e846af95cb3a@linux.ibm.com> Content-Language: en-US From: Ajit Agarwal In-Reply-To: Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Iq7-vvkRFTa3HhoQW0hnHaJpLvg83be0 X-Proofpoint-GUID: onVuStUeTtG1cJZIUzAJbkU1yyH791Ek Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.650,FMLib:17.11.176.26 definitions=2024-05-09_05,2024-05-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 malwarescore=0 spamscore=0 mlxscore=0 suspectscore=0 bulkscore=0 phishscore=0 clxscore=1011 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2405010000 definitions=main-2405090065 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Alex: On 08/05/24 6:18 pm, Alex Coplan wrote: > Hi Ajit, > > Sorry for the long delay in reviewing this. > > This is really getting there now. I've left a few more comments below. > > Apart from minor style things, the main remaining issues are mostly > around comments. It's important to have good clear comments for > functions with the parameters (and return value, if any) clearly > described. See https://www.gnu.org/prep/standards/standards.html#Comments > > Note that this now needs a little rebasing, too. > Done. > On 21/04/2024 13:22, Ajit Agarwal wrote: >> Hello Alex/Richard: >> >> All review comments are addressed and changes are made to transform_for_base >> function as per consensus. >> >> Common infrastructure of load store pair fusion is divided into target >> independent and target dependent changed code. >> >> Target independent code is the Generic code with pure virtual function >> to interface betwwen target independent and dependent code. >> >> Target dependent code is the implementation of pure virtual function for >> aarch64 target and the call to target independent code. >> >> Bootstrapped on aarch64-linux-gnu. >> >> Thanks & Regards >> Ajit >> >> >> >> aarch64: Preparatory patch to place target independent and >> dependent changed code in one file >> >> Common infrastructure of load store pair fusion is divided into target >> independent and target dependent changed code. >> >> Target independent code is the Generic code with pure virtual function >> to interface betwwen target independent and dependent code. >> >> Target dependent code is the implementation of pure virtual function for >> aarch64 target and the call to target independent code. >> >> 2024-04-21 Ajit Kumar Agarwal >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-ldp-fusion.cc: Place target >> independent and dependent changed code >> --- >> gcc/config/aarch64/aarch64-ldp-fusion.cc | 484 +++++++++++++++-------- >> 1 file changed, 325 insertions(+), 159 deletions(-) >> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc >> index 365dcf48b22..83a917e1d20 100644 >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc >> @@ -138,6 +138,189 @@ struct alt_base >> poly_int64 offset; >> }; >> >> +// Virtual base class for load/store walkers used in alias analysis. >> +struct alias_walker >> +{ >> + virtual bool conflict_p (int &budget) const = 0; >> + virtual insn_info *insn () const = 0; >> + virtual bool valid () const = 0; >> + virtual void advance () = 0; >> +}; >> + >> +// Forward declaration to be used inside the aarch64_pair_fusion class. >> +bool ldp_operand_mode_ok_p (machine_mode mode); >> +rtx aarch64_destructure_load_pair (rtx regs[2], rtx pattern); >> +rtx aarch64_destructure_store_pair (rtx regs[2], rtx pattern); >> +rtx aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2], >> + bool load_p); > > I don't think we want to change the linkage of these, they should be kept > static. > >> +enum class writeback{ > > Nit: space before '{' > >> + WRITEBACK_PAIR_P, >> + WRITEBACK >> +}; > > We're going to want some more descriptive names here. How about > EXISTING and ALL? Note that the WRITEBACK_ prefix isn't necessary as > you're using an enum class, so uses of the enumerators need to be > prefixed with writeback:: anyway. A comment describing the usage of the > enum as well as comments above the enumerators describing their > interpretation would be good. > Done. >> + >> +struct pair_fusion { >> + > > Nit: excess blank line. > >> + pair_fusion () >> + { >> + calculate_dominance_info (CDI_DOMINATORS); >> + df_analyze (); >> + crtl->ssa = new rtl_ssa::function_info (cfun); >> + }; > > Can we have one blank line between the virtual functions, please? I > think that would be more readable now that there are comments above each > of them. > Done. >> + // Return true if GPR is FP or SIMD accesses, passed >> + // with GPR reg_op rtx, machine mode and load_p. > > It's slightly awkward trying to document this without the parameter > names, but I can see that you're omitting them to avoid unused parameter > warnings. One option would be to introduce names in the comment as you > go. How about this instead: > > // Given: > // - an rtx REG_OP, the non-memory operand in a load/store insn, > // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and > // - a boolean LOAD_P (true iff the insn is a load), then: > // return true if the access should be considered an FP/SIMD access. > // Such accesses are segregated from GPR accesses, since we only want to > // form pairs for accesses that use the same register file. > Done. >> + virtual bool fpsimd_op_p (rtx, machine_mode, bool) >> + { >> + return false; >> + } >> + // Return true if pair operand mode is ok. Passed with >> + // machine mode. > > Could you use something closer to the comment that is already above > ldp_operand_mode_ok_p? The purpose of this predicate is really to test > the following: "is it a good idea (for optimization) to form paired > accesses with this operand mode at this stage in compilation?" > Done. >> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; >> + // Return true if reg operand is ok, passed with load_p, >> + // reg_op rtx and machine mode. > > Can you try to use the parameter names in the comment where possible? > The comment needs to be a bit more precise, what does "ok" mean here? > How about: > > // Return true iff REG_OP is a suitable register operand for a paired > // memory access, where LOAD_P is true if we're asking about loads and > // false for stores. MEM_MODE gives the mode of the operand. > > Also ... > >> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, >> + machine_mode mem_mode) = 0; > > ... can we use just mode (instead of mem_mode) as the parameter name > here? We're only using the mem mode because the non-memory operand > might be a const_int, in which case we need to infer the mode of the > register operand from the mode of the memory operand. But really it's > just the mode of REG_OP we're passing here. > Done. >> + // Return alias check limit. >> + virtual int pair_mem_alias_check_limit () = 0; > > Can you extend this comment with something like: "This is needed to > avoid unbounded quadratic behaviour when performing alias analysis." > >> + // Return true if there is writeback opportunities. Passed >> + // with enum writeback. > > Not exactly. This returns true if we should try to handle writeback > opportunities (not whether there are any). > >> + virtual bool handle_writeback_opportunities (enum writeback wback) = 0 ; >> + // Return true if mem ok ldp stp policy model passed with >> + // rtx mem, load_p and machine mode. > > This doesn't parse. Please can you write full grammatically-correct > sentences in comments? > Done. > I can't see a need to pass the mode separately here (the current code > just passes GET_MODE (first_mem)), so I think we only need two parameters > here: first_mem and load_p. Let's rename s/first_mem/base_mem/ here, too. > > Then the comment can read something like: > > // Given BASE_MEM, the mem from the lower candidate access for a pair, > // and LOAD_P (true if the access is a load), check if we should proceed > // to form the pair given the target's code generation policy on > // paired accesses. > >> + virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, >> + machine_mode mode) = 0; >> + // Gen load store mem pair. Return load store rtx passed >> + // with arguments load store pattern, writeback rtx and >> + // load_p. > Done. > As above, let's use the formal parameter names here. This comment could > also do with more detail. How about: > > // Generate the pattern for a paired access. PATS gives the patterns > // for the individual memory accesses (which by this point must share a > // common base register). If WRITEBACK is non-NULL, then this rtx > // describes the update to the base register that should be performed by > // the resulting insn. LOAD_P is true iff the accesses are loads. > >> + virtual rtx gen_mem_pair (rtx *pats, rtx writeback, >> + bool load_p) = 0; > > Nit on naming: how about just gen_pair? It is clear from being a member > function of the pass what sort of pair we are talking about. > >> + // Return true if memory writeback can be promoted, passed >> + // with insn, rtx pattern and load_p. load_p is set by this >> + // hook. >> + virtual bool pair_mem_promote_writeback_p (insn_info *, rtx, bool &) >> + { >> + return false; >> + } > > I think we should change the interface here, see the comments in > ldp_fusion_bb below. > >> + // Return true if we track loads. > > I think this reads better if you say "if we _should_ track loads". > Likewise for the store case below. > Done. >> + virtual bool track_loads_p () >> + { >> + return true; >> + } >> + // Return true if we track stores. >> + virtual bool track_stores_p () >> + { >> + return true; >> + } >> + // Return true if offset is out of range. >> + virtual bool pair_mem_out_of_range_p (HOST_WIDE_INT off) = 0; >> + // Return destructure pair. Passed with rtx reg, insn pattern >> + // and load_p. >> + virtual rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p) = 0; > > So firstly this shouldn't be prefixed with gen_ as there is no RTL generation > going on, it's just pulling apart an existing pair insn. > > On the comment: again you should use the formal parameter names and describe > the purpose of each parameter. > Done. >> + // Return writeback pair. Passed with rtx writeback effect, mem rtx >> + // regs rtx and load_p. >> + virtual rtx gen_writeback_pair (rtx wb_effect, rtx mem, >> + rtx regs[2], bool load_p) = 0; > > I realise this is pre-existing, but: I think we want a more descriptive name > for this function. The key bit of the comment above > aarch64_gen_writeback_pair is: > > // This is used when promoting existing non-writeback pairs to writeback > // variants. > > so how about gen_promote_writeback_pair instead? Including that part of > the comment above would be good, too. I wasn't sure on a first glance > how this function differed from your gen_mem_pair function, which can > also generate writeback pairs. > Done. >> + void ldp_fusion_bb (bb_info *bb); >> + insn_info * find_trailing_add (insn_info *insns[2], > > Style nit: no space after *. Also this should probably be marked inline > as we want the linkage kept within this file. Likewise with the next > two functions. > >> + const insn_range_info &pair_range, >> + int initial_writeback, >> + rtx *writeback_effect, >> + def_info **add_def, >> + def_info *base_def, >> + poly_int64 initial_offset, >> + unsigned access_size); >> + int get_viable_bases (insn_info *insns[2], >> + vec &base_cands, >> + rtx cand_mems[2], >> + unsigned access_size, >> + bool reversed); >> + void do_alias_analysis (insn_info *alias_hazards[4], >> + alias_walker *walkers[4], >> + bool load_p); >> + void run (); >> + ~pair_fusion() >> + { >> + if (crtl->ssa->perform_pending_updates ()) >> + cleanup_cfg (0); >> + >> + free_dominance_info (CDI_DOMINATORS); >> + >> + delete crtl->ssa; >> + crtl->ssa = nullptr; >> + } >> +}; >> + >> +void >> +pair_fusion::run () > > This function could do with a comment. > Done. >> +{ >> + if (!track_loads_p () && !track_stores_p ()) >> + return; >> + >> + for (auto bb : crtl->ssa->bbs ()) >> + ldp_fusion_bb (bb); >> +} >> + >> +struct aarch64_pair_fusion : public pair_fusion >> +{ >> + bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode, >> + bool load_p) override final >> + { >> + return reload_completed >> + ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) >> + : (GET_MODE_CLASS (mem_mode) != MODE_INT >> + && (load_p || !aarch64_const_zero_rtx_p (reg_op))); >> + } >> + bool pair_mem_promote_writeback_p (insn_info *insn, rtx pat, bool &load_p); > > All of these other virtual overrides should be marked with the > override final keywords, too. > Done. >> + bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, >> + machine_mode mode) >> + { >> + return aarch64_mem_ok_with_ldpstp_policy_model (first_mem, >> + load_p, >> + mode); >> + } >> + bool pair_operand_mode_ok_p (machine_mode mode) >> + { >> + return ldp_operand_mode_ok_p (mode); >> + } > > Here you can just rename the static function ldp_operand_mode_ok_p > to aarch64_pair_fusion::pair_operand_mode_ok_p. No need for the > wrapper function. > >> + rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p); >> + bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, >> + machine_mode mem_mode) >> + { >> + return (load_p >> + ? aarch64_ldp_reg_operand (reg_op, mem_mode) >> + : aarch64_stp_reg_operand (reg_op, mem_mode)); >> + } >> + int pair_mem_alias_check_limit () >> + { >> + return aarch64_ldp_alias_check_limit; >> + } >> + bool handle_writeback_opportunities (enum writeback wback) > > Just a suggestion: you could call this parameter "which" as it is > talking about which opportunities we want to handle. > Done. >> + { >> + if (wback == writeback::WRITEBACK_PAIR_P) >> + return aarch64_ldp_writeback > 1; >> + else >> + return aarch64_ldp_writeback; >> + } >> + bool track_loads_p () >> + { >> + return >> + aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > > Style nit: I _think_ this would be better formatted with the LHS of the > expression on the same line as the return keyword, then the != on a new > line. I.e.: > > return aarch64_tune_params.ldp_policy_model > != AARCH64_LDP_STP_POLICY_NEVER; > Done. >> + } >> + bool track_stores_p () >> + { >> + return >> + aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; > > Likewise. > Done. >> + } >> + bool pair_mem_out_of_range_p (HOST_WIDE_INT off) >> + { >> + return (off < LDP_MIN_IMM || off > LDP_MAX_IMM); >> + } > > When I said (in > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648878.html): > >> Please can you invert the condition and fix up callers appropriately? > > I was expecting you to keep the function named pair_mem_in_range_p and invert > the condition, then update callers to introduce negations. > > I think it's more natural to have the function test for the positive condition > (is this offset in range) and then callers negate it if they want to check if > an offset is out of range. That matches existing practice elsewhere. > Done. >> + rtx gen_writeback_pair (rtx wb_effect, rtx mem, rtx regs[2], bool load_p) >> + { >> + return aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p); >> + } > Done. > Just rename the static function aarch64_gen_writeback_pair to > aarch64_pair_fusion::gen_writeback_pair, then there is no need for this > wrapper. > >> + rtx gen_destructure_pair (rtx regs[2], rtx rti, bool load_p); >> +}; >> + >> // State used by the pass for a given basic block. >> struct ldp_bb_info >> { >> @@ -160,8 +343,11 @@ struct ldp_bb_info >> >> static const size_t obstack_alignment = sizeof (void *); >> bb_info *m_bb; >> + pair_fusion *m_pass; > > I realise this is pre-existing, but can m_bb and m_pass both be private > members? > Done. >> >> - ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false) >> + ldp_bb_info (bb_info *bb, >> + pair_fusion *d) : m_bb (bb), >> + m_pass (d), m_emitted_tombstone (false) > > I think the preferred style is to have the parameters on the same line > and the initializer list (starting with ':') on a new line. > >> { >> obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE, >> obstack_alignment, obstack_chunk_alloc, >> @@ -177,10 +363,28 @@ struct ldp_bb_info >> bitmap_obstack_release (&m_bitmap_obstack); >> } >> } >> - > > Any need for this whitespace change? > Done. >> inline void track_access (insn_info *, bool load, rtx mem); >> inline void transform (); >> inline void cleanup_tombstones (); >> + inline void merge_pairs (insn_list_t &, insn_list_t &, >> + bool load_p, unsigned access_size); >> + inline void transform_for_base (int load_size, access_group &group); >> + >> + inline bool try_fuse_pair (bool load_p, unsigned access_size, >> + insn_info *i1, insn_info *i2); >> + >> + inline bool fuse_pair (bool load_p, unsigned access_size, >> + int writeback, >> + insn_info *i1, insn_info *i2, >> + base_cand &base, >> + const insn_range_info &move_range); >> + >> + inline void track_tombstone (int uid); >> + >> + inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); >> + >> + template >> + void traverse_base_map (Map &map); > > What's the reason for changing the access of these functions? > Can they be kept private? > Done. >> >> private: >> obstack m_obstack; >> @@ -191,27 +395,60 @@ private: >> bool m_emitted_tombstone; >> >> inline splay_tree_node *node_alloc (access_record *); >> +}; >> >> - template >> - inline void traverse_base_map (Map &map); >> - inline void transform_for_base (int load_size, access_group &group); >> - >> - inline void merge_pairs (insn_list_t &, insn_list_t &, >> - bool load_p, unsigned access_size); >> +bool >> +aarch64_pair_fusion::pair_mem_promote_writeback_p (insn_info *insn, rtx pat, >> + bool &load_p) >> +{ >> + if (reload_completed >> + && aarch64_ldp_writeback > 1 >> + && GET_CODE (pat) == PARALLEL >> + && XVECLEN (pat, 0) == 2) >> + { >> + auto rti = insn->rtl (); >> + const auto attr = get_attr_ldpstp (rti); >> + if (attr == LDPSTP_NONE) >> + return false; >> >> - inline bool try_fuse_pair (bool load_p, unsigned access_size, >> - insn_info *i1, insn_info *i2); >> + load_p = (attr == LDPSTP_LDP); >> + gcc_checking_assert (load_p || attr == LDPSTP_STP); >> + return true; >> + } >> + return false; >> +} >> >> - inline bool fuse_pair (bool load_p, unsigned access_size, >> - int writeback, >> - insn_info *i1, insn_info *i2, >> - base_cand &base, >> - const insn_range_info &move_range); >> +rtx >> +aarch64_pair_fusion::gen_destructure_pair (rtx regs[2], rtx rti, bool load_p) >> +{ >> + if (load_p) >> + return aarch64_destructure_load_pair (regs, rti); >> + else >> + return aarch64_destructure_store_pair (regs, rti); >> +} >> >> - inline void track_tombstone (int uid); >> +rtx >> +aarch64_pair_fusion::gen_mem_pair (rtx *pats, >> + rtx writeback, >> + bool load_p) >> +{ >> + rtx pair_pat; >> >> - inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); >> -}; >> + if (writeback) >> + { >> + auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]); >> + return gen_rtx_PARALLEL (VOIDmode, patvec); >> + } >> + else if (load_p) >> + return aarch64_gen_load_pair (XEXP (pats[0], 0), >> + XEXP (pats[1], 0), >> + XEXP (pats[0], 1)); >> + else >> + return aarch64_gen_store_pair (XEXP (pats[0], 0), >> + XEXP (pats[0], 1), >> + XEXP (pats[1], 1)); >> + return pair_pat; >> +} >> >> splay_tree_node * >> ldp_bb_info::node_alloc (access_record *access) >> @@ -312,7 +549,7 @@ any_post_modify_p (rtx x) >> >> // Return true if we should consider forming ldp/stp insns from memory >> // accesses with operand mode MODE at this stage in compilation. >> -static bool >> +bool >> ldp_operand_mode_ok_p (machine_mode mode) >> { >> const bool allow_qregs >> @@ -412,9 +649,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> const machine_mode mem_mode = GET_MODE (mem); >> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); >> >> - // Punt on misaligned offsets. LDP/STP instructions require offsets to be a >> - // multiple of the access size, and we believe that misaligned offsets on >> - // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases. >> + // Punt on misaligned offsets. Paired memory access instructions require >> + // offsets to be a multiple of the access size, and we believe that >> + // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned >> + // offsets w.r.t. RTL bases. >> if (!multiple_p (offset, mem_size)) >> return false; >> >> @@ -438,10 +676,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> } >> >> // Main function to begin pair discovery. Given a memory access INSN, >> -// determine whether it could be a candidate for fusing into an ldp/stp, >> -// and if so, track it in the appropriate data structure for this basic >> -// block. LOAD_P is true if the access is a load, and MEM is the mem >> -// rtx that occurs in INSN. >> +// determine whether it could be a candidate for fusing into a paired >> +// access, and if so, track it in the appropriate data structure for >> +// this basic block. LOAD_P is true if the access is a load, and MEM >> +// is the mem rtx that occurs in INSN. >> void >> ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> { >> @@ -449,35 +687,26 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> if (MEM_VOLATILE_P (mem)) >> return; >> >> - // Ignore writeback accesses if the param says to do so. >> - if (!aarch64_ldp_writeback >> + // Ignore writeback accesses if the hook says to do so. >> + if (!m_pass->handle_writeback_opportunities (writeback::WRITEBACK) >> && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC) >> return; >> >> const machine_mode mem_mode = GET_MODE (mem); >> - if (!ldp_operand_mode_ok_p (mem_mode)) >> + if (!m_pass->pair_operand_mode_ok_p (mem_mode)) >> return; >> >> rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p); >> >> - // Ignore the access if the register operand isn't suitable for ldp/stp. >> - if (load_p >> - ? !aarch64_ldp_reg_operand (reg_op, mem_mode) >> - : !aarch64_stp_reg_operand (reg_op, mem_mode)) >> + if (!m_pass->pair_reg_operand_ok_p (load_p, reg_op, mem_mode)) >> return; >> - > > Let's keep this newline. > >> // We want to segregate FP/SIMD accesses from GPR accesses. >> // >> // Before RA, we use the modes, noting that stores of constant zero >> // operands use GPRs (even in non-integer modes). After RA, we use >> // the hard register numbers. > > The second para of this comment would probably be best moved to the aarch64 > implementation of the hook (with the first sentence duplicated in both places). > >> - const bool fpsimd_op_p >> - = reload_completed >> - ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) >> - : (GET_MODE_CLASS (mem_mode) != MODE_INT >> - && (load_p || !aarch64_const_zero_rtx_p (reg_op))); >> - >> - // Note ldp_operand_mode_ok_p already rejected VL modes. >> + const bool fpsimd_op_p = m_pass->fpsimd_op_p (reg_op, mem_mode, load_p); > > Might be more readable with a blank line inserted here. > Done. >> + // Note pair_operand_mode_ok_p already rejected VL modes. >> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); >> const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size }; >> >> @@ -506,8 +735,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> // elimination offset pre-RA, we should postpone forming pairs on such >> // accesses until after RA. >> // >> - // As it stands, addresses with offsets in range for LDR but not >> - // in range for LDP/STP are currently reloaded inefficiently, >> + // As it stands, addresses in range for an individual load/store but not >> + // for a paired access are currently reloaded inefficiently, >> // ending up with a separate base register for each pair. >> // >> // In theory LRA should make use of >> @@ -519,8 +748,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> // that calls targetm.legitimize_address_displacement. >> // >> // So for now, it's better to punt when we can't be sure that the >> - // offset is in range for LDP/STP. Out-of-range cases can then be >> - // handled after RA by the out-of-range LDP/STP peepholes. Eventually, it >> + // offset is in range for paired access. Out-of-range cases can then be >> + // handled after RA by the out-of-range PAIR MEM peepholes. Eventually, it > > Does this part really apply to rs6000? Do you have peephole2 patterns for lxvp? > If not you may want to caveat this part as being specific to aarch64. E.g. "On > aarch64, out-of-range cases ..." > >> // would be nice to handle known out-of-range opportunities in the >> // pass itself (for stack accesses, this would be in the post-RA pass). >> if (!reload_completed >> @@ -573,8 +802,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> gcc_unreachable (); // Base defs should be unique. >> } >> >> - // Punt on misaligned offsets. LDP/STP require offsets to be a multiple of >> - // the access size. >> + // Punt on misaligned offsets. Paired memory accesses require offsets >> + // to be a multiple of the access size. >> if (!multiple_p (mem_off, mem_size)) >> return; >> >> @@ -1207,8 +1436,8 @@ extract_writebacks (bool load_p, rtx pats[2], int changed) >> // base register. If there is one, we choose the first such update after >> // PAIR_DST that is still in the same BB as our pair. We return the new def in >> // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT. >> -static insn_info * >> -find_trailing_add (insn_info *insns[2], >> +insn_info * >> +pair_fusion::find_trailing_add (insn_info *insns[2], >> const insn_range_info &pair_range, >> int initial_writeback, >> rtx *writeback_effect, >> @@ -1286,7 +1515,7 @@ find_trailing_add (insn_info *insns[2], >> >> off_hwi /= access_size; >> >> - if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM) >> + if (pair_mem_out_of_range_p (off_hwi)) >> return nullptr; >> >> auto dump_prefix = [&]() >> @@ -1800,7 +2029,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> { >> if (dump_file) >> fprintf (dump_file, >> - " ldp: i%d has wb but subsequent i%d has non-wb " >> + " load pair: i%d has wb but subsequent i%d has non-wb " >> "update of base (r%d), dropping wb\n", >> insns[0]->uid (), insns[1]->uid (), base_regno); >> gcc_assert (writeback_effect); >> @@ -1823,7 +2052,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> } >> >> // If either of the original insns had writeback, but the resulting pair insn >> - // does not (can happen e.g. in the ldp edge case above, or if the writeback >> + // does not (can happen e.g. in the load pair edge case above, or if the writeback >> // effects cancel out), then drop the def(s) of the base register as >> // appropriate. >> // >> @@ -1842,7 +2071,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> // update of the base register and try and fold it in to make this into a >> // writeback pair. >> insn_info *trailing_add = nullptr; >> - if (aarch64_ldp_writeback > 1 >> + if (m_pass->handle_writeback_opportunities (writeback::WRITEBACK_PAIR_P) >> && !writeback_effect >> && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1, >> XEXP (pats[0], 0), nullptr) >> @@ -1850,7 +2079,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> XEXP (pats[1], 0), nullptr)))) >> { >> def_info *add_def; >> - trailing_add = find_trailing_add (insns, move_range, writeback, >> + trailing_add = m_pass->find_trailing_add (insns, move_range, writeback, >> &writeback_effect, >> &add_def, base.def, offsets[0], >> access_size); >> @@ -1863,14 +2092,14 @@ ldp_bb_info::fuse_pair (bool load_p, >> } >> >> // Now that we know what base mem we're going to use, check if it's OK >> - // with the ldp/stp policy. >> + // with the pair mem policy. >> rtx first_mem = XEXP (pats[0], load_p); >> - if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem, >> - load_p, >> - GET_MODE (first_mem))) >> + if (!m_pass->pair_mem_ok_with_policy (first_mem, >> + load_p, >> + GET_MODE (first_mem))) > > The indentation looks off here. I suppose when you drop the mode parameter > (as mentioned above) this will all fit on one line anyway. > >> { >> if (dump_file) >> - fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n", >> + fprintf (dump_file, "punting on pair (%d,%d), pair mem policy says no\n", > > Nit: excess space after mem. > Done. >> i1->uid (), i2->uid ()); >> return false; >> } >> @@ -1878,21 +2107,10 @@ ldp_bb_info::fuse_pair (bool load_p, >> rtx reg_notes = combine_reg_notes (first, second, load_p); >> >> rtx pair_pat; >> - if (writeback_effect) >> - { >> - auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]); >> - pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); >> - } >> - else if (load_p) >> - pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), >> - XEXP (pats[1], 0), >> - XEXP (pats[0], 1)); >> - else >> - pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), >> - XEXP (pats[0], 1), >> - XEXP (pats[1], 1)); >> >> + pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p); > > Please can you declare pair_pat on the same line it is set, so that we have: > > rtx pair_pat = m_pass->gen_mem_pair (pats, writeback_effect, load_p); > > Done. >> insn_change *pair_change = nullptr; >> + > > Spurious whitespace change? > >> auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) { >> rtx_insn *rti = change->insn ()->rtl (); >> validate_unshare_change (rti, &PATTERN (rti), pair_pat, true); >> @@ -2133,15 +2351,6 @@ load_modified_by_store_p (insn_info *load, >> return false; >> } >> >> -// Virtual base class for load/store walkers used in alias analysis. >> -struct alias_walker >> -{ >> - virtual bool conflict_p (int &budget) const = 0; >> - virtual insn_info *insn () const = 0; >> - virtual bool valid () const = 0; >> - virtual void advance () = 0; >> -}; >> - >> // Implement some common functionality used by both store_walker >> // and load_walker. >> template >> @@ -2259,13 +2468,13 @@ public: >> // >> // We try to maintain the invariant that if a walker becomes invalid, we >> // set its pointer to null. >> -static void >> -do_alias_analysis (insn_info *alias_hazards[4], >> +void >> +pair_fusion::do_alias_analysis (insn_info *alias_hazards[4], >> alias_walker *walkers[4], >> bool load_p) >> { >> const int n_walkers = 2 + (2 * !load_p); >> - int budget = aarch64_ldp_alias_check_limit; >> + int budget = pair_mem_alias_check_limit (); >> >> auto next_walker = [walkers,n_walkers](int current) -> int { >> for (int j = 1; j <= n_walkers; j++) >> @@ -2350,8 +2559,8 @@ do_alias_analysis (insn_info *alias_hazards[4], >> // >> // Returns an integer where bit (1 << i) is set if INSNS[i] uses writeback >> // addressing. >> -static int >> -get_viable_bases (insn_info *insns[2], >> +int >> +pair_fusion::get_viable_bases (insn_info *insns[2], >> vec &base_cands, >> rtx cand_mems[2], >> unsigned access_size, >> @@ -2397,7 +2606,7 @@ get_viable_bases (insn_info *insns[2], >> if (!is_lower) >> base_off--; >> >> - if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM) >> + if (pair_mem_out_of_range_p (base_off)) >> continue; >> >> use_info *use = find_access (insns[i]->uses (), REGNO (base)); >> @@ -2454,7 +2663,7 @@ get_viable_bases (insn_info *insns[2], >> } >> >> // Given two adjacent memory accesses of the same size, I1 and I2, try >> -// and see if we can merge them into a ldp or stp. >> +// and see if we can merge them into a paired access load and store. > > Delete "load and store" here. > Done. >> // >> // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true >> // if the accesses are both loads, otherwise they are both stores. >> @@ -2494,7 +2703,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> { >> if (dump_file) >> fprintf (dump_file, >> - "punting on ldp due to reg conflcits (%d,%d)\n", >> + "punting on pair mem load due to reg conflcits (%d,%d)\n", >> insns[0]->uid (), insns[1]->uid ()); >> return false; >> } >> @@ -2512,7 +2721,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> >> auto_vec base_cands (2); >> >> - int writeback = get_viable_bases (insns, base_cands, cand_mems, >> + int writeback = m_pass->get_viable_bases (insns, base_cands, cand_mems, >> access_size, reversed); >> if (base_cands.is_empty ()) >> { >> @@ -2641,7 +2850,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> walkers[1] = &backward_store_walker; >> >> if (load_p && (mem_defs[0] || mem_defs[1])) >> - do_alias_analysis (alias_hazards, walkers, load_p); >> + m_pass->do_alias_analysis (alias_hazards, walkers, load_p); >> else >> { >> // We want to find any loads hanging off the first store. >> @@ -2650,7 +2859,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> load_walker backward_load_walker (mem_defs[1], insns[1], insns[0]); >> walkers[2] = &forward_load_walker; >> walkers[3] = &backward_load_walker; >> - do_alias_analysis (alias_hazards, walkers, load_p); >> + m_pass->do_alias_analysis (alias_hazards, walkers, load_p); >> // Now consolidate hazards back down. >> if (alias_hazards[2] >> && (!alias_hazards[0] || (*alias_hazards[2] < *alias_hazards[0]))) >> @@ -2891,7 +3100,7 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, >> // merge_pairs. >> void >> ldp_bb_info::transform_for_base (int encoded_lfs, >> - access_group &group) >> + access_group &group) > > Looks like a bad change in indentation. > Done. >> { >> const auto lfs = decode_lfs (encoded_lfs); >> const unsigned access_size = lfs.size; >> @@ -2964,29 +3173,9 @@ ldp_bb_info::transform () >> traverse_base_map (def_map); >> } >> >> -static void >> -ldp_fusion_init () >> -{ >> - calculate_dominance_info (CDI_DOMINATORS); >> - df_analyze (); >> - crtl->ssa = new rtl_ssa::function_info (cfun); >> -} >> - >> -static void >> -ldp_fusion_destroy () >> -{ >> - if (crtl->ssa->perform_pending_updates ()) >> - cleanup_cfg (0); >> - >> - free_dominance_info (CDI_DOMINATORS); >> - >> - delete crtl->ssa; >> - crtl->ssa = nullptr; >> -} >> - >> // Given a load pair insn in PATTERN, unpack the insn, storing >> // the registers in REGS and returning the mem. >> -static rtx >> +rtx > Done. > Again, no need to change the linkage here. > >> aarch64_destructure_load_pair (rtx regs[2], rtx pattern) >> { >> rtx mem = NULL_RTX; >> @@ -3012,7 +3201,7 @@ aarch64_destructure_load_pair (rtx regs[2], rtx pattern) >> >> // Given a store pair insn in PATTERN, unpack the insn, storing >> // the register operands in REGS, and returning the mem. >> -static rtx >> +rtx > > Likewise. Done. > >> aarch64_destructure_store_pair (rtx regs[2], rtx pattern) >> { >> rtx mem = XEXP (pattern, 0); >> @@ -3030,7 +3219,7 @@ aarch64_destructure_store_pair (rtx regs[2], rtx pattern) >> // >> // This is used when promoting existing non-writeback pairs to writeback >> // variants. >> -static rtx >> +rtx >> aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2], >> bool load_p) >> { >> @@ -3068,22 +3257,13 @@ aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2], >> // the base register which we can fold in to make this pair use >> // a writeback addressing mode. >> static void >> -try_promote_writeback (insn_info *insn) >> +try_promote_writeback (insn_info *insn, bool load_p, pair_fusion *pass) > > I think this would be better as a member function of pair_fusion. Then > there's no need to pass a pointer to the pass structure explicitly. > >> { >> - auto rti = insn->rtl (); >> - const auto attr = get_attr_ldpstp (rti); >> - if (attr == LDPSTP_NONE) >> - return; >> - >> - bool load_p = (attr == LDPSTP_LDP); >> - gcc_checking_assert (load_p || attr == LDPSTP_STP); >> - >> rtx regs[2]; >> rtx mem = NULL_RTX; >> - if (load_p) >> - mem = aarch64_destructure_load_pair (regs, PATTERN (rti)); >> - else >> - mem = aarch64_destructure_store_pair (regs, PATTERN (rti)); >> + >> + mem = pass->gen_destructure_pair (regs, PATTERN (insn->rtl ()), load_p); >> + >> gcc_checking_assert (MEM_P (mem)); >> >> poly_int64 offset; >> @@ -3120,9 +3300,10 @@ try_promote_writeback (insn_info *insn) >> def_info *add_def; >> const insn_range_info pair_range (insn); >> insn_info *insns[2] = { nullptr, insn }; >> - insn_info *trailing_add = find_trailing_add (insns, pair_range, 0, &wb_effect, >> - &add_def, base_def, offset, >> - access_size); >> + insn_info *trailing_add >> + = pass->find_trailing_add (insns, pair_range, 0, &wb_effect, >> + &add_def, base_def, offset, >> + access_size); >> if (!trailing_add) >> return; >> >> @@ -3132,8 +3313,9 @@ try_promote_writeback (insn_info *insn) >> insn_change del_change (trailing_add, insn_change::DELETE); >> insn_change *changes[] = { &pair_change, &del_change }; >> >> - rtx pair_pat = aarch64_gen_writeback_pair (wb_effect, mem, regs, load_p); >> - validate_unshare_change (rti, &PATTERN (rti), pair_pat, true); >> + rtx pair_pat = pass->gen_writeback_pair (wb_effect, mem, regs, load_p); >> + > > No need for this blank line IMO. > >> + validate_unshare_change (insn->rtl(), &PATTERN (insn->rtl()), pair_pat, true); >> >> // The pair must gain the def of the base register from the add. >> pair_change.new_defs = insert_access (attempt, >> @@ -3167,14 +3349,12 @@ try_promote_writeback (insn_info *insn) >> // for load/store candidates. If running after RA, also try and promote >> // non-writeback pairs to use writeback addressing. Then try to fuse >> // candidates into pairs. >> -void ldp_fusion_bb (bb_info *bb) >> +void pair_fusion::ldp_fusion_bb (bb_info *bb) >> { >> - const bool track_loads >> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >> - const bool track_stores >> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >> + const bool track_loads = track_loads_p (); >> + const bool track_stores = track_stores_p (); >> >> - ldp_bb_info bb_state (bb); >> + ldp_bb_info bb_state (bb, this); >> >> for (auto insn : bb->nondebug_insns ()) >> { >> @@ -3184,11 +3364,9 @@ void ldp_fusion_bb (bb_info *bb) >> continue; >> >> rtx pat = PATTERN (rti); >> - if (reload_completed >> - && aarch64_ldp_writeback > 1 >> - && GET_CODE (pat) == PARALLEL >> - && XVECLEN (pat, 0) == 2) >> - try_promote_writeback (insn); >> + bool load_p; >> + if (pair_mem_promote_writeback_p (insn, pat, load_p)) >> + try_promote_writeback (insn, load_p, this); > > As above, if we make try_promote_writeback a member function of pair_fusion, > the this pointer would be implicit, I think that would be a lot cleaner. > > Also, sorry for the run-around, but I think the function > pair_mem_promote_writeback_p is trying to do too many things at once. > > I think we should keep these checks (making the latter suitable for generic > code): > > if (reload_completed > && aarch64_ldp_writeback > 1 > > in ldp_fusion_bb, at which point pair_mem_promote_writeback_p just > becomes a test for "is this insn a (non-writeback) paired memory > access? (and if so, is it a load?)" > > So ultimately I think we want the following in ldp_fusion_bb: > > bool load_p; > if (reload_completed > && handle_writeback_opportunities (writeback::ALL) > && pair_mem_insn_p (rti, load_p)) > try_promote_writeback (insn, load_p); > > rtx pat = PATTERN (rti); > [...] > Done. > It probably makes most sense to pass just the rtx_insn * to > pair_mem_insn_p instead of passing both an insn_info * and an rtx for > the PATTERN. > >> >> if (GET_CODE (pat) != SET) >> continue; >> @@ -3205,12 +3383,8 @@ void ldp_fusion_bb (bb_info *bb) >> >> void ldp_fusion () >> { >> - ldp_fusion_init (); >> - >> - for (auto bb : crtl->ssa->bbs ()) >> - ldp_fusion_bb (bb); >> - >> - ldp_fusion_destroy (); >> + aarch64_pair_fusion pass; >> + pass.run (); >> } > > I think we mentioned this before, but since this is now a very simple > two-line function, can we inline it into pass_ldp_fusion::execute? > ISTM there's no need for the additional wrapper. > Done. Thanks & Regards Ajit >> >> namespace { >> @@ -3242,14 +3416,6 @@ public: >> if (!optimize || optimize_debug) >> return false; >> >> - // If the tuning policy says never to form ldps or stps, don't run >> - // the pass. >> - if ((aarch64_tune_params.ldp_policy_model >> - == AARCH64_LDP_STP_POLICY_NEVER) >> - && (aarch64_tune_params.stp_policy_model >> - == AARCH64_LDP_STP_POLICY_NEVER)) >> - return false; >> - >> if (reload_completed) >> return flag_aarch64_late_ldp_fusion; >> else >> -- >> 2.39.3 >> > > Thanks, > Alex