From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id DACA73858D37 for ; Mon, 18 Mar 2024 18:12:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DACA73858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DACA73858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710785533; cv=none; b=N8iZV7xG4SPLta6C+pWj60Fn9zBftPjH5aAzBpPfo0NFBRs2oI2A4LW7RP5KtB8y0tB+qhvem7H5oV9ScTd5lIJk1diYPtbFeVbOLGfM7yxSbpftBN/YR5Y8IcRPLoLy6kzar2ApVwYZfKkJUtZeOvzRXpkKVLzt5Qd+QMGvKFY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710785533; c=relaxed/simple; bh=Yu2HA7qj1ZIEEk4YTP1RcJVcHHhNJVKuMQ7tjDMjTYo=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:From:To; b=C1US2inYh8C1YwvjCy5ixmUcpG9BJ4xfDGkADwgX9dXFYPLWk6rIGQCCa/02UcmwvVtHNcJ6IYhryhvZXwFLLrrbVu4GzHdFZJ94CaNer42fStpW0X/YLu7iRDh9AnWRFJGC6jj4Wi+xMi9b2QBktqzRpNf3hC21IW7bkduKNmg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rmHSu-0001es-4K for gcc-patches@gcc.gnu.org; Mon, 18 Mar 2024 14:12:07 -0400 Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42IHlxmG017382; Mon, 18 Mar 2024 18:11:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : from : to : references : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=x3B7CowrvK6xMKKu0CMfbGD4YOqsUAlO7/16txfwPVI=; b=pAeQ+nKfxpn63bj4cJA7gIo6a9RMecpn3WFxL9Q67BxarFHDk2FCrHXvCAzREKv9xXV8 ZIeQZZwB4ancgM5tFNZ7VGQkorZ2B+9c581PCtANi/FkMK9EtdU+7SQ5U2ht8E91Jg0F x26vp6ma7SxQmDKaU6KdTU12IpAHK+p2YuT9DvNGS1fpjQ0aA1oV6TtyKe8gy2wi4TYD UTPNgjiLjJuD+1qeYEWgWZxuhL6Gxyoip6rV+5gKZOSgVdUKBwHREX1VZSETnifrZiiF +zDyID6ovIOaCTSNWFA5PCLJfzoOwmBGjIYUsIcxhTK5LHsNOtY5McHw7OYYZUOO/o3V ow== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wxtaug4p6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Mar 2024 18:11:57 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42IIBOwa004176; Mon, 18 Mar 2024 18:11:57 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wxtaug4nu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Mar 2024 18:11:57 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42IGJdEO019891; Mon, 18 Mar 2024 18:11:56 GMT Received: from smtprelay03.wdc07v.mail.ibm.com ([172.16.1.70]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3wwqyka6j9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Mar 2024 18:11:56 +0000 Received: from smtpav06.wdc07v.mail.ibm.com (smtpav06.wdc07v.mail.ibm.com [10.39.53.233]) by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42IIBrEM46596358 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 18 Mar 2024 18:11:55 GMT Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 006AF58054; Mon, 18 Mar 2024 18:11:53 +0000 (GMT) Received: from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EC4385803F; Mon, 18 Mar 2024 18:11:48 +0000 (GMT) Received: from [9.43.117.180] (unknown [9.43.117.180]) by smtpav06.wdc07v.mail.ibm.com (Postfix) with ESMTP; Mon, 18 Mar 2024 18:11:48 +0000 (GMT) Message-ID: <67efc48f-b98d-495a-b94d-c3197ec179ea@linux.ibm.com> Date: Mon, 18 Mar 2024 23:41:46 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PING^0][PATCH V3 0/2] aarch64: Place target independent and dependent changed and unchanged code in one file. Content-Language: en-US From: Ajit Agarwal To: Richard Sandiford , Alex Coplan , "Kewen.Lin" , Michael Meissner , Segher Boessenkool , Peter Bergner , David Edelsohn , gcc-patches References: <6a013cf5-a5f2-44df-bdb3-a4985fbae06a@linux.ibm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: J-eYJrjctIwE_TjA3uYRbFt3Nn-NEJc4 X-Proofpoint-GUID: EPemri2S35fFLue-Ct8cC9S-ngODdWhW X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-18_12,2024-03-18_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 spamscore=0 priorityscore=1501 adultscore=0 phishscore=0 mlxscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 clxscore=1015 impostorscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403140000 definitions=main-2403180138 Received-SPF: pass client-ip=148.163.158.5; envelope-from=aagarwa1@linux.ibm.com; helo=mx0b-001b2d01.pphosted.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_EF=-0.1,RCVD_IN_MSPIKE_H4=0.001,RCVD_IN_MSPIKE_WL=0.001,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_MANYTO,SPF_HELO_PASS,SPF_SOFTFAIL,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Richard/Alex: Ping! Please reply. Thanks & Regards Ajit On 27/02/24 12:33 pm, Ajit Agarwal wrote: > Hello Richard/Alex: > > This patch has better diff with changed and unchanged code. > Unchanged code and some of the changed code will be extracted > into target independent headers and sources wherein target > deoendent code changed and unchanged code would be in target > dependent file like aarch64-ldp-fusion > > Please review. > > Thanks & Regards > Ajit > > On 23/02/24 4:41 pm, Ajit Agarwal wrote: >> Hello Richard/Alex/Segher: >> >> This patch adds the changed code for target independent and >> dependent code for load store fusion. >> >> Common infrastructure of load store pair fusion is >> divided into target independent and target dependent >> changed code. >> >> Target independent code is the Generic code with >> pure virtual function to interface betwwen target >> independent and dependent code. >> >> Target dependent code is the implementation of pure >> virtual function for aarch64 target and the call >> to target independent code. >> >> Bootstrapped for aarch64-linux-gnu. >> >> Thanks & Regards >> Ajit >> >> aarch64: Place target independent and dependent changed code in one file. >> >> Common infrastructure of load store pair fusion is >> divided into target independent and target dependent >> changed code. >> >> Target independent code is the Generic code with >> pure virtual function to interface betwwen target >> independent and dependent code. >> >> Target dependent code is the implementation of pure >> virtual function for aarch64 target and the call >> to target independent code. >> >> 2024-02-23 Ajit Kumar Agarwal >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-ldp-fusion.cc: Place target >> independent and dependent changed code. >> --- >> gcc/config/aarch64/aarch64-ldp-fusion.cc | 437 ++++++++++++++++------- >> 1 file changed, 305 insertions(+), 132 deletions(-) >> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc >> index 22ed95eb743..2ef22ff1e96 100644 >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc >> @@ -40,10 +40,10 @@ >> >> using namespace rtl_ssa; >> >> -static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7; >> -static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1)); >> -static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1; >> -static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1; >> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7; >> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << (PAIR_MEM_IMM_BITS - 1)); >> +static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1; >> +static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1; >> >> // We pack these fields (load_p, fpsimd_p, and size) into an integer >> // (LFS) which we use as part of the key into the main hash tables. >> @@ -138,8 +138,18 @@ struct alt_base >> poly_int64 offset; >> }; >> >> +// Virtual base class for load/store walkers used in alias analysis. >> +struct alias_walker >> +{ >> + virtual bool conflict_p (int &budget) const = 0; >> + virtual insn_info *insn () const = 0; >> + virtual bool valid () const = 0; >> + virtual void advance () = 0; >> +}; >> + >> + >> // State used by the pass for a given basic block. >> -struct ldp_bb_info >> +struct pair_fusion >> { >> using def_hash = nofree_ptr_hash; >> using expr_key_t = pair_hash>; >> @@ -161,13 +171,13 @@ struct ldp_bb_info >> static const size_t obstack_alignment = sizeof (void *); >> bb_info *m_bb; >> >> - ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false) >> + pair_fusion (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false) >> { >> obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE, >> obstack_alignment, obstack_chunk_alloc, >> obstack_chunk_free); >> } >> - ~ldp_bb_info () >> + ~pair_fusion () >> { >> obstack_free (&m_obstack, nullptr); >> >> @@ -177,10 +187,50 @@ struct ldp_bb_info >> bitmap_obstack_release (&m_bitmap_obstack); >> } >> } >> + void track_access (insn_info *, bool load, rtx mem); >> + void transform (); >> + void cleanup_tombstones (); >> + virtual void set_multiword_subreg (insn_info *i1, insn_info *i2, >> + bool load_p) = 0; >> + virtual rtx gen_load_store_pair (rtx *pats, rtx writeback, >> + bool load_p) = 0; >> + void merge_pairs (insn_list_t &, insn_list_t &, >> + bool load_p, unsigned access_size); >> + virtual void transform_for_base (int load_size, access_group &group) = 0; >> + >> + bool try_fuse_pair (bool load_p, unsigned access_size, >> + insn_info *i1, insn_info *i2); >> + >> + bool fuse_pair (bool load_p, unsigned access_size, >> + int writeback, >> + insn_info *i1, insn_info *i2, >> + base_cand &base, >> + const insn_range_info &move_range); >> + >> + void do_alias_analysis (insn_info *alias_hazards[4], >> + alias_walker *walkers[4], >> + bool load_p); >> + >> + void track_tombstone (int uid); >> + >> + bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); >> + >> + virtual bool is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, >> + bool load_p) = 0; >> + >> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; >> + virtual bool pair_trailing_writeback_p () = 0; >> + virtual bool pair_check_register_operand (bool load_p, rtx reg_op, >> + machine_mode mem_mode) = 0; >> + virtual int pair_mem_alias_check_limit () = 0; >> + virtual bool pair_is_writeback () = 0 ; >> + virtual bool pair_mem_ok_policy (rtx first_mem, bool load_p, >> + machine_mode mode) = 0; >> + virtual bool fuseable_store_p (insn_info *i1, insn_info *i2) = 0; >> + virtual bool fuseable_load_p (insn_info *info) = 0; >> >> - inline void track_access (insn_info *, bool load, rtx mem); >> - inline void transform (); >> - inline void cleanup_tombstones (); >> + template >> + void traverse_base_map (Map &map); >> >> private: >> obstack m_obstack; >> @@ -191,30 +241,157 @@ private: >> bool m_emitted_tombstone; >> >> inline splay_tree_node *node_alloc (access_record *); >> +}; >> >> - template >> - inline void traverse_base_map (Map &map); >> - inline void transform_for_base (int load_size, access_group &group); >> - >> - inline void merge_pairs (insn_list_t &, insn_list_t &, >> - bool load_p, unsigned access_size); >> +struct aarch64_pair_fusion : public pair_fusion >> +{ >> +public: >> + aarch64_pair_fusion (bb_info *bb) : pair_fusion (bb) {}; >> + bool is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool load_p) >> + { >> + const bool fpsimd_op_p >> + = reload_completed >> + ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) >> + : (GET_MODE_CLASS (mem_mode) != MODE_INT >> + && (load_p || !aarch64_const_zero_rtx_p (reg_op))); >> + return fpsimd_op_p; >> + } >> >> - inline bool try_fuse_pair (bool load_p, unsigned access_size, >> - insn_info *i1, insn_info *i2); >> + bool pair_mem_ok_policy (rtx first_mem, bool load_p, machine_mode mode) >> + { >> + return aarch64_mem_ok_with_ldpstp_policy_model (first_mem, >> + load_p, >> + mode); >> + } >> + bool pair_operand_mode_ok_p (machine_mode mode); >> >> - inline bool fuse_pair (bool load_p, unsigned access_size, >> - int writeback, >> - insn_info *i1, insn_info *i2, >> - base_cand &base, >> - const insn_range_info &move_range); >> + void transform_for_base (int encoded_lfs, >> + access_group &group); >> + rtx gen_load_store_pair (rtx *pats, >> + rtx writeback, >> + bool load_p) >> + { >> + rtx pair_pat; >> >> - inline void track_tombstone (int uid); >> + if (writeback) >> + { >> + auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]); >> + pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); >> + } >> + else if (load_p) >> + pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), >> + XEXP (pats[1], 0), >> + XEXP (pats[0], 1)); >> + else >> + pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), >> + XEXP (pats[0], 1), >> + XEXP (pats[1], 1)); >> + return pair_pat; >> + } >> >> - inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); >> + void set_multiword_subreg (insn_info *i1, insn_info *i2, bool load_p) >> + { >> + if (i1 || i2 || load_p) >> + return; >> + return; >> + } >> + bool pair_trailing_writeback_p () >> + { >> + return aarch64_ldp_writeback > 1; >> + } >> + bool pair_check_register_operand (bool load_p, rtx reg_op, machine_mode mem_mode) >> + { >> + return (load_p >> + ? aarch64_ldp_reg_operand (reg_op, mem_mode) >> + : aarch64_stp_reg_operand (reg_op, mem_mode)); >> + } >> + int pair_mem_alias_check_limit () >> + { >> + return aarch64_ldp_alias_check_limit; >> + } >> + bool fuseable_store_p (insn_info *i1, insn_info *i2) { return i1 || i2;} >> + bool fuseable_load_p (insn_info *insn) { return insn;} >> + bool pair_is_writeback () >> + { >> + return !aarch64_ldp_writeback; >> + } >> }; >> >> +bool >> +store_modifies_mem_p (rtx mem, insn_info *store_insn, int &budget); >> +bool load_modified_by_store_p (insn_info *load, >> + insn_info *store, >> + int &budget); >> +extern insn_info * >> +try_repurpose_store (insn_info *first, >> + insn_info *second, >> + const insn_range_info &move_range); >> + >> +void reset_debug_use (use_info *use); >> + >> +extern void >> +fixup_debug_uses (obstack_watermark &attempt, >> + insn_info *insns[2], >> + rtx orig_rtl[2], >> + insn_info *pair_dst, >> + insn_info *trailing_add, >> + bool load_p, >> + int writeback, >> + rtx writeback_effect, >> + unsigned base_regno); >> + >> +void >> +fixup_debug_uses_trailing_add (obstack_watermark &attempt, >> + insn_info *pair_dst, >> + insn_info *trailing_add, >> + rtx writeback_effect); >> + >> + >> +extern void >> +fixup_debug_use (obstack_watermark &attempt, >> + use_info *use, >> + def_info *def, >> + rtx base, >> + poly_int64 wb_offset); >> + >> +extern insn_info * >> +find_trailing_add (insn_info *insns[2], >> + const insn_range_info &pair_range, >> + int initial_writeback, >> + rtx *writeback_effect, >> + def_info **add_def, >> + def_info *base_def, >> + poly_int64 initial_offset, >> + unsigned access_size); >> + >> +rtx drop_writeback (rtx mem); >> +rtx pair_mem_strip_offset (rtx mem, poly_int64 *offset); >> +bool any_pre_modify_p (rtx x); >> +bool any_post_modify_p (rtx x); >> +int encode_lfs (lfs_fields fields); >> +extern insn_info * latest_hazard_before (insn_info *insn, rtx *ignore, >> + insn_info *ignore_insn = nullptr); >> +insn_info * first_hazard_after (insn_info *insn, rtx *ignore); >> +bool ranges_overlap_p (const insn_range_info &r1, const insn_range_info &r2); >> +insn_range_info get_def_range (def_info *def); >> +insn_range_info def_downwards_move_range (def_info *def); >> +insn_range_info def_upwards_move_range (def_info *def); >> +rtx gen_tombstone (void); >> +rtx filter_notes (rtx note, rtx result, bool *eh_region, rtx *fr_expr); >> +rtx combine_reg_notes (insn_info *i1, insn_info *i2, bool load_p); >> +rtx extract_writebacks (bool load_p, rtx pats[2], int changed); >> +void do_alias_analysis (insn_info *alias_hazards[4], >> + alias_walker *walkers[4], >> + bool load_p); >> +int get_viable_bases (insn_info *insns[2], >> + vec &base_cands, >> + rtx cand_mems[2], >> + unsigned access_size, >> + bool reversed); >> +void dump_insn_list (FILE *f, const insn_list_t &l); >> + >> splay_tree_node * >> -ldp_bb_info::node_alloc (access_record *access) >> +pair_fusion::node_alloc (access_record *access) >> { >> using T = splay_tree_node; >> void *addr = obstack_alloc (&m_obstack, sizeof (T)); >> @@ -224,7 +401,7 @@ ldp_bb_info::node_alloc (access_record *access) >> // Given a mem MEM, if the address has side effects, return a MEM that accesses >> // the same address but without the side effects. Otherwise, return >> // MEM unchanged. >> -static rtx >> +rtx >> drop_writeback (rtx mem) >> { >> rtx addr = XEXP (mem, 0); >> @@ -261,8 +438,8 @@ drop_writeback (rtx mem) >> // Convenience wrapper around strip_offset that can also look through >> // RTX_AUTOINC addresses. The interface is like strip_offset except we take a >> // MEM so that we know the mode of the access. >> -static rtx >> -ldp_strip_offset (rtx mem, poly_int64 *offset) >> +rtx >> +pair_mem_strip_offset (rtx mem, poly_int64 *offset) >> { >> rtx addr = XEXP (mem, 0); >> >> @@ -295,7 +472,7 @@ ldp_strip_offset (rtx mem, poly_int64 *offset) >> } >> >> // Return true if X is a PRE_{INC,DEC,MODIFY} rtx. >> -static bool >> +bool >> any_pre_modify_p (rtx x) >> { >> const auto code = GET_CODE (x); >> @@ -303,7 +480,7 @@ any_pre_modify_p (rtx x) >> } >> >> // Return true if X is a POST_{INC,DEC,MODIFY} rtx. >> -static bool >> +bool >> any_post_modify_p (rtx x) >> { >> const auto code = GET_CODE (x); >> @@ -332,9 +509,15 @@ ldp_operand_mode_ok_p (machine_mode mode) >> return reload_completed || mode != TImode; >> } >> >> +bool >> +aarch64_pair_fusion::pair_operand_mode_ok_p (machine_mode mode) >> +{ >> + return (ldp_operand_mode_ok_p (mode)); >> +} >> + >> // Given LFS (load_p, fpsimd_p, size) fields in FIELDS, encode these >> // into an integer for use as a hash table key. >> -static int >> +int >> encode_lfs (lfs_fields fields) >> { >> int size_log2 = exact_log2 (fields.size); >> @@ -396,7 +579,7 @@ access_group::track (Alloc alloc_node, poly_int64 offset, insn_info *insn) >> // MEM_EXPR base (i.e. a tree decl) relative to which we can track the access. >> // LFS is used as part of the key to the hash table, see track_access. >> bool >> -ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> +pair_fusion::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> { >> if (!MEM_EXPR (mem) || !MEM_OFFSET_KNOWN_P (mem)) >> return false; >> @@ -412,7 +595,7 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> const machine_mode mem_mode = GET_MODE (mem); >> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); >> >> - // Punt on misaligned offsets. LDP/STP instructions require offsets to be a >> + // Punt on misaligned offsets. PAIR MEM instructions require offsets to be a >> // multiple of the access size, and we believe that misaligned offsets on >> // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases. >> if (!multiple_p (offset, mem_size)) >> @@ -438,46 +621,38 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> } >> >> // Main function to begin pair discovery. Given a memory access INSN, >> -// determine whether it could be a candidate for fusing into an ldp/stp, >> +// determine whether it could be a candidate for fusing into an pair mem, >> // and if so, track it in the appropriate data structure for this basic >> // block. LOAD_P is true if the access is a load, and MEM is the mem >> // rtx that occurs in INSN. >> void >> -ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> +pair_fusion::track_access (insn_info *insn, bool load_p, rtx mem) >> { >> // We can't combine volatile MEMs, so punt on these. >> if (MEM_VOLATILE_P (mem)) >> return; >> >> - // Ignore writeback accesses if the param says to do so. >> - if (!aarch64_ldp_writeback >> + // Ignore writeback accesses if the param says to do so >> + if (pair_is_writeback () >> && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC) >> return; >> >> const machine_mode mem_mode = GET_MODE (mem); >> - if (!ldp_operand_mode_ok_p (mem_mode)) >> + >> + if (!pair_operand_mode_ok_p (mem_mode)) >> return; >> >> rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p); >> >> - // Ignore the access if the register operand isn't suitable for ldp/stp. >> - if (load_p >> - ? !aarch64_ldp_reg_operand (reg_op, mem_mode) >> - : !aarch64_stp_reg_operand (reg_op, mem_mode)) >> + if (!pair_check_register_operand (load_p, reg_op, mem_mode)) >> return; >> - >> // We want to segregate FP/SIMD accesses from GPR accesses. >> // >> // Before RA, we use the modes, noting that stores of constant zero >> // operands use GPRs (even in non-integer modes). After RA, we use >> // the hard register numbers. >> - const bool fpsimd_op_p >> - = reload_completed >> - ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) >> - : (GET_MODE_CLASS (mem_mode) != MODE_INT >> - && (load_p || !aarch64_const_zero_rtx_p (reg_op))); >> - >> - // Note ldp_operand_mode_ok_p already rejected VL modes. >> + const bool fpsimd_op_p = is_fpsimd_op_p (reg_op, mem_mode, load_p); >> + // Note pair_operand_mode_ok_p already rejected VL modes. >> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); >> const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size }; >> >> @@ -487,7 +662,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> poly_int64 mem_off; >> rtx addr = XEXP (mem, 0); >> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC; >> - rtx base = ldp_strip_offset (mem, &mem_off); >> + rtx base = pair_mem_strip_offset (mem, &mem_off); >> if (!REG_P (base)) >> return; >> >> @@ -507,7 +682,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> // accesses until after RA. >> // >> // As it stands, addresses with offsets in range for LDR but not >> - // in range for LDP/STP are currently reloaded inefficiently, >> + // in range for PAIR MEM LOAD STORE are currently reloaded inefficiently, >> // ending up with a separate base register for each pair. >> // >> // In theory LRA should make use of >> @@ -519,8 +694,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> // that calls targetm.legitimize_address_displacement. >> // >> // So for now, it's better to punt when we can't be sure that the >> - // offset is in range for LDP/STP. Out-of-range cases can then be >> - // handled after RA by the out-of-range LDP/STP peepholes. Eventually, it >> + // offset is in range for PAIR MEM LOAD STORE. Out-of-range cases can then be >> + // handled after RA by the out-of-range PAIR MEM peepholes. Eventually, it >> // would be nice to handle known out-of-range opportunities in the >> // pass itself (for stack accesses, this would be in the post-RA pass). >> if (!reload_completed >> @@ -573,7 +748,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> gcc_unreachable (); // Base defs should be unique. >> } >> >> - // Punt on misaligned offsets. LDP/STP require offsets to be a multiple of >> + // Punt on misaligned offsets. PAIR MEM require offsets to be a multiple of >> // the access size. >> if (!multiple_p (mem_off, mem_size)) >> return; >> @@ -612,9 +787,9 @@ static bool no_ignore (insn_info *) { return false; } >> // >> // N.B. we ignore any defs/uses of memory here as we deal with that separately, >> // making use of alias disambiguation. >> -static insn_info * >> +insn_info * >> latest_hazard_before (insn_info *insn, rtx *ignore, >> - insn_info *ignore_insn = nullptr) >> + insn_info *ignore_insn) >> { >> insn_info *result = nullptr; >> >> @@ -698,7 +873,7 @@ latest_hazard_before (insn_info *insn, rtx *ignore, >> // >> // N.B. we ignore any defs/uses of memory here as we deal with that separately, >> // making use of alias disambiguation. >> -static insn_info * >> +insn_info * >> first_hazard_after (insn_info *insn, rtx *ignore) >> { >> insn_info *result = nullptr; >> @@ -787,7 +962,7 @@ first_hazard_after (insn_info *insn, rtx *ignore) >> } >> >> // Return true iff R1 and R2 overlap. >> -static bool >> +bool >> ranges_overlap_p (const insn_range_info &r1, const insn_range_info &r2) >> { >> // If either range is empty, then their intersection is empty. >> @@ -801,7 +976,7 @@ ranges_overlap_p (const insn_range_info &r1, const insn_range_info &r2) >> } >> >> // Get the range of insns that def feeds. >> -static insn_range_info get_def_range (def_info *def) >> +insn_range_info get_def_range (def_info *def) >> { >> insn_info *last = def->next_def ()->insn ()->prev_nondebug_insn (); >> return { def->insn (), last }; >> @@ -809,7 +984,7 @@ static insn_range_info get_def_range (def_info *def) >> >> // Given a def (of memory), return the downwards range within which we >> // can safely move this def. >> -static insn_range_info >> +insn_range_info >> def_downwards_move_range (def_info *def) >> { >> auto range = get_def_range (def); >> @@ -827,7 +1002,7 @@ def_downwards_move_range (def_info *def) >> >> // Given a def (of memory), return the upwards range within which we can >> // safely move this def. >> -static insn_range_info >> +insn_range_info >> def_upwards_move_range (def_info *def) >> { >> def_info *prev = def->prev_def (); >> @@ -974,7 +1149,7 @@ private: >> // Given candidate store insns FIRST and SECOND, see if we can re-purpose one >> // of them (together with its def of memory) for the stp insn. If so, return >> // that insn. Otherwise, return null. >> -static insn_info * >> +insn_info * >> try_repurpose_store (insn_info *first, >> insn_info *second, >> const insn_range_info &move_range) >> @@ -1001,7 +1176,7 @@ try_repurpose_store (insn_info *first, >> // >> // These are deleted at the end of the pass and uses re-parented appropriately >> // at this point. >> -static rtx >> +rtx >> gen_tombstone (void) >> { >> return gen_rtx_CLOBBER (VOIDmode, >> @@ -1034,7 +1209,7 @@ aarch64_operand_mode_for_pair_mode (machine_mode mode) >> // REG_EH_REGION note in the resulting list. FR_EXPR is used to return any >> // REG_FRAME_RELATED_EXPR note we find, as these can need special handling in >> // combine_reg_notes. >> -static rtx >> +rtx >> filter_notes (rtx note, rtx result, bool *eh_region, rtx *fr_expr) >> { >> for (; note; note = XEXP (note, 1)) >> @@ -1084,7 +1259,7 @@ filter_notes (rtx note, rtx result, bool *eh_region, rtx *fr_expr) >> >> // Return the notes that should be attached to a combination of I1 and I2, where >> // *I1 < *I2. LOAD_P is true for loads. >> -static rtx >> +rtx >> combine_reg_notes (insn_info *i1, insn_info *i2, bool load_p) >> { >> // Temporary storage for REG_FRAME_RELATED_EXPR notes. >> @@ -1133,7 +1308,7 @@ combine_reg_notes (insn_info *i1, insn_info *i2, bool load_p) >> // relative to the initial value of the base register, and output these >> // in PATS. Return an rtx that represents the overall change to the >> // base register. >> -static rtx >> +rtx >> extract_writebacks (bool load_p, rtx pats[2], int changed) >> { >> rtx base_reg = NULL_RTX; >> @@ -1150,7 +1325,7 @@ extract_writebacks (bool load_p, rtx pats[2], int changed) >> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC; >> >> poly_int64 offset; >> - rtx this_base = ldp_strip_offset (mem, &offset); >> + rtx this_base = pair_mem_strip_offset (mem, &offset); >> gcc_assert (REG_P (this_base)); >> if (base_reg) >> gcc_assert (rtx_equal_p (base_reg, this_base)); >> @@ -1207,7 +1382,7 @@ extract_writebacks (bool load_p, rtx pats[2], int changed) >> // base register. If there is one, we choose the first such update after >> // PAIR_DST that is still in the same BB as our pair. We return the new def in >> // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT. >> -static insn_info * >> +insn_info * >> find_trailing_add (insn_info *insns[2], >> const insn_range_info &pair_range, >> int initial_writeback, >> @@ -1286,7 +1461,7 @@ find_trailing_add (insn_info *insns[2], >> >> off_hwi /= access_size; >> >> - if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM) >> + if (off_hwi < PAIR_MEM_MIN_IMM || off_hwi > PAIR_MEM_MAX_IMM) >> return nullptr; >> >> auto dump_prefix = [&]() >> @@ -1328,7 +1503,7 @@ find_trailing_add (insn_info *insns[2], >> // We just emitted a tombstone with uid UID, track it in a bitmap for >> // this BB so we can easily identify it later when cleaning up tombstones. >> void >> -ldp_bb_info::track_tombstone (int uid) >> +pair_fusion::track_tombstone (int uid) >> { >> if (!m_emitted_tombstone) >> { >> @@ -1344,7 +1519,7 @@ ldp_bb_info::track_tombstone (int uid) >> >> // Reset the debug insn containing USE (the debug insn has been >> // optimized away). >> -static void >> +void >> reset_debug_use (use_info *use) >> { >> auto use_insn = use->insn (); >> @@ -1360,7 +1535,7 @@ reset_debug_use (use_info *use) >> // is an update of the register BASE by a constant, given by WB_OFFSET, >> // and we can preserve debug info by accounting for the change in side >> // effects. >> -static void >> +void >> fixup_debug_use (obstack_watermark &attempt, >> use_info *use, >> def_info *def, >> @@ -1463,7 +1638,7 @@ fixup_debug_use (obstack_watermark &attempt, >> // is a trailing add insn which is being folded into the pair to make it >> // use writeback addressing, and WRITEBACK_EFFECT is the pattern for >> // TRAILING_ADD. >> -static void >> +void >> fixup_debug_uses_trailing_add (obstack_watermark &attempt, >> insn_info *pair_dst, >> insn_info *trailing_add, >> @@ -1500,7 +1675,7 @@ fixup_debug_uses_trailing_add (obstack_watermark &attempt, >> // writeback, and WRITEBACK_EFFECT is an rtx describing the overall update to >> // the base register in the final pair (if any). BASE_REGNO gives the register >> // number of the base register used in the final pair. >> -static void >> +void >> fixup_debug_uses (obstack_watermark &attempt, >> insn_info *insns[2], >> rtx orig_rtl[2], >> @@ -1528,7 +1703,7 @@ fixup_debug_uses (obstack_watermark &attempt, >> gcc_checking_assert (GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) >> == RTX_AUTOINC); >> >> - base = ldp_strip_offset (mem, &offset); >> + base = pair_mem_strip_offset (mem, &offset); >> gcc_checking_assert (REG_P (base) && REGNO (base) == base_regno); >> } >> fixup_debug_use (attempt, use, def, base, offset); >> @@ -1664,7 +1839,7 @@ fixup_debug_uses (obstack_watermark &attempt, >> // BASE gives the chosen base candidate for the pair and MOVE_RANGE is >> // a singleton range which says where to place the pair. >> bool >> -ldp_bb_info::fuse_pair (bool load_p, >> +pair_fusion::fuse_pair (bool load_p, >> unsigned access_size, >> int writeback, >> insn_info *i1, insn_info *i2, >> @@ -1684,6 +1859,9 @@ ldp_bb_info::fuse_pair (bool load_p, >> insn_change::DELETE); >> }; >> >> + if (*i1 > *i2) >> + return false; >> + >> insn_info *first = (*i1 < *i2) ? i1 : i2; >> insn_info *second = (first == i1) ? i2 : i1; >> >> @@ -1800,7 +1978,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> { >> if (dump_file) >> fprintf (dump_file, >> - " ldp: i%d has wb but subsequent i%d has non-wb " >> + " pair_mem: i%d has wb but subsequent i%d has non-wb " >> "update of base (r%d), dropping wb\n", >> insns[0]->uid (), insns[1]->uid (), base_regno); >> gcc_assert (writeback_effect); >> @@ -1823,7 +2001,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> } >> >> // If either of the original insns had writeback, but the resulting pair insn >> - // does not (can happen e.g. in the ldp edge case above, or if the writeback >> + // does not (can happen e.g. in the pair mem edge case above, or if the writeback >> // effects cancel out), then drop the def(s) of the base register as >> // appropriate. >> // >> @@ -1842,7 +2020,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> // update of the base register and try and fold it in to make this into a >> // writeback pair. >> insn_info *trailing_add = nullptr; >> - if (aarch64_ldp_writeback > 1 >> + if (pair_trailing_writeback_p () >> && !writeback_effect >> && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1, >> XEXP (pats[0], 0), nullptr) >> @@ -1863,14 +2041,14 @@ ldp_bb_info::fuse_pair (bool load_p, >> } >> >> // Now that we know what base mem we're going to use, check if it's OK >> - // with the ldp/stp policy. >> + // with the pair mem policy. >> rtx first_mem = XEXP (pats[0], load_p); >> - if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem, >> - load_p, >> - GET_MODE (first_mem))) >> + if (!pair_mem_ok_policy (first_mem, >> + load_p, >> + GET_MODE (first_mem))) >> { >> if (dump_file) >> - fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n", >> + fprintf (dump_file, "punting on pair (%d,%d), pair mem policy says no\n", >> i1->uid (), i2->uid ()); >> return false; >> } >> @@ -1878,20 +2056,12 @@ ldp_bb_info::fuse_pair (bool load_p, >> rtx reg_notes = combine_reg_notes (first, second, load_p); >> >> rtx pair_pat; >> - if (writeback_effect) >> - { >> - auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]); >> - pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); >> - } >> - else if (load_p) >> - pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), >> - XEXP (pats[1], 0), >> - XEXP (pats[0], 1)); >> - else >> - pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), >> - XEXP (pats[0], 1), >> - XEXP (pats[1], 1)); >> >> + set_multiword_subreg (first, second, load_p); >> + >> + pair_pat = gen_load_store_pair (pats, writeback_effect, load_p); >> + if (pair_pat == NULL_RTX) >> + return false; >> insn_change *pair_change = nullptr; >> auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) { >> rtx_insn *rti = change->insn ()->rtl (); >> @@ -2075,7 +2245,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> >> // Return true if STORE_INSN may modify mem rtx MEM. Make sure we keep >> // within our BUDGET for alias analysis. >> -static bool >> +bool >> store_modifies_mem_p (rtx mem, insn_info *store_insn, int &budget) >> { >> if (!budget) >> @@ -2098,7 +2268,7 @@ store_modifies_mem_p (rtx mem, insn_info *store_insn, int &budget) >> >> // Return true if LOAD may be modified by STORE. Make sure we keep >> // within our BUDGET for alias analysis. >> -static bool >> +bool >> load_modified_by_store_p (insn_info *load, >> insn_info *store, >> int &budget) >> @@ -2133,15 +2303,6 @@ load_modified_by_store_p (insn_info *load, >> return false; >> } >> >> -// Virtual base class for load/store walkers used in alias analysis. >> -struct alias_walker >> -{ >> - virtual bool conflict_p (int &budget) const = 0; >> - virtual insn_info *insn () const = 0; >> - virtual bool valid () const = 0; >> - virtual void advance () = 0; >> -}; >> - >> // Implement some common functionality used by both store_walker >> // and load_walker. >> template >> @@ -2259,13 +2420,13 @@ public: >> // >> // We try to maintain the invariant that if a walker becomes invalid, we >> // set its pointer to null. >> -static void >> -do_alias_analysis (insn_info *alias_hazards[4], >> +void >> +pair_fusion::do_alias_analysis (insn_info *alias_hazards[4], >> alias_walker *walkers[4], >> bool load_p) >> { >> const int n_walkers = 2 + (2 * !load_p); >> - int budget = aarch64_ldp_alias_check_limit; >> + int budget = pair_mem_alias_check_limit(); >> >> auto next_walker = [walkers,n_walkers](int current) -> int { >> for (int j = 1; j <= n_walkers; j++) >> @@ -2350,7 +2511,7 @@ do_alias_analysis (insn_info *alias_hazards[4], >> // >> // Returns an integer where bit (1 << i) is set if INSNS[i] uses writeback >> // addressing. >> -static int >> +int >> get_viable_bases (insn_info *insns[2], >> vec &base_cands, >> rtx cand_mems[2], >> @@ -2365,7 +2526,7 @@ get_viable_bases (insn_info *insns[2], >> { >> const bool is_lower = (i == reversed); >> poly_int64 poly_off; >> - rtx base = ldp_strip_offset (cand_mems[i], &poly_off); >> + rtx base = pair_mem_strip_offset (cand_mems[i], &poly_off); >> if (GET_RTX_CLASS (GET_CODE (XEXP (cand_mems[i], 0))) == RTX_AUTOINC) >> writeback |= (1 << i); >> >> @@ -2373,7 +2534,7 @@ get_viable_bases (insn_info *insns[2], >> continue; >> >> // Punt on accesses relative to eliminable regs. See the comment in >> - // ldp_bb_info::track_access for a detailed explanation of this. >> + // pair_fusion::track_access for a detailed explanation of this. >> if (!reload_completed >> && (REGNO (base) == FRAME_POINTER_REGNUM >> || REGNO (base) == ARG_POINTER_REGNUM)) >> @@ -2397,7 +2558,7 @@ get_viable_bases (insn_info *insns[2], >> if (!is_lower) >> base_off--; >> >> - if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM) >> + if (base_off < PAIR_MEM_MIN_IMM || base_off > PAIR_MEM_MAX_IMM) >> continue; >> >> use_info *use = find_access (insns[i]->uses (), REGNO (base)); >> @@ -2454,12 +2615,12 @@ get_viable_bases (insn_info *insns[2], >> } >> >> // Given two adjacent memory accesses of the same size, I1 and I2, try >> -// and see if we can merge them into a ldp or stp. >> +// and see if we can merge them into a pair mem load and store. >> // >> // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true >> // if the accesses are both loads, otherwise they are both stores. >> bool >> -ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> +pair_fusion::try_fuse_pair (bool load_p, unsigned access_size, >> insn_info *i1, insn_info *i2) >> { >> if (dump_file) >> @@ -2490,11 +2651,21 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> reg_ops[i] = XEXP (pats[i], !load_p); >> } >> >> + if (!load_p && !fuseable_store_p (i1, i2)) >> + { >> + if (dump_file) >> + fprintf (dump_file, >> + "punting on store-mem-pairs due to non fuseable cand (%d,%d)\n", >> + insns[0]->uid (), insns[1]->uid ()); >> + return false; >> + } >> + >> + >> if (load_p && reg_overlap_mentioned_p (reg_ops[0], reg_ops[1])) >> { >> if (dump_file) >> fprintf (dump_file, >> - "punting on ldp due to reg conflcits (%d,%d)\n", >> + "punting on pair mem load due to reg conflcits (%d,%d)\n", >> insns[0]->uid (), insns[1]->uid ()); >> return false; >> } >> @@ -2787,7 +2958,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> i1, i2, *base, range); >> } >> >> -static void >> +void >> dump_insn_list (FILE *f, const insn_list_t &l) >> { >> fprintf (f, "("); >> @@ -2843,7 +3014,7 @@ debug (const insn_list_t &l) >> // we can't re-order them anyway, so provided earlier passes have cleaned up >> // redundant loads, we shouldn't miss opportunities by doing this. >> void >> -ldp_bb_info::merge_pairs (insn_list_t &left_list, >> +pair_fusion::merge_pairs (insn_list_t &left_list, >> insn_list_t &right_list, >> bool load_p, >> unsigned access_size) >> @@ -2890,8 +3061,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, >> // of accesses. If we find two sets of adjacent accesses, call >> // merge_pairs. >> void >> -ldp_bb_info::transform_for_base (int encoded_lfs, >> - access_group &group) >> +aarch64_pair_fusion::transform_for_base (int encoded_lfs, >> + access_group &group) >> { >> const auto lfs = decode_lfs (encoded_lfs); >> const unsigned access_size = lfs.size; >> @@ -2919,7 +3090,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, >> // and remove all the tombstone insns, being sure to reparent any uses >> // of mem to previous defs when we do this. >> void >> -ldp_bb_info::cleanup_tombstones () >> +pair_fusion::cleanup_tombstones () >> { >> // No need to do anything if we didn't emit a tombstone insn for this BB. >> if (!m_emitted_tombstone) >> @@ -2947,7 +3118,7 @@ ldp_bb_info::cleanup_tombstones () >> >> template >> void >> -ldp_bb_info::traverse_base_map (Map &map) >> +pair_fusion::traverse_base_map (Map &map) >> { >> for (auto kv : map) >> { >> @@ -2958,7 +3129,7 @@ ldp_bb_info::traverse_base_map (Map &map) >> } >> >> void >> -ldp_bb_info::transform () >> +pair_fusion::transform () >> { >> traverse_base_map (expr_map); >> traverse_base_map (def_map); >> @@ -3174,7 +3345,9 @@ void ldp_fusion_bb (bb_info *bb) >> const bool track_stores >> = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >> >> - ldp_bb_info bb_state (bb); >> + pair_fusion *bb_state; >> + aarch64_pair_fusion derived (bb); >> + bb_state = &derived; >> >> for (auto insn : bb->nondebug_insns ()) >> { >> @@ -3194,13 +3367,13 @@ void ldp_fusion_bb (bb_info *bb) >> continue; >> >> if (track_stores && MEM_P (XEXP (pat, 0))) >> - bb_state.track_access (insn, false, XEXP (pat, 0)); >> + bb_state->track_access (insn, false, XEXP (pat, 0)); >> else if (track_loads && MEM_P (XEXP (pat, 1))) >> - bb_state.track_access (insn, true, XEXP (pat, 1)); >> + bb_state->track_access (insn, true, XEXP (pat, 1)); >> } >> >> - bb_state.transform (); >> - bb_state.cleanup_tombstones (); >> + bb_state->transform (); >> + bb_state->cleanup_tombstones (); >> } >> >> void ldp_fusion ()