From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id F250B385841E for ; Tue, 9 Apr 2024 12:00:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F250B385841E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F250B385841E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712664037; cv=none; b=qAAc7LjxnVqyNdJb4WeFttuL/B6A19+VF8rcEjwowM1qR9BOh8vNCfmNNUGnBPFx4vQrbIj/LuuPwfpBqG/xJDu/WezXRWOnM2i4pftIy5SM7XKZ7faMzpv8fLW/PS55Rs4bmg4D8FWR3p9Odof1Q0HSXq/uDNs92Wxw7OKzISM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712664037; c=relaxed/simple; bh=9aDp15640RZ0HhGxyZrAMUMDPQHQiIkWP/HJ8JxLCMw=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=BELyktdE+EuBbw7h6ikVTLivyTB6jzWF1yjea6PnDy70LJqNJTqaNO/3CP8hHA2dMquEHcQgKWVBJT+5xVfEhdoD0Deo0MQ1WgSIDCjCkObt1XQ+Qd1G5FDGpWVC9Skq8M2V64BXd2MfNlxrwO0ITt0ZFMG4jvLLZ8vUolQ92yE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 439BdAJa003647; Tue, 9 Apr 2024 12:00:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=f1+getxHpf5aLPN1fpuGmw5hAhHpvxK6/Zo+lUXwJUk=; b=QO2N04nQxyxDtDN879m3ItFkje9WE9nBJayjedZ2cSEXQmmr9YjybN3P6GNGztcrFehy JzwdL/eGnDXabFLbGfrq6kiJx7tDJOSzomvZRvq0Ke8y70q7I2Y1ZgJxixrmGvqaWZqO CAXxGcVEXHfF+n1qXo+7dYwNKFwJa5RW+aMUWel0YJol45/RInQD6NE7yGSeCrI5zSty VUtpRo7uQXk4cDkDdzf+40dT8aOftRbfCli5bmhR5RL6uBnJKSoyuJz//RfwxJEdxml2 5QNQecECt+XtFMKHLCUK5ggtN1vTSTXMzmIiA1lSjlcpD60FSOYrnwqfmoUbUdKzv08r hg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xd43084bt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Apr 2024 12:00:26 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 439C0QBV001369; Tue, 9 Apr 2024 12:00:26 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xd43084br-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Apr 2024 12:00:25 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4398hkSO029940; Tue, 9 Apr 2024 12:00:25 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([172.16.1.5]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3xbj7m5w3y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Apr 2024 12:00:25 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 439C0MMC22413898 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 9 Apr 2024 12:00:24 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0DA4D58061; Tue, 9 Apr 2024 12:00:22 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B857058073; Tue, 9 Apr 2024 12:00:12 +0000 (GMT) Received: from [9.43.56.173] (unknown [9.43.56.173]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP; Tue, 9 Apr 2024 12:00:11 +0000 (GMT) Message-ID: Date: Tue, 9 Apr 2024 17:30:06 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file To: Alex Coplan Cc: Richard Sandiford , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , David Edelsohn , Peter Bergner , gcc-patches References: Content-Language: en-US From: Ajit Agarwal In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: K0Y0NwnlhlMUJ0eLl-KwQksAY5MQOTra X-Proofpoint-GUID: BEGK-QV7v-mdwlj0-v3VA6wRyhwW4JYk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-09_08,2024-04-09_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 phishscore=0 impostorscore=0 spamscore=0 clxscore=1015 mlxscore=0 lowpriorityscore=0 suspectscore=0 bulkscore=0 priorityscore=1501 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2404010000 definitions=main-2404090077 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 05/04/24 10:03 pm, Alex Coplan wrote: > On 05/04/2024 13:53, Ajit Agarwal wrote: >> Hello Alex/Richard: >> >> All review comments are incorporated. > > Thanks, I was kind-of expecting you to also send the renaming patch as a > preparatory patch as we discussed. > > Sorry for another meta comment, but: I think the reason that the Linaro > CI isn't running tests on your patches is actually because you're > sending 1/3 of a series but not sending the rest of the series. > > So please can you either send this as an individual preparatory patch > (not marked as a series) or if you're going to send a series (e.g. with > a preparatory rename patch as 1/2 and this as 2/2) then send the entire > series when you make updates. That way the CI should test your patches, > which would be helpful. > Addressed. >> >> Common infrastructure of load store pair fusion is divided into target >> independent and target dependent changed code. >> >> Target independent code is the Generic code with pure virtual function >> to interface betwwen target independent and dependent code. >> >> Target dependent code is the implementation of pure virtual function for >> aarch64 target and the call to target independent code. >> >> Thanks & Regards >> Ajit >> >> >> aarch64: Place target independent and dependent changed code in one file >> >> Common infrastructure of load store pair fusion is divided into target >> independent and target dependent changed code. >> >> Target independent code is the Generic code with pure virtual function >> to interface betwwen target independent and dependent code. >> >> Target dependent code is the implementation of pure virtual function for >> aarch64 target and the call to target independent code. >> >> 2024-04-06 Ajit Kumar Agarwal >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-ldp-fusion.cc: Place target >> independent and dependent changed code. > > You're going to need a proper ChangeLog eventually, but I guess there's > no need for that right now. > >> --- >> gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++++++++++++++-------- >> 1 file changed, 249 insertions(+), 122 deletions(-) >> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc >> index 22ed95eb743..cb21b514ef7 100644 >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc >> @@ -138,8 +138,122 @@ struct alt_base >> poly_int64 offset; >> }; >> >> +// Virtual base class for load/store walkers used in alias analysis. >> +struct alias_walker >> +{ >> + virtual bool conflict_p (int &budget) const = 0; >> + virtual insn_info *insn () const = 0; >> + virtual bool valid () const = 0; > > Heh, looking at this made me realise there is a whitespace bug here in > the existing code (double space after const). Sorry about that! I'll > push an obvious fix for that. > >> + virtual void advance () = 0; >> +}; >> + >> +struct pair_fusion { >> + >> + pair_fusion () {}; > > This ctor looks pointless at the moment. Perhaps instead we could put > the contents of ldp_fusion_init in here and then delete that function? > Addressed. >> + virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode, >> + bool load_p) = 0; > > Please can we have comments above each of these virtual functions > describing any parameters, what the purpose of the hook is, and the > interpretation of the return value? This will serve as the > documentation for other targets that want to make use of the pass. > > It might make sense to have a default-false implementation for > fpsimd_op_p, especially if you don't want to make use of this bit for > rs6000. > Addressed. >> + >> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; >> + virtual bool pair_trailing_writeback_p () = 0; > > Sorry for the run-around, but: I think this and > handle_writeback_opportunities () should be the same function, either > returning an enum or taking an enum and returning a boolean. > > At a minimum they should have more similar sounding names. > Addressed. >> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, >> + machine_mode mem_mode) = 0; >> + virtual int pair_mem_alias_check_limit () = 0; >> + virtual bool handle_writeback_opportunities () = 0 ; >> + virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, >> + machine_mode mode) = 0; >> + virtual rtx gen_mem_pair (rtx *pats, rtx writeback, > > Nit: excess whitespace after pats, > >> + bool load_p) = 0; >> + virtual bool pair_mem_promote_writeback_p (rtx pat) = 0; >> + virtual bool track_load_p () = 0; >> + virtual bool track_store_p () = 0; > > I think it would probably make more sense for these two to have > default-true implementations rather than being pure virtual functions. > > Also, sorry for the bikeshedding, but please can we keep the plural > names (so track_loads_p and track_stores_p). > >> + virtual bool cand_insns_empty_p (std::list &insns) = 0; > > Why does this need to be virtualised? I would it expect it to > just be insns.empty () on all targets. > >> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT off) = 0; >> + void ldp_fusion_bb (bb_info *bb); > > Just to check, are you planning to rename this function in a separate > patch? > Yes. >> + >> + ~pair_fusion() { } > > As with the ctor, perhaps we could put the contents of ldp_fusion_destroy > in here and then delete that function? > Addressed. >> +}; >> + >> +struct aarch64_pair_fusion : public pair_fusion >> +{ >> +public: > Addressed. > Nit: there shouldn't be a need for this as public is already the default > for structs. > >> + aarch64_pair_fusion () : pair_fusion () {}; > > This ctor looks pointless, the default one should do, so can we remove > this? > Addressed. >> + bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode, >> + bool load_p) override final >> + { >> + const bool fpsimd_op_p >> + = reload_completed >> + ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) >> + : (GET_MODE_CLASS (mem_mode) != MODE_INT >> + && (load_p || !aarch64_const_zero_rtx_p (reg_op))); >> + return fpsimd_op_p; > > Can this just directly return the boolean expression instead of storing > it into a variable first? > Addressed. >> + } >> + >> + bool pair_mem_promote_writeback_p (rtx pat) >> + { >> + if (reload_completed >> + && aarch64_ldp_writeback > 1 >> + && GET_CODE (pat) == PARALLEL >> + && XVECLEN (pat, 0) == 2) >> + return true; >> + >> + return false; >> + } >> + >> + bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, >> + machine_mode mode) >> + { >> + return aarch64_mem_ok_with_ldpstp_policy_model (first_mem, >> + load_p, >> + mode); >> + } >> + bool pair_operand_mode_ok_p (machine_mode mode); > > Why have you implemented this (one-line function) out-of-line but the > others are implemented inline? Please be consistent. > Addressed. >> + >> + rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p); >> + >> + bool pair_trailing_writeback_p () >> + { >> + return aarch64_ldp_writeback > 1; >> + } >> + bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, >> + machine_mode mem_mode) >> + { >> + return (load_p >> + ? aarch64_ldp_reg_operand (reg_op, mem_mode) >> + : aarch64_stp_reg_operand (reg_op, mem_mode)); >> + } >> + int pair_mem_alias_check_limit () >> + { >> + return aarch64_ldp_alias_check_limit; >> + } >> + bool handle_writeback_opportunities () >> + { >> + return aarch64_ldp_writeback; >> + } >> + bool track_load_p () >> + { >> + const bool track_loads >> + = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >> + return track_loads; >> + } > Addressed. > As above, can this just return the boolean expression directly without > the temporary? > >> + bool track_store_p () >> + { >> + const bool track_stores >> + = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >> + return track_stores; >> + } > Addressed. > Likewise. > >> + bool cand_insns_empty_p (std::list &insns) >> + { >> + return insns.empty(); >> + } > > As mentioned on the declaration, I don't see why this needs to be > virtualised. > >> + bool pair_mem_in_range_p (HOST_WIDE_INT off) >> + { >> + return (off < LDP_MIN_IMM || off > LDP_MAX_IMM); >> + } > > Hmm, this looks to test exactly the opposite condition from what the > name suggests, i.e. the implementation looks like what I would expect > for a function called pair_mem_out_of_range_p. > > Please can you invert the condition and fix up callers appropriately? > > This suggests you're not really thinking about the code you're writing, > please try to think a bit more carefully before posting patches. > Sorry for my mistake as I have overlooked it. >> +}; >> + >> + >> // State used by the pass for a given basic block. >> -struct ldp_bb_info >> +struct pair_fusion_bb_info >> { >> using def_hash = nofree_ptr_hash; >> using expr_key_t = pair_hash>; >> @@ -160,14 +274,17 @@ struct ldp_bb_info >> >> static const size_t obstack_alignment = sizeof (void *); >> bb_info *m_bb; >> + pair_fusion *bb_state; > Addressed. > Can we call this m_pass instead of bb_state? The pair_fusion class is > explicitly at the pass level rather than the bb level. > > Also I think this (and indeed m_bb, although that's pre-existing) could > be private members. Not sure there's a good reason for making them > public. > Addressed. > It might be slightly nicer to use a reference for m_pass unless there's > a good reason to use a pointer, although I don't feel strongly about > this. > >> >> - ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false) >> + pair_fusion_bb_info (bb_info *bb, >> + aarch64_pair_fusion *d) : m_bb (bb), > > This generic class needs to take the generic pass type, i.e. just a > pair_fusion * rather than an aarch64_pair_fusion *. > >> + bb_state (d), m_emitted_tombstone (false) >> { >> obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE, >> obstack_alignment, obstack_chunk_alloc, >> obstack_chunk_free); >> } >> - ~ldp_bb_info () >> + ~pair_fusion_bb_info () >> { >> obstack_free (&m_obstack, nullptr); >> >> @@ -177,10 +294,32 @@ struct ldp_bb_info >> bitmap_obstack_release (&m_bitmap_obstack); >> } >> } >> + void track_access (insn_info *, bool load, rtx mem); >> + void transform (); >> + void cleanup_tombstones (); > > I think most (or indeed all?) of the changes here shouldn't be needed. > AFAICS, there shouldn't be a need to: > - drop inline from these functions. > - change the accessibility of any members. > Addressed. > Plese can you drop these changes or explain why they're needed (from > here until the end of the class). > >> + void merge_pairs (insn_list_t &, insn_list_t &, >> + bool load_p, unsigned access_size); >> + void transform_for_base (int load_size, access_group &group); >> + >> + bool try_fuse_pair (bool load_p, unsigned access_size, >> + insn_info *i1, insn_info *i2); >> + >> + bool fuse_pair (bool load_p, unsigned access_size, >> + int writeback, >> + insn_info *i1, insn_info *i2, >> + base_cand &base, >> + const insn_range_info &move_range); >> >> - inline void track_access (insn_info *, bool load, rtx mem); >> - inline void transform (); >> - inline void cleanup_tombstones (); >> + void do_alias_analysis (insn_info *alias_hazards[4], >> + alias_walker *walkers[4], >> + bool load_p); >> + >> + void track_tombstone (int uid); >> + >> + bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); >> + >> + template >> + void traverse_base_map (Map &map); >> >> private: >> obstack m_obstack; >> @@ -191,30 +330,32 @@ private: >> bool m_emitted_tombstone; >> >> inline splay_tree_node *node_alloc (access_record *); >> +}; >> >> - template >> - inline void traverse_base_map (Map &map); >> - inline void transform_for_base (int load_size, access_group &group); >> - >> - inline void merge_pairs (insn_list_t &, insn_list_t &, >> - bool load_p, unsigned access_size); >> - >> - inline bool try_fuse_pair (bool load_p, unsigned access_size, >> - insn_info *i1, insn_info *i2); >> - >> - inline bool fuse_pair (bool load_p, unsigned access_size, >> - int writeback, >> - insn_info *i1, insn_info *i2, >> - base_cand &base, >> - const insn_range_info &move_range); >> - >> - inline void track_tombstone (int uid); >> +rtx aarch64_pair_fusion::gen_mem_pair (rtx *pats, > > Style nit: newline after rtx. > Addressed. >> + rtx writeback, >> + bool load_p) >> + { >> + rtx pair_pat; >> >> - inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); >> -}; >> + if (writeback) >> + { >> + auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]); >> + pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); >> + } >> + else if (load_p) >> + pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), >> + XEXP (pats[1], 0), >> + XEXP (pats[0], 1)); >> + else >> + pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), >> + XEXP (pats[0], 1), >> + XEXP (pats[1], 1)); >> + return pair_pat; > > Can the assignments just be direct returns here, please? Then > get rid of the temporary. > Addressed. >> + } >> >> splay_tree_node * >> -ldp_bb_info::node_alloc (access_record *access) >> +pair_fusion_bb_info::node_alloc (access_record *access) >> { >> using T = splay_tree_node; >> void *addr = obstack_alloc (&m_obstack, sizeof (T)); >> @@ -262,7 +403,7 @@ drop_writeback (rtx mem) >> // RTX_AUTOINC addresses. The interface is like strip_offset except we take a >> // MEM so that we know the mode of the access. >> static rtx >> -ldp_strip_offset (rtx mem, poly_int64 *offset) >> +pair_mem_strip_offset (rtx mem, poly_int64 *offset) > > I thought we discussed splitting the renaming into a separate patch? > >> { >> rtx addr = XEXP (mem, 0); >> >> @@ -332,6 +473,12 @@ ldp_operand_mode_ok_p (machine_mode mode) >> return reload_completed || mode != TImode; >> } >> >> +bool >> +aarch64_pair_fusion::pair_operand_mode_ok_p (machine_mode mode) >> +{ >> + return ldp_operand_mode_ok_p (mode); >> +} >> + >> // Given LFS (load_p, fpsimd_p, size) fields in FIELDS, encode these >> // into an integer for use as a hash table key. >> static int >> @@ -396,7 +543,7 @@ access_group::track (Alloc alloc_node, poly_int64 offset, insn_info *insn) >> // MEM_EXPR base (i.e. a tree decl) relative to which we can track the access. >> // LFS is used as part of the key to the hash table, see track_access. >> bool >> -ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> +pair_fusion_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) > > Again, please do any renaming in a separate patch. > Addressed. >> { >> if (!MEM_EXPR (mem) || !MEM_OFFSET_KNOWN_P (mem)) >> return false; >> @@ -412,9 +559,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> const machine_mode mem_mode = GET_MODE (mem); >> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); >> >> - // Punt on misaligned offsets. LDP/STP instructions require offsets to be a >> - // multiple of the access size, and we believe that misaligned offsets on >> - // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases. >> + // Punt on misaligned offsets. Paired memory access instructions require >> + // offsets to be a multiple of the access size, and we believe that >> + // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned >> + // offsets w.r.t. RTL bases. >> if (!multiple_p (offset, mem_size)) >> return false; >> >> @@ -438,46 +586,38 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) >> } >> >> // Main function to begin pair discovery. Given a memory access INSN, >> -// determine whether it could be a candidate for fusing into an ldp/stp, >> +// determine whether it could be a candidate for fusing into an pair mem, > > s/an pair mem/a paired access/ > >> // and if so, track it in the appropriate data structure for this basic >> // block. LOAD_P is true if the access is a load, and MEM is the mem >> // rtx that occurs in INSN. >> void >> -ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> +pair_fusion_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> { >> // We can't combine volatile MEMs, so punt on these. >> if (MEM_VOLATILE_P (mem)) >> return; >> >> - // Ignore writeback accesses if the param says to do so. >> - if (!aarch64_ldp_writeback >> + // Ignore writeback accesses if the param says to do so > Addressed. > This comment needs updating, I guess something like "if the hook says to > do so". But also please don't delete the full stop. > >> + if (!bb_state->handle_writeback_opportunities () >> && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC) >> return; >> >> const machine_mode mem_mode = GET_MODE (mem); >> - if (!ldp_operand_mode_ok_p (mem_mode)) >> + > > I don't think we need the extra whitespace here. > >> + if (!bb_state->pair_operand_mode_ok_p (mem_mode)) >> return; >> >> rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p); >> >> - // Ignore the access if the register operand isn't suitable for ldp/stp. >> - if (load_p >> - ? !aarch64_ldp_reg_operand (reg_op, mem_mode) >> - : !aarch64_stp_reg_operand (reg_op, mem_mode)) >> + if (!bb_state->pair_reg_operand_ok_p (load_p, reg_op, mem_mode)) >> return; >> - >> // We want to segregate FP/SIMD accesses from GPR accesses. >> // >> // Before RA, we use the modes, noting that stores of constant zero >> // operands use GPRs (even in non-integer modes). After RA, we use >> // the hard register numbers. >> - const bool fpsimd_op_p >> - = reload_completed >> - ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) >> - : (GET_MODE_CLASS (mem_mode) != MODE_INT >> - && (load_p || !aarch64_const_zero_rtx_p (reg_op))); >> - >> - // Note ldp_operand_mode_ok_p already rejected VL modes. >> + const bool fpsimd_op_p = bb_state->fpsimd_op_p (reg_op, mem_mode, load_p); >> + // Note pair_operand_mode_ok_p already rejected VL modes. >> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); >> const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size }; >> >> @@ -487,7 +627,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> poly_int64 mem_off; >> rtx addr = XEXP (mem, 0); >> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC; >> - rtx base = ldp_strip_offset (mem, &mem_off); >> + rtx base = pair_mem_strip_offset (mem, &mem_off); > > Again, let's split the renaming off into a separate patch. > Addressed. >> if (!REG_P (base)) >> return; >> >> @@ -506,8 +646,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> // elimination offset pre-RA, we should postpone forming pairs on such >> // accesses until after RA. >> // >> - // As it stands, addresses with offsets in range for LDR but not >> - // in range for LDP/STP are currently reloaded inefficiently, >> + // As it stands, addresses in range for an individual load/store but not >> + // for a paired access are currently reloaded inefficiently, >> // ending up with a separate base register for each pair. >> // >> // In theory LRA should make use of >> @@ -519,8 +659,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> // that calls targetm.legitimize_address_displacement. >> // >> // So for now, it's better to punt when we can't be sure that the >> - // offset is in range for LDP/STP. Out-of-range cases can then be >> - // handled after RA by the out-of-range LDP/STP peepholes. Eventually, it >> + // offset is in range for paired access. Out-of-range cases can then be >> + // handled after RA by the out-of-range PAIR MEM peepholes. Eventually, it >> // would be nice to handle known out-of-range opportunities in the >> // pass itself (for stack accesses, this would be in the post-RA pass). >> if (!reload_completed >> @@ -573,8 +713,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) >> gcc_unreachable (); // Base defs should be unique. >> } >> >> - // Punt on misaligned offsets. LDP/STP require offsets to be a multiple of >> - // the access size. >> + // Punt on misaligned offsets. Paired memory accesses require offsets >> + // to be a multiple of the access size. >> if (!multiple_p (mem_off, mem_size)) >> return; >> >> @@ -614,7 +754,7 @@ static bool no_ignore (insn_info *) { return false; } >> // making use of alias disambiguation. >> static insn_info * >> latest_hazard_before (insn_info *insn, rtx *ignore, >> - insn_info *ignore_insn = nullptr) >> + insn_info *ignore_insn = 0) > > Is this change needed? > Addressed. >> { >> insn_info *result = nullptr; >> >> @@ -1150,7 +1290,7 @@ extract_writebacks (bool load_p, rtx pats[2], int changed) >> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC; >> >> poly_int64 offset; >> - rtx this_base = ldp_strip_offset (mem, &offset); >> + rtx this_base = pair_mem_strip_offset (mem, &offset); >> gcc_assert (REG_P (this_base)); >> if (base_reg) >> gcc_assert (rtx_equal_p (base_reg, this_base)); >> @@ -1286,7 +1426,11 @@ find_trailing_add (insn_info *insns[2], >> >> off_hwi /= access_size; >> >> - if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM) >> + pair_fusion *pfuse; >> + aarch64_pair_fusion derived; >> + pfuse = &derived; > > Huh? We'll want to use the same instance of the pass structure (pair_fusion) > that gets created at the top level, so that should get passed around everywhere. > > In the case of find_trailing_add we probably don't want to add any more > parameters (as Segher noted there are already too many), so perhaps the > easiest thing would be to make find_trailing_add itself a (non-virtual) > member function of pair_fusion. > > Then here we can just query pair_mem_in_range_p directly. > > For try_promote_writeback itself you can either make it a non-virtual > member function of pair_fusion or just pass a reference to the pass > object in when you call it. > > Note also that you don't have to create a pointer of the parent type in > order to call a virtual function on a derived object, you can just call > it directly. > Addressed. >> + >> + if (pfuse->pair_mem_in_range_p (off_hwi)) >> return nullptr; > > Note of course this condition needs inverting when you invert the > implementation. > Addressed. >> >> auto dump_prefix = [&]() >> @@ -1328,7 +1472,7 @@ find_trailing_add (insn_info *insns[2], >> // We just emitted a tombstone with uid UID, track it in a bitmap for >> // this BB so we can easily identify it later when cleaning up tombstones. >> void >> -ldp_bb_info::track_tombstone (int uid) >> +pair_fusion_bb_info::track_tombstone (int uid) > > Again, please split the renaming off. Same with other instances below. > >> { >> if (!m_emitted_tombstone) >> { >> @@ -1528,7 +1672,7 @@ fixup_debug_uses (obstack_watermark &attempt, >> gcc_checking_assert (GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) >> == RTX_AUTOINC); >> >> - base = ldp_strip_offset (mem, &offset); >> + base = pair_mem_strip_offset (mem, &offset); >> gcc_checking_assert (REG_P (base) && REGNO (base) == base_regno); >> } >> fixup_debug_use (attempt, use, def, base, offset); >> @@ -1664,7 +1808,7 @@ fixup_debug_uses (obstack_watermark &attempt, >> // BASE gives the chosen base candidate for the pair and MOVE_RANGE is >> // a singleton range which says where to place the pair. >> bool >> -ldp_bb_info::fuse_pair (bool load_p, >> +pair_fusion_bb_info::fuse_pair (bool load_p, >> unsigned access_size, >> int writeback, >> insn_info *i1, insn_info *i2, >> @@ -1800,7 +1944,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> { >> if (dump_file) >> fprintf (dump_file, >> - " ldp: i%d has wb but subsequent i%d has non-wb " >> + " load pair: i%d has wb but subsequent i%d has non-wb " >> "update of base (r%d), dropping wb\n", >> insns[0]->uid (), insns[1]->uid (), base_regno); >> gcc_assert (writeback_effect); >> @@ -1823,7 +1967,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> } >> >> // If either of the original insns had writeback, but the resulting pair insn >> - // does not (can happen e.g. in the ldp edge case above, or if the writeback >> + // does not (can happen e.g. in the load pair edge case above, or if the writeback >> // effects cancel out), then drop the def(s) of the base register as >> // appropriate. >> // >> @@ -1842,7 +1986,7 @@ ldp_bb_info::fuse_pair (bool load_p, >> // update of the base register and try and fold it in to make this into a >> // writeback pair. >> insn_info *trailing_add = nullptr; >> - if (aarch64_ldp_writeback > 1 >> + if (bb_state->pair_trailing_writeback_p () >> && !writeback_effect >> && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1, >> XEXP (pats[0], 0), nullptr) >> @@ -1863,14 +2007,14 @@ ldp_bb_info::fuse_pair (bool load_p, >> } >> >> // Now that we know what base mem we're going to use, check if it's OK >> - // with the ldp/stp policy. >> + // with the pair mem policy. >> rtx first_mem = XEXP (pats[0], load_p); >> - if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem, >> - load_p, >> - GET_MODE (first_mem))) >> + if (!bb_state->pair_mem_ok_with_policy (first_mem, >> + load_p, >> + GET_MODE (first_mem))) >> { >> if (dump_file) >> - fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n", >> + fprintf (dump_file, "punting on pair (%d,%d), pair mem policy says no\n", >> i1->uid (), i2->uid ()); >> return false; >> } >> @@ -1878,21 +2022,10 @@ ldp_bb_info::fuse_pair (bool load_p, >> rtx reg_notes = combine_reg_notes (first, second, load_p); >> >> rtx pair_pat; >> - if (writeback_effect) >> - { >> - auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]); >> - pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); >> - } >> - else if (load_p) >> - pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), >> - XEXP (pats[1], 0), >> - XEXP (pats[0], 1)); >> - else >> - pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), >> - XEXP (pats[0], 1), >> - XEXP (pats[1], 1)); >> >> + pair_pat = bb_state->gen_mem_pair (pats, writeback_effect, load_p); >> insn_change *pair_change = nullptr; >> + >> auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) { >> rtx_insn *rti = change->insn ()->rtl (); >> validate_unshare_change (rti, &PATTERN (rti), pair_pat, true); >> @@ -2133,15 +2266,6 @@ load_modified_by_store_p (insn_info *load, >> return false; >> } >> >> -// Virtual base class for load/store walkers used in alias analysis. >> -struct alias_walker >> -{ >> - virtual bool conflict_p (int &budget) const = 0; >> - virtual insn_info *insn () const = 0; >> - virtual bool valid () const = 0; >> - virtual void advance () = 0; >> -}; >> - >> // Implement some common functionality used by both store_walker >> // and load_walker. >> template >> @@ -2259,13 +2383,13 @@ public: >> // >> // We try to maintain the invariant that if a walker becomes invalid, we >> // set its pointer to null. >> -static void >> -do_alias_analysis (insn_info *alias_hazards[4], >> +void >> +pair_fusion_bb_info::do_alias_analysis (insn_info *alias_hazards[4], > > It would probably make more sense for this to be a (non-virtual) member > function of the pair_fusion class, since it doesn't need to access the > BB-specific state. > Addressed. >> alias_walker *walkers[4], >> bool load_p) >> { >> const int n_walkers = 2 + (2 * !load_p); >> - int budget = aarch64_ldp_alias_check_limit; >> + int budget = bb_state->pair_mem_alias_check_limit (); >> >> auto next_walker = [walkers,n_walkers](int current) -> int { >> for (int j = 1; j <= n_walkers; j++) >> @@ -2365,7 +2489,7 @@ get_viable_bases (insn_info *insns[2], >> { >> const bool is_lower = (i == reversed); >> poly_int64 poly_off; >> - rtx base = ldp_strip_offset (cand_mems[i], &poly_off); >> + rtx base = pair_mem_strip_offset (cand_mems[i], &poly_off); >> if (GET_RTX_CLASS (GET_CODE (XEXP (cand_mems[i], 0))) == RTX_AUTOINC) >> writeback |= (1 << i); >> >> @@ -2373,7 +2497,7 @@ get_viable_bases (insn_info *insns[2], >> continue; >> >> // Punt on accesses relative to eliminable regs. See the comment in >> - // ldp_bb_info::track_access for a detailed explanation of this. >> + // pair_fusion_bb_info::track_access for a detailed explanation of this. >> if (!reload_completed >> && (REGNO (base) == FRAME_POINTER_REGNUM >> || REGNO (base) == ARG_POINTER_REGNUM)) >> @@ -2397,7 +2521,11 @@ get_viable_bases (insn_info *insns[2], >> if (!is_lower) >> base_off--; >> >> - if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM) >> + pair_fusion *pfuse; >> + aarch64_pair_fusion derived; >> + pfuse = &derived; > > Again, please don't do this but instead use the same instance > everywhere. > Addressed. >> + >> + if (pfuse->pair_mem_in_range_p (base_off)) >> continue; >> >> use_info *use = find_access (insns[i]->uses (), REGNO (base)); >> @@ -2454,12 +2582,12 @@ get_viable_bases (insn_info *insns[2], >> } >> >> // Given two adjacent memory accesses of the same size, I1 and I2, try >> -// and see if we can merge them into a ldp or stp. >> +// and see if we can merge them into a paired accesses load and store. > > "into a load pair or store pair" or "into a paired access" if you > prefer. > Addressed. >> // >> // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true >> // if the accesses are both loads, otherwise they are both stores. >> bool >> -ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> +pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> insn_info *i1, insn_info *i2) >> { >> if (dump_file) >> @@ -2494,7 +2622,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, >> { >> if (dump_file) >> fprintf (dump_file, >> - "punting on ldp due to reg conflcits (%d,%d)\n", >> + "punting on pair mem load due to reg conflcits (%d,%d)\n", >> insns[0]->uid (), insns[1]->uid ()); >> return false; >> } >> @@ -2843,7 +2971,7 @@ debug (const insn_list_t &l) >> // we can't re-order them anyway, so provided earlier passes have cleaned up >> // redundant loads, we shouldn't miss opportunities by doing this. >> void >> -ldp_bb_info::merge_pairs (insn_list_t &left_list, >> +pair_fusion_bb_info::merge_pairs (insn_list_t &left_list, >> insn_list_t &right_list, >> bool load_p, >> unsigned access_size) >> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, >> // of accesses. If we find two sets of adjacent accesses, call >> // merge_pairs. >> void >> -ldp_bb_info::transform_for_base (int encoded_lfs, >> - access_group &group) >> +pair_fusion_bb_info::transform_for_base (int encoded_lfs, >> + access_group &group) >> { >> const auto lfs = decode_lfs (encoded_lfs); >> const unsigned access_size = lfs.size; >> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, >> access.cand_insns, >> lfs.load_p, >> access_size); >> - skip_next = access.cand_insns.empty (); >> + skip_next = bb_state->cand_insns_empty_p (access.cand_insns); > > As above, why is this needed? For rs6000 we want to return always true. as load store pair that are to be merged with 8/16 16/32 32/64 is occuring for rs6000. And we want load store pair to 8/16 32/64. Thats why we want to generate always true for rs6000 to skip pairs as above. > >> } >> prev_access = &access; >> } >> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, >> // and remove all the tombstone insns, being sure to reparent any uses >> // of mem to previous defs when we do this. >> void >> -ldp_bb_info::cleanup_tombstones () >> +pair_fusion_bb_info::cleanup_tombstones () >> { >> // No need to do anything if we didn't emit a tombstone insn for this BB. >> if (!m_emitted_tombstone) >> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () >> >> template >> void >> -ldp_bb_info::traverse_base_map (Map &map) >> +pair_fusion_bb_info::traverse_base_map (Map &map) >> { >> for (auto kv : map) >> { >> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) >> } >> >> void >> -ldp_bb_info::transform () >> +pair_fusion_bb_info::transform () >> { >> traverse_base_map (expr_map); >> traverse_base_map (def_map); >> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) >> // for load/store candidates. If running after RA, also try and promote >> // non-writeback pairs to use writeback addressing. Then try to fuse >> // candidates into pairs. >> -void ldp_fusion_bb (bb_info *bb) >> +void pair_fusion::ldp_fusion_bb (bb_info *bb) >> { >> - const bool track_loads >> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >> - const bool track_stores >> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; >> + const bool track_loads = track_load_p (); >> + const bool track_stores = track_store_p (); >> >> - ldp_bb_info bb_state (bb); > > This: > >> + aarch64_pair_fusion derived; > > can be deleted and then: > >> + pair_fusion_bb_info bb_info (bb, &derived); > > can just be: > > pair_fusion_bb_info bb_info (bb, this); > > (or you can pass *this if you make bb_info take a reference). > > I don't think there's a particular need to change the variable name > (bb_state -> bb_info). I chose the former because it doens't clash > with the RTL-SSA structure of the same name as the latter. > Addressed. >> >> for (auto insn : bb->nondebug_insns ()) >> { >> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) >> continue; >> >> rtx pat = PATTERN (rti); >> - if (reload_completed >> - && aarch64_ldp_writeback > 1 >> - && GET_CODE (pat) == PARALLEL >> - && XVECLEN (pat, 0) == 2) >> + if (pair_mem_promote_writeback_p (pat)) >> try_promote_writeback (insn); > > It looks like try_promote_writeback itself will need some further work > to make it target-independent. I suppose this check: > > auto rti = insn->rtl (); > const auto attr = get_attr_ldpstp (rti); > if (attr == LDPSTP_NONE) > return; > > bool load_p = (attr == LDPSTP_LDP); > gcc_checking_assert (load_p || attr == LDPSTP_STP); > > will need to become part of the pair_mem_promote_writeback_p hook that you > added, potentially changing it to return a boolean for load_p. > > Then I guess we will need hooks for destructuring the pair insn and > another hook to wrap aarch64_gen_writeback_pair. > Addressed. >> >> if (GET_CODE (pat) != SET) >> continue; >> >> if (track_stores && MEM_P (XEXP (pat, 0))) >> - bb_state.track_access (insn, false, XEXP (pat, 0)); >> + bb_info.track_access (insn, false, XEXP (pat, 0)); >> else if (track_loads && MEM_P (XEXP (pat, 1))) >> - bb_state.track_access (insn, true, XEXP (pat, 1)); >> + bb_info.track_access (insn, true, XEXP (pat, 1)); >> } >> >> - bb_state.transform (); >> - bb_state.cleanup_tombstones (); >> + bb_info.transform (); >> + bb_info.cleanup_tombstones (); >> } >> >> void ldp_fusion () >> { >> ldp_fusion_init (); >> + pair_fusion *pfuse; >> + aarch64_pair_fusion derived; >> + pfuse = &derived; > > This is indeed the one place where I think it is acceptable to > instantiate aarch64_pair_fusion. But again there's no need to create a > pointer to the parent class, just call any function you like directly. > Addressed. >> >> for (auto bb : crtl->ssa->bbs ()) >> - ldp_fusion_bb (bb); >> + pfuse->ldp_fusion_bb (bb); > > I think even the code to iterate over bbs should itself be a member > function of pair_fusion (say "run") and then that becomes part of the > generic code. > > So this function would just become: > > aarch64_pair_fusion pass; > pass.run (); > > and could be inlined into the caller. > Addressed. > Perhaps you could also add an early return like: > > if (!track_loads_p () && !track_stores_p ()) > return; > > in pair_fusion::run () and then remove the corresponding code from > pass_ldp_fusion::gate? > Addressed. >> >> ldp_fusion_destroy (); >> } >> -- >> 2.39.3 >> > > Thanks, > Alex Thanks & Regards Ajit