From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 2AB2E3858C50 for ; Thu, 30 May 2024 12:36:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2AB2E3858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2AB2E3858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717072570; cv=none; b=UHLMnVGVOt/8qB7lqz/+O0az8bLRSy/lM1ptnMiezRZPU04finxqeGX/DOcdkcxCCsoHAMl8TL+hvTbvqemu2VIolF5o44Zv5AiHr1zKz3zw1k6EaqX1BLb25L0or83Fl2dA7sX+OPMErHXsgXNTMKgiP6Q35p/aAsFKFff8UNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717072570; c=relaxed/simple; bh=K7pzGpVB1QmgWy+do5+pTNXfeq+dwrgxtEyuDxYcSBo=; h=DKIM-Signature:Message-ID:Date:Subject:To:From:MIME-Version; b=QaRYueZvVwt6xEJ4yD6pqqkgRwAIPYYb4R7wy+k6AkCgAGUm9gZ7ydUPiZg6fr0kjQY0hf0RhEoM1JkOmKb7j900JMntKvx2mX9zE5fp/C3mJEygCBXFzHYI35Arx2kTKAv+vQv6fR2SMTKkKCRirWqV6U2D0wrN9HiX5poh+6w= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 44UCRg2B020238; Thu, 30 May 2024 12:36:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=content-transfer-encoding : content-type : date : from : in-reply-to : message-id : mime-version : references : subject : to; s=pp1; bh=34oiGnHIOBTZmIefw4JrS/AMgOZ1SFJ8QrnQr4U9mqE=; b=icMU2zrc5m5OgaZX4Hd9b1BsdagRuWeNOXVk+G0r69RAOd1mjpqifufsg1O18WhTuL90 ZFCt4LBYsUC/cAd8x46C5G1jrZSJa/ubAgewp9inxJeCZwyH8IVwwSna4UUhoB3Q52Mm 7k6cDp3V7Fg+u04CIfYyPwouxpKquBAsSntRb64Wbv2AreA/86D6klx3tMCApBNDLsKy RY1Hp25s1mJj9YkPBj+q9xEMPKjee1sMsnVcybys2pksJhWWNjnMmFJwrZ+JUcYBrQTD d9S527c52LBZw/27IJ4iQTYfvG79GxlQY9ydB5TMOBSw1a28PTPhyloQg7dqMgKOaPEF EQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yesfsr16d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 30 May 2024 12:36:05 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 44UCa4qn006890; Thu, 30 May 2024 12:36:04 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yesfsr16a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 30 May 2024 12:36:04 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 44UBmwwn029003; Thu, 30 May 2024 12:36:04 GMT Received: from smtprelay07.dal12v.mail.ibm.com ([172.16.1.9]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3ydpaysves-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 30 May 2024 12:36:04 +0000 Received: from smtpav05.dal12v.mail.ibm.com (smtpav05.dal12v.mail.ibm.com [10.241.53.104]) by smtprelay07.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 44UCa1h040501710 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 30 May 2024 12:36:03 GMT Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1CCC258052; Thu, 30 May 2024 12:36:01 +0000 (GMT) Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B32D758056; Thu, 30 May 2024 12:35:57 +0000 (GMT) Received: from [9.43.87.236] (unknown [9.43.87.236]) by smtpav05.dal12v.mail.ibm.com (Postfix) with ESMTP; Thu, 30 May 2024 12:35:57 +0000 (GMT) Message-ID: Date: Thu, 30 May 2024 18:05:56 +0530 User-Agent: Mozilla Thunderbird Subject: Re: [Patch, aarch64, middle-end\ v4: Move pair_fusion pass from aarch64 to middle-end To: Alex Coplan , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , Peter Bergner , David Edelsohn , gcc-patches , richard.sandiford@arm.com References: <76142f4e-993e-4aae-b9c6-793dca2643a5@linux.ibm.com> Content-Language: en-US From: Ajit Agarwal In-Reply-To: Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: wjBGKnB3zc-0Oms-sZroTX_0aeheK1Df X-Proofpoint-ORIG-GUID: 6DoOjlAZiC3lqKs7Ej9k55qY8kB-BLNr Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.650,FMLib:17.12.28.16 definitions=2024-05-30_09,2024-05-28_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 phishscore=0 adultscore=0 mlxlogscore=999 suspectscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 impostorscore=0 bulkscore=0 priorityscore=1501 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2405010000 definitions=main-2405300095 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_MANYTO,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Richard: On 30/05/24 4:44 pm, Richard Sandiford wrote: > Thanks for the update. Some comments below, but looks very close > to ready. > Thanks a lot. > Ajit Agarwal writes: >> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc >> new file mode 100644 >> index 00000000000..060fdcccc95 >> --- /dev/null >> +++ b/gcc/pair-fusion.cc >> @@ -0,0 +1,3012 @@ >> +// Pass to fuse adjacent loads/stores into paired memory accesses. >> +// Copyright (C) 2024 Free Software Foundation, Inc. > > This should probably be 2023-2024, since it's based on code > contributed in 2023. > Addressed in v5 of the patch. >> +// >> +// This file is part of GCC. >> +// >> +// GCC is free software; you can redistribute it and/or modify it >> +// under the terms of the GNU General Public License as published by >> +// the Free Software Foundation; either version 3, or (at your option) >> +// any later version. >> +// >> +// GCC is distributed in the hope that it will be useful, but >> +// WITHOUT ANY WARRANTY; without even the implied warranty of >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> +// General Public License for more details. >> +// >> +// You should have received a copy of the GNU General Public License >> +// along with GCC; see the file COPYING3. If not see >> +// . >> + >> +#define INCLUDE_ALGORITHM >> +#define INCLUDE_FUNCTIONAL >> +#define INCLUDE_LIST >> +#define INCLUDE_TYPE_TRAITS >> +#include "config.h" >> +#include "system.h" >> +#include "coretypes.h" >> +#include "backend.h" >> +#include "rtl.h" >> +#include "df.h" >> +#include "rtl-iter.h" >> +#include "rtl-ssa.h" >> +#include "cfgcleanup.h" >> +#include "tree-pass.h" >> +#include "ordered-hash-map.h" >> +#include "tree-dfa.h" >> +#include "fold-const.h" >> +#include "tree-hash-traits.h" >> +#include "print-tree.h" >> +#include "pair-fusion.h" >> + >> +using namespace rtl_ssa; >> + >> +// We pack these fields (load_p, fpsimd_p, and size) into an integer >> +// (LFS) which we use as part of the key into the main hash tables. >> +// >> +// The idea is that we group candidates together only if they agree on >> +// the fields below. Candidates that disagree on any of these >> +// properties shouldn't be merged together. >> +struct lfs_fields >> +{ >> + bool load_p; >> + bool fpsimd_p; >> + unsigned size; >> +}; >> + >> +using insn_list_t = std::list; >> + >> +// Information about the accesses at a given offset from a particular >> +// base. Stored in an access_group, see below. >> +struct access_record >> +{ >> + poly_int64 offset; >> + std::list cand_insns; >> + std::list::iterator place; >> + >> + access_record (poly_int64 off) : offset (off) {} >> +}; >> + >> +// A group of accesses where adjacent accesses could be ldp/stp >> +// candidates. The splay tree supports efficient insertion, >> +// while the list supports efficient iteration. >> +struct access_group >> +{ >> + splay_tree tree; >> + std::list list; >> + >> + template >> + inline void track (Alloc node_alloc, poly_int64 offset, insn_info *insn); >> +}; >> + >> +// Test if this base candidate is viable according to HAZARDS. >> +bool base_cand::viable () const > > Formating nit, should be: > > bool > base_cand::viable () const > Addressed in v5 of the patch. >> +{ >> + return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]); >> +} >> [...] >> +void >> +pair_fusion_bb_info::transform () >> +{ >> + traverse_base_map (expr_map); >> + traverse_base_map (def_map); >> +} >> + >> +// the base register which we can fold in to make this pair use >> +// a writeback addressing mode. > > The first line of this comment is missing. It should be: > > // Given an existing pair insn INSN, look for a trailing update of > Addressed in v5 of the patch. >> [...] >> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h >> new file mode 100644 >> index 00000000000..f295fdbdb8f >> --- /dev/null >> +++ b/gcc/pair-fusion.h >> @@ -0,0 +1,195 @@ >> +// Pass to fuse adjacent loads/stores into paired memory accesses. >> +// >> +// This file contains the definition of the virtual base class which is >> +// overriden by targets that make use of the pass. >> +// >> +// Copyright (C) 2024 Free Software Foundation, Inc. > > 2023-2024 here too > Addressed in v5 of the patch. >> +// >> +// This file is part of GCC. >> +// >> +// GCC is free software; you can redistribute it and/or modify it >> +// under the terms of the GNU General Public License as published by >> +// the Free Software Foundation; either version 3, or (at your option) >> +// any later version. >> +// >> +// GCC is distributed in the hope that it will be useful, but >> +// WITHOUT ANY WARRANTY; without even the implied warranty of >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> +// General Public License for more details. >> +// >> +// You should have received a copy of the GNU General Public License >> +// along with GCC; see the file COPYING3. If not see >> +// . >> + >> +namespace rtl_ssa { >> + class def_info; >> + class insn_info; >> + class insn_range_info; >> + class bb_info; >> +} >> + >> +// Information about a potential base candidate, used in try_fuse_pair. >> +// There may be zero, one, or two viable RTL bases for a given pair. >> +struct base_cand >> +{ >> + // DEF is the def of the base register to be used by the pair. >> + rtl_ssa::def_info *def; >> + >> + // FROM_INSN is -1 if the base candidate is already shared by both >> + // candidate insns. Otherwise it holds the index of the insn from >> + // which the base originated. >> + // >> + // In the case that the base is shared, either DEF is already used >> + // by both candidate accesses, or both accesses see different versions >> + // of the same regno, in which case DEF is the def consumed by the >> + // first candidate access. >> + int from_insn; >> + >> + // To form a pair, we do so by moving the first access down and the second >> + // access up. To determine where to form the pair, and whether or not >> + // it is safe to form the pair, we track instructions which cannot be >> + // re-ordered past due to either dataflow or alias hazards. >> + // >> + // Since we allow changing the base used by an access, the choice of >> + // base can change which instructions act as re-ordering hazards for >> + // this pair (due to different dataflow). We store the initial >> + // dataflow hazards for this choice of base candidate in HAZARDS. >> + // >> + // These hazards act as re-ordering barriers to each candidate insn >> + // respectively, in program order. >> + // >> + // Later on, when we take alias analysis into account, we narrow >> + // HAZARDS accordingly. >> + rtl_ssa::insn_info *hazards[2]; >> + >> + base_cand (rtl_ssa::def_info *def, int insn) >> + : def (def), from_insn (insn), hazards {nullptr, nullptr} {} >> + >> + base_cand (rtl_ssa::def_info *def) : base_cand (def, -1) {} >> + >> + // Test if this base candidate is viable according to HAZARDS. >> + inline bool viable () const; > > This shouldn't be inline, now that it's defined in the .cc file. > Same for all other uses of "inline" in the file. > Addressed in v5 of the patch. > Richard > Thanks & Regards Ajit >> +}; >> + >> +struct alias_walker; >> + >> +// When querying should_handle_writeback, this enum is used to >> +// qualify which opportunities we are asking about. >> +enum class writeback { >> + // Only those writeback opportunities that arise from existing >> + // auto-increment accesses. >> + EXISTING, >> + >> + // All writeback opportunities, including those that involve folding >> + // base register updates into a non-writeback pair. >> + ALL >> +}; >> + >> +// This class can be overriden by targets to give a pass that fuses >> +// adjacent loads and stores into load/store pair instructions. >> +// >> +// The target can override the various virtual functions to customize >> +// the behaviour of the pass as appropriate for the target. >> +struct pair_fusion { >> + pair_fusion (); >> + >> + // Given: >> + // - an rtx REG_OP, the non-memory operand in a load/store insn, >> + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and >> + // - a boolean LOAD_P (true iff the insn is a load), then: >> + // return true if the access should be considered an FP/SIMD access. >> + // Such accesses are segregated from GPR accesses, since we only want >> + // to form pairs for accesses that use the same register file. >> + virtual bool fpsimd_op_p (rtx, machine_mode, bool) >> + { >> + return false; >> + } >> + >> + // Return true if we should consider forming pairs from memory >> + // accesses with operand mode MODE at this stage in compilation. >> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; >> + >> + // Return true iff REG_OP is a suitable register operand for a paired >> + // memory access, where LOAD_P is true if we're asking about loads and >> + // false for stores. MODE gives the mode of the operand. >> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, >> + machine_mode mode) = 0; >> + >> + // Return alias check limit. >> + // This is needed to avoid unbounded quadratic behaviour when >> + // performing alias analysis. >> + virtual int pair_mem_alias_check_limit () = 0; >> + >> + // Return true if we should try to handle writeback opportunities. >> + // WHICH determines the kinds of writeback opportunities the caller >> + // is asking about. >> + virtual bool should_handle_writeback (enum writeback which) = 0; >> + >> + // Given BASE_MEM, the mem from the lower candidate access for a pair, >> + // and LOAD_P (true if the access is a load), check if we should proceed >> + // to form the pair given the target's code generation policy on >> + // paired accesses. >> + virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0; >> + >> + // Generate the pattern for a paired access. PATS gives the patterns >> + // for the individual memory accesses (which by this point must share a >> + // common base register). If WRITEBACK is non-NULL, then this rtx >> + // describes the update to the base register that should be performed by >> + // the resulting insn. LOAD_P is true iff the accesses are loads. >> + virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0; >> + >> + // Return true if INSN is a paired memory access. If so, set LOAD_P to >> + // true iff INSN is a load pair. >> + virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0; >> + >> + // Return true if we should track loads. >> + virtual bool track_loads_p () >> + { >> + return true; >> + } >> + >> + // Return true if we should track stores. >> + virtual bool track_stores_p () >> + { >> + return true; >> + } >> + >> + // Return true if OFFSET is in range for a paired memory access. >> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0; >> + >> + // Given a load/store pair insn in PATTERN, unpack the insn, storing >> + // the register operands in REGS, and returning the mem. LOAD_P is >> + // true for loads and false for stores. >> + virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0; >> + >> + // Given a pair mem in MEM, register operands in REGS, and an rtx >> + // representing the effect of writeback on the base register in WB_EFFECT, >> + // return an insn representing a writeback variant of this pair. >> + // LOAD_P is true iff the pair is a load. >> + // This is used when promoting existing non-writeback pairs to writeback >> + // variants. >> + virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem, >> + rtx regs[2], bool load_p) = 0; >> + >> + inline void process_block (rtl_ssa::bb_info *bb); >> + inline rtl_ssa::insn_info *find_trailing_add (rtl_ssa::insn_info *insns[2], >> + const rtl_ssa::insn_range_info >> + &pair_range, >> + int initial_writeback, >> + rtx *writeback_effect, >> + rtl_ssa::def_info **add_def, >> + rtl_ssa::def_info *base_def, >> + poly_int64 initial_offset, >> + unsigned access_size); >> + inline int get_viable_bases (rtl_ssa::insn_info *insns[2], >> + vec &base_cands, >> + rtx cand_mems[2], >> + unsigned access_size, >> + bool reversed); >> + inline void do_alias_analysis (rtl_ssa::insn_info *alias_hazards[4], >> + alias_walker *walkers[4], >> + bool load_p); >> + inline void try_promote_writeback (rtl_ssa::insn_info *insn, bool load_p); >> + void run (); >> + ~pair_fusion (); >> +};