From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 11FC4385841F for ; Thu, 24 Feb 2022 06:33:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 11FC4385841F Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 21O6Pt87020928; Thu, 24 Feb 2022 06:33:19 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3edpjv2sxj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 24 Feb 2022 06:33:19 +0000 Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 21O6WQV7016602; Thu, 24 Feb 2022 06:33:18 GMT Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0a-001b2d01.pphosted.com with ESMTP id 3edpjv2swm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 24 Feb 2022 06:33:18 +0000 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 21O6CELe015770; Thu, 24 Feb 2022 06:33:17 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma01wdc.us.ibm.com with ESMTP id 3ear6aq2je-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 24 Feb 2022 06:33:17 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 21O6XHso49480120 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 24 Feb 2022 06:33:17 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1DAFDAE05F; Thu, 24 Feb 2022 06:33:17 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9D9A3AE062; Thu, 24 Feb 2022 06:33:16 +0000 (GMT) Received: from genoa (unknown [9.40.192.157]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTPS; Thu, 24 Feb 2022 06:33:16 +0000 (GMT) From: Jiufu Guo To: Jiufu Guo via Gcc-patches Cc: Segher Boessenkool , wschmidt@linux.ibm.com, jlaw@tachyum.com, Richard Biener , dje.gcc@gmail.com Subject: Re: [PATCH] Check if loading const from mem is faster References: <20220222065313.2040127-1-guojiufu@linux.ibm.com> <70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr> <1d471fba-a966-3e90-92ce-ae4707fe53b6@linux.ibm.com> <7ns67pr-q7q-rps-136-37pn481515q5@fhfr.qr> <20220223211406.GH614@gate.crashing.org> Date: Thu, 24 Feb 2022 14:33:09 +0800 In-Reply-To: (Jiufu Guo via Gcc-patches's message of "Thu, 24 Feb 2022 13:56:49 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Jzm36W-Q-7kLxRr5HiyD3577flqk5viU X-Proofpoint-GUID: uNNcoiCW6X4eLntBs129Ko1xx6Mn5lzx X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-02-23_09,2022-02-23_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 suspectscore=0 adultscore=0 clxscore=1015 mlxscore=0 malwarescore=0 bulkscore=0 spamscore=0 phishscore=0 mlxlogscore=999 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2202240037 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Feb 2022 06:33:23 -0000 Jiufu Guo via Gcc-patches writes: > Segher Boessenkool writes: > >> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote: >>> I'm assuming we're always dealing with >>> >>> (set (reg:MODE ..) ) >>> >>> here and CSE is not substituting into random places of an >>> instruction(?). I don't know what 'rtx_cost' should evaluate >>> to for a constant, if it should implicitely evaluate the cost >>> of putting the result into a register for example. >> >> rtx_cost is no good here (and in most places). rtx_cost should be 0 >> for anything that is used as input in a machine instruction -- but you >> need much more context to determine that. insn_cost is much simpler and >> much easier to use. >> >>> Using RTX_COST with SET and 1 at least looks no worse than using >>> your proposed new target hook and comparing it with the original >>> unfolded src (again with SET and 1). >> >> It is required to generate valid instructions no matter what, before >> the pass has finished that is. On all more modern architectures it is >> futile to think you can usefully consider the cost of an RTL expression >> and derive a real-world cost of the generated code from that. > > Thanks Segher for pointing out these! Here is another reason that I > did not use rtx_cost: in a few passes, there are codes to check the > constants and store them in constant pool. I'm thinking to integerate > those codes in a consistent way. Hi Segher, Richard! I'm thinking the way like: For a constant, 1. if the constant could be used as an immediate for the instruction, then retreated as an operand; 2. otherwise if the constant can not be stored into a constant pool, then handle through instructions; 3. if it is faster to access constant from pool, then emit constant as data(.rodata); 4. otherwise, handle the constant by instructions. And to store the constant into a pool, besides force_const_mem, create reference (toc) may be needed on some platforms. For this particular issue in CSE, there is already code that tries to put constant into a pool (invoke force_const_mem). While the code is too late. So, we may check the constant earlier and store it into constant pool if profitable. And another thing as Segher pointed out, CSE is doing too much work. It may be ok to separate the constant handling logic from CSE. I update a new version patch as follow (did not seprate CSE): Thanks for the comments and suggestions again! BR, Jiufu --- gcc/config/rs6000/rs6000.cc | 39 ++++++++++++++----- gcc/cse.cc | 36 ++++++++--------- gcc/doc/tm.texi | 5 +++ gcc/doc/tm.texi.in | 2 + gcc/target.def | 8 ++++ gcc/targhooks.cc | 6 +++ gcc/targhooks.h | 1 + .../gcc.target/powerpc/medium_offset.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr63281.c | 14 +++++++ gcc/testsuite/gcc.target/powerpc/pr93012.c | 2 +- 10 files changed, 84 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index d7a7cfe860f..0a8f487d516 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1361,6 +1361,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_CANNOT_FORCE_CONST_MEM #define TARGET_CANNOT_FORCE_CONST_MEM rs6000_cannot_force_const_mem +#undef TARGET_FASTER_LOADING_CONSTANT +#define TARGET_FASTER_LOADING_CONSTANT rs6000_faster_loading_const + #undef TARGET_DELEGITIMIZE_ADDRESS #define TARGET_DELEGITIMIZE_ADDRESS rs6000_delegitimize_address @@ -9684,8 +9687,8 @@ rs6000_init_stack_protect_guard (void) static bool rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) { - if (GET_CODE (x) == HIGH - && GET_CODE (XEXP (x, 0)) == UNSPEC) + /* Exclude CONSTANT HIGH part. */ + if (GET_CODE (x) == HIGH) return true; /* A TLS symbol in the TOC cannot contain a sum. */ @@ -10483,6 +10486,30 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode) return false; } + +/* Implement TARGET_FASTER_LOADING_CONSTANT. */ + +static bool +rs6000_faster_loading_const (machine_mode mode, rtx dest, rtx src) +{ + gcc_assert (CONSTANT_P (src)); + + if (GET_MODE_CLASS (mode) == MODE_CC || mode == VOIDmode) + return false; + if (GET_CODE (src) == HIGH) + return false; + if (toc_relative_expr_p (src, false, NULL, NULL)) + return false; + if (rs6000_cannot_force_const_mem (mode, src)) + return false; + + if (REG_P (dest) && FP_REGNO_P (REGNO (dest))) + return true; + if (!CONST_INT_P (src)) + return true; + return num_insns_constant (src, mode) > (TARGET_PCREL ? 1 : 2); +} + /* Emit a move from SOURCE to DEST in mode MODE. */ void rs6000_emit_move (rtx dest, rtx source, machine_mode mode) @@ -10860,13 +10887,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode) operands[1] = create_TOC_reference (operands[1], operands[0]); else if (mode == Pmode && CONSTANT_P (operands[1]) - && GET_CODE (operands[1]) != HIGH - && ((REG_P (operands[0]) - && FP_REGNO_P (REGNO (operands[0]))) - || !CONST_INT_P (operands[1]) - || (num_insns_constant (operands[1], mode) - > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2))) - && !toc_relative_expr_p (operands[1], false, NULL, NULL) + && rs6000_faster_loading_const (mode, operands[0], operands[1]) && (TARGET_CMODEL == CMODEL_SMALL || can_create_pseudo_p () || (REG_P (operands[0]) diff --git a/gcc/cse.cc b/gcc/cse.cc index a18b599d324..53a7b3556b3 100644 --- a/gcc/cse.cc +++ b/gcc/cse.cc @@ -5192,6 +5192,22 @@ cse_insn (rtx_insn *insn) src_elt_regcost = elt->regcost; } + /* If the current function uses a constant pool and this is a + constant, try making a pool entry. Put it in src_folded + unless we already have done this since that is where it + likely came from. */ + + if (crtl->uses_const_pool && src_folded && CONSTANT_P (src_folded) + && targetm.faster_loading_constant (mode, dest, src_folded)) + { + src_folded = force_const_mem (mode, src_folded); + if (src_folded) + { + src_folded_cost = COST (src_folded, mode); + src_folded_regcost = approx_reg_cost (src_folded); + } + } + /* Find cheapest and skip it for the next time. For items of equal cost, use this order: src_folded, src, src_eqv, src_related and hash table entry. */ @@ -5390,26 +5406,6 @@ cse_insn (rtx_insn *insn) break; } - - /* If the current function uses a constant pool and this is a - constant, try making a pool entry. Put it in src_folded - unless we already have done this since that is where it - likely came from. */ - - else if (crtl->uses_const_pool - && CONSTANT_P (trial) - && !CONST_INT_P (trial) - && (src_folded == 0 || !MEM_P (src_folded)) - && GET_MODE_CLASS (mode) != MODE_CC - && mode != VOIDmode) - { - src_folded = force_const_mem (mode, trial); - if (src_folded) - { - src_folded_cost = COST (src_folded, mode); - src_folded_regcost = approx_reg_cost (src_folded); - } - } } /* If we changed the insn too much, handle this set from scratch. */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 962bbb8caaf..65685311249 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6014,6 +6014,11 @@ holding the constant. This restriction is often true of addresses of TLS symbols for various targets. @end deftypefn +@deftypefn {Target Hook} bool TARGET_FASTER_LOADING_CONSTANT (machine_mode @var{mode}, rtx @var{y}, rtx @var{x}) +It returns true if loading contant @var{x} is faster than building +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (machine_mode @var{mode}, const_rtx @var{x}) This hook should return true if pool entries for constant @var{x} can be placed in an @code{object_block} structure. @var{mode} is the mode diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 394b59e70a6..918914f0e30 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4148,6 +4148,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_CANNOT_FORCE_CONST_MEM +@hook TARGET_FASTER_LOADING_CONSTANT + @hook TARGET_USE_BLOCKS_FOR_CONSTANT_P @hook TARGET_USE_BLOCKS_FOR_DECL_P diff --git a/gcc/target.def b/gcc/target.def index 57e64b20eef..8b007aca9dc 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2987,6 +2987,14 @@ locally. The default is to return false.", bool, (void), hook_bool_void_false) +/* Check if it is profitable to load the constant from constant pool. */ +DEFHOOK +(faster_loading_constant, + "It returns true if loading contant @var{x} is faster than building\n\ +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}.", + bool, (machine_mode mode, rtx y, rtx x), + default_faster_loading_constant) + /* True if it is OK to do sibling call optimization for the specified call expression EXP. DECL will be the called function, or NULL if this is an indirect call. */ diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index fc49235eb38..12de61a7ed8 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -173,6 +173,12 @@ default_return_in_memory (const_tree type, return (TYPE_MODE (type) == BLKmode); } +bool +default_faster_loading_constant (machine_mode, rtx, rtx) +{ + return false; +} + rtx default_legitimize_address (rtx x, rtx orig_x ATTRIBUTE_UNUSED, machine_mode mode ATTRIBUTE_UNUSED) diff --git a/gcc/targhooks.h b/gcc/targhooks.h index ecfa11287ef..4db21cd59f6 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -295,5 +295,6 @@ extern rtx default_memtag_extract_tag (rtx, rtx); extern rtx default_memtag_untagged_pointer (rtx, rtx); extern HOST_WIDE_INT default_gcov_type_size (void); +extern bool default_faster_loading_constant (machine_mode, rtx, rtx); #endif /* GCC_TARGHOOKS_H */ diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c b/gcc/testsuite/gcc.target/powerpc/medium_offset.c index f29eba08c38..4889e8fa8ec 100644 --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c @@ -1,7 +1,7 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target lp64 } */ /* { dg-options "-O" } */ -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ static int x; diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c new file mode 100644 index 00000000000..58ba074d427 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define CONST1 0x100803004101001 +#define CONST2 0x000406002105007 + +void __attribute__ ((noinline)) +foo (unsigned long long *a, unsigned long long *b) +{ + *a = CONST1; + *b = CONST2; +} + +/* { dg-final { scan-assembler-times {\mld|pld\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c b/gcc/testsuite/gcc.target/powerpc/pr93012.c index 4f764d0576f..5afb4f79c45 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0xffff9234ffff9234ULL; } unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; } unsigned long long mskse() { return 0xffff1234ffff1234ULL; } -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */ +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */ -- 2.25.1 > > > BR, > Jiufu > >> >> But there is so much more wrong with cse.c :-( >> >> >> Segher