From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id EA761385BF81 for ; Wed, 23 Feb 2022 11:33:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EA761385BF81 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 21N8lYmK011733; Wed, 23 Feb 2022 11:33:05 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3edhqsayxv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 23 Feb 2022 11:33:04 +0000 Received: from m0098413.ppops.net (m0098413.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 21NBO1Df000372; Wed, 23 Feb 2022 11:33:04 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 3edhqsayxe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 23 Feb 2022 11:33:04 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 21NBPqb7013213; Wed, 23 Feb 2022 11:33:02 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma04ams.nl.ibm.com with ESMTP id 3ear699ekk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 23 Feb 2022 11:33:02 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 21NBWxeY48365992 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 23 Feb 2022 11:32:59 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 823ED11C066; Wed, 23 Feb 2022 11:32:59 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BFBFF11C058; Wed, 23 Feb 2022 11:32:56 +0000 (GMT) Received: from [9.197.234.245] (unknown [9.197.234.245]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 23 Feb 2022 11:32:56 +0000 (GMT) Message-ID: <1d471fba-a966-3e90-92ce-ae4707fe53b6@linux.ibm.com> Date: Wed, 23 Feb 2022 19:32:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH] Check if loading const from mem is faster To: Richard Biener Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, jlaw@tachyum.com, wschmidt@linux.ibm.com, guojiufu@linux.ibm.com References: <20220222065313.2040127-1-guojiufu@linux.ibm.com> <70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr> From: guojiufu In-Reply-To: <70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: WJujvgDS83cX69lTZWfUrCNEg4O3KxRr X-Proofpoint-ORIG-GUID: To2keUz8FnC_buDyozdOCIWWSjfyxTlO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-02-23_03,2022-02-23_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 bulkscore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=0 adultscore=0 mlxlogscore=999 phishscore=0 impostorscore=0 spamscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2202230064 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Feb 2022 11:33:08 -0000 On 2/22/22 PM3:26, Richard Biener wrote: > On Tue, 22 Feb 2022, Jiufu Guo wrote: > >> Hi, >> >> For constants, there are some codes to check: if it is able to put >> to instruction as an immediate operand or it is profitable to load from >> mem. There are still some places that could be improved for platforms. >> >> This patch could handle PR63281/57836. This patch does not change >> too much on the code like force_const_mem and legitimate_constant_p. >> We may integrate these APIs for passes like expand/cse/combine >> as a whole solution in the future (maybe better for stage1?). >> >> Bootstrap and regtest pass on ppc64le and x86_64. Is this ok for trunk? >> Thanks for comments! > > I'm not sure whether we need a new hook here, but iff, then I think > whether loading a constant (from memory?) is faster or not depends > on the context. So what's the exact situation and the two variants > you are costing against each other? I assume (since you are > touching CSE) you are costing Hi Richard, Thanks for your review! In some contexts, it may be faster to load from memory for some constant value, and for some constant value, it would be faster to put into immediate of few (1 or 2) instructions. For example 0x1234567812345678, on ppc64, we may need 3 instructions to build it, and then it would be better to put it in .rodata, and then load it from memory. Currently, we already have hooks TARGET_CANNOT_FORCE_CONST_MEM and TARGET_LEGITIMATE_CONSTANT_P. TARGET_CANNOT_FORCE_CONST_MEM is used to check if one 'rtx' can be store into the constant pool. On some targets (e.g. alpha), TARGET_LEGITIMATE_CONSTANT_P does the behavior like what we expect: I once thought to use TARGET_LEGITIMATE_CONSTANT_P too. But in general, it seems this hook is designed to check if one 'rtx' could be used as an immediate instruction. This hook is used in RTL passes: ira/reload. It is also used in recog.cc and expr.cc. In other words, I feel, whether putting a constant in the constant pool, we could check: - If TARGET_CANNOT_FORCE_CONST_MEM returns true, we should not put the 'constant' in the constant pool. - If TARGET_LEGITIMATE_CONSTANT_P returns true, then the 'constant' would be immediate of **one** instruction, and not put to constant pool. - If the new hook TARGET_FASTER_LOADING_CONSTANT returns true, then the 'constant' would be stored in the constant pool. Otherwise, it would be better to use an instructions-seq to build the 'constant'. This is why I introduce a new hook. We may also use the new hook at other places, e.g. expand/combining... where is calling force_const_mem. Any suggestions? > > (set (...) (mem)) (before CSE) > > against > > (set (...) (immediate)) (what CSE does now) > > vs. > > (set (...) (mem)) (original, no CSE) > > ? With the new hook you are skipping _all_ of the following loops > logic which does look like a quite bad design and hack (not that > I am very familiar with the candidate / costing logic in there). At cse_insn, in the following loop of the code, it is also testing the constant and try to put into memory: else if (crtl->uses_const_pool && CONSTANT_P (trial) && !CONST_INT_P (trial) && (src_folded == 0 || !MEM_P (src_folded)) && GET_MODE_CLASS (mode) != MODE_CC && mode != VOIDmode) { src_folded = force_const_mem (mode, trial); if (src_folded) { src_folded_cost = COST (src_folded, mode); src_folded_regcost = approx_reg_cost (src_folded); } } This code is at the end of the loop, it would only be tested for the next iteration. It may be better to test "if need to put the constant into memory" for all iterations. The current patch is adding an additional test before the loop. I will update the patch to integrate these two places! > > We already have TARGET_INSN_COST which you could ask for a cost. > Like if we'd have a single_set then just temporarily substitute > the RHS with the candidate and cost the insns and compare against > the original insn cost. So why exactly do you need a new hook > for this particular situation? Thanks for pointing out this! Segher also mentioned this before. Currently, CSE is using rtx_cost. Using insn_cost to replace rtx_cost would be a good idea for all necessary places including CSE. For this particular case: check the cost for constants. I did not use insn_cost. Because to use insn_cost, we may need to create a recognizable insn temporarily, and for some kind of constants we may need to create a sequence instructions on some platform, e.g. "li xx; ori ; sldi .." on ppc64, and check the sum cost of those instructions. If only create one fake instruction, the insn_cost may not return the accurate cost either. BR, Jiufu > > Thanks, > Richard. > > >> >> BR, >> Jiufu >> >> gcc/ChangeLog: >> >> PR target/94393 >> PR rtl-optimization/63281 >> * config/rs6000/rs6000.cc (TARGET_FASTER_LOADING_CONSTANT): >> New hook. >> (rs6000_faster_loading_const): New hook. >> (rs6000_cannot_force_const_mem): Update. >> (rs6000_emit_move): Use rs6000_faster_loading_const. >> * cse.cc (cse_insn): Use new hook. >> * doc/tm.texi.in: New hook TARGET_FASTER_LOADING_CONSTANT. >> * doc/tm.texi: Regenerate. >> * target.def: New hook faster_loading_constant. >> * targhooks.cc (default_faster_loading_constant): New. >> * targhooks.h (default_faster_loading_constant): Declare it. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/medium_offset.c: Update. >> * gcc.target/powerpc/pr93012.c: Update. >> * gcc.target/powerpc/pr63281.c: New test. >> >> --- >> gcc/config/rs6000/rs6000.cc | 38 ++++++++++++++----- >> gcc/cse.cc | 5 +++ >> gcc/doc/tm.texi | 5 +++ >> gcc/doc/tm.texi.in | 2 + >> gcc/target.def | 8 ++++ >> gcc/targhooks.cc | 6 +++ >> .../gcc.target/powerpc/medium_offset.c | 2 +- >> gcc/testsuite/gcc.target/powerpc/pr63281.c | 14 +++++++ >> gcc/testsuite/gcc.target/powerpc/pr93012.c | 2 +- >> 9 files changed, 71 insertions(+), 11 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index d7a7cfe860f..beb21a0bb2b 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -1361,6 +1361,9 @@ static const struct attribute_spec rs6000_attribute_table[] = >> #undef TARGET_CANNOT_FORCE_CONST_MEM >> #define TARGET_CANNOT_FORCE_CONST_MEM rs6000_cannot_force_const_mem >> >> +#undef TARGET_FASTER_LOADING_CONSTANT >> +#define TARGET_FASTER_LOADING_CONSTANT rs6000_faster_loading_const >> + >> #undef TARGET_DELEGITIMIZE_ADDRESS >> #define TARGET_DELEGITIMIZE_ADDRESS rs6000_delegitimize_address >> >> @@ -9684,8 +9687,7 @@ rs6000_init_stack_protect_guard (void) >> static bool >> rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) >> { >> - if (GET_CODE (x) == HIGH >> - && GET_CODE (XEXP (x, 0)) == UNSPEC) >> + if (GET_CODE (x) == HIGH) >> return true; >> >> /* A TLS symbol in the TOC cannot contain a sum. */ >> @@ -10483,6 +10485,30 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode) >> return false; >> } >> >> + >> +/* Implement TARGET_FASTER_LOADING_CONSTANT. */ >> + >> +static bool >> +rs6000_faster_loading_const (machine_mode mode, rtx dest, rtx src) >> +{ >> + gcc_assert (CONSTANT_P (src)); >> + >> + if (GET_MODE_CLASS (mode) == MODE_CC || mode == VOIDmode) >> + return false; >> + if (GET_CODE (src) == HIGH) >> + return false; >> + if (toc_relative_expr_p (src, false, NULL, NULL)) >> + return false; >> + if (rs6000_cannot_force_const_mem (mode, src)) >> + return false; >> + >> + if (REG_P (dest) && FP_REGNO_P (REGNO (dest))) >> + return true; >> + if (!CONST_INT_P (src)) >> + return true; >> + return num_insns_constant (src, mode) > (TARGET_PCREL ? 1 : 2); >> +} >> + >> /* Emit a move from SOURCE to DEST in mode MODE. */ >> void >> rs6000_emit_move (rtx dest, rtx source, machine_mode mode) >> @@ -10860,13 +10886,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode) >> operands[1] = create_TOC_reference (operands[1], operands[0]); >> else if (mode == Pmode >> && CONSTANT_P (operands[1]) >> - && GET_CODE (operands[1]) != HIGH >> - && ((REG_P (operands[0]) >> - && FP_REGNO_P (REGNO (operands[0]))) >> - || !CONST_INT_P (operands[1]) >> - || (num_insns_constant (operands[1], mode) >> - > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2))) >> - && !toc_relative_expr_p (operands[1], false, NULL, NULL) >> + && rs6000_faster_loading_const (mode, operands[0], operands[1]) >> && (TARGET_CMODEL == CMODEL_SMALL >> || can_create_pseudo_p () >> || (REG_P (operands[0]) >> diff --git a/gcc/cse.cc b/gcc/cse.cc >> index a18b599d324..c33d3c7e911 100644 >> --- a/gcc/cse.cc >> +++ b/gcc/cse.cc >> @@ -5158,6 +5158,11 @@ cse_insn (rtx_insn *insn) >> if (dest == pc_rtx && src_const && GET_CODE (src_const) == LABEL_REF) >> src_folded = src_const, src_folded_cost = src_folded_regcost = -1; >> >> + /* Check if src_foled is contant which is fastster to load pool. */ >> + if (src_folded && CONSTANT_P (src_folded) && src && MEM_P (src) >> + && targetm.faster_loading_constant (mode, dest, src_folded)) >> + src_folded = NULL_RTX, src_folded_cost = src_folded_regcost = -1; >> + >> /* Terminate loop when replacement made. This must terminate since >> the current contents will be tested and will always be valid. */ >> while (1) >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> index 962bbb8caaf..65685311249 100644 >> --- a/gcc/doc/tm.texi >> +++ b/gcc/doc/tm.texi >> @@ -6014,6 +6014,11 @@ holding the constant. This restriction is often true of addresses >> of TLS symbols for various targets. >> @end deftypefn >> >> +@deftypefn {Target Hook} bool TARGET_FASTER_LOADING_CONSTANT (machine_mode @var{mode}, rtx @var{y}, rtx @var{x}) >> +It returns true if loading contant @var{x} is faster than building >> +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}. >> +@end deftypefn >> + >> @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (machine_mode @var{mode}, const_rtx @var{x}) >> This hook should return true if pool entries for constant @var{x} can >> be placed in an @code{object_block} structure. @var{mode} is the mode >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in >> index 394b59e70a6..918914f0e30 100644 >> --- a/gcc/doc/tm.texi.in >> +++ b/gcc/doc/tm.texi.in >> @@ -4148,6 +4148,8 @@ address; but often a machine-dependent strategy can generate better code. >> >> @hook TARGET_CANNOT_FORCE_CONST_MEM >> >> +@hook TARGET_FASTER_LOADING_CONSTANT >> + >> @hook TARGET_USE_BLOCKS_FOR_CONSTANT_P >> >> @hook TARGET_USE_BLOCKS_FOR_DECL_P >> diff --git a/gcc/target.def b/gcc/target.def >> index 57e64b20eef..8b007aca9dc 100644 >> --- a/gcc/target.def >> +++ b/gcc/target.def >> @@ -2987,6 +2987,14 @@ locally. The default is to return false.", >> bool, (void), >> hook_bool_void_false) >> >> +/* Check if it is profitable to load the constant from constant pool. */ >> +DEFHOOK >> +(faster_loading_constant, >> + "It returns true if loading contant @var{x} is faster than building\n\ >> +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}.", >> + bool, (machine_mode mode, rtx y, rtx x), >> + default_faster_loading_constant) >> + >> /* True if it is OK to do sibling call optimization for the specified >> call expression EXP. DECL will be the called function, or NULL if >> this is an indirect call. */ >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc >> index fc49235eb38..1d9340d1c14 100644 >> --- a/gcc/targhooks.cc >> +++ b/gcc/targhooks.cc >> @@ -173,6 +173,12 @@ default_return_in_memory (const_tree type, >> return (TYPE_MODE (type) == BLKmode); >> } >> >> +bool >> +default_faster_loading_constant (machine_mode mode, rtx dest, rtx src) >> +{ >> + return false; >> +} >> + >> rtx >> default_legitimize_address (rtx x, rtx orig_x ATTRIBUTE_UNUSED, >> machine_mode mode ATTRIBUTE_UNUSED) >> diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c b/gcc/testsuite/gcc.target/powerpc/medium_offset.c >> index f29eba08c38..4889e8fa8ec 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c >> @@ -1,7 +1,7 @@ >> /* { dg-do compile { target { powerpc*-*-* } } } */ >> /* { dg-require-effective-target lp64 } */ >> /* { dg-options "-O" } */ >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ >> >> static int x; >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c >> new file mode 100644 >> index 00000000000..58ba074d427 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c >> @@ -0,0 +1,14 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#define CONST1 0x100803004101001 >> +#define CONST2 0x000406002105007 >> + >> +void __attribute__ ((noinline)) >> +foo (unsigned long long *a, unsigned long long *b) >> +{ >> + *a = CONST1; >> + *b = CONST2; >> +} >> + >> +/* { dg-final { scan-assembler-times {\mld|pld\M} 2 } } */ >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c b/gcc/testsuite/gcc.target/powerpc/pr93012.c >> index 4f764d0576f..5afb4f79c45 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c >> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c >> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0xffff9234ffff9234ULL; } >> unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; } >> unsigned long long mskse() { return 0xffff1234ffff1234ULL; } >> >> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */ >> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */ >> >