From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 203EF3858402 for ; Fri, 25 Feb 2022 04:35:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 203EF3858402 Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 21P3xsqi001672; Fri, 25 Feb 2022 04:35:26 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3edv9qbqmv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Feb 2022 04:35:26 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 21P4Mr1n034306; Fri, 25 Feb 2022 04:35:25 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com with ESMTP id 3edv9qbqmp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Feb 2022 04:35:25 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 21P4Wa4H009133; Fri, 25 Feb 2022 04:35:24 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma03dal.us.ibm.com with ESMTP id 3ear6cw5ur-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Feb 2022 04:35:24 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 21P4ZOD431064558 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 25 Feb 2022 04:35:24 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D8BEE11206B; Fri, 25 Feb 2022 04:35:23 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 64238112063; Fri, 25 Feb 2022 04:35:23 +0000 (GMT) Received: from genoa (unknown [9.40.192.157]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTPS; Fri, 25 Feb 2022 04:35:23 +0000 (GMT) From: Jiufu Guo To: Richard Biener Cc: Jiufu Guo via Gcc-patches , Segher Boessenkool , wschmidt@linux.ibm.com, jlaw@tachyum.com, dje.gcc@gmail.com Subject: Re: [PATCH] Check if loading const from mem is faster References: <20220222065313.2040127-1-guojiufu@linux.ibm.com> <70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr> <1d471fba-a966-3e90-92ce-ae4707fe53b6@linux.ibm.com> <7ns67pr-q7q-rps-136-37pn481515q5@fhfr.qr> <20220223211406.GH614@gate.crashing.org> <58519pr3-4363-4rso-2rq8-n07n86sqq3p@fhfr.qr> Date: Fri, 25 Feb 2022 12:35:20 +0800 In-Reply-To: <58519pr3-4363-4rso-2rq8-n07n86sqq3p@fhfr.qr> (Richard Biener's message of "Thu, 24 Feb 2022 09:50:28 +0100 (CET)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: WPnlc-MnxrJ3FuO4yILF_EB3syC1Zm9- X-Proofpoint-GUID: 3MIcUlc8BUkSyntO14hMVgHtHffKYVT0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-02-25_01,2022-02-24_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 malwarescore=0 adultscore=0 impostorscore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 spamscore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2202250022 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Feb 2022 04:35:30 -0000 Richard Biener writes: > On Thu, 24 Feb 2022, Jiufu Guo wrote: > >> Jiufu Guo via Gcc-patches writes: >> >> > Segher Boessenkool writes: >> > >> >> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote: >> >>> I'm assuming we're always dealing with >> >>> >> >>> (set (reg:MODE ..) ) >> >>> >> >>> here and CSE is not substituting into random places of an >> >>> instruction(?). I don't know what 'rtx_cost' should evaluate >> >>> to for a constant, if it should implicitely evaluate the cost >> >>> of putting the result into a register for example. >> >> >> >> rtx_cost is no good here (and in most places). rtx_cost should be 0 >> >> for anything that is used as input in a machine instruction -- but you >> >> need much more context to determine that. insn_cost is much simpler and >> >> much easier to use. >> >> >> >>> Using RTX_COST with SET and 1 at least looks no worse than using >> >>> your proposed new target hook and comparing it with the original >> >>> unfolded src (again with SET and 1). >> >> >> >> It is required to generate valid instructions no matter what, before >> >> the pass has finished that is. On all more modern architectures it is >> >> futile to think you can usefully consider the cost of an RTL expression >> >> and derive a real-world cost of the generated code from that. >> > >> > Thanks Segher for pointing out these! Here is another reason that I >> > did not use rtx_cost: in a few passes, there are codes to check the >> > constants and store them in constant pool. I'm thinking to integerate >> > those codes in a consistent way. >> >> Hi Segher, Richard! >> >> I'm thinking the way like: For a constant, >> 1. if the constant could be used as an immediate for the >> instruction, then retreated as an operand; >> 2. otherwise if the constant can not be stored into a >> constant pool, then handle through instructions; >> 3. if it is faster to access constant from pool, then emit >> constant as data(.rodata); >> 4. otherwise, handle the constant by instructions. >> >> And to store the constant into a pool, besides force_const_mem, >> create reference (toc) may be needed on some platforms. >> >> For this particular issue in CSE, there is already code that >> tries to put constant into a pool (invoke force_const_mem). >> While the code is too late. So, we may check the constant >> earlier and store it into constant pool if profitable. >> >> And another thing as Segher pointed out, CSE is doing too >> much work. It may be ok to separate the constant handling >> logic from CSE. > > Not sure - CSE just is value numbering, I don't see that it does > more than that. Yes, it might have developed "heuristics" over > the years what to CSE and to what and where to substitute and > where not. But in the end it does just value numbering. > >> >> I update a new version patch as follow (did not seprate CSE): > > How is the new target hook better in any way compared to rtx_cost > or insn_cost? It looks like a total hack. > > I suppose the actual way of materializing a constant is done > behind GCCs back and not exposed anywhere? But instead you > claim the constants are valid when they actually are not? > Isn't the problem then that the rs6000 backend lies? Hi Richard, Thanks for your comments and sugguestions! Materializing a constant should be done behind GCC. On rs6000, in expand pass, during emit_move, the constant is checked and store into constant pool if necessary. Some other platforms are doing a similar thing, e.g. ix86_expand_vector_move, alpha_expand_mov,... mips_legitimize_const_move. But, it does not as we expect, force_const_mem is also exposed other places (not only ira/reload for stack reference). CSE is one place, for example, CSE first retrieve the constant from insn's equal, but it also tries to put constant into pool for some condition (the condition was introduced at early age of cse.c, and it is rare to run into in trunk). In some aspects, IMHO, this seems not a great work of CSE. And this is how the 'invalid(or say slow)' constant occurs. e.g. before cse: 7: r119:DI=[unspec[`*.LC0',%r2:DI] 47] REG_EQUAL 0x100803004101001 after cse: 7: r119:DI=0x100803004101001 REG_EQUAL 0x100803004101001 As you pointed out: we can also avoid this transformation through rtx_cost/insn_cost by estimating the COST more accurately for "r119:DI=0x100803004101001". (r119:DI=0x100803004101001 will not be a real final instruction.) Is it necessary to refine this constant handling for CSE, or even to eliminate the logic on constant extracting for an insn, beside updating rtx_cost/insn_cost? Any sugguestions? > > Btw, all of this is of course not appropriate for stage4 and changes > to CSE need testing on more than one target. I would do more evaluation, thanks! Jiufu > > Richard. > >> Thanks for the comments and suggestions again! >> >> >> BR, >> Jiufu >> >> --- >> gcc/config/rs6000/rs6000.cc | 39 ++++++++++++++----- >> gcc/cse.cc | 36 ++++++++--------- >> gcc/doc/tm.texi | 5 +++ >> gcc/doc/tm.texi.in | 2 + >> gcc/target.def | 8 ++++ >> gcc/targhooks.cc | 6 +++ >> gcc/targhooks.h | 1 + >> .../gcc.target/powerpc/medium_offset.c | 2 +- >> gcc/testsuite/gcc.target/powerpc/pr63281.c | 14 +++++++ >> gcc/testsuite/gcc.target/powerpc/pr93012.c | 2 +- >> 10 files changed, 84 insertions(+), 31 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index d7a7cfe860f..0a8f487d516 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -1361,6 +1361,9 @@ static const struct attribute_spec rs6000_attribute_table[] = >> #undef TARGET_CANNOT_FORCE_CONST_MEM >> #define TARGET_CANNOT_FORCE_CONST_MEM rs6000_cannot_force_const_mem >> >> +#undef TARGET_FASTER_LOADING_CONSTANT >> +#define TARGET_FASTER_LOADING_CONSTANT rs6000_faster_loading_const >> + >> #undef TARGET_DELEGITIMIZE_ADDRESS >> #define TARGET_DELEGITIMIZE_ADDRESS rs6000_delegitimize_address >> >> @@ -9684,8 +9687,8 @@ rs6000_init_stack_protect_guard (void) >> static bool >> rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) >> { >> - if (GET_CODE (x) == HIGH >> - && GET_CODE (XEXP (x, 0)) == UNSPEC) >> + /* Exclude CONSTANT HIGH part. */ >> + if (GET_CODE (x) == HIGH) >> return true; >> >> /* A TLS symbol in the TOC cannot contain a sum. */ >> @@ -10483,6 +10486,30 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode) >> return false; >> } >> >> + >> +/* Implement TARGET_FASTER_LOADING_CONSTANT. */ >> + >> +static bool >> +rs6000_faster_loading_const (machine_mode mode, rtx dest, rtx src) >> +{ >> + gcc_assert (CONSTANT_P (src)); >> + >> + if (GET_MODE_CLASS (mode) == MODE_CC || mode == VOIDmode) >> + return false; >> + if (GET_CODE (src) == HIGH) >> + return false; >> + if (toc_relative_expr_p (src, false, NULL, NULL)) >> + return false; >> + if (rs6000_cannot_force_const_mem (mode, src)) >> + return false; >> + >> + if (REG_P (dest) && FP_REGNO_P (REGNO (dest))) >> + return true; >> + if (!CONST_INT_P (src)) >> + return true; >> + return num_insns_constant (src, mode) > (TARGET_PCREL ? 1 : 2); >> +} >> + >> /* Emit a move from SOURCE to DEST in mode MODE. */ >> void >> rs6000_emit_move (rtx dest, rtx source, machine_mode mode) >> @@ -10860,13 +10887,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode) >> operands[1] = create_TOC_reference (operands[1], operands[0]); >> else if (mode == Pmode >> && CONSTANT_P (operands[1]) >> - && GET_CODE (operands[1]) != HIGH >> - && ((REG_P (operands[0]) >> - && FP_REGNO_P (REGNO (operands[0]))) >> - || !CONST_INT_P (operands[1]) >> - || (num_insns_constant (operands[1], mode) >> - > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2))) >> - && !toc_relative_expr_p (operands[1], false, NULL, NULL) >> + && rs6000_faster_loading_const (mode, operands[0], operands[1]) >> && (TARGET_CMODEL == CMODEL_SMALL >> || can_create_pseudo_p () >> || (REG_P (operands[0]) >> diff --git a/gcc/cse.cc b/gcc/cse.cc >> index a18b599d324..53a7b3556b3 100644 >> --- a/gcc/cse.cc >> +++ b/gcc/cse.cc >> @@ -5192,6 +5192,22 @@ cse_insn (rtx_insn *insn) >> src_elt_regcost = elt->regcost; >> } >> >> + /* If the current function uses a constant pool and this is a >> + constant, try making a pool entry. Put it in src_folded >> + unless we already have done this since that is where it >> + likely came from. */ >> + >> + if (crtl->uses_const_pool && src_folded && CONSTANT_P (src_folded) >> + && targetm.faster_loading_constant (mode, dest, src_folded)) >> + { >> + src_folded = force_const_mem (mode, src_folded); >> + if (src_folded) >> + { >> + src_folded_cost = COST (src_folded, mode); >> + src_folded_regcost = approx_reg_cost (src_folded); >> + } >> + } >> + >> /* Find cheapest and skip it for the next time. For items >> of equal cost, use this order: >> src_folded, src, src_eqv, src_related and hash table entry. */ >> @@ -5390,26 +5406,6 @@ cse_insn (rtx_insn *insn) >> >> break; >> } >> - >> - /* If the current function uses a constant pool and this is a >> - constant, try making a pool entry. Put it in src_folded >> - unless we already have done this since that is where it >> - likely came from. */ >> - >> - else if (crtl->uses_const_pool >> - && CONSTANT_P (trial) >> - && !CONST_INT_P (trial) >> - && (src_folded == 0 || !MEM_P (src_folded)) >> - && GET_MODE_CLASS (mode) != MODE_CC >> - && mode != VOIDmode) >> - { >> - src_folded = force_const_mem (mode, trial); >> - if (src_folded) >> - { >> - src_folded_cost = COST (src_folded, mode); >> - src_folded_regcost = approx_reg_cost (src_folded); >> - } >> - } >> } >> >> /* If we changed the insn too much, handle this set from scratch. */ >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> index 962bbb8caaf..65685311249 100644 >> --- a/gcc/doc/tm.texi >> +++ b/gcc/doc/tm.texi >> @@ -6014,6 +6014,11 @@ holding the constant. This restriction is often true of addresses >> of TLS symbols for various targets. >> @end deftypefn >> >> +@deftypefn {Target Hook} bool TARGET_FASTER_LOADING_CONSTANT (machine_mode @var{mode}, rtx @var{y}, rtx @var{x}) >> +It returns true if loading contant @var{x} is faster than building >> +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}. >> +@end deftypefn >> + >> @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (machine_mode @var{mode}, const_rtx @var{x}) >> This hook should return true if pool entries for constant @var{x} can >> be placed in an @code{object_block} structure. @var{mode} is the mode >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in >> index 394b59e70a6..918914f0e30 100644 >> --- a/gcc/doc/tm.texi.in >> +++ b/gcc/doc/tm.texi.in >> @@ -4148,6 +4148,8 @@ address; but often a machine-dependent strategy can generate better code. >> >> @hook TARGET_CANNOT_FORCE_CONST_MEM >> >> +@hook TARGET_FASTER_LOADING_CONSTANT >> + >> @hook TARGET_USE_BLOCKS_FOR_CONSTANT_P >> >> @hook TARGET_USE_BLOCKS_FOR_DECL_P >> diff --git a/gcc/target.def b/gcc/target.def >> index 57e64b20eef..8b007aca9dc 100644 >> --- a/gcc/target.def >> +++ b/gcc/target.def >> @@ -2987,6 +2987,14 @@ locally. The default is to return false.", >> bool, (void), >> hook_bool_void_false) >> >> +/* Check if it is profitable to load the constant from constant pool. */ >> +DEFHOOK >> +(faster_loading_constant, >> + "It returns true if loading contant @var{x} is faster than building\n\ >> +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}.", >> + bool, (machine_mode mode, rtx y, rtx x), >> + default_faster_loading_constant) >> + >> /* True if it is OK to do sibling call optimization for the specified >> call expression EXP. DECL will be the called function, or NULL if >> this is an indirect call. */ >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc >> index fc49235eb38..12de61a7ed8 100644 >> --- a/gcc/targhooks.cc >> +++ b/gcc/targhooks.cc >> @@ -173,6 +173,12 @@ default_return_in_memory (const_tree type, >> return (TYPE_MODE (type) == BLKmode); >> } >> >> +bool >> +default_faster_loading_constant (machine_mode, rtx, rtx) >> +{ >> + return false; >> +} >> + >> rtx >> default_legitimize_address (rtx x, rtx orig_x ATTRIBUTE_UNUSED, >> machine_mode mode ATTRIBUTE_UNUSED) >> diff --git a/gcc/targhooks.h b/gcc/targhooks.h >> index ecfa11287ef..4db21cd59f6 100644 >> --- a/gcc/targhooks.h >> +++ b/gcc/targhooks.h >> @@ -295,5 +295,6 @@ extern rtx default_memtag_extract_tag (rtx, rtx); >> extern rtx default_memtag_untagged_pointer (rtx, rtx); >> >> extern HOST_WIDE_INT default_gcov_type_size (void); >> +extern bool default_faster_loading_constant (machine_mode, rtx, rtx); >> >> #endif /* GCC_TARGHOOKS_H */ >> diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c b/gcc/testsuite/gcc.target/powerpc/medium_offset.c >> index f29eba08c38..4889e8fa8ec 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c >> @@ -1,7 +1,7 @@ >> /* { dg-do compile { target { powerpc*-*-* } } } */ >> /* { dg-require-effective-target lp64 } */ >> /* { dg-options "-O" } */ >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ >> >> static int x; >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c >> new file mode 100644 >> index 00000000000..58ba074d427 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c >> @@ -0,0 +1,14 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#define CONST1 0x100803004101001 >> +#define CONST2 0x000406002105007 >> + >> +void __attribute__ ((noinline)) >> +foo (unsigned long long *a, unsigned long long *b) >> +{ >> + *a = CONST1; >> + *b = CONST2; >> +} >> + >> +/* { dg-final { scan-assembler-times {\mld|pld\M} 2 } } */ >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c b/gcc/testsuite/gcc.target/powerpc/pr93012.c >> index 4f764d0576f..5afb4f79c45 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c >> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c >> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0xffff9234ffff9234ULL; } >> unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; } >> unsigned long long mskse() { return 0xffff1234ffff1234ULL; } >> >> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */ >> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */ >>