From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 01B973858402 for ; Fri, 25 Feb 2022 08:45:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 01B973858402 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id B05322110A; Fri, 25 Feb 2022 08:45:45 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 6717FA3B8B; Fri, 25 Feb 2022 08:45:45 +0000 (UTC) Date: Fri, 25 Feb 2022 09:45:45 +0100 (CET) From: Richard Biener To: Jiufu Guo cc: Jiufu Guo via Gcc-patches , Segher Boessenkool , wschmidt@linux.ibm.com, jlaw@tachyum.com, dje.gcc@gmail.com Subject: Re: [PATCH] Check if loading const from mem is faster In-Reply-To: Message-ID: <1q51o2n4-r721-qqo6-o186-7r04o4463n@fhfr.qr> References: <20220222065313.2040127-1-guojiufu@linux.ibm.com> <70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr> <1d471fba-a966-3e90-92ce-ae4707fe53b6@linux.ibm.com> <7ns67pr-q7q-rps-136-37pn481515q5@fhfr.qr> <20220223211406.GH614@gate.crashing.org> <58519pr3-4363-4rso-2rq8-n07n86sqq3p@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Feb 2022 08:45:49 -0000 On Fri, 25 Feb 2022, Jiufu Guo wrote: > Richard Biener writes: > > > On Thu, 24 Feb 2022, Jiufu Guo wrote: > > > >> Jiufu Guo via Gcc-patches writes: > >> > >> > Segher Boessenkool writes: > >> > > >> >> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote: > >> >>> I'm assuming we're always dealing with > >> >>> > >> >>> (set (reg:MODE ..) ) > >> >>> > >> >>> here and CSE is not substituting into random places of an > >> >>> instruction(?). I don't know what 'rtx_cost' should evaluate > >> >>> to for a constant, if it should implicitely evaluate the cost > >> >>> of putting the result into a register for example. > >> >> > >> >> rtx_cost is no good here (and in most places). rtx_cost should be 0 > >> >> for anything that is used as input in a machine instruction -- but you > >> >> need much more context to determine that. insn_cost is much simpler and > >> >> much easier to use. > >> >> > >> >>> Using RTX_COST with SET and 1 at least looks no worse than using > >> >>> your proposed new target hook and comparing it with the original > >> >>> unfolded src (again with SET and 1). > >> >> > >> >> It is required to generate valid instructions no matter what, before > >> >> the pass has finished that is. On all more modern architectures it is > >> >> futile to think you can usefully consider the cost of an RTL expression > >> >> and derive a real-world cost of the generated code from that. > >> > > >> > Thanks Segher for pointing out these! Here is another reason that I > >> > did not use rtx_cost: in a few passes, there are codes to check the > >> > constants and store them in constant pool. I'm thinking to integerate > >> > those codes in a consistent way. > >> > >> Hi Segher, Richard! > >> > >> I'm thinking the way like: For a constant, > >> 1. if the constant could be used as an immediate for the > >> instruction, then retreated as an operand; > >> 2. otherwise if the constant can not be stored into a > >> constant pool, then handle through instructions; > >> 3. if it is faster to access constant from pool, then emit > >> constant as data(.rodata); > >> 4. otherwise, handle the constant by instructions. > >> > >> And to store the constant into a pool, besides force_const_mem, > >> create reference (toc) may be needed on some platforms. > >> > >> For this particular issue in CSE, there is already code that > >> tries to put constant into a pool (invoke force_const_mem). > >> While the code is too late. So, we may check the constant > >> earlier and store it into constant pool if profitable. > >> > >> And another thing as Segher pointed out, CSE is doing too > >> much work. It may be ok to separate the constant handling > >> logic from CSE. > > > > Not sure - CSE just is value numbering, I don't see that it does > > more than that. Yes, it might have developed "heuristics" over > > the years what to CSE and to what and where to substitute and > > where not. But in the end it does just value numbering. > > > >> > >> I update a new version patch as follow (did not seprate CSE): > > > > How is the new target hook better in any way compared to rtx_cost > > or insn_cost? It looks like a total hack. > > > > I suppose the actual way of materializing a constant is done > > behind GCCs back and not exposed anywhere? But instead you > > claim the constants are valid when they actually are not? > > Isn't the problem then that the rs6000 backend lies? > > Hi Richard, > > Thanks for your comments and sugguestions! > > Materializing a constant should be done behind GCC. > On rs6000, in expand pass, during emit_move, the constant is > checked and store into constant pool if necessary. > Some other platforms are doing a similar thing, e.g. > ix86_expand_vector_move, alpha_expand_mov,... > mips_legitimize_const_move. > > But, it does not as we expect, force_const_mem is also > exposed other places (not only ira/reload for stack reference). > > CSE is one place, for example, CSE first retrieve the constant > from insn's equal, but it also tries to put constant into > pool for some condition (the condition was introduced at > early age of cse.c, and it is rare to run into in trunk). > In some aspects, IMHO, this seems not a great work of CSE. > > And this is how the 'invalid(or say slow)' constant occurs. > e.g. before cse: > 7: r119:DI=[unspec[`*.LC0',%r2:DI] 47] > REG_EQUAL 0x100803004101001 > after cse: > 7: r119:DI=0x100803004101001 > REG_EQUAL 0x100803004101001 > > As you pointed out: we can also avoid this transformation through > rtx_cost/insn_cost by estimating the COST more accurately for > "r119:DI=0x100803004101001". (r119:DI=0x100803004101001 will not > be a real final instruction.) At which point does this become the final instruction? I suppose CSE belives this constant is legitimate and the insn is recognized just fine in CSE? (that's what I mean with "behind GCCs back") > Is it necessary to refine this constant handling for CSE, or even > to eliminate the logic on constant extracting for an insn, beside > updating rtx_cost/insn_cost? So if CSE really just transforms > 7: r119:DI=[unspec[`*.LC0',%r2:DI] 47] > REG_EQUAL 0x100803004101001 to > 7: r119:DI=0x100803004101001 > REG_EQUAL 0x100803004101001 then why can't rtx_cost (SET, 1) or insn_cost () be used to accurately tell it that the immediate is going to be a lot more expensive? That is, rtx_cost (CONST_INT-rtx, DI, SET, 1, ...) is accurate enough to be treated as an actual insn (it _might_ be part of a parallel I guess, but that's unlikely) and thus the backend should have a very good idea of the many instruction it needs for this. Likewise rtx_cost ([unspec[`*.LC0',%r2:DI] 47], DI, SET, 1, ...) should receive accurate cost that can be compared to other rtx_cost (..., DI, SET, 1, ...) And CSE doesn't even need to create fake insns here since IMHO rtx_cost is good enough to capture the full insn. Targets can choose to split out a set_cost from their rtx_cost/insn_cost hooks for this case for example. Richard. > Any sugguestions? > > > > > Btw, all of this is of course not appropriate for stage4 and changes > > to CSE need testing on more than one target. > I would do more evaluation, thanks! > > Jiufu > > > > > Richard. > > > >> Thanks for the comments and suggestions again! > >> > >> > >> BR, > >> Jiufu > >> > >> --- > >> gcc/config/rs6000/rs6000.cc | 39 ++++++++++++++----- > >> gcc/cse.cc | 36 ++++++++--------- > >> gcc/doc/tm.texi | 5 +++ > >> gcc/doc/tm.texi.in | 2 + > >> gcc/target.def | 8 ++++ > >> gcc/targhooks.cc | 6 +++ > >> gcc/targhooks.h | 1 + > >> .../gcc.target/powerpc/medium_offset.c | 2 +- > >> gcc/testsuite/gcc.target/powerpc/pr63281.c | 14 +++++++ > >> gcc/testsuite/gcc.target/powerpc/pr93012.c | 2 +- > >> 10 files changed, 84 insertions(+), 31 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c > >> > >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > >> index d7a7cfe860f..0a8f487d516 100644 > >> --- a/gcc/config/rs6000/rs6000.cc > >> +++ b/gcc/config/rs6000/rs6000.cc > >> @@ -1361,6 +1361,9 @@ static const struct attribute_spec rs6000_attribute_table[] = > >> #undef TARGET_CANNOT_FORCE_CONST_MEM > >> #define TARGET_CANNOT_FORCE_CONST_MEM rs6000_cannot_force_const_mem > >> > >> +#undef TARGET_FASTER_LOADING_CONSTANT > >> +#define TARGET_FASTER_LOADING_CONSTANT rs6000_faster_loading_const > >> + > >> #undef TARGET_DELEGITIMIZE_ADDRESS > >> #define TARGET_DELEGITIMIZE_ADDRESS rs6000_delegitimize_address > >> > >> @@ -9684,8 +9687,8 @@ rs6000_init_stack_protect_guard (void) > >> static bool > >> rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) > >> { > >> - if (GET_CODE (x) == HIGH > >> - && GET_CODE (XEXP (x, 0)) == UNSPEC) > >> + /* Exclude CONSTANT HIGH part. */ > >> + if (GET_CODE (x) == HIGH) > >> return true; > >> > >> /* A TLS symbol in the TOC cannot contain a sum. */ > >> @@ -10483,6 +10486,30 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode) > >> return false; > >> } > >> > >> + > >> +/* Implement TARGET_FASTER_LOADING_CONSTANT. */ > >> + > >> +static bool > >> +rs6000_faster_loading_const (machine_mode mode, rtx dest, rtx src) > >> +{ > >> + gcc_assert (CONSTANT_P (src)); > >> + > >> + if (GET_MODE_CLASS (mode) == MODE_CC || mode == VOIDmode) > >> + return false; > >> + if (GET_CODE (src) == HIGH) > >> + return false; > >> + if (toc_relative_expr_p (src, false, NULL, NULL)) > >> + return false; > >> + if (rs6000_cannot_force_const_mem (mode, src)) > >> + return false; > >> + > >> + if (REG_P (dest) && FP_REGNO_P (REGNO (dest))) > >> + return true; > >> + if (!CONST_INT_P (src)) > >> + return true; > >> + return num_insns_constant (src, mode) > (TARGET_PCREL ? 1 : 2); > >> +} > >> + > >> /* Emit a move from SOURCE to DEST in mode MODE. */ > >> void > >> rs6000_emit_move (rtx dest, rtx source, machine_mode mode) > >> @@ -10860,13 +10887,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode) > >> operands[1] = create_TOC_reference (operands[1], operands[0]); > >> else if (mode == Pmode > >> && CONSTANT_P (operands[1]) > >> - && GET_CODE (operands[1]) != HIGH > >> - && ((REG_P (operands[0]) > >> - && FP_REGNO_P (REGNO (operands[0]))) > >> - || !CONST_INT_P (operands[1]) > >> - || (num_insns_constant (operands[1], mode) > >> - > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2))) > >> - && !toc_relative_expr_p (operands[1], false, NULL, NULL) > >> + && rs6000_faster_loading_const (mode, operands[0], operands[1]) > >> && (TARGET_CMODEL == CMODEL_SMALL > >> || can_create_pseudo_p () > >> || (REG_P (operands[0]) > >> diff --git a/gcc/cse.cc b/gcc/cse.cc > >> index a18b599d324..53a7b3556b3 100644 > >> --- a/gcc/cse.cc > >> +++ b/gcc/cse.cc > >> @@ -5192,6 +5192,22 @@ cse_insn (rtx_insn *insn) > >> src_elt_regcost = elt->regcost; > >> } > >> > >> + /* If the current function uses a constant pool and this is a > >> + constant, try making a pool entry. Put it in src_folded > >> + unless we already have done this since that is where it > >> + likely came from. */ > >> + > >> + if (crtl->uses_const_pool && src_folded && CONSTANT_P (src_folded) > >> + && targetm.faster_loading_constant (mode, dest, src_folded)) > >> + { > >> + src_folded = force_const_mem (mode, src_folded); > >> + if (src_folded) > >> + { > >> + src_folded_cost = COST (src_folded, mode); > >> + src_folded_regcost = approx_reg_cost (src_folded); > >> + } > >> + } > >> + > >> /* Find cheapest and skip it for the next time. For items > >> of equal cost, use this order: > >> src_folded, src, src_eqv, src_related and hash table entry. */ > >> @@ -5390,26 +5406,6 @@ cse_insn (rtx_insn *insn) > >> > >> break; > >> } > >> - > >> - /* If the current function uses a constant pool and this is a > >> - constant, try making a pool entry. Put it in src_folded > >> - unless we already have done this since that is where it > >> - likely came from. */ > >> - > >> - else if (crtl->uses_const_pool > >> - && CONSTANT_P (trial) > >> - && !CONST_INT_P (trial) > >> - && (src_folded == 0 || !MEM_P (src_folded)) > >> - && GET_MODE_CLASS (mode) != MODE_CC > >> - && mode != VOIDmode) > >> - { > >> - src_folded = force_const_mem (mode, trial); > >> - if (src_folded) > >> - { > >> - src_folded_cost = COST (src_folded, mode); > >> - src_folded_regcost = approx_reg_cost (src_folded); > >> - } > >> - } > >> } > >> > >> /* If we changed the insn too much, handle this set from scratch. */ > >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > >> index 962bbb8caaf..65685311249 100644 > >> --- a/gcc/doc/tm.texi > >> +++ b/gcc/doc/tm.texi > >> @@ -6014,6 +6014,11 @@ holding the constant. This restriction is often true of addresses > >> of TLS symbols for various targets. > >> @end deftypefn > >> > >> +@deftypefn {Target Hook} bool TARGET_FASTER_LOADING_CONSTANT (machine_mode @var{mode}, rtx @var{y}, rtx @var{x}) > >> +It returns true if loading contant @var{x} is faster than building > >> +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}. > >> +@end deftypefn > >> + > >> @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (machine_mode @var{mode}, const_rtx @var{x}) > >> This hook should return true if pool entries for constant @var{x} can > >> be placed in an @code{object_block} structure. @var{mode} is the mode > >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > >> index 394b59e70a6..918914f0e30 100644 > >> --- a/gcc/doc/tm.texi.in > >> +++ b/gcc/doc/tm.texi.in > >> @@ -4148,6 +4148,8 @@ address; but often a machine-dependent strategy can generate better code. > >> > >> @hook TARGET_CANNOT_FORCE_CONST_MEM > >> > >> +@hook TARGET_FASTER_LOADING_CONSTANT > >> + > >> @hook TARGET_USE_BLOCKS_FOR_CONSTANT_P > >> > >> @hook TARGET_USE_BLOCKS_FOR_DECL_P > >> diff --git a/gcc/target.def b/gcc/target.def > >> index 57e64b20eef..8b007aca9dc 100644 > >> --- a/gcc/target.def > >> +++ b/gcc/target.def > >> @@ -2987,6 +2987,14 @@ locally. The default is to return false.", > >> bool, (void), > >> hook_bool_void_false) > >> > >> +/* Check if it is profitable to load the constant from constant pool. */ > >> +DEFHOOK > >> +(faster_loading_constant, > >> + "It returns true if loading contant @var{x} is faster than building\n\ > >> +from scratch, and set to @var{y}. @var{mode} is the mode of @var{x}.", > >> + bool, (machine_mode mode, rtx y, rtx x), > >> + default_faster_loading_constant) > >> + > >> /* True if it is OK to do sibling call optimization for the specified > >> call expression EXP. DECL will be the called function, or NULL if > >> this is an indirect call. */ > >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc > >> index fc49235eb38..12de61a7ed8 100644 > >> --- a/gcc/targhooks.cc > >> +++ b/gcc/targhooks.cc > >> @@ -173,6 +173,12 @@ default_return_in_memory (const_tree type, > >> return (TYPE_MODE (type) == BLKmode); > >> } > >> > >> +bool > >> +default_faster_loading_constant (machine_mode, rtx, rtx) > >> +{ > >> + return false; > >> +} > >> + > >> rtx > >> default_legitimize_address (rtx x, rtx orig_x ATTRIBUTE_UNUSED, > >> machine_mode mode ATTRIBUTE_UNUSED) > >> diff --git a/gcc/targhooks.h b/gcc/targhooks.h > >> index ecfa11287ef..4db21cd59f6 100644 > >> --- a/gcc/targhooks.h > >> +++ b/gcc/targhooks.h > >> @@ -295,5 +295,6 @@ extern rtx default_memtag_extract_tag (rtx, rtx); > >> extern rtx default_memtag_untagged_pointer (rtx, rtx); > >> > >> extern HOST_WIDE_INT default_gcov_type_size (void); > >> +extern bool default_faster_loading_constant (machine_mode, rtx, rtx); > >> > >> #endif /* GCC_TARGHOOKS_H */ > >> diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c b/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> index f29eba08c38..4889e8fa8ec 100644 > >> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c > >> @@ -1,7 +1,7 @@ > >> /* { dg-do compile { target { powerpc*-*-* } } } */ > >> /* { dg-require-effective-target lp64 } */ > >> /* { dg-options "-O" } */ > >> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ > >> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ > >> > >> static int x; > >> > >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c > >> new file mode 100644 > >> index 00000000000..58ba074d427 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c > >> @@ -0,0 +1,14 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2" } */ > >> + > >> +#define CONST1 0x100803004101001 > >> +#define CONST2 0x000406002105007 > >> + > >> +void __attribute__ ((noinline)) > >> +foo (unsigned long long *a, unsigned long long *b) > >> +{ > >> + *a = CONST1; > >> + *b = CONST2; > >> +} > >> + > >> +/* { dg-final { scan-assembler-times {\mld|pld\M} 2 } } */ > >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c b/gcc/testsuite/gcc.target/powerpc/pr93012.c > >> index 4f764d0576f..5afb4f79c45 100644 > >> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c > >> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c > >> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0xffff9234ffff9234ULL; } > >> unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; } > >> unsigned long long mskse() { return 0xffff1234ffff1234ULL; } > >> > >> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */ > >> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */ > >> > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)