Re: [PATCH] Check if loading const from mem is faster

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	wschmidt@linux.ibm.com, jlaw@tachyum.com, dje.gcc@gmail.com
Subject: Re: [PATCH] Check if loading const from mem is faster
Date: Fri, 25 Feb 2022 21:32:03 +0800	[thread overview]
Message-ID: <h48a6efqdgs.fsf@genoa.aus.stglabs.ibm.com> (raw)
In-Reply-To: <1q51o2n4-r721-qqo6-o186-7r04o4463n@fhfr.qr> (Richard Biener's message of "Fri, 25 Feb 2022 09:45:45 +0100 (CET)")

Richard Biener <rguenther@suse.de> writes:

> On Fri, 25 Feb 2022, Jiufu Guo wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> 
>> > On Thu, 24 Feb 2022, Jiufu Guo wrote:
>> >
>> >> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> 
>> >> > Segher Boessenkool <segher@kernel.crashing.org> writes:
>> >> >
>> >> >> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote:
>> >> >>> I'm assuming we're always dealing with
>> >> >>> 
>> >> >>>   (set (reg:MODE ..) <src_folded>)
>> >> >>> 
>> >> >>> here and CSE is not substituting into random places of an
>> >> >>> instruction(?).  I don't know what 'rtx_cost' should evaluate
>> >> >>> to for a constant, if it should implicitely evaluate the cost
>> >> >>> of putting the result into a register for example.
>> >> >>
>> >> >> rtx_cost is no good here (and in most places).  rtx_cost should be 0
>> >> >> for anything that is used as input in a machine instruction -- but you
>> >> >> need much more context to determine that.  insn_cost is much simpler and
>> >> >> much easier to use.
>> >> >>
>> >> >>> Using RTX_COST with SET and 1 at least looks no worse than using
>> >> >>> your proposed new target hook and comparing it with the original
>> >> >>> unfolded src (again with SET and 1).
>> >> >>
>> >> >> It is required to generate valid instructions no matter what, before
>> >> >> the pass has finished that is.  On all more modern architectures it is
>> >> >> futile to think you can usefully consider the cost of an RTL expression
>> >> >> and derive a real-world cost of the generated code from that.
>> >> >
>> >> > Thanks Segher for pointing out these!  Here is  another reason that I
>> >> > did not use rtx_cost: in a few passes, there are codes to check the
>> >> > constants and store them in constant pool.  I'm thinking to integerate
>> >> > those codes in a consistent way.
>> >> 
>> >> Hi Segher, Richard!
>> >> 
>> >> I'm thinking the way like: For a constant,
>> >> 1. if the constant could be used as an immediate for the
>> >> instruction, then retreated as an operand;
>> >> 2. otherwise if the constant can not be stored into a
>> >> constant pool, then handle through instructions;
>> >> 3. if it is faster to access constant from pool, then emit
>> >> constant as data(.rodata);
>> >> 4. otherwise, handle the constant by instructions.
>> >> 
>> >> And to store the constant into a pool, besides force_const_mem,
>> >> create reference (toc) may be needed on some platforms.
>> >> 
>> >> For this particular issue in CSE, there is already code that
>> >> tries to put constant into a pool (invoke force_const_mem).
>> >> While the code is too late.  So, we may check the constant
>> >> earlier and store it into constant pool if profitable.
>> >> 
>> >> And another thing as Segher pointed out, CSE is doing too
>> >> much work.  It may be ok to separate the constant handling
>> >> logic from CSE.
>> >
>> > Not sure - CSE just is value numbering, I don't see that it does
>> > more than that.  Yes, it might have developed "heuristics" over
>> > the years what to CSE and to what and where to substitute and
>> > where not.  But in the end it does just value numbering.
>> >
>> >> 
>> >> I update a new version patch as follow (did not seprate CSE):
>> >
>> > How is the new target hook better in any way compared to rtx_cost
>> > or insn_cost?  It looks like a total hack.
>> >
>> > I suppose the actual way of materializing a constant is done
>> > behind GCCs back and not exposed anywhere?  But instead you
>> > claim the constants are valid when they actually are not?
>> > Isn't the problem then that the rs6000 backend lies?
>> 
>> Hi Richard,
>> 
>> Thanks for your comments and sugguestions!
>> 
>> Materializing a constant should be done behind GCC.
>> On rs6000, in expand pass, during emit_move, the constant is
>> checked and store into constant pool if necessary.
>> Some other platforms are doing a similar thing, e.g.
>> ix86_expand_vector_move, alpha_expand_mov,...
>> mips_legitimize_const_move.
>> 
>> But, it does not as we expect, force_const_mem is also 
>> exposed other places (not only ira/reload for stack reference).
>> 
>> CSE is one place, for example, CSE first retrieve the constant
>> from insn's equal, but it also tries to put constant into
>> pool for some condition (the condition was introduced at
>> early age of cse.c, and it is rare to run into in trunk).
>> In some aspects, IMHO, this seems not a great work of CSE.
>> 
>> And this is how the 'invalid(or say slow)' constant occurs.
>> e.g.  before cse:
>>     7: r119:DI=[unspec[`*.LC0',%r2:DI] 47]
>>       REG_EQUAL 0x100803004101001
>> after cse: 
>>     7: r119:DI=0x100803004101001
>>       REG_EQUAL 0x100803004101001
>> 
>> As you pointed out: we can also avoid this transformation through
>> rtx_cost/insn_cost by estimating the COST more accurately for
>>  "r119:DI=0x100803004101001". (r119:DI=0x100803004101001 will not
>> be a real final instruction.)
>
> At which point does this become the final instruction?  I suppose
> CSE belives this constant is legitimate and the insn is recognized
> just fine in CSE?  (that's what I mean with "behind GCCs back")
>
It would become final instructions during split pass on rs6000,
other target, e.g. alpha, seem also do split it.
>> Is it necessary to refine this constant handling for CSE, or even
>> to eliminate the logic on constant extracting for an insn, beside
>> updating rtx_cost/insn_cost?
>
> So if CSE really just transforms
>
>>     7: r119:DI=[unspec[`*.LC0',%r2:DI] 47]
>>       REG_EQUAL 0x100803004101001
>
> to
>
>>     7: r119:DI=0x100803004101001
>>       REG_EQUAL 0x100803004101001
>
> then why can't rtx_cost (SET, 1) or insn_cost () be used to
> accurately tell it that the immediate is going to be a lot
> more expensive?
>
> That is, rtx_cost (CONST_INT-rtx, DI, SET, 1, ...) is accurate
> enough to be treated as an actual insn (it _might_ be part of
> a parallel I guess, but that's unlikely) and thus the backend
> should have a very good idea of the many instruction it needs
> for this.  Likewise rtx_cost ([unspec[`*.LC0',%r2:DI] 47], DI, SET, 1, 
> ...)
> should receive accurate cost that can be compared to other
> rtx_cost (..., DI, SET, 1, ...)
>
> And CSE doesn't even need to create fake insns here since IMHO
> rtx_cost is good enough to capture the full insn.  Targets can
> choose to split out a set_cost from their rtx_cost/insn_cost
> hooks for this case for example.

Hi Richard,

Right, we can fix this issue by updating rtx_cost/insn_cost to
tell that this kind of constants is a lot of expansive.

To update rtx_cost, we can use a trivial patch (shown as end of
this mail) to fix this particular issue.
To use insn_cost, additional work is replacing rtx_cost with
insn_cost for CSE, maybe more suitable for stage1.

So, it would be too far to fix this by refactoring the logic of
constant handling. :-)

Thanks for your comments!

BR,
Jiufu

>
> Richard.
>
>> Any sugguestions?
>> 
>> >
>> > Btw, all of this is of course not appropriate for stage4 and changes
>> > to CSE need testing on more than one target.
>> I would do more evaluation, thanks!
>> 
>> Jiufu
>> 
>> >
>> > Richard.
>> >
>> >> Thanks for the comments and suggestions again!
>> >> 
>> >> 
>> >> BR,
>> >> Jiufu
>> >> 

Part of the experimental patch for rs6000, which could help
to mitigate inaccurate rtx cost on constant.

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index e82a47f4c0e..e429ae2bcf0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -21884,6 +21884,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
     case CONST_DOUBLE:
     case CONST_WIDE_INT:
+      /* Set const to a reg, it may needs a few insns.  */
+      if (outer_code == SET)
+	{
+	  *total = COSTS_N_INSNS (num_insns_constant (x, mode));
+	  return true;
+	}
+      /* FALLTHRU */
+
     case CONST:
     case HIGH:
     case SYMBOL_REF:

next prev parent reply	other threads:[~2022-02-25 13:32 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-22  6:53 Jiufu Guo
2022-02-22  7:26 ` Richard Biener
2022-02-23 11:32   ` guojiufu
2022-02-23 13:02     ` Richard Biener
2022-02-23 21:14       ` Segher Boessenkool
2022-02-24  5:56         ` Jiufu Guo
2022-02-24  6:33           ` Jiufu Guo
2022-02-24  8:50             ` Richard Biener
2022-02-25  4:35               ` Jiufu Guo
2022-02-25  8:45                 ` Richard Biener
2022-02-25 13:32                   ` Jiufu Guo [this message]
2022-02-25 13:57                     ` Richard Biener
2022-02-28  9:15                       ` Jiufu Guo
2022-02-28 17:03               ` Segher Boessenkool
2022-03-01  2:59                 ` Jiufu Guo
2022-03-01  7:47                   ` Richard Biener
2022-03-01 13:47                     ` Jiufu Guo
2022-03-02 19:15                     ` Jeff Law
2022-03-03 10:08                       ` Jiufu Guo
2022-02-23 21:27     ` Segher Boessenkool
2022-02-24  7:48       ` Jiufu Guo
2022-02-28 16:45         ` Segher Boessenkool
2022-03-01 14:28           ` Jiufu Guo
2022-03-02 20:24             ` Segher Boessenkool
2022-03-03 10:09               ` Jiufu Guo
2022-03-08 11:25                 ` Jiufu Guo
2022-03-08 11:50                   ` Richard Biener
2022-03-09  4:37                     ` Jiufu Guo
2022-03-09  7:41                       ` Richard Biener
2022-03-10  2:09                         ` Jiufu Guo
2022-03-10  7:09                           ` Richard Biener
2022-03-11  6:33                             ` Jiufu Guo
2022-02-22 17:30 ` Segher Boessenkool
2022-02-23  3:31   ` guojiufu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=h48a6efqdgs.fsf@genoa.aus.stglabs.ibm.com \
    --to=guojiufu@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@tachyum.com \
    --cc=rguenther@suse.de \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).