public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Vineet Gupta <vineetg@rivosinc.com>
To: Jeff Law <jeffreyalaw@gmail.com>, gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com, Palmer Dabbelt <palmer@rivosinc.com>,
	gnu-toolchain@rivosinc.com, Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]
Date: Mon, 18 Mar 2024 17:07:06 -0700	[thread overview]
Message-ID: <63222471-d4a7-4c37-bc7a-75d83f13ddde@rivosinc.com> (raw)
In-Reply-To: <03d87203-d4a1-47e1-afea-07b2f857492f@gmail.com>



On 3/16/24 13:28, Jeff Law wrote:
>> Implementation Details (for posterity)
>> --------------------------------------
>>   - basic idea is to have a splitter selected via a new predicate for constant
>>     being possible sum of two S12 and provide the transform.
>>     This is however a 2 -> 2 transform which combine can't handle.
>>     So we specify it using a define_insn_and_split.
> Right.  For the record it looks like a 2->2 case because of the 
> mvconst_internal define_insn_and_split.
>
> What I was talking about in the Tuesday meeting last week was the desire 
> to rethink that pattern precisely because it drives us to need to 
> implement patterns like yours rather than the more natural 3->2 or 4->3/2.

A few weeks, I did an experimental run removing that splitter isn't
catastrophic for SPEC, granted it is a pretty narrow view of the world :-)
Below is upstream vs. revert of mvconst_internal (for apples:apples just
the revert, none of my new splitter stuff)

     Baseline        Revert mvconst_int
1,214,172,747,858 |  1,225,245,908,131 |  -0.91% <--
  740,618,658,160 |    743,873,857,461 |  -0.44% <-
  692,128,469,436 |    695,695,894,602 |  -0.52% <--
  190,811,495,310 |    190,934,386,665 |  -0.06%
  225,751,808,189 |    225,909,747,497 |  -0.07%
  220,364,237,915 |    220,466,640,640 |  -0.05%
  179,157,361,261 |    179,357,613,835 |  -0.11%
  219,313,921,837 |    219,436,712,114 |  -0.06%
  281,344,210,410 |    281,344,197,305 |   0.00%
  446,517,079,816 |    446,517,058,807 |   0.00%
  347,300,137,757 |    347,300,119,942 |   0.00%
  421,496,082,529 |    421,496,063,144 |   0.00%
  669,319,255,911 |    673,977,112,732 |  -0.70% <--
2,852,810,623,629 |  2,852,809,986,901 |   0.00%
1,855,884,349,001 |  1,855,837,810,109 |   0.00%
1,653,853,403,482 |  1,653,714,171,270 |   0.01%
2,974,347,287,057 |  2,970,520,074,342 |   0.13%
1,158,337,609,995 |  1,158,337,607,076 |   0.00%
1,019,181,486,091 |  1,020,576,716,760 |  -0.14%
1,738,961,517,942 |  1,736,024,569,247 |   0.17%
  849,330,280,930 |    848,955,738,855 |   0.04%
  276,584,892,794 |    276,457,202,331 |   0.05%
  913,410,589,335 |    913,407,101,214 |   0.00%
  903,864,384,529 |    903,800,709,452 |   0.01%
1,654,905,086,567 |  1,656,684,048,052 |  -0.11%
1,513,514,546,262 |  1,510,815,435,668 |   0.18%
1,641,980,078,831 |  1,602,410,552,874 |   2.41% <--
2,546,170,307,336 |  2,546,206,790,578 |   0.00%
1,983,551,321,388 |  1,979,562,936,994 |   0.20%
1,516,077,036,742 |  1,515,038,668,667 |   0.07%
2,056,386,745,670 |  2,055,592,903,700 |   0.04%
1,008,224,427,448 |  1,008,027,321,669 |   0.02%
1,169,305,141,789 |  1,169,028,937,430 |   0.02%
  363,827,037,541 |    361,982,880,800 |   0.51% <--
  906,649,110,459 |    909,651,995,900 |  -0.33% <-
  509,023,896,044 |    508,578,027,604 |   0.09%
      403,238,430 |        398,492,922 |   1.18%
      403,238,430 |        398,492,922 |   1.18%
38,917,902,479,830  38,886,374,486,212


Do note however that this takes us back to constant pool way of loading
complex constants (vs. bit fiddling and stitching). The 2.4% gain in
deepsjeng is exactly that and we need to decide what to do about it:
keep it as such or tweak it with a cost model change and/or make this
for everyone or have this per cpu tune - hard to tell whats the right
thing to do here.

P.S. Perhaps obvious but the prerequisite to revert is to tend to the
original linked PRs which will now start failing so that will hopefully
improve some of the above.

> Furthermore, I have a suspicion that there's logicals where we're going 
> to want to do something similar and if we keep the mvconst_internal 
> pattern they're all going to have to be implemented as 2->2s with a 
> define_insn_and_split rather than the more natural 3->2.

Naive question: why is define_split more natural vs. define_insn_and_split.
Is it because of the syntax/implementation or that one is more Combine
"centric" and other is more of a Split* Pass thing (roughly speaking of
course) or something else altogether ?

Would we have to revisit the new splitter (and perhaps audit the
existing define_insn_and_split patterns) if we were to go ahead with
this revert ?
>> +(define_insn_and_split "*add<mode>3_const_sum_of_two_s12"
>> +  [(set (match_operand:P	 0 "register_operand" "=r,r")
>> +	(plus:P (match_operand:P 1 "register_operand" " r,r")
>> +		(match_operand:P 2 "const_two_s12"    " MiG,r")))]
>> +  ""
>> +  "add %0,%1,%2"
>> +  ""
>> +  [(set (match_dup 0)
>> +	(plus:P (match_dup 1) (match_dup 3)))
>> +   (set (match_dup 0)
>> +	(plus:P (match_dup 0) (match_dup 4)))]
>> +{
>> +  int val = INTVAL (operands[2]);
>> +  if (SUM_OF_TWO_S12_P (val))
>> +    {
>> +       operands[3] = GEN_INT (2047);
>> +       operands[4] = GEN_INT (val - 2047);
>> +    }
>> +  else if (SUM_OF_TWO_S12_N (val))
>> +    {
>> +       operands[3] = GEN_INT (-2048);
>> +       operands[4] = GEN_INT (val + 2048);
>> +    }
>> +  else
>> +      gcc_unreachable ();
>> +}
>> +  [(set_attr "type" "arith")
>> +   (set_attr "mode" "<P:MODE>")])
> So why use "P" for your mode iterator?  I would have expected GPR since 
> I think this works for both SI and DI mode as-is.

My intent at least was to have this work for either of rv64/rv32 and in
either of those environments, work for both SI or DI - certainly
rv64+{SI,DI}.
To that end I debated between GPR, X and P.
It seems GPR only supports DI if TARGET_64BIT.
But I could well be wrong about above requirements or the subtle
differences in these.

> For the output template "#" for alternative 0 and the add instruction 
> for alternative 1 is probably better.  That way it's clear to everyone 
> that alternative 0 is always meant to be split.

OK.

> Don't you need some kind of condition on the split to avoid splitting 
> when you've got a register for operands[2]? 

Won't the predicate "match_code" guarantee that ?

  (define_predicate "const_two_s12"
     (match_code "const_int")
   {
     return SUM_OF_TWO_S12 (INTVAL (op), false);
   })

> It would seem to me that as 
> written, it would still try to spit and trigger an RTL checking error 
> when you try to extract INTVAL (operands[2]).

Do I need to build gcc in a certain way to expose such errors - I wasn't
able to trigger such an error for the entire testsuite or SPEC build.

Thx for the detailed quick review.
-Vineet

  reply	other threads:[~2024-03-19  0:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-16 17:35 [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak Vineet Gupta
2024-03-16 17:35 ` [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265] Vineet Gupta
2024-03-16 20:28   ` Jeff Law
2024-03-19  0:07     ` Vineet Gupta [this message]
2024-03-23  5:59       ` Jeff Law
2024-03-16 17:35 ` [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned Vineet Gupta
2024-03-16 20:21   ` Jeff Law
2024-03-19  0:27     ` Vineet Gupta
2024-03-19  6:48       ` Andrew Waterman
2024-03-19 13:10         ` Jeff Law
2024-03-19 20:05           ` Vineet Gupta
2024-03-19 20:58             ` Andrew Waterman
2024-03-19 21:17             ` Palmer Dabbelt
2024-03-20 18:57             ` Jeff Law
2024-03-23  6:05             ` Jeff Law
2024-03-16 17:35 ` [gcc-15 3/3] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733] Vineet Gupta
2024-03-16 20:27   ` Jeff Law
2024-03-19  4:41 ` [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak Jeff Law
2024-03-21  0:45   ` Vineet Gupta
2024-03-21 14:36   ` scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak) Vineet Gupta
2024-03-21 14:45     ` Jeff Law
2024-03-21 17:19       ` Vineet Gupta
2024-03-21 19:56         ` Jeff Law
2024-03-22  0:34           ` scheduler queue flush Vineet Gupta
2024-03-22  8:47           ` scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak) Richard Biener
2024-03-22 12:29             ` Jeff Law
2024-03-22 16:56               ` Vineet Gupta
2024-03-25  3:05         ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63222471-d4a7-4c37-bc7a-75d83f13ddde@rivosinc.com \
    --to=vineetg@rivosinc.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=gnu-toolchain@rivosinc.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@gmail.com \
    --cc=palmer@rivosinc.com \
    --cc=rdapp.gcc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).