From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ex+G=KZ=rivosinc.com=vineetg@sourceware.org>
Received: from mail-oa1-x2e.google.com (mail-oa1-x2e.google.com [IPv6:2001:4860:4864:20::2e])
	by sourceware.org (Postfix) with ESMTPS id 8B71B3858CDB
	for <gcc-patches@gcc.gnu.org>; Tue, 19 Mar 2024 00:07:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8B71B3858CDB
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8B71B3858CDB
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4860:4864:20::2e
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710806834; cv=none;
	b=UsEA4em8OXfRwEFc6DnZA04R0Oug/b81NZzpW77pqVneiaFTjLkrV26ggw5fC11X7dqmj3LQfpeEcubScykU8loQzd1rq8p0ywisxkUxVlSlmYcj3fVOfMIUDGtWJ4aJ1aKTzzlya0Ywot9IQPnFPIZ+YTgFo2B7rkC7Rbjunak=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1710806834; c=relaxed/simple;
	bh=kwKmBN0cQgQgj3Dt5a4sRiob4LG3GAch8H7VSuvGE0s=;
	h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=f9EfecKgvCWhEu7qtpVXoWbejXYOp5JM5YMoy/XUm5GgxMKJPdDWiwRAInOerFjKjmLfoEcnLfDyONvGXuLQeVFhscx6/p38Fs4lNSSkBSG9Tw2Nr/v3WtTWGrXdNnFLVHEIeniblknRo+MDyKcVb80ifjyKFmGCKKzIq1MscpU=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-2282597eff5so1288718fac.3
        for <gcc-patches@gcc.gnu.org>; Mon, 18 Mar 2024 17:07:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1710806827; x=1711411627; darn=gcc.gnu.org;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=w79pTb/KH4i5jjnE+vd2GDMiTEXYBCkgVl2kqN9tnRE=;
        b=yS3xclAmr/E0zntXujL7sTGsJYCe/lPTiHOTZSApg8AtCNa3v+7fN7bhGhRCH429Zu
         rz5TsjbwlftpXWUbojzDMSPrY40c2vfFBsdZguJw7B3cWsdCyS0Sdn1F+BJScsssb5C4
         fVdHsCeLf5yAn83Z8bEbEeUJgHwZvXfytr/ZTi436Zf5gXHYAQ/q6u+SAvQDjhkwg6Ry
         7CY8XWFNs6yYkdBttJ3lRWbPvECZHvnb4Yi9GjN90pvx0g7OlliCzEZPGPBcgj/ZtsFZ
         3MJ0AxFzOuFf8tilFLmn2RE1ddav524xzprV3lcNaQMvgtm6kWiIY9KCoggSzK2/jnyo
         TZ0A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1710806827; x=1711411627;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=w79pTb/KH4i5jjnE+vd2GDMiTEXYBCkgVl2kqN9tnRE=;
        b=ElctLdEzzxbDajsKPg2sHBRlPfYQ3Wk+jXPCDvoiAxa7unIpSFlHkwJBTUPK4/ZWva
         4prnO+ksdMgF8xpncBwZE1Bsx+MxKYwy29dJgggqdNsXcnPeOgAyC1kBlYYAD+q0U0UI
         NJSPp8Kh6QxK5tSaJhbidGxJquN9tP3Sym0R+CY5scAe/4Tvh00zJ7K+3e3eZGxlcYAl
         UtPvFnZlsk54g1+FI9qtyki8HPI7d3ypnnfvURqEhl7SzwJ551gZydDQv6AzH2CNbfNe
         pkhioMvvVvvQgS+ETVBZ8CNia7k4cVcS1dg5apbZ0pXvhGz3zwR6M19P6igFZmKV/E5+
         yShQ==
X-Forwarded-Encrypted: i=1; AJvYcCXOO3FeToRQxgsqLScf4jlUc8ndIMkcmJwLQUtI2FAwLkDJ3IvnjBWg5ABg74e76iPXhPv/sUOtGHKee7vfrbObSxxGZ4QHyA==
X-Gm-Message-State: AOJu0YzODrR8tRopWA3pXmNtGe8vXQh3WPz+LIlxGp0Mqi/8rAPhljL0
	pes+1Kv4thhR5tRDj7qMrK6yRoT8gwAkpEq3H4qWmc3HIZ5GM69N+BWnwlCYXUU=
X-Google-Smtp-Source: AGHT+IEGSSXb8+93FsTTTVbDyM/BNMIeSsxv1mmktiGbC6NlodfKhFLQi8Nl+W7lNeZ72an2jvzaDw==
X-Received: by 2002:a05:6870:4344:b0:221:793a:3b9a with SMTP id x4-20020a056870434400b00221793a3b9amr13389351oah.40.1710806827510;
        Mon, 18 Mar 2024 17:07:07 -0700 (PDT)
Received: from [10.0.16.165] ([12.44.203.122])
        by smtp.gmail.com with ESMTPSA id l62-20020a632541000000b005e838955bc4sm4747090pgl.58.2024.03.18.17.07.06
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 18 Mar 2024 17:07:07 -0700 (PDT)
Message-ID: <63222471-d4a7-4c37-bc7a-75d83f13ddde@rivosinc.com>
Date: Mon, 18 Mar 2024 17:07:06 -0700
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ...
 [part of PR/106265]
To: Jeff Law <jeffreyalaw@gmail.com>, gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com, Palmer Dabbelt <palmer@rivosinc.com>,
 gnu-toolchain@rivosinc.com, Robin Dapp <rdapp.gcc@gmail.com>
References: <20240316173524.1147760-1-vineetg@rivosinc.com>
 <20240316173524.1147760-2-vineetg@rivosinc.com>
 <03d87203-d4a1-47e1-afea-07b2f857492f@gmail.com>
Content-Language: en-US
From: Vineet Gupta <vineetg@rivosinc.com>
In-Reply-To: <03d87203-d4a1-47e1-afea-07b2f857492f@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 3/16/24 13:28, Jeff Law wrote:
>> Implementation Details (for posterity)
>> --------------------------------------
>>   - basic idea is to have a splitter selected via a new predicate for constant
>>     being possible sum of two S12 and provide the transform.
>>     This is however a 2 -> 2 transform which combine can't handle.
>>     So we specify it using a define_insn_and_split.
> Right.  For the record it looks like a 2->2 case because of the 
> mvconst_internal define_insn_and_split.
>
> What I was talking about in the Tuesday meeting last week was the desire 
> to rethink that pattern precisely because it drives us to need to 
> implement patterns like yours rather than the more natural 3->2 or 4->3/2.

A few weeks, I did an experimental run removing that splitter isn't
catastrophic for SPEC, granted it is a pretty narrow view of the world :-)
Below is upstream vs. revert of mvconst_internal (for apples:apples just
the revert, none of my new splitter stuff)

     Baseline        Revert mvconst_int
1,214,172,747,858 |  1,225,245,908,131 |  -0.91% <--
  740,618,658,160 |    743,873,857,461 |  -0.44% <-
  692,128,469,436 |    695,695,894,602 |  -0.52% <--
  190,811,495,310 |    190,934,386,665 |  -0.06%
  225,751,808,189 |    225,909,747,497 |  -0.07%
  220,364,237,915 |    220,466,640,640 |  -0.05%
  179,157,361,261 |    179,357,613,835 |  -0.11%
  219,313,921,837 |    219,436,712,114 |  -0.06%
  281,344,210,410 |    281,344,197,305 |   0.00%
  446,517,079,816 |    446,517,058,807 |   0.00%
  347,300,137,757 |    347,300,119,942 |   0.00%
  421,496,082,529 |    421,496,063,144 |   0.00%
  669,319,255,911 |    673,977,112,732 |  -0.70% <--
2,852,810,623,629 |  2,852,809,986,901 |   0.00%
1,855,884,349,001 |  1,855,837,810,109 |   0.00%
1,653,853,403,482 |  1,653,714,171,270 |   0.01%
2,974,347,287,057 |  2,970,520,074,342 |   0.13%
1,158,337,609,995 |  1,158,337,607,076 |   0.00%
1,019,181,486,091 |  1,020,576,716,760 |  -0.14%
1,738,961,517,942 |  1,736,024,569,247 |   0.17%
  849,330,280,930 |    848,955,738,855 |   0.04%
  276,584,892,794 |    276,457,202,331 |   0.05%
  913,410,589,335 |    913,407,101,214 |   0.00%
  903,864,384,529 |    903,800,709,452 |   0.01%
1,654,905,086,567 |  1,656,684,048,052 |  -0.11%
1,513,514,546,262 |  1,510,815,435,668 |   0.18%
1,641,980,078,831 |  1,602,410,552,874 |   2.41% <--
2,546,170,307,336 |  2,546,206,790,578 |   0.00%
1,983,551,321,388 |  1,979,562,936,994 |   0.20%
1,516,077,036,742 |  1,515,038,668,667 |   0.07%
2,056,386,745,670 |  2,055,592,903,700 |   0.04%
1,008,224,427,448 |  1,008,027,321,669 |   0.02%
1,169,305,141,789 |  1,169,028,937,430 |   0.02%
  363,827,037,541 |    361,982,880,800 |   0.51% <--
  906,649,110,459 |    909,651,995,900 |  -0.33% <-
  509,023,896,044 |    508,578,027,604 |   0.09%
      403,238,430 |        398,492,922 |   1.18%
      403,238,430 |        398,492,922 |   1.18%
38,917,902,479,830  38,886,374,486,212


Do note however that this takes us back to constant pool way of loading
complex constants (vs. bit fiddling and stitching). The 2.4% gain in
deepsjeng is exactly that and we need to decide what to do about it:
keep it as such or tweak it with a cost model change and/or make this
for everyone or have this per cpu tune - hard to tell whats the right
thing to do here.

P.S. Perhaps obvious but the prerequisite to revert is to tend to the
original linked PRs which will now start failing so that will hopefully
improve some of the above.

> Furthermore, I have a suspicion that there's logicals where we're going 
> to want to do something similar and if we keep the mvconst_internal 
> pattern they're all going to have to be implemented as 2->2s with a 
> define_insn_and_split rather than the more natural 3->2.

Naive question: why is define_split more natural vs. define_insn_and_split.
Is it because of the syntax/implementation or that one is more Combine
"centric" and other is more of a Split* Pass thing (roughly speaking of
course) or something else altogether ?

Would we have to revisit the new splitter (and perhaps audit the
existing define_insn_and_split patterns) if we were to go ahead with
this revert ?
>> +(define_insn_and_split "*add<mode>3_const_sum_of_two_s12"
>> +  [(set (match_operand:P	 0 "register_operand" "=r,r")
>> +	(plus:P (match_operand:P 1 "register_operand" " r,r")
>> +		(match_operand:P 2 "const_two_s12"    " MiG,r")))]
>> +  ""
>> +  "add %0,%1,%2"
>> +  ""
>> +  [(set (match_dup 0)
>> +	(plus:P (match_dup 1) (match_dup 3)))
>> +   (set (match_dup 0)
>> +	(plus:P (match_dup 0) (match_dup 4)))]
>> +{
>> +  int val = INTVAL (operands[2]);
>> +  if (SUM_OF_TWO_S12_P (val))
>> +    {
>> +       operands[3] = GEN_INT (2047);
>> +       operands[4] = GEN_INT (val - 2047);
>> +    }
>> +  else if (SUM_OF_TWO_S12_N (val))
>> +    {
>> +       operands[3] = GEN_INT (-2048);
>> +       operands[4] = GEN_INT (val + 2048);
>> +    }
>> +  else
>> +      gcc_unreachable ();
>> +}
>> +  [(set_attr "type" "arith")
>> +   (set_attr "mode" "<P:MODE>")])
> So why use "P" for your mode iterator?  I would have expected GPR since 
> I think this works for both SI and DI mode as-is.

My intent at least was to have this work for either of rv64/rv32 and in
either of those environments, work for both SI or DI - certainly
rv64+{SI,DI}.
To that end I debated between GPR, X and P.
It seems GPR only supports DI if TARGET_64BIT.
But I could well be wrong about above requirements or the subtle
differences in these.

> For the output template "#" for alternative 0 and the add instruction 
> for alternative 1 is probably better.  That way it's clear to everyone 
> that alternative 0 is always meant to be split.

OK.

> Don't you need some kind of condition on the split to avoid splitting 
> when you've got a register for operands[2]? 

Won't the predicate "match_code" guarantee that ?

  (define_predicate "const_two_s12"
     (match_code "const_int")
   {
     return SUM_OF_TWO_S12 (INTVAL (op), false);
   })

> It would seem to me that as 
> written, it would still try to spit and trigger an RTL checking error 
> when you try to extract INTVAL (operands[2]).

Do I need to build gcc in a certain way to expose such errors - I wasn't
able to trigger such an error for the entire testsuite or SPEC build.

Thx for the detailed quick review.
-Vineet