From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2e.google.com (mail-oa1-x2e.google.com [IPv6:2001:4860:4864:20::2e]) by sourceware.org (Postfix) with ESMTPS id 8B71B3858CDB for ; Tue, 19 Mar 2024 00:07:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8B71B3858CDB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8B71B3858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4860:4864:20::2e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710806834; cv=none; b=UsEA4em8OXfRwEFc6DnZA04R0Oug/b81NZzpW77pqVneiaFTjLkrV26ggw5fC11X7dqmj3LQfpeEcubScykU8loQzd1rq8p0ywisxkUxVlSlmYcj3fVOfMIUDGtWJ4aJ1aKTzzlya0Ywot9IQPnFPIZ+YTgFo2B7rkC7Rbjunak= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710806834; c=relaxed/simple; bh=kwKmBN0cQgQgj3Dt5a4sRiob4LG3GAch8H7VSuvGE0s=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=f9EfecKgvCWhEu7qtpVXoWbejXYOp5JM5YMoy/XUm5GgxMKJPdDWiwRAInOerFjKjmLfoEcnLfDyONvGXuLQeVFhscx6/p38Fs4lNSSkBSG9Tw2Nr/v3WtTWGrXdNnFLVHEIeniblknRo+MDyKcVb80ifjyKFmGCKKzIq1MscpU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-2282597eff5so1288718fac.3 for ; Mon, 18 Mar 2024 17:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1710806827; x=1711411627; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=w79pTb/KH4i5jjnE+vd2GDMiTEXYBCkgVl2kqN9tnRE=; b=yS3xclAmr/E0zntXujL7sTGsJYCe/lPTiHOTZSApg8AtCNa3v+7fN7bhGhRCH429Zu rz5TsjbwlftpXWUbojzDMSPrY40c2vfFBsdZguJw7B3cWsdCyS0Sdn1F+BJScsssb5C4 fVdHsCeLf5yAn83Z8bEbEeUJgHwZvXfytr/ZTi436Zf5gXHYAQ/q6u+SAvQDjhkwg6Ry 7CY8XWFNs6yYkdBttJ3lRWbPvECZHvnb4Yi9GjN90pvx0g7OlliCzEZPGPBcgj/ZtsFZ 3MJ0AxFzOuFf8tilFLmn2RE1ddav524xzprV3lcNaQMvgtm6kWiIY9KCoggSzK2/jnyo TZ0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710806827; x=1711411627; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=w79pTb/KH4i5jjnE+vd2GDMiTEXYBCkgVl2kqN9tnRE=; b=ElctLdEzzxbDajsKPg2sHBRlPfYQ3Wk+jXPCDvoiAxa7unIpSFlHkwJBTUPK4/ZWva 4prnO+ksdMgF8xpncBwZE1Bsx+MxKYwy29dJgggqdNsXcnPeOgAyC1kBlYYAD+q0U0UI NJSPp8Kh6QxK5tSaJhbidGxJquN9tP3Sym0R+CY5scAe/4Tvh00zJ7K+3e3eZGxlcYAl UtPvFnZlsk54g1+FI9qtyki8HPI7d3ypnnfvURqEhl7SzwJ551gZydDQv6AzH2CNbfNe pkhioMvvVvvQgS+ETVBZ8CNia7k4cVcS1dg5apbZ0pXvhGz3zwR6M19P6igFZmKV/E5+ yShQ== X-Forwarded-Encrypted: i=1; AJvYcCXOO3FeToRQxgsqLScf4jlUc8ndIMkcmJwLQUtI2FAwLkDJ3IvnjBWg5ABg74e76iPXhPv/sUOtGHKee7vfrbObSxxGZ4QHyA== X-Gm-Message-State: AOJu0YzODrR8tRopWA3pXmNtGe8vXQh3WPz+LIlxGp0Mqi/8rAPhljL0 pes+1Kv4thhR5tRDj7qMrK6yRoT8gwAkpEq3H4qWmc3HIZ5GM69N+BWnwlCYXUU= X-Google-Smtp-Source: AGHT+IEGSSXb8+93FsTTTVbDyM/BNMIeSsxv1mmktiGbC6NlodfKhFLQi8Nl+W7lNeZ72an2jvzaDw== X-Received: by 2002:a05:6870:4344:b0:221:793a:3b9a with SMTP id x4-20020a056870434400b00221793a3b9amr13389351oah.40.1710806827510; Mon, 18 Mar 2024 17:07:07 -0700 (PDT) Received: from [10.0.16.165] ([12.44.203.122]) by smtp.gmail.com with ESMTPSA id l62-20020a632541000000b005e838955bc4sm4747090pgl.58.2024.03.18.17.07.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 18 Mar 2024 17:07:07 -0700 (PDT) Message-ID: <63222471-d4a7-4c37-bc7a-75d83f13ddde@rivosinc.com> Date: Mon, 18 Mar 2024 17:07:06 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265] To: Jeff Law , gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, Palmer Dabbelt , gnu-toolchain@rivosinc.com, Robin Dapp References: <20240316173524.1147760-1-vineetg@rivosinc.com> <20240316173524.1147760-2-vineetg@rivosinc.com> <03d87203-d4a1-47e1-afea-07b2f857492f@gmail.com> Content-Language: en-US From: Vineet Gupta In-Reply-To: <03d87203-d4a1-47e1-afea-07b2f857492f@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 3/16/24 13:28, Jeff Law wrote: >> Implementation Details (for posterity) >> -------------------------------------- >> - basic idea is to have a splitter selected via a new predicate for constant >> being possible sum of two S12 and provide the transform. >> This is however a 2 -> 2 transform which combine can't handle. >> So we specify it using a define_insn_and_split. > Right. For the record it looks like a 2->2 case because of the > mvconst_internal define_insn_and_split. > > What I was talking about in the Tuesday meeting last week was the desire > to rethink that pattern precisely because it drives us to need to > implement patterns like yours rather than the more natural 3->2 or 4->3/2. A few weeks, I did an experimental run removing that splitter isn't catastrophic for SPEC, granted it is a pretty narrow view of the world :-) Below is upstream vs. revert of mvconst_internal (for apples:apples just the revert, none of my new splitter stuff) Baseline Revert mvconst_int 1,214,172,747,858 |  1,225,245,908,131 |  -0.91% <--   740,618,658,160 |    743,873,857,461 |  -0.44% <-   692,128,469,436 |    695,695,894,602 |  -0.52% <--   190,811,495,310 |    190,934,386,665 |  -0.06%   225,751,808,189 |    225,909,747,497 |  -0.07%   220,364,237,915 |    220,466,640,640 |  -0.05%   179,157,361,261 |    179,357,613,835 |  -0.11%   219,313,921,837 |    219,436,712,114 |  -0.06%   281,344,210,410 |    281,344,197,305 |   0.00%   446,517,079,816 |    446,517,058,807 |   0.00%   347,300,137,757 |    347,300,119,942 |   0.00%   421,496,082,529 |    421,496,063,144 |   0.00%   669,319,255,911 |    673,977,112,732 |  -0.70% <-- 2,852,810,623,629 |  2,852,809,986,901 |   0.00% 1,855,884,349,001 |  1,855,837,810,109 |   0.00% 1,653,853,403,482 |  1,653,714,171,270 |   0.01% 2,974,347,287,057 |  2,970,520,074,342 |   0.13% 1,158,337,609,995 |  1,158,337,607,076 |   0.00% 1,019,181,486,091 |  1,020,576,716,760 |  -0.14% 1,738,961,517,942 |  1,736,024,569,247 |   0.17%   849,330,280,930 |    848,955,738,855 |   0.04%   276,584,892,794 |    276,457,202,331 |   0.05%   913,410,589,335 |    913,407,101,214 |   0.00%   903,864,384,529 |    903,800,709,452 |   0.01% 1,654,905,086,567 |  1,656,684,048,052 |  -0.11% 1,513,514,546,262 |  1,510,815,435,668 |   0.18% 1,641,980,078,831 |  1,602,410,552,874 |   2.41% <-- 2,546,170,307,336 |  2,546,206,790,578 |   0.00% 1,983,551,321,388 |  1,979,562,936,994 |   0.20% 1,516,077,036,742 |  1,515,038,668,667 |   0.07% 2,056,386,745,670 |  2,055,592,903,700 |   0.04% 1,008,224,427,448 |  1,008,027,321,669 |   0.02% 1,169,305,141,789 |  1,169,028,937,430 |   0.02%   363,827,037,541 |    361,982,880,800 |   0.51% <--   906,649,110,459 |    909,651,995,900 |  -0.33% <-   509,023,896,044 |    508,578,027,604 |   0.09%       403,238,430 |        398,492,922 |   1.18%       403,238,430 |        398,492,922 |   1.18% 38,917,902,479,830  38,886,374,486,212 Do note however that this takes us back to constant pool way of loading complex constants (vs. bit fiddling and stitching). The 2.4% gain in deepsjeng is exactly that and we need to decide what to do about it: keep it as such or tweak it with a cost model change and/or make this for everyone or have this per cpu tune - hard to tell whats the right thing to do here. P.S. Perhaps obvious but the prerequisite to revert is to tend to the original linked PRs which will now start failing so that will hopefully improve some of the above. > Furthermore, I have a suspicion that there's logicals where we're going > to want to do something similar and if we keep the mvconst_internal > pattern they're all going to have to be implemented as 2->2s with a > define_insn_and_split rather than the more natural 3->2. Naive question: why is define_split more natural vs. define_insn_and_split. Is it because of the syntax/implementation or that one is more Combine "centric" and other is more of a Split* Pass thing (roughly speaking of course) or something else altogether ? Would we have to revisit the new splitter (and perhaps audit the existing define_insn_and_split patterns) if we were to go ahead with this revert ? >> +(define_insn_and_split "*add3_const_sum_of_two_s12" >> + [(set (match_operand:P 0 "register_operand" "=r,r") >> + (plus:P (match_operand:P 1 "register_operand" " r,r") >> + (match_operand:P 2 "const_two_s12" " MiG,r")))] >> + "" >> + "add %0,%1,%2" >> + "" >> + [(set (match_dup 0) >> + (plus:P (match_dup 1) (match_dup 3))) >> + (set (match_dup 0) >> + (plus:P (match_dup 0) (match_dup 4)))] >> +{ >> + int val = INTVAL (operands[2]); >> + if (SUM_OF_TWO_S12_P (val)) >> + { >> + operands[3] = GEN_INT (2047); >> + operands[4] = GEN_INT (val - 2047); >> + } >> + else if (SUM_OF_TWO_S12_N (val)) >> + { >> + operands[3] = GEN_INT (-2048); >> + operands[4] = GEN_INT (val + 2048); >> + } >> + else >> + gcc_unreachable (); >> +} >> + [(set_attr "type" "arith") >> + (set_attr "mode" "")]) > So why use "P" for your mode iterator? I would have expected GPR since > I think this works for both SI and DI mode as-is. My intent at least was to have this work for either of rv64/rv32 and in either of those environments, work for both SI or DI - certainly rv64+{SI,DI}. To that end I debated between GPR, X and P. It seems GPR only supports DI if TARGET_64BIT. But I could well be wrong about above requirements or the subtle differences in these. > For the output template "#" for alternative 0 and the add instruction > for alternative 1 is probably better. That way it's clear to everyone > that alternative 0 is always meant to be split. OK. > Don't you need some kind of condition on the split to avoid splitting > when you've got a register for operands[2]? Won't the predicate "match_code" guarantee that ?   (define_predicate "const_two_s12"      (match_code "const_int")    {      return SUM_OF_TWO_S12 (INTVAL (op), false);    }) > It would seem to me that as > written, it would still try to spit and trigger an RTL checking error > when you try to extract INTVAL (operands[2]). Do I need to build gcc in a certain way to expose such errors - I wasn't able to trigger such an error for the entire testsuite or SPEC build. Thx for the detailed quick review. -Vineet