From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=LSHU=B3=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e])
	by sourceware.org (Postfix) with ESMTPS id 964053858C54;
	Wed,  7 Jun 2023 23:44:04 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 964053858C54
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pj1-x102e.google.com with SMTP id 98e67ed59e1d1-2567b589d3bso17028a91.0;
        Wed, 07 Jun 2023 16:44:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1686181443; x=1688773443;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=jBNrRm7eiTigcVadI/DTacoj5jmpviGbCvcaOT5qT08=;
        b=Kh/EdMJ+OPJtiCJySpV4yuCn4Eat13+pFeWc7a0gxjRVW4qBhqkiLroIWRVSpTYc8h
         o/i8m3ehq3ErvdsepVkpkXMGNW+E7ZB9O2pJrC7JF8PMjYh2ya0tBiE6l3yYAgh/QIfB
         ui5KJoqg/hzVp7XSlk2xNNY58rCufkTEryRpiDCHtFZwLEPpV1vj9TVn9EyQteAcwP2N
         IudfGxIjSGQ+YC/10nXB4ULyXI8Ut0QxjyYOGn93d/+ncmP5n4aMs7j1Eli3WXBJ4iou
         4+VXHCFWjkQ+MnbeIFyRHAtMcx4j7FD5ZXpWQQWj2nJzN7ccgmfvb9M+zja7hXlO2k7x
         8ouQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1686181443; x=1688773443;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=jBNrRm7eiTigcVadI/DTacoj5jmpviGbCvcaOT5qT08=;
        b=bRcMA1v7I1yj0ip3oGbFF5eCE2jFYK29001LwiUv4fpgsb2ps08BOQVEudbAbsod5U
         MQiFjfIzyX8cn9LY00TdT4GuDH5rVPRP9dRjBj3BT5WN6eMyaVRQ3bxnn/HkTWDxwTs7
         U2t2whWqhikMOROVjQnmqwDF3HRaDy+lDfxy8cVL3RMHxGq2Q01554CYUpHsqOEu/VyH
         zTHTy6REHHZ1Agv0uYppMSQ2AdXoP6roRFyUSfdCydMw2YQ5UXJlS7OhS1r4+XDkuaqH
         2p8USdk9QAXElvgDZ63NpFlS5Ej62tKefJL5+J6dGJVa+lGO9UN+6xOdOUbNOrYjDxiJ
         4JVA==
X-Gm-Message-State: AC+VfDyT8Xxiw4nSZbfnhq1HEL06eu2P7E4b9xV9HG8Dv/qMOnzA7smq
	xiY66pQ1ZovdvE1Y2VZHeKE=
X-Google-Smtp-Source: ACHHUZ5aMps294/Dtun81WduW070Ok4HcoTUhrhRZX5pPW4kPv69It20j9SlB/6RotUUBaqmK/ukVg==
X-Received: by 2002:a17:90a:c58e:b0:256:5abf:105c with SMTP id l14-20020a17090ac58e00b002565abf105cmr2800526pjt.46.1686181443292;
        Wed, 07 Jun 2023 16:44:03 -0700 (PDT)
Received: from [172.31.0.109] ([136.36.130.248])
        by smtp.gmail.com with ESMTPSA id bk24-20020a17090b081800b0023a84911df2sm74035pjb.7.2023.06.07.16.44.01
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 07 Jun 2023 16:44:02 -0700 (PDT)
Message-ID: <80218276-fc3c-032b-eaa7-0a4b0e8a859f@gmail.com>
Date: Wed, 7 Jun 2023 17:44:00 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.9.1
Subject: Re: Followup on PR/109279: large constants on RISCV
Content-Language: en-US
To: Vineet Gupta <vineetg@rivosinc.com>
Cc: gcc@gcc.gnu.org, Kito Cheng <kito.cheng@gmail.com>,
 Palmer Dabbelt <palmer@rivosinc.com>,
 gnu-toolchain <gnu-toolchain@rivosinc.com>, pinskia@gmail.com,
 GCC Patches <gcc-patches@gcc.gnu.org>
References: <09f944a2-1123-75e3-ce2f-2080df94c964@rivosinc.com>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <09f944a2-1123-75e3-ce2f-2080df94c964@rivosinc.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 6/1/23 20:38, Vineet Gupta wrote:
> Hi Jeff,
> 
> I finally got around to collecting various observations on PR/109279 - 
> more importantly the state of large constants in RV backend, apologies 
> in advance for the long email.
> 
> It seems the various commits in area have improved the original test 
> case of 0x1010101_01010101
> 
>    Before 2e886eef7f2b      |   With 2e886eef7f2b   | With 
> 0530254413f8     | With c104ef4b5eb1
Right.  The handling of that constant shows a nice progression.  On our 
architecture the latter two versions are probably equivalent from a 
latency standpoint, but the last is obviously best as it's smaller and 
probably better on in-order architectures as well.


> 
> But same commits seem to have regressed Andrew's test from same PR 
> (which is the theme of this email).
> The seemingly contrived test turned out to be much more than I'd hoped for.
> 
>     long long f(void)
>     {
>       unsigned t = 0x101_0101;
>       long long t1 = t;
>       long long t2 = ((unsigned long long )t) << 32;
>       asm("":"+r"(t1));
>       return t1 | t2;
>     }
[ ... ]
It may be more instructions, but I suspect they end up being the same 
performance for us across all three varaints.  Fusion and out-of-order 
execution save the day.  But I realize there may be targets where the 
first is going to be preferred.


> 
>    Before 2e886eef7f2b  |   With 2e886eef7f2b    | With 0530254413f8
>      (ideal code)       | define_insn_and_split  | "splitter relaxed new
>                         |                        |  pseudos"
>     li   a0,0x1010000   |    li   a5,0x1010000   |    li a0,0x101_0000
>     addi a0,a0,0x101    |    addi a5,a5,0x101    |    addi a0,a0,0x101
>     slli a5,a0,32       |    mv   a0,a5          |    li a5,0x101_0000
>     or   a0,a0,a5       |    slli a5,a5,32       |    slli a0,a0,32
>     ret                 |    or   a0,a0,a5       |    addi a5,a5,0x101
>                         |    ret                 |    or   a0,a5,a0
>                                                  |    ret
> 
> As a baseline, RTL just before cse1 (in 260r.dfinit) in all of above is:
[ ... ]
Right. Standard looking synthesis.


> 
> 
> Prior to 2e886eef7f2b, cse1 could do its job: finding oldest equivalent 
> registers for the fragments of const and reusing the reg.
Right.  That's what I would expect.

[ ... ]

> 
> 
> With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog() 
> kicks in during cse1, eliding insns for a const_int.
> 
>     (insn 7 6 8 2 (set (reg:DI 137)
>          (const_int [0x1010101])) {*mvconst_internal}
>          (expr_list:REG_EQUAL (const_int [0x1010101])))
>     [...]
> 
>     (insn 11 10 12 2 (set (reg:DI 140)
>          (const_int [0x1010101_00000000])) {*mvconst_internal}
>          (expr_list:REG_EQUAL (const_int  [0x1010101_00000000]) ))
Understood.  Not ideal, but we generally don't have good ways to limit 
patterns to being available at different times during the optimization 
phase.  One thing you might want to try (which I thought we used at one 
point) was make the pattern conditional on cse_not_expected.  The goal 
would be to avoid exposing the pattern until a later point in the 
optimizer pipeline.  It may have been the case that we dropped that over 
time during development.  It's all getting fuzzy at this point.

> 
> Eventually split1 breaks it up using same mvconst_internal splitter, but 
> the cse opportunity has been lost.
Right.  I'd have to look at the pass definitions, but I suspect the 
splitting pass where this happens is after the last standard CSE pass. 
So we don't get a chance to CSE the constant synthesis.


> *This is a now a baseline for large consts handling for RV backend which 
> we all need to be aware of*.
Understood.  Though it's not as bad as you might think :-)  You can 
spend an inordinate amount of time improving constant synthesis, 
generate code that looks really good, but in the end it may not make a 
bit of different in real performance.  Been there, done that.  I'm not 
saying we give up, but we need to keep in mind that we're often better 
off trading a bit on the constant synthesis if doing so helps code where 
those constants get used.


> 
> 
> (2) Now on to the nuances as to why things get progressively worse after 
> commit 0530254413f8.
> 
> It all seems to get down to register allocation passes:
> 
> sched1 before 0530254413f8
> 
>     ;;     0--> b  0: i  22 r140=0x1010000    :alu
>     ;;     1--> b  0: i  20 r137=0x1010000    :alu
>     ;;     2--> b  0: i  23 r140=r140+0x101   :alu
>     ;;     3--> b  0: i  21 r137=r137+0x101   :alu
>     ;;     4--> b  0: i  24 r140=r140<<0x20   :alu
>     ;;     5--> b  0: i  25 r136=r137         :alu
>     ;;     6--> b  0: i   8 r136=asm_operands :nothing
>     ;;     7--> b  0: i  17 a0=r136|r140      :alu
>     ;;     8--> b  0: i  18 use a0            :nothing
> 
> sched1 with 0530254413f8
> 
>     ;;     0--> b  0: i  22 r144=0x1010000    :alu
>     ;;     1--> b  0: i  20 r143=0x1010000    :alu
>     ;;     2--> b  0: i  23 r145=r144+0x101   :alu
>     ;;     3--> b  0: i  21 r137=r143+0x101   :alu
>     ;;     4--> b  0: i  24 r140=r145<<0x20   :alu
>     ;;     5--> b  0: i  25 r136=r137         :alu
>     ;;     6--> b  0: i   8 r136=asm_operands :nothing
>     ;;     7--> b  0: i  17 a0=r136|r140      :alu
>     ;;     8--> b  0: i  18 use a0            :nothing
> 
> The insn stream is same, only differences being registers reuse (due to 
> splitter restriction) vs. not.
> 
> Next IRA, for reasons I don't understand (and not brave enough yet to 
> dive into) decides to regenerate const_int.
Sure.  It's pretty standard practice.  When it finds a register that has 
a known re-synthesizable value it will often replace the register with 
the value.  It can help in cases where register pressure it excessively 
high by reducing the range of the register holding the value.


> And my guess is, this being so late in the game that it gets 
> rematerialized as is in the end, causing the duplicity.
Yup.  Though there is a post-reload CSE pass.  It's pretty limited in 
what it can do and register assignments often make it impossible to do 
anything, but it's worth looking at to see if why it's not helping here.

> FWIW, IRA for latter case only, emits additional REG_EQUIV notes which 
> could also be playing a role.
REG_EQUAL notes get promoted to REG_EQUIV notes in some cases.  And when 
other equivalences are discovered it may create a REG_EQUIV note out of 
thin air.

The REG_EQUIV note essentially means that everywhere the register occurs 
you can validly (from a program semantics standpoint) replace the 
register with the value.  It might require reloading, but it's a valid 
semantic transformation which may reduce register pressure -- especially 
for constants that were subject to LICM.

Contrast to REG_EQUAL which creates an equivalence at a particular point 
in the IL, but the equivalence may not hold elsewhere in the IL.


> 
> I naively tried to gate mvconst_internal to !reload_completed, but that 
> triggered some ICE.
I wouldn't expect that to help here.

I would first start to see if using cse_not_expected in the splitter 
pattern.   I would also look at reload_cse_regs which should give us 
some chance at seeing the value reuse if/when IRA/LRA muck things up.

jeff