From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x233.google.com (mail-oi1-x233.google.com [IPv6:2607:f8b0:4864:20::233]) by sourceware.org (Postfix) with ESMTPS id 209EA3858C36 for ; Fri, 2 Jun 2023 02:38:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 209EA3858C36 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-oi1-x233.google.com with SMTP id 5614622812f47-39831cb47fbso1334762b6e.1 for ; Thu, 01 Jun 2023 19:38:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1685673537; x=1688265537; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=st7ufIUpzDeo3BfLfKoEggBOe5rcnn1+0Mg2FE1VWcc=; b=oiXAmcmv5yvgjuVBPEdw1+Gifr8mfZoajcXJIAD7hm5tDBa+9Z59OYIWbJpIGsLqBn tpTOgZDtSZ0xh2GAgfr7EstZYhQQ4N/qMxZXG0lF9Nkp/OJDyeIz9XKe8qgvENScpprm 8iSEIFSqqYEEK7j88y/SYEOf9GnlMFJg0fLbmOebsrOehcWuk08q+IRFsEKjbWXJNnp4 4Gcv55mSnC0AyXzjZmM9cNSWCK94+nbOR+dyw7BZwwrxlMF0fXf4t6XdsZ/r9SbON42a Q+lsT64Ewb8gDVC7CeKmeK+hq7feWuFqOyk5Z5FVkzAGem8q079AWdVo75zCLuNs5UVp BxZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685673537; x=1688265537; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=st7ufIUpzDeo3BfLfKoEggBOe5rcnn1+0Mg2FE1VWcc=; b=NTMOyKUzLbEruK/DZ9J8P2LydziLyBlu743rYGmAPyMQM/6faK8Y0oFt+CFovdDthI OKgjaMNtY81/OoWOJ/3xAuuVlntF2+mZ7807NyFe9iwTFJZZSaYIRnp3v4KRXsuVwxj6 0OaeaRcaKyDsBYEY1TnOPn6QtVwXfwt0CmbwN23mLJodokRb1Q8g/dWQN/lzBYJDhWbo GJnv19ObCYopjZrNW/lmBe6aXnDxAQ323vG/TLJ9AsWkLNz2Ns3wjnw9cFWOcbwG0m77 sJ3N3sifnNxPrHuY3Qx5SFRfqMhW8h21QAYoGRRa6+R5B/Df1ZVWVe+GGWjOBUHTV1P0 ngbw== X-Gm-Message-State: AC+VfDy1OfvPTWpt02MAHbmrPRaGem429CBFpSlPar7LjpWDVd7M+80h 5HHASmy+u8Fpr8jaHlmtswJKww== X-Google-Smtp-Source: ACHHUZ4Ygd6rC4c5eU+Y9Y/PQojCuOhjgCANTfzcSjBhdAf4WXa1LsiOCKB3M9hCmxEgd0g6iDJZUw== X-Received: by 2002:a05:6808:9af:b0:38b:e6:3d95 with SMTP id e15-20020a05680809af00b0038b00e63d95mr1089061oig.40.1685673537369; Thu, 01 Jun 2023 19:38:57 -0700 (PDT) Received: from [192.168.50.116] ([71.202.114.183]) by smtp.gmail.com with ESMTPSA id m8-20020aa79008000000b00652a72b89d1sm449424pfo.170.2023.06.01.19.38.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 01 Jun 2023 19:38:57 -0700 (PDT) Message-ID: <09f944a2-1123-75e3-ce2f-2080df94c964@rivosinc.com> Date: Thu, 1 Jun 2023 19:38:55 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Content-Language: en-US To: Jeff Law Cc: gcc@gcc.gnu.org, Kito Cheng , Palmer Dabbelt , gnu-toolchain , pinskia@gmail.com, GCC Patches From: Vineet Gupta Subject: Followup on PR/109279: large constants on RISCV Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Jeff, I finally got around to collecting various observations on PR/109279 - more importantly the state of large constants in RV backend, apologies in advance for the long email. It seems the various commits in area have improved the original test case of 0x1010101_01010101   Before 2e886eef7f2b      |   With 2e886eef7f2b   | With 0530254413f8     | With c104ef4b5eb1     (const pool)           | define_insn_and_split | "splitter relaxed new |                            | "*mvconst_internal"   | pseudos"             | lui  a5,%hi(.LANCHOR0)     | li     a0,0x01010000  | li a5,0x01010000    | li   a5,0x01010000 ld   a0,%lo(.LANCHOR0)(a5) | addi   a0,0x0101      | addi a5,a5,0x0101     | addi a5,a5,0x0101 ret                        | slli   a0,a0,16       | mv a0,a5            | slli a0,a5,32                            | addi   a0,a0,0x0101   | slli a5,a5,32         | add  a0,a0,a5                            | slli   a0,a0,16       | add a0,a5,a0         |                            | addi   a0,a0,0x0101   | ret                   |                            | ret                   | But same commits seem to have regressed Andrew's test from same PR (which is the theme of this email). The seemingly contrived test turned out to be much more than I'd hoped for.    long long f(void)    {      unsigned t = 0x101_0101;      long long t1 = t;      long long t2 = ((unsigned long long )t) << 32;      asm("":"+r"(t1));      return t1 | t2;    }   Before 2e886eef7f2b  |   With 2e886eef7f2b    | With 0530254413f8     (ideal code)       | define_insn_and_split  | "splitter relaxed new                        |                        |  pseudos"    li   a0,0x1010000   |    li   a5,0x1010000   |    li a0,0x101_0000    addi a0,a0,0x101    |    addi a5,a5,0x101    |    addi a0,a0,0x101    slli a5,a0,32       |    mv   a0,a5          |    li a5,0x101_0000    or   a0,a0,a5       |    slli a5,a5,32       |    slli a0,a0,32    ret                 |    or   a0,a0,a5       |    addi a5,a5,0x101                        |    ret                 |    or   a0,a5,a0                                                 |    ret As a baseline, RTL just before cse1 (in 260r.dfinit) in all of above is:    # lower word    (insn 6 2 7 2 (set (reg:DI 138)         (const_int [0x1010000]))  {*movdi_64bit}    (insn 7 6 8 2 (set (reg:DI 137)         (plus:DI (reg:DI 138)             (const_int [0x101]))) {adddi3}      (expr_list:REG_EQUAL (const_int [0x1010101]) )    (insn 5 8 9 2 (set (reg/v:DI 134 [ t1 ])         (reg:DI 136 [ t1 ])) {*movdi_64bit}    # upper word (created independently)    (insn 9 5 10 2 (set (reg:DI 141)         (const_int [0x1010000]))  {*movdi_64bit}    (insn 10 9 11 2 (set (reg:DI 142)         (plus:DI (reg:DI 141)             (const_int [0x101]))) {adddi3}    (insn 11 10 12 2 (set (reg:DI 140)         (ashift:DI (reg:DI 142)             (const_int 32 [0x20]))) {ashldi3}        (expr_list:REG_EQUAL (const_int [0x101010100000000]))    # stitch them    (insn 12 11 13 2 (set (reg:DI 139)         (ior:DI (reg/v:DI 134 [ t1 ])             (reg:DI 140))) "const2.c":7:13 99 {iordi3} Prior to 2e886eef7f2b, cse1 could do its job: finding oldest equivalent registers for the fragments of const and reusing the reg.    (insn 7 6 8 2 (set (reg:DI 137)         (plus:DI (reg:DI 138)             (const_int [0x101]))) {adddi3}         (expr_list:REG_EQUAL (const_int [0x1010101])))    [...]    (insn 11 10 12 2 (set (reg:DI 140)         (ashift:DI (reg:DI 137)                          ^^^^^   OLD EQUIV REG             (const_int 32 [0x20]))) {ashldi3}         (expr_list:REG_EQUAL (const_int [0x1010101_00000000]))) With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog() kicks in during cse1, eliding insns for a const_int.    (insn 7 6 8 2 (set (reg:DI 137)         (const_int [0x1010101])) {*mvconst_internal}         (expr_list:REG_EQUAL (const_int [0x1010101])))    [...]    (insn 11 10 12 2 (set (reg:DI 140)         (const_int [0x1010101_00000000])) {*mvconst_internal}         (expr_list:REG_EQUAL (const_int  [0x1010101_00000000]) )) Eventually split1 breaks it up using same mvconst_internal splitter, but the cse opportunity has been lost. *This is a now a baseline for large consts handling for RV backend which we all need to be aware of*. (2) Now on to the nuances as to why things get progressively worse after commit 0530254413f8. It all seems to get down to register allocation passes: sched1 before 0530254413f8    ;;     0--> b  0: i  22 r140=0x1010000    :alu    ;;     1--> b  0: i  20 r137=0x1010000    :alu    ;;     2--> b  0: i  23 r140=r140+0x101   :alu    ;;     3--> b  0: i  21 r137=r137+0x101   :alu    ;;     4--> b  0: i  24 r140=r140<<0x20   :alu    ;;     5--> b  0: i  25 r136=r137         :alu    ;;     6--> b  0: i   8 r136=asm_operands :nothing    ;;     7--> b  0: i  17 a0=r136|r140      :alu    ;;     8--> b  0: i  18 use a0            :nothing sched1 with 0530254413f8    ;;     0--> b  0: i  22 r144=0x1010000    :alu    ;;     1--> b  0: i  20 r143=0x1010000    :alu    ;;     2--> b  0: i  23 r145=r144+0x101   :alu    ;;     3--> b  0: i  21 r137=r143+0x101   :alu    ;;     4--> b  0: i  24 r140=r145<<0x20   :alu    ;;     5--> b  0: i  25 r136=r137         :alu    ;;     6--> b  0: i   8 r136=asm_operands :nothing    ;;     7--> b  0: i  17 a0=r136|r140      :alu    ;;     8--> b  0: i  18 use a0            :nothing The insn stream is same, only differences being registers reuse (due to splitter restriction) vs. not. Next IRA, for reasons I don't understand (and not brave enough yet to dive into) decides to regenerate const_int. And my guess is, this being so late in the game that it gets rematerialized as is in the end, causing the duplicity. FWIW, IRA for latter case only, emits additional REG_EQUIV notes which could also be playing a role.    deleting insn with uid = 22.    deleting insn with uid = 21.    (insn 20 12 23 2 (set (reg:DI 143)         (const_int  [0x1010000])) {*movdi_64bit}      (expr_list:REG_EQUIV (const_int  [0x1010000])))    (insn 23 20 24 2 (set (reg:DI 145)         (const_int  [0x1010101])) *{*mvconst_internal})*                ^^^^^^^^^^^    (insn 24 23 25 2 (set (reg:DI 140)         (ashift:DI (reg:DI 145)             (const_int 32 [0x20]))) {ashldi3}      (expr_list:REG_DEAD (reg:DI 145)      (expr_list:REG_EQUIV (const_int [0x1010101_00000000]))))    (insn 25 24 8 2 (set (reg:DI 136 [ t1 ])         (plus:DI (reg:DI 143)             (const_int [0x101]))) {adddi3}      (expr_list:REG_DEAD (reg:DI 143)))    (insn 17 8 18 2 (set (reg/i:DI 10 a0)         (ior:DI (reg:DI 136 [ t1 ])             (reg:DI 140)))  {iordi3}      (expr_list:REG_DEAD (reg:DI 140)      (expr_list:REG_DEAD (reg:DI 136 [ t1 ]) ))) I naively tried to gate mvconst_internal to !reload_completed, but that triggered some ICE. It seems wrong anyways since post reload split2 also sometimes needs to invokes the mvconst_internal splitter (afaikr for the original 0x01010... test) Do you have any ideas/directions on how to tackle this ? Thx, -Vineet