From: Vineet Gupta <vineetg@rivosinc.com>
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: gcc@gcc.gnu.org, Kito Cheng <kito.cheng@gmail.com>,
Palmer Dabbelt <palmer@rivosinc.com>,
gnu-toolchain <gnu-toolchain@rivosinc.com>,
pinskia@gmail.com, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Followup on PR/109279: large constants on RISCV
Date: Thu, 1 Jun 2023 19:38:55 -0700 [thread overview]
Message-ID: <09f944a2-1123-75e3-ce2f-2080df94c964@rivosinc.com> (raw)
Hi Jeff,
I finally got around to collecting various observations on PR/109279 -
more importantly the state of large constants in RV backend, apologies
in advance for the long email.
It seems the various commits in area have improved the original test
case of 0x1010101_01010101
Before 2e886eef7f2b | With 2e886eef7f2b | With
0530254413f8 | With c104ef4b5eb1
(const pool) | define_insn_and_split | "splitter relaxed
new |
| "*mvconst_internal" | pseudos" |
lui a5,%hi(.LANCHOR0) | li a0,0x01010000 | li a5,0x01010000
| li a5,0x01010000
ld a0,%lo(.LANCHOR0)(a5) | addi a0,0x0101 | addi
a5,a5,0x0101 | addi a5,a5,0x0101
ret | slli a0,a0,16 | mv a0,a5
| slli a0,a5,32
| addi a0,a0,0x0101 | slli
a5,a5,32 | add a0,a0,a5
| slli a0,a0,16 | add a0,a5,a0 |
| addi a0,a0,0x0101 |
ret |
| ret |
But same commits seem to have regressed Andrew's test from same PR
(which is the theme of this email).
The seemingly contrived test turned out to be much more than I'd hoped for.
long long f(void)
{
unsigned t = 0x101_0101;
long long t1 = t;
long long t2 = ((unsigned long long )t) << 32;
asm("":"+r"(t1));
return t1 | t2;
}
Before 2e886eef7f2b | With 2e886eef7f2b | With 0530254413f8
(ideal code) | define_insn_and_split | "splitter relaxed new
| | pseudos"
li a0,0x1010000 | li a5,0x1010000 | li a0,0x101_0000
addi a0,a0,0x101 | addi a5,a5,0x101 | addi a0,a0,0x101
slli a5,a0,32 | mv a0,a5 | li a5,0x101_0000
or a0,a0,a5 | slli a5,a5,32 | slli a0,a0,32
ret | or a0,a0,a5 | addi a5,a5,0x101
| ret | or a0,a5,a0
| ret
As a baseline, RTL just before cse1 (in 260r.dfinit) in all of above is:
# lower word
(insn 6 2 7 2 (set (reg:DI 138)
(const_int [0x1010000])) {*movdi_64bit}
(insn 7 6 8 2 (set (reg:DI 137)
(plus:DI (reg:DI 138)
(const_int [0x101]))) {adddi3}
(expr_list:REG_EQUAL (const_int [0x1010101]) )
(insn 5 8 9 2 (set (reg/v:DI 134 [ t1 ])
(reg:DI 136 [ t1 ])) {*movdi_64bit}
# upper word (created independently)
(insn 9 5 10 2 (set (reg:DI 141)
(const_int [0x1010000])) {*movdi_64bit}
(insn 10 9 11 2 (set (reg:DI 142)
(plus:DI (reg:DI 141)
(const_int [0x101]))) {adddi3}
(insn 11 10 12 2 (set (reg:DI 140)
(ashift:DI (reg:DI 142)
(const_int 32 [0x20]))) {ashldi3}
(expr_list:REG_EQUAL (const_int [0x101010100000000]))
# stitch them
(insn 12 11 13 2 (set (reg:DI 139)
(ior:DI (reg/v:DI 134 [ t1 ])
(reg:DI 140))) "const2.c":7:13 99 {iordi3}
Prior to 2e886eef7f2b, cse1 could do its job: finding oldest equivalent
registers for the fragments of const and reusing the reg.
(insn 7 6 8 2 (set (reg:DI 137)
(plus:DI (reg:DI 138)
(const_int [0x101]))) {adddi3}
(expr_list:REG_EQUAL (const_int [0x1010101])))
[...]
(insn 11 10 12 2 (set (reg:DI 140)
(ashift:DI (reg:DI 137)
^^^^^ OLD EQUIV REG
(const_int 32 [0x20]))) {ashldi3}
(expr_list:REG_EQUAL (const_int [0x1010101_00000000])))
With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog()
kicks in during cse1, eliding insns for a const_int.
(insn 7 6 8 2 (set (reg:DI 137)
(const_int [0x1010101])) {*mvconst_internal}
(expr_list:REG_EQUAL (const_int [0x1010101])))
[...]
(insn 11 10 12 2 (set (reg:DI 140)
(const_int [0x1010101_00000000])) {*mvconst_internal}
(expr_list:REG_EQUAL (const_int [0x1010101_00000000]) ))
Eventually split1 breaks it up using same mvconst_internal splitter, but
the cse opportunity has been lost.
*This is a now a baseline for large consts handling for RV backend which
we all need to be aware of*.
(2) Now on to the nuances as to why things get progressively worse after
commit 0530254413f8.
It all seems to get down to register allocation passes:
sched1 before 0530254413f8
;; 0--> b 0: i 22 r140=0x1010000 :alu
;; 1--> b 0: i 20 r137=0x1010000 :alu
;; 2--> b 0: i 23 r140=r140+0x101 :alu
;; 3--> b 0: i 21 r137=r137+0x101 :alu
;; 4--> b 0: i 24 r140=r140<<0x20 :alu
;; 5--> b 0: i 25 r136=r137 :alu
;; 6--> b 0: i 8 r136=asm_operands :nothing
;; 7--> b 0: i 17 a0=r136|r140 :alu
;; 8--> b 0: i 18 use a0 :nothing
sched1 with 0530254413f8
;; 0--> b 0: i 22 r144=0x1010000 :alu
;; 1--> b 0: i 20 r143=0x1010000 :alu
;; 2--> b 0: i 23 r145=r144+0x101 :alu
;; 3--> b 0: i 21 r137=r143+0x101 :alu
;; 4--> b 0: i 24 r140=r145<<0x20 :alu
;; 5--> b 0: i 25 r136=r137 :alu
;; 6--> b 0: i 8 r136=asm_operands :nothing
;; 7--> b 0: i 17 a0=r136|r140 :alu
;; 8--> b 0: i 18 use a0 :nothing
The insn stream is same, only differences being registers reuse (due to
splitter restriction) vs. not.
Next IRA, for reasons I don't understand (and not brave enough yet to
dive into) decides to regenerate const_int.
And my guess is, this being so late in the game that it gets
rematerialized as is in the end, causing the duplicity.
FWIW, IRA for latter case only, emits additional REG_EQUIV notes which
could also be playing a role.
deleting insn with uid = 22.
deleting insn with uid = 21.
(insn 20 12 23 2 (set (reg:DI 143)
(const_int [0x1010000])) {*movdi_64bit}
(expr_list:REG_EQUIV (const_int [0x1010000])))
(insn 23 20 24 2 (set (reg:DI 145)
(const_int [0x1010101])) *{*mvconst_internal})*
^^^^^^^^^^^
(insn 24 23 25 2 (set (reg:DI 140)
(ashift:DI (reg:DI 145)
(const_int 32 [0x20]))) {ashldi3}
(expr_list:REG_DEAD (reg:DI 145)
(expr_list:REG_EQUIV (const_int [0x1010101_00000000]))))
(insn 25 24 8 2 (set (reg:DI 136 [ t1 ])
(plus:DI (reg:DI 143)
(const_int [0x101]))) {adddi3}
(expr_list:REG_DEAD (reg:DI 143)))
(insn 17 8 18 2 (set (reg/i:DI 10 a0)
(ior:DI (reg:DI 136 [ t1 ])
(reg:DI 140))) {iordi3}
(expr_list:REG_DEAD (reg:DI 140)
(expr_list:REG_DEAD (reg:DI 136 [ t1 ]) )))
I naively tried to gate mvconst_internal to !reload_completed, but that
triggered some ICE.
It seems wrong anyways since post reload split2 also sometimes needs to
invokes the mvconst_internal splitter (afaikr for the original
0x01010... test)
Do you have any ideas/directions on how to tackle this ?
Thx,
-Vineet
next reply other threads:[~2023-06-02 2:38 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-02 2:38 Vineet Gupta [this message]
2023-06-07 23:44 ` Jeff Law
2023-06-12 19:32 ` Vineet Gupta
2023-06-17 22:13 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=09f944a2-1123-75e3-ce2f-2080df94c964@rivosinc.com \
--to=vineetg@rivosinc.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=gcc@gcc.gnu.org \
--cc=gnu-toolchain@rivosinc.com \
--cc=jeffreyalaw@gmail.com \
--cc=kito.cheng@gmail.com \
--cc=palmer@rivosinc.com \
--cc=pinskia@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).