在 2022/11/23 00:44, Xi Ruoyao 写道: >> While I still can't fully understand the immediate load issue and how >> this patch fix it, I've tested this patch (alongside the prefetch >> instruction patch) with bootstrap-ubsan.  And the compiled result of >> imm-load1.c seems OK. > And it's doing correct thing for Glibc "improved generic string > functions" patch, producing some really tight loop now. > In the process of debugging, I found this,bringing the immediate number load instruction out of the loop is done in loop2_invariant optimization. One of the conditions for extraction is that the destination register cannot be used more than once, and the sequence before it was modified was like this: (insn 12 11 13 3 (set (reg:DI 90)         (const_int 16842752 [0x1010000])) "test.c":13:12 discrim 1 131 {*movdi_64bit}      (nil)) (insn 13 12 14 3 (set (reg:DI 91)         (ior:DI (reg:DI 90)             (const_int 257 [0x101]))) "test.c":13:12 discrim 1 88 {iordi3}      (expr_list:REG_DEAD (reg:DI 90)         (expr_list:REG_EQUAL (const_int 16843009 [0x1010101])             (nil)))) (insn 14 13 15 3 (set (reg:DI 91)         (ior:DI (zero_extend:DI (subreg:SI (reg:DI 91) 0))             (const_int 282578783305728 [0x1010100000000]))) "test.c":13:12 discrim 1 150 {lu32i_d}      (expr_list:REG_EQUAL (const_int 282578800148737 [0x1010101010101])         (nil))) (insn 15 14 17 3 (set (reg:DI 91)         (ior:DI (and:DI (reg:DI 91)                 (const_int 4503599627370495 [0xfffffffffffff]))             (const_int 72057594037927936 [0x100000000000000]))) "test.c":13:12 discrim 1 151 {lu52i_d}      (expr_list:REG_EQUAL (const_int 72340172838076673 [0x101010101010101])         (nil))) Therefore, the last two instructions do not meet the extraction conditions. But because of the implementation of our instructions, I freed myself up immediately to do it loop2_invariant later, so I avoided this problem.