Hi, Richard.

>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>> (But AArch64 might need to lower lane operations more than it does now if
>> we want gimple to handle it.)

We were trying to address such issue at GIMPLE leve at the beginning.
Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64 only tuple types.
However, for RVV, that's not enough to address all issues.
Consider this following situation:
https://godbolt.org/z/fhTvEjvr8 

You can see comparing with LLVM, GCC has so many redundant mov instructions "vmv1r.v".
Since GCC is not able to tracking subreg liveness, wheras LLVM can.

The reason why tracking sub-lanes in GIMPLE can not address these redundant move issues for RVV：

1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as aarch64 "svint8x1_t".
    It used by segment load/store which is similiar instruction "ld2r" instruction in ARM SVE (vec_load_lanes/vec_store_lanes)
    Support sub-lanes tracking in GIMPLE can fix this situation for both RVV and ARM SVE.
    
2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL =2) which also occupies 2 regsiters
    which is not tuple type, instead, it is simple vector type. Such type is used by all simple operations.
    For example, "vadd" with vint8m1_t is doing PLUS operation on single vector registers, wheras same
    instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector registers.  Such type we can't
    define them as tuple type for following reasons:
    1). we also have tuple type for LMUL > 1, for example, we also have "vint8m2x2_t" has tuple type.
         If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? , Tuple type with tuple or
         Array with array ? It makes type so strange.
    2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t not tuple type. We are not able
         to change the documents.
    3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not tuple type for 3 years and widely
         used, changing type definition will destroy ecosystem.  So for compability, we are not able define
         LMUL > 1 as tuple type.

For these reasons, we should be able to access highpart of vint8m2_t and lowpart of vint8m2_t, we provide
vget to generate subreg access of the vector mode.

So, at the discussion stage, we decided to address subpart access of vector mode in more generic way,
which is support subreg liveness tracking in RTL level. So that it can not only address issues happens on ARM SVE,
but also address issues for LMUL > 1.

3. After we decided to support subreg liveness tracking in RTL, we study LLVM.
    Actually, LLVM has a standalone PASS right before their linear scan RA (greedy) call register coalescer.
    So, the first draft of our solution is supporting register coalescing before RA which is opened source:
    riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next · riscv-collab/riscv-gcc (github.com)
    by simulating LLVM solution. However, we don't think such solution is elegant and we have consulted
    Vlad.  Vlad suggested we should enhance IRA/LRA with subreg liveness tracking which turns to be
    more reasonable and elegant approach. 

So, after Lehua several experiments and investigations, he dedicate himself produce this series of patches.
And we think Lehua's approach should be generic and optimal solution to fix this subreg generic problems.

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-11-11 23:33
To: Jeff Law
CC: Lehua Ding; gcc-patches; vmakarov; juzhe.zhong
Subject: Re: [PATCH 0/7] ira/lra: Support subreg coalesce
Jeff Law <jeffreyalaw@gmail.com> writes:
> On 11/8/23 02:40, Richard Sandiford wrote:
>> Lehua Ding <lehua.ding@rivai.ai> writes:
>>> Hi,
>>>
>>> These patchs try to support subreg coalesce feature in
>>> register allocation passes (ira and lra).
>> 
>> Thanks a lot for the series.  This is definitely something we've
>> needed for a while.
>> 
>> I probably won't be able to look at it in detail for a couple of weeks
>> (and the real review should come from Vlad anyway), but one initial
>> comment:
> Absolutely agreed on the above.
>
> The other thing to ponder.  Jivan and I have been banging on Joern's 
> sub-object tracking bits for a totally different problem in the RISC-V 
> space.  But there may be some overlap.
>
> Essentially Joern's code tracks liveness for a few chunks in registers. 
> bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
> propagating liveness from the destination through to the sources.  SO 
> for example if we have
>
> (set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))
>
> If we had previously determined that only bits 0..15 were live in DEST, 
> then we'll propagate that into the source registers.
>
> The goal is to ultimately transform something like
>
> (set (dest:mode) (any_extend:mode (reg:narrower_mode)))
>
> into
>
> (set (dest:mode) (subreg:mode (reg:narrower_mode)))
>
> Where the latter typically will get simplified and propagated away.
>
>
> Joern's code is a bit of a mess, but Jivan and I are slowly untangling 
> it from a correctness standpoint.  It'll also need the usual cleanups.
 
Ah, nice!  How configurable are the bit ranges?  We might be able to use
something similar to track lanes in a vector operation, to detect the
dead code in:
 
   ins v0.b[4], w0
   ...
   ins v0.b[4], w1
 
It sounds like the bit ranges you have now would do that for some
common/useful cases, even if it doesn't handle the general case.
 
Maybe dead lanes are better tracked at the gimple level though, not sure.
(But AArch64 might need to lower lane operations more than it does now if
we want gimple to handle it.)
 
Richard