>> Ah, nice!  How configurable are the bit ranges?
I think Lehua's patch is configurable for bit ranges.
Since his patch allow target flexible tracking subreg livenesss according to REGMODE_NATURAL_SIZE

+/* Return true if REGNO is a pseudo and MODE is a multil regs size.  */
+bool
+need_track_subreg (int regno, machine_mode reg_mode)
+{
+  poly_int64 total_size = GET_MODE_SIZE (reg_mode);
+  poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode);
+  return maybe_gt (total_size, natural_size)
+	 && multiple_p (total_size, natural_size)
+	 && regno >= FIRST_PSEUDO_REGISTER;
+}
It depends on how targets configure REGMODE_NATURAL_SIZE target hook.

If we return QImode size, his patch is enable tracking bit ranges 7 bits subreg.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-11-12 19:53
To: 钟居哲
CC: Jeff Law; 丁乐华; gcc-patches; vmakarov
Subject: Re: [PATCH 0/7] ira/lra: Support subreg coalesce
钟居哲 <juzhe.zhong@rivai.ai> writes:
> Hi, Richard.
>
>>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>>> (But AArch64 might need to lower lane operations more than it does now if
>>> we want gimple to handle it.)
>
> We were trying to address such issue at GIMPLE leve at the beginning.
> Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64 only tuple types.
> However, for RVV, that's not enough to address all issues.
> Consider this following situation:
> https://godbolt.org/z/fhTvEjvr8 
>
> You can see comparing with LLVM, GCC has so many redundant mov instructions "vmv1r.v".
> Since GCC is not able to tracking subreg liveness, wheras LLVM can.
>
> The reason why tracking sub-lanes in GIMPLE can not address these redundant move issues for RVV：
>
> 1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as aarch64 "svint8x1_t".
>     It used by segment load/store which is similiar instruction "ld2r" instruction in ARM SVE (vec_load_lanes/vec_store_lanes)
>     Support sub-lanes tracking in GIMPLE can fix this situation for both RVV and ARM SVE.
>     
> 2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL =2) which also occupies 2 regsiters
>     which is not tuple type, instead, it is simple vector type. Such type is used by all simple operations.
>     For example, "vadd" with vint8m1_t is doing PLUS operation on single vector registers, wheras same
>     instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector registers.  Such type we can't
>     define them as tuple type for following reasons:
>     1). we also have tuple type for LMUL > 1, for example, we also have "vint8m2x2_t" has tuple type.
>          If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? , Tuple type with tuple or
>          Array with array ? It makes type so strange.
>     2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t not tuple type. We are not able
>          to change the documents.
>     3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not tuple type for 3 years and widely
>          used, changing type definition will destroy ecosystem.  So for compability, we are not able define
>          LMUL > 1 as tuple type.
>
> For these reasons, we should be able to access highpart of vint8m2_t and lowpart of vint8m2_t, we provide
> vget to generate subreg access of the vector mode.
>
> So, at the discussion stage, we decided to address subpart access of vector mode in more generic way,
> which is support subreg liveness tracking in RTL level. So that it can not only address issues happens on ARM SVE,
> but also address issues for LMUL > 1.
>
> 3. After we decided to support subreg liveness tracking in RTL, we study LLVM.
>     Actually, LLVM has a standalone PASS right before their linear scan RA (greedy) call register coalescer.
>     So, the first draft of our solution is supporting register coalescing before RA which is opened source:
>     riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next · riscv-collab/riscv-gcc (github.com)
>     by simulating LLVM solution. However, we don't think such solution is elegant and we have consulted
>     Vlad.  Vlad suggested we should enhance IRA/LRA with subreg liveness tracking which turns to be
>     more reasonable and elegant approach. 
>
> So, after Lehua several experiments and investigations, he dedicate himself produce this series of patches.
> And we think Lehua's approach should be generic and optimal solution to fix this subreg generic problems.
 
Ah, sorry, I caused a misunderstanding.  In the message quoted above,
I'd moved on from talking about tracking liveness of vectors in a tuple.
I was instead talking about tracking the liveness of individual lanes
in a single vector.
 
I was responding to Jeff's description of the bit-level liveness tracking
pass.  That pass solves a generic issue: redundant sign and zero extensions.
But it sounded like it could also be reused for tracking lanes of a vector
(by using different bit ranges from the ones that Jeff listed).
 
The thing that I was saying might be better done on gimple was tracking
lanes of an individual vector.  In other words, I was arguing against
my own question.
 
I should have changed the subject line when responding, sorry.
 
I wasn't suggesting that we should avoid subreg tracking in the RA.
That's definitely needed for AArch64, and in general.
 
Thanks,
Richard