>> Ah, nice! How configurable are the bit ranges? I think Lehua's patch is configurable for bit ranges. Since his patch allow target flexible tracking subreg livenesss according to REGMODE_NATURAL_SIZE +/* Return true if REGNO is a pseudo and MODE is a multil regs size. */ +bool +need_track_subreg (int regno, machine_mode reg_mode) +{ + poly_int64 total_size = GET_MODE_SIZE (reg_mode); + poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode); + return maybe_gt (total_size, natural_size) + && multiple_p (total_size, natural_size) + && regno >= FIRST_PSEUDO_REGISTER; +} It depends on how targets configure REGMODE_NATURAL_SIZE target hook. If we return QImode size, his patch is enable tracking bit ranges 7 bits subreg. juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2023-11-12 19:53 To: 钟居哲 CC: Jeff Law; 丁乐华; gcc-patches; vmakarov Subject: Re: [PATCH 0/7] ira/lra: Support subreg coalesce 钟居哲 writes: > Hi, Richard. > >>> Maybe dead lanes are better tracked at the gimple level though, not sure. >>> (But AArch64 might need to lower lane operations more than it does now if >>> we want gimple to handle it.) > > We were trying to address such issue at GIMPLE leve at the beginning. > Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64 only tuple types. > However, for RVV, that's not enough to address all issues. > Consider this following situation: > https://godbolt.org/z/fhTvEjvr8 > > You can see comparing with LLVM, GCC has so many redundant mov instructions "vmv1r.v". > Since GCC is not able to tracking subreg liveness, wheras LLVM can. > > The reason why tracking sub-lanes in GIMPLE can not address these redundant move issues for RVV: > > 1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as aarch64 "svint8x1_t". > It used by segment load/store which is similiar instruction "ld2r" instruction in ARM SVE (vec_load_lanes/vec_store_lanes) > Support sub-lanes tracking in GIMPLE can fix this situation for both RVV and ARM SVE. > > 2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL =2) which also occupies 2 regsiters > which is not tuple type, instead, it is simple vector type. Such type is used by all simple operations. > For example, "vadd" with vint8m1_t is doing PLUS operation on single vector registers, wheras same > instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector registers. Such type we can't > define them as tuple type for following reasons: > 1). we also have tuple type for LMUL > 1, for example, we also have "vint8m2x2_t" has tuple type. > If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? , Tuple type with tuple or > Array with array ? It makes type so strange. > 2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t not tuple type. We are not able > to change the documents. > 3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not tuple type for 3 years and widely > used, changing type definition will destroy ecosystem. So for compability, we are not able define > LMUL > 1 as tuple type. > > For these reasons, we should be able to access highpart of vint8m2_t and lowpart of vint8m2_t, we provide > vget to generate subreg access of the vector mode. > > So, at the discussion stage, we decided to address subpart access of vector mode in more generic way, > which is support subreg liveness tracking in RTL level. So that it can not only address issues happens on ARM SVE, > but also address issues for LMUL > 1. > > 3. After we decided to support subreg liveness tracking in RTL, we study LLVM. > Actually, LLVM has a standalone PASS right before their linear scan RA (greedy) call register coalescer. > So, the first draft of our solution is supporting register coalescing before RA which is opened source: > riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next · riscv-collab/riscv-gcc (github.com) > by simulating LLVM solution. However, we don't think such solution is elegant and we have consulted > Vlad. Vlad suggested we should enhance IRA/LRA with subreg liveness tracking which turns to be > more reasonable and elegant approach. > > So, after Lehua several experiments and investigations, he dedicate himself produce this series of patches. > And we think Lehua's approach should be generic and optimal solution to fix this subreg generic problems. Ah, sorry, I caused a misunderstanding. In the message quoted above, I'd moved on from talking about tracking liveness of vectors in a tuple. I was instead talking about tracking the liveness of individual lanes in a single vector. I was responding to Jeff's description of the bit-level liveness tracking pass. That pass solves a generic issue: redundant sign and zero extensions. But it sounded like it could also be reused for tracking lanes of a vector (by using different bit ranges from the ones that Jeff listed). The thing that I was saying might be better done on gimple was tracking lanes of an individual vector. In other words, I was arguing against my own question. I should have changed the subject line when responding, sorry. I wasn't suggesting that we should avoid subreg tracking in the RA. That's definitely needed for AArch64, and in general. Thanks, Richard