From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 9AB473858D20 for ; Sun, 12 Nov 2023 11:53:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9AB473858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9AB473858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699790040; cv=none; b=vz0XLMfrdelc9XXurGuZH/LBH8sRf+enjXkEhryq13ffjCBXmLkUv/I5KFso/hjHvgxjAEw/awtAbfiSpMW5SBLPq9Zf1NeaBeY/7sLsKNIizBQTk9UBbyVwN9iaQ+veM1UIGf2CY0gc51m5SB5eqC5SVlkc4YFwd5Gf45FUP3M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699790040; c=relaxed/simple; bh=ilCnBw2l/U0APwiJT5jUIr9/gcKPW0m3qrGLfk8KRB4=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=Jxxiv75oXnE5/OcataP7ny8xn0r+t7kwtxgh8gIZvew5U1voFh/J/QEwQOZB2xCfZCFyBwS7ERf8OjGXiYwLBsAQxYJ7ilChIjaP4akj/OR8rdVS3Na++cxW8Wkrx2/PiMzdWA7PldLSADiF6MNiCkZBhqBrkb620XJzoHv6h9c= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0BAA4C15; Sun, 12 Nov 2023 03:54:43 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 504F43F64C; Sun, 12 Nov 2023 03:53:57 -0800 (PST) From: Richard Sandiford To: =?utf-8?B?6ZKf5bGF5ZOy?= Mail-Followup-To: =?utf-8?B?6ZKf5bGF5ZOy?= ,"Jeff Law" , =?utf-8?B?5LiB5LmQ5Y2O?= , gcc-patches , vmakarov , richard.sandiford@arm.com Cc: "Jeff Law" , =?utf-8?B?5LiB5LmQ5Y2O?= , gcc-patches , vmakarov Subject: Re: [PATCH 0/7] ira/lra: Support subreg coalesce References: <20231108034740.834590-1-lehua.ding@rivai.ai> <3d6ec0ee-6542-4b6a-a2cd-7fd54c136af9@gmail.com> <3D45AF37B3B11CB0+2023111209160809613957@rivai.ai> Date: Sun, 12 Nov 2023 11:53:56 +0000 In-Reply-To: <3D45AF37B3B11CB0+2023111209160809613957@rivai.ai> (=?utf-8?B?IumSn+WxheWTsiIncw==?= message of "Sun, 12 Nov 2023 09:16:08 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.6 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: =E9=92=9F=E5=B1=85=E5=93=B2 writes: > Hi, Richard. > >>> Maybe dead lanes are better tracked at the gimple level though, not sur= e. >>> (But AArch64 might need to lower lane operations more than it does now = if >>> we want gimple to handle it.) > > We were trying to address such issue at GIMPLE leve at the beginning. > Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch= 64 only tuple types. > However, for RVV, that's not enough to address all issues. > Consider this following situation: > https://godbolt.org/z/fhTvEjvr8=20 > > You can see comparing with LLVM, GCC has so many redundant mov instructio= ns "vmv1r.v". > Since GCC is not able to tracking subreg liveness, wheras LLVM can. > > The reason why tracking sub-lanes in GIMPLE can not address these redunda= nt move issues for RVV=EF=BC=9A > > 1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as aa= rch64 "svint8x1_t". > It used by segment load/store which is similiar instruction "ld2r" in= struction in ARM SVE (vec_load_lanes/vec_store_lanes) > Support sub-lanes tracking in GIMPLE can fix this situation for both = RVV and ARM SVE. >=20=20=20=20=20 > 2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LM= UL =3D2) which also occupies 2 regsiters > which is not tuple type, instead, it is simple vector type. Such type= is used by all simple operations. > For example, "vadd" with vint8m1_t is doing PLUS operation on single = vector registers, wheras same > instruction "vadd=E2=80=9C with vint8m2_t is dong PLUS operation on 2= vector registers. Such type we can't > define them as tuple type for following reasons: > 1). we also have tuple type for LMUL > 1, for example, we also have "= vint8m2x2_t" has tuple type. > If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" = ? , Tuple type with tuple or > Array with array ? It makes type so strange. > 2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_= t not tuple type. We are not able > to change the documents. > 3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not = tuple type for 3 years and widely > used, changing type definition will destroy ecosystem. So for c= ompability, we are not able define > LMUL > 1 as tuple type. > > For these reasons, we should be able to access highpart of vint8m2_t and = lowpart of vint8m2_t, we provide > vget to generate subreg access of the vector mode. > > So, at the discussion stage, we decided to address subpart access of vect= or mode in more generic way, > which is support subreg liveness tracking in RTL level. So that it can no= t only address issues happens on ARM SVE, > but also address issues for LMUL > 1. > > 3. After we decided to support subreg liveness tracking in RTL, we study = LLVM. > Actually, LLVM has a standalone PASS right before their linear scan R= A (greedy) call register coalescer. > So, the first draft of our solution is supporting register coalescing= before RA which is opened source: > riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next =C2=B7 riscv-coll= ab/riscv-gcc (github.com) > by simulating LLVM solution. However, we don't think such solution is= elegant and we have consulted > Vlad. Vlad suggested we should enhance IRA/LRA with subreg liveness = tracking which turns to be > more reasonable and elegant approach.=20 > > So, after Lehua several experiments and investigations, he dedicate himse= lf produce this series of patches. > And we think Lehua's approach should be generic and optimal solution to f= ix this subreg generic problems. Ah, sorry, I caused a misunderstanding. In the message quoted above, I'd moved on from talking about tracking liveness of vectors in a tuple. I was instead talking about tracking the liveness of individual lanes in a single vector. I was responding to Jeff's description of the bit-level liveness tracking pass. That pass solves a generic issue: redundant sign and zero extensions. But it sounded like it could also be reused for tracking lanes of a vector (by using different bit ranges from the ones that Jeff listed). The thing that I was saying might be better done on gimple was tracking lanes of an individual vector. In other words, I was arguing against my own question. I should have changed the subject line when responding, sorry. I wasn't suggesting that we should avoid subreg tracking in the RA. That's definitely needed for AArch64, and in general. Thanks, Richard