From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id F09153858D35; Wed,  8 Nov 2023 00:15:18 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F09153858D35
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1699402518;
	bh=oJIDPz0sfW3S3A74m0+UnfM1n2JXM2PbtvdR+R3Il84=;
	h=From:To:Subject:Date:From;
	b=iX26Up4aJtiUA/F7tLmuW/O2eOhKY3ysxDHwDWRwUN33SwGUnvq7Tq/pJJSBZWUnv
	 /LEfp1i9+i3xX49W85/OeQF8/E3k/AQ/cfJmWWUHN5fPUU+phtSG4nyOumzOV/V4rn
	 8WH9WmPbTjBchczYjnCiL2O/XoVCT9tMW8MgFtF0=
From: "juzhe.zhong at rivai dot ai" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/112431] New: RISC-V GCC-15 feature: Support register overlap
 on widen RVV instructions
Date: Wed, 08 Nov 2023 00:15:18 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: juzhe.zhong at rivai dot ai
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-112431-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112431

            Bug ID: 112431
           Summary: RISC-V GCC-15 feature: Support register overlap on
                    widen RVV instructions
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

According to RVV ISA:

"The destination EEW is smaller than the source EEW and the overlap is in t=
he=20
 lowest-numbered part of the source register group (e.g., when LMUL=3D1, vn=
srl.wi=20
 v0, v0, 3 is legal, but a destination of v1 is not)."

It's nice that we can support register overlap currently for narrow operati=
ons.
Consider this following example:

#include "riscv_vector.h"
void f20 (int16_t *base,int8_t *out,size_t vl, size_t shift)
{
    vuint16m2_t src =3D __riscv_vle16_v_u16m2 (base, vl);
    /* Only allow load v30,v31.  */
    asm volatile("#" ::
                 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v=
9",
                   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",=
=20
                   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",=
=20=20
                   "v26", "v27", "v28", "v29");

    vuint8m1_t v =3D __riscv_vnclipu_wx_u8m1(src,shift,0,vl);
    /* Only allow vncvt SRC =3D=3D DEST v30.  */
    asm volatile("#" ::=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
                 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v=
9",=20
                   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",=
=20=20=20=20=20
                   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",=
=20=20=20=20=20
                   "v26", "v27", "v28", "v29", "v31");

    __riscv_vse8_v_u8m1 (out,v,vl);
}

https://gcc.godbolt.org/z/j98xejKh5

GCC doesn't have register spilling wheras LLVM has register spillings.

However, we failed to support register overlap for RVV widen operations.
Since according to RVV ISA:=20

"The destination EEW is greater than the source EEW, the source EMUL is at
least=20
 1, and the overlap is in the highest-numbered part of the destination regi=
ster=20
 group (e.g., when LMUL=3D8, vzext.vf4 v0, v6 is legal, but a source of v0,=
 v2,
or=20
 v4 is not)."

Consider this following case:

#include "riscv_vector.h"
void f20 (void *base,void *out,size_t vl, size_t shift)
{
    vuint16m1_t src =3D __riscv_vle16_v_u16m1 (base, vl);
    /* Only allow load v30,v31.  */
    asm volatile("#" ::
                 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v=
9",
                   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",=
=20
                   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",=
=20=20
                   "v26", "v27", "v28", "v29", "v30");

    vuint32m2_t v =3D __riscv_vwaddu_vv_u32m2(src,src,vl);
    /* Only allow vncvt SRC =3D=3D DEST v30.  */
    asm volatile("#" ::=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
                 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v=
9",=20
                   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",=
=20=20=20=20=20
                   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",=
=20=20=20=20=20
                   "v26", "v27", "v28", "v29");

    __riscv_vse32_v_u32m2 (out,v,vl);
}

https://gcc.godbolt.org/z/h3cM9vhnY

Since we are configuring RVV widen instructions early clobber, same as LLVM.
We can see both LLVM and GCC fail to overlap registers.

GCC ASM:

f20:
        vsetvli zero,a2,e16,m1,ta,ma
        vle16.v v31,0(a0)
        vwaddu.vv       v2,v31,v31
        vmv2r.v v30,v2                   ----> Redundant mov instruction.
        vse32.v v30,0(a1)
        ret

We should be able to generate vwaddu.vv v30,v31,v31 which can eliminate the
redundant move instruction.

This issue will be fixed on GCC-15 since we don't enough time on GCC-14.=