From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 48BC33857C58; Wed, 13 Dec 2023 11:52:19 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 48BC33857C58
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1702468339;
	bh=aamKfNrxOWu5POEPufb389TRWv+qqv1vO9mQ1i9Jb9E=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=aBwTHF7o/Fs0hQ3gRTHMfeK3B6wEDaSNpskGSeGWlHPxR/ko4nbDL5W0ZoJDFTuRf
	 PZtuLCDCReKwc/zyWdp0A8GsoCJ1FsWV3CmtlbPZ4VoSltjqViNWRSGj219pQvrQfE
	 ZJruWy7zt+mh2rMCkskA43oE65cKNZmImxRVslXE=
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/111317] RISC-V: Incorrect COST model for RVV conversions
Date: Wed, 13 Dec 2023 11:52:18 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: cvs-commit at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-111317-4-0sH7FU2YiQ@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-111317-4@http.gcc.gnu.org/bugzilla/>
References: <bug-111317-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111317
--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:f6d787c231905063dc3b55ce7028e348b74719be

commit r14-6488-gf6d787c231905063dc3b55ce7028e348b74719be
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Wed Dec 13 17:21:07 2023 +0800

    Middle-end: Adjust decrement IV style partial vectorization COST model

    Hi, before this patch, a simple conversion case for RVV codegen:

    foo:
            ble     a2,zero,.L8
            addiw   a5,a2,-1
            li      a4,6
            bleu    a5,a4,.L6
            srliw   a3,a2,3
            slli    a3,a3,3
            add     a3,a3,a0
            mv      a5,a0
            mv      a4,a1
            vsetivli        zero,8,e16,m1,ta,ma
    .L4:
            vle8.v  v2,0(a5)
            addi    a5,a5,8
            vzext.vf2       v1,v2
            vse16.v v1,0(a4)
            addi    a4,a4,16
            bne     a3,a5,.L4
            andi    a5,a2,-8
            beq     a2,a5,.L10
    .L3:
            slli    a4,a5,32
            srli    a4,a4,32
            subw    a2,a2,a5
            slli    a2,a2,32
            slli    a5,a4,1
            srli    a2,a2,32
            add     a0,a0,a4
            add     a1,a1,a5
            vsetvli zero,a2,e16,m1,ta,ma
            vle8.v  v2,0(a0)
            vzext.vf2       v1,v2
            vse16.v v1,0(a1)
    .L8:
            ret
    .L10:
            ret
    .L6:
            li      a5,0
            j       .L3

    This vectorization go through first loop:

            vsetivli        zero,8,e16,m1,ta,ma
    .L4:
            vle8.v  v2,0(a5)
            addi    a5,a5,8
            vzext.vf2       v1,v2
            vse16.v v1,0(a4)
            addi    a4,a4,16
            bne     a3,a5,.L4

    Each iteration processes 8 elements.

    For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLE=
N =3D
128.
    But, as long as VLEN > 128 bits, it will waste the CPU resources. That =
is,
e.g. VLEN =3D 256bits.
    only half of the vector units are working and another half is idle.

    After investigation, I realize that I forgot to adjust COST for SELECT_=
VL.
    So, adjust COST for SELECT_VL styple length vectorization. We adjust CO=
ST
from 3 to 2. since
    after this patch:

    foo:
            ble     a2,zero,.L5
    .L3:
            vsetvli a5,a2,e16,m1,ta,ma     -----> SELECT_VL cost.
            vle8.v  v2,0(a0)
            slli    a4,a5,1                -----> additional shift of outco=
me
SELECT_VL for memory address calculation.
            vzext.vf2       v1,v2
            sub     a2,a2,a5
            vse16.v v1,0(a1)
            add     a0,a0,a5
            add     a1,a1,a4
            bne     a2,zero,.L3
    .L5:
            ret

    This patch is a simple fix that I previous forgot.

    Ok for trunk ?

    If not, I am going to adjust cost in backend cost model.

            PR target/111317

    gcc/ChangeLog:

            * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust
for COST for decrement IV.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.=