[Bug c/111317] New: RISC-V: Incorrect COST model for RVV conversions

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/111317] New: RISC-V: Incorrect COST model for RVV conversions
@ 2023-09-07  7:09 juzhe.zhong at rivai dot ai
  2023-09-12 14:29 ` [Bug target/111317] " rdapp at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-07  7:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111317

            Bug ID: 111317
           Summary: RISC-V: Incorrect COST model for RVV conversions
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

#include <stdint.h>

void foo (int32_t *__restrict a, int64_t * __restrict b, int n)
{
    for (int i = 0; i < n; i++)
      b[i] = (int64_t)a[i];
}

--param=riscv-autovec-preference=scalable -O3 -fopt-info-vec-missed:
Failed to vectorize:

<source>:5:23: missed: couldn't vectorize loop
<source>:6:24: missed: not vectorized: no vectype for stmt: _4 = *_3;

However, try -fno-vect-cost-model.

We must adjust the COST model for RVV corretly.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111317] RISC-V: Incorrect COST model for RVV conversions
  2023-09-07  7:09 [Bug c/111317] New: RISC-V: Incorrect COST model for RVV conversions juzhe.zhong at rivai dot ai
@ 2023-09-12 14:29 ` rdapp at gcc dot gnu.org
  2023-12-13 11:52 ` cvs-commit at gcc dot gnu.org
  2023-12-13 11:54 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-12 14:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111317

--- Comment #1 from Robin Dapp <rdapp at gcc dot gnu.org> ---
I think the default cost model is not too bad for these simple cases.  Our
emitted instructions match gimple pretty well.

The thing we don't model is vsetvl.  We could ignore it under the assumption
that it is going to be rather cheap on most uarchs.

Something that needs to be fixed is the general costing used for
length-masking:

            /* Each may need two MINs and one MINUS to update lengths in body
               for next iteration.  */
            if (need_iterate_p)
              body_stmts += 3 * num_vectors;

We don't actually need min with vsetvl (they are our mins) so this would need
to be adjusted down, provided vsetvl is cheap.  

This is the scalar baseline:
.L3:
        lw      a5,0(a0)
        sd      a5,0(a1)
        addi    a0,a0,4
        addi    a1,a1,8
        bne     a4,a0,.L3

While this is what zvl128b would emit:
 .L3:
        vsetvli a5,a2,e8,mf8,ta,ma
        vle32.v v2,0(a0)
        vsetvli a4,zero,e64,m1,ta,ma
        vsext.vf2       v1,v2
        vsetvli zero,a2,e64,m1,ta,ma
        vse64.v v1,0(a1)
        slli    a4,a5,2
        add     a0,a0,a4
        slli    a4,a5,3
        add     a1,a1,a4
        sub     a2,a2,a5
        bne     a2,zero,.L3

With a vectorization factor of 2 (might effectively be higher of course but
possibly unknown at compile time) I'm not sure vectorization is always a win
and the costs actually reflect that.  If we disregard vsetvl for now we have 8
instructions in the vectorized loop and 2 * 4 instructions in the scalar loop
for the same amount of data.  Factoring in the vsetvls I'd say it's worse.
Once we statically know the VF is higher, we will vectorize.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111317] RISC-V: Incorrect COST model for RVV conversions
  2023-09-07  7:09 [Bug c/111317] New: RISC-V: Incorrect COST model for RVV conversions juzhe.zhong at rivai dot ai
  2023-09-12 14:29 ` [Bug target/111317] " rdapp at gcc dot gnu.org
@ 2023-12-13 11:52 ` cvs-commit at gcc dot gnu.org
  2023-12-13 11:54 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-13 11:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111317

--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:f6d787c231905063dc3b55ce7028e348b74719be

commit r14-6488-gf6d787c231905063dc3b55ce7028e348b74719be
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Wed Dec 13 17:21:07 2023 +0800

    Middle-end: Adjust decrement IV style partial vectorization COST model

    Hi, before this patch, a simple conversion case for RVV codegen:

    foo:
            ble     a2,zero,.L8
            addiw   a5,a2,-1
            li      a4,6
            bleu    a5,a4,.L6
            srliw   a3,a2,3
            slli    a3,a3,3
            add     a3,a3,a0
            mv      a5,a0
            mv      a4,a1
            vsetivli        zero,8,e16,m1,ta,ma
    .L4:
            vle8.v  v2,0(a5)
            addi    a5,a5,8
            vzext.vf2       v1,v2
            vse16.v v1,0(a4)
            addi    a4,a4,16
            bne     a3,a5,.L4
            andi    a5,a2,-8
            beq     a2,a5,.L10
    .L3:
            slli    a4,a5,32
            srli    a4,a4,32
            subw    a2,a2,a5
            slli    a2,a2,32
            slli    a5,a4,1
            srli    a2,a2,32
            add     a0,a0,a4
            add     a1,a1,a5
            vsetvli zero,a2,e16,m1,ta,ma
            vle8.v  v2,0(a0)
            vzext.vf2       v1,v2
            vse16.v v1,0(a1)
    .L8:
            ret
    .L10:
            ret
    .L6:
            li      a5,0
            j       .L3

    This vectorization go through first loop:

            vsetivli        zero,8,e16,m1,ta,ma
    .L4:
            vle8.v  v2,0(a5)
            addi    a5,a5,8
            vzext.vf2       v1,v2
            vse16.v v1,0(a4)
            addi    a4,a4,16
            bne     a3,a5,.L4

    Each iteration processes 8 elements.

    For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN =
128.
    But, as long as VLEN > 128 bits, it will waste the CPU resources. That is,
e.g. VLEN = 256bits.
    only half of the vector units are working and another half is idle.

    After investigation, I realize that I forgot to adjust COST for SELECT_VL.
    So, adjust COST for SELECT_VL styple length vectorization. We adjust COST
from 3 to 2. since
    after this patch:

    foo:
            ble     a2,zero,.L5
    .L3:
            vsetvli a5,a2,e16,m1,ta,ma     -----> SELECT_VL cost.
            vle8.v  v2,0(a0)
            slli    a4,a5,1                -----> additional shift of outcome
SELECT_VL for memory address calculation.
            vzext.vf2       v1,v2
            sub     a2,a2,a5
            vse16.v v1,0(a1)
            add     a0,a0,a5
            add     a1,a1,a4
            bne     a2,zero,.L3
    .L5:
            ret

    This patch is a simple fix that I previous forgot.

    Ok for trunk ?

    If not, I am going to adjust cost in backend cost model.

            PR target/111317

    gcc/ChangeLog:

            * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust
for COST for decrement IV.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111317] RISC-V: Incorrect COST model for RVV conversions
  2023-09-07  7:09 [Bug c/111317] New: RISC-V: Incorrect COST model for RVV conversions juzhe.zhong at rivai dot ai
  2023-09-12 14:29 ` [Bug target/111317] " rdapp at gcc dot gnu.org
  2023-12-13 11:52 ` cvs-commit at gcc dot gnu.org
@ 2023-12-13 11:54 ` juzhe.zhong at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-13 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111317

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed on the trunk.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-12-13 11:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-07  7:09 [Bug c/111317] New: RISC-V: Incorrect COST model for RVV conversions juzhe.zhong at rivai dot ai
2023-09-12 14:29 ` [Bug target/111317] " rdapp at gcc dot gnu.org
2023-12-13 11:52 ` cvs-commit at gcc dot gnu.org
2023-12-13 11:54 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).