[Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction
@ 2023-08-25 10:07 juzhe.zhong at rivai dot ai
  2023-08-25 11:18 ` [Bug c/111153] " rdapp at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-08-25 10:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

            Bug ID: 111153
           Summary: RISC-V: Incorrect Vector cost model for reduction
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Consider this following case:

#include <stdint.h>

#define DEF_REDUC_PLUS(TYPE)                    \
TYPE __attribute__ ((noinline, noclone))        \
reduc_plus_##TYPE (TYPE * __restrict a, int n)          \
{                                               \
  TYPE r = 0;                                   \
  for (int i = 0; i < n; ++i)                   \
    r += a[i];                                  \
  return r;                                     \
}

#define TEST_PLUS(T)                            \
  T (int32_t)                                   \

TEST_PLUS (DEF_REDUC_PLUS)

 -O3 --param=riscv-autovec-preference=scalable:

reduc_plus_int32_t:
        ble     a1,zero,.L8
        addiw   a5,a1,-1
        li      a4,4
        addi    sp,sp,-16
        mv      a2,a0
        sext.w  a3,a1
        bleu    a5,a4,.L9
        srliw   a4,a3,2
        slli    a4,a4,4
        mv      a5,a0
        add     a4,a4,a0
        vsetivli        zero,4,e32,m1,ta,ma
        vmv.v.i v1,0
        vse32.v v1,0(sp)
.L4:
        vle32.v v1,0(a5)
        vle32.v v2,0(sp)
        addi    a5,a5,16
        vadd.vv v1,v2,v1
        vse32.v v1,0(sp)
        bne     a4,a5,.L4
        ld      a5,0(sp)
        lw      a4,0(sp)
        andi    a1,a1,-4
        srai    a5,a5,32
        addw    a5,a4,a5
        lw      a4,8(sp)
        addw    a5,a5,a4
        ld      a4,8(sp)
        srai    a4,a4,32
        addw    a0,a5,a4
        beq     a3,a1,.L15
.L3:
        subw    a3,a3,a1
        slli    a5,a1,32
        slli    a3,a3,32
        srli    a3,a3,32
        srli    a5,a5,30
        add     a2,a2,a5
        vsetvli a5,a3,e8,mf4,tu,mu
        vsetvli a4,zero,e32,m1,ta,ma
        sub     a1,a3,a5
        vmv.v.i v1,0
        vsetvli zero,a3,e32,m1,tu,ma
        vle32.v v2,0(a2)
        vmv.v.v v1,v2
        bne     a3,a5,.L21
.L7:
        vsetvli a4,zero,e32,m1,ta,ma
        vmv.s.x v2,zero
        vredsum.vs      v1,v1,v2
        vmv.x.s a5,v1
        addw    a0,a0,a5
.L15:
        addi    sp,sp,16
        jr      ra
.L21:
        slli    a5,a5,2
        add     a2,a2,a5
        vsetvli zero,a1,e32,m1,tu,ma
        vle32.v v2,0(a2)
        vadd.vv v1,v1,v2
        j       .L7
.L8:
        li      a0,0
        ret
.L9:
        li      a1,0
        li      a0,0
        j       .L3

-O3 --param=riscv-autovec-preference=scalable -fno-vect-cost-model:
reduc_plus_int32_t:
        ble     a1,zero,.L4
        vsetvli a3,zero,e32,m1,ta,ma
        vmv.v.i v1,0
.L3:
        vsetvli a5,a1,e32,m1,tu,ma
        slli    a4,a5,2
        sub     a1,a1,a5
        vle32.v v2,0(a0)
        add     a0,a0,a4
        vadd.vv v1,v2,v1
        bne     a1,zero,.L3
        vsetvli a3,zero,e32,m1,ta,ma
        vmv.s.x v2,zero
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret
.L4:
        li      a0,0
        ret

The current vector cost model generates inferiors codegen.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
  2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
@ 2023-08-25 11:18 ` rdapp at gcc dot gnu.org
  2023-09-13 12:39 ` rdapp at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-08-25 11:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

--- Comment #1 from Robin Dapp <rdapp at gcc dot gnu.org> ---
We seem to decide that a slightly more expensive loop (one instruction more)
without an epilogue is better than a loop with an epilogue.  This looks
intentional in the vectorizer cost estimation and is not specific to our lack
of a costing model.  Hmm..

The main loops are (VLA):
.L3:
        vsetvli a5,a1,e32,m1,tu,ma
        slli    a4,a5,2
        sub     a1,a1,a5
        vle32.v v2,0(a0)
        add     a0,a0,a4
        vadd.vv v1,v2,v1
        bne     a1,zero,.L3

vs (VLS):
.L4:
        vle32.v v1,0(a5)
        vle32.v v2,0(sp)
        addi    a5,a5,16
        vadd.vv v1,v2,v1
        vse32.v v1,0(sp)
        bne     a4,a5,.L4

This is doubly weird because of the spill of the accumulator.  We shouldn't be
generating this sequence but even if so, it should be more expensive.  This can
be achieved e.g. by the following example vectorizer cost function:

static int
riscv_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
                                 tree vectype,
                                 int misalign ATTRIBUTE_UNUSED)
{
  unsigned elements;

  switch (type_of_cost)
    {
      case scalar_stmt:
      case scalar_load:
      case scalar_store:
      case vector_stmt:
      case vector_gather_load:
      case vector_scatter_store:
      case vec_to_scalar:
      case scalar_to_vec:
      case cond_branch_not_taken:
      case vec_perm:
      case vec_promote_demote:
      case unaligned_load:
      case unaligned_store:
        return 1;

      case vector_load:
      case vector_store:
        return 3;

      case cond_branch_taken:
        return 3;

      case vec_construct:
        elements = estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
        return elements / 2 + 1;

      default:
        gcc_unreachable ();
    }
}

For a proper loop like
        vle32.v v2,0(sp)
.L4:
        vle32.v v1,0(a5)
        addi    a5,a5,16
        vadd.vv v1,v2,v1
        bne     a4,a5,.L4
        vse32.v v1,0(sp)
I'm not so sure anymore.  For large n this could be preferable depending on the
vectorization factor and other things.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
  2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
  2023-08-25 11:18 ` [Bug c/111153] " rdapp at gcc dot gnu.org
@ 2023-09-13 12:39 ` rdapp at gcc dot gnu.org
  2023-09-13 12:47 ` juzhe.zhong at rivai dot ai
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-13 12:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

--- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> ---
With the current trunk we don't spill anymore:

(VLS)
.L4:
        vle32.v v2,0(a5)
        vadd.vv v1,v1,v2
        addi    a5,a5,16
        bne     a5,a4,.L4

Considering just that loop I'd say costing works as designed.  Even though the
epilog and boilerplate code seems "crude" the main loop is as short as it can
be and is IMHO preferable.

.L3:
        vsetvli a5,a1,e32,m1,tu,ma
        slli    a4,a5,2
        sub     a1,a1,a5
        vle32.v v2,0(a0)
        add     a0,a0,a4
        vadd.vv v1,v2,v1
        bne     a1,zero,.L3

This has 6 instructions (disregarding the jump) and can't be faster than the 3
instructions for the VLS loop.  Provided we iterate often enough the VLS loop
should always be a win.

Regarding "looking slow" - I think ideally we would have the VLS loop followed
directly by the VLA loop for the residual iterations and next to no additional
statements.  That would require changes in the vectorizer, though.

In total: I think the current behavior is reasonable.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
  2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
  2023-08-25 11:18 ` [Bug c/111153] " rdapp at gcc dot gnu.org
  2023-09-13 12:39 ` rdapp at gcc dot gnu.org
@ 2023-09-13 12:47 ` juzhe.zhong at rivai dot ai
  2023-09-13 13:03 ` rdapp at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-13 12:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Robin Dapp from comment #2)
> With the current trunk we don't spill anymore:
> 
> (VLS)
> .L4:
> 	vle32.v	v2,0(a5)
> 	vadd.vv	v1,v1,v2
> 	addi	a5,a5,16
> 	bne	a5,a4,.L4
> 
> Considering just that loop I'd say costing works as designed.  Even though
> the epilog and boilerplate code seems "crude" the main loop is as short as
> it can be and is IMHO preferable.
> 
> .L3:
>         vsetvli a5,a1,e32,m1,tu,ma
>         slli    a4,a5,2
>         sub     a1,a1,a5
>         vle32.v v2,0(a0)
>         add     a0,a0,a4
>         vadd.vv v1,v2,v1
>         bne     a1,zero,.L3
> 
> This has 6 instructions (disregarding the jump) and can't be faster than the
> 3 instructions for the VLS loop.  Provided we iterate often enough the VLS
> loop should always be a win.
> 
> Regarding "looking slow" - I think ideally we would have the VLS loop
> followed directly by the VLA loop for the residual iterations and next to no
> additional statements.  That would require changes in the vectorizer, though.
> 
> In total: I think the current behavior is reasonable.

Oh. I see. I just checked it now.
.L4:
        vle32.v v2,0(a5)
        addi    a5,a5,16
        vadd.vv v1,v1,v2
        bne     a5,a4,.L4
        lui     a4,%hi(.LC0)
        lui     a5,%hi(.LC1)
        addi    a4,a4,%lo(.LC0)
        vlm.v   v0,0(a4)
        addi    a5,a5,%lo(.LC1)
        andi    a1,a1,-4
        vmv1r.v v2,v3
        vlm.v   v4,0(a5)
        vcompress.vm    v2,v1,v0
        vmv1r.v v0,v4
        vadd.vv v1,v2,v1
        vcompress.vm    v3,v1,v0
        vadd.vv v3,v3,v1
        vmv.x.s a0,v3
        sext.w  a0,a0
        beq     a3,a1,.L12

It seems that the codegen will be even better if we support VLS mode
reduction.

I aggree that we first take VLS reduction choice then move to VLA reduction
choice.

But I wonder ARM SVE doesn't use this approach since they also has VLS mode
(NEON/ADVSIMD).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
  2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2023-09-13 12:47 ` juzhe.zhong at rivai dot ai
@ 2023-09-13 13:03 ` rdapp at gcc dot gnu.org
  2023-09-18  8:25 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-13 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

--- Comment #4 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Yes, with VLS reduction this will improve.

On aarch64 + sve I see
loop inside costs: 2
This is similar to our VLS costs.

And their loop is indeed short:

        ld1w    z30.s, p7/z, [x0, x2, lsl 2]
        add     x2, x2, x3
        add     z31.s, p7/m, z31.s, z30.s
        whilelo p7.s, w2, w1
        b.any   .L3

Not much to be squeezed out with a VLS approach.  I guess that's why.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
  2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2023-09-13 13:03 ` rdapp at gcc dot gnu.org
@ 2023-09-18  8:25 ` cvs-commit at gcc dot gnu.org
  2023-12-14  6:51 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-09-18  8:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:fafd2502c5416fe4f69daf13224ab1efbf256a1c

commit r14-4086-gfafd2502c5416fe4f69daf13224ab1efbf256a1c
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Sun Sep 17 10:05:49 2023 +0800

    RISC-V: Support VLS modes reduction[PR111153]

    This patch supports VLS reduction vectorization.

    It can optimize the current reduction vectorization codegen with current
COST model.

    TYPE __attribute__ ((noinline, noclone))        \
    reduc_plus_##TYPE (TYPE * __restrict a, int n)          \
    {                                               \
      TYPE r = 0;                                   \
      for (int i = 0; i < n; ++i)                   \
        r += a[i];                                  \
      return r;                                     \
    }

      T (int32_t)                                   \

    TEST_PLUS (DEF_REDUC_PLUS)

    Before this patch:

            vle32.v v2,0(a5)
            addi    a5,a5,16
            vadd.vv v1,v1,v2
            bne     a5,a4,.L4
            lui     a4,%hi(.LC0)
            lui     a5,%hi(.LC1)
            addi    a4,a4,%lo(.LC0)
            vlm.v   v0,0(a4)
            addi    a5,a5,%lo(.LC1)
            andi    a1,a1,-4
            vmv1r.v v2,v3
            vlm.v   v4,0(a5)
            vcompress.vm    v2,v1,v0
            vmv1r.v v0,v4
            vadd.vv v1,v2,v1
            vcompress.vm    v3,v1,v0
            vadd.vv v3,v3,v1
            vmv.x.s a0,v3
            sext.w  a0,a0
            beq     a3,a1,.L12

    After this patch:

            vle32.v v2,0(a5)
            addi    a5,a5,16
            vadd.vv v1,v1,v2
            bne     a5,a4,.L4
            li      a5,0
            andi    a1,a1,-4
            vmv.s.x v2,a5
            vredsum.vs      v1,v1,v2
            vmv.x.s a0,v1
            beq     a3,a1,.L12

            PR target/111153

    gcc/ChangeLog:

            * config/riscv/autovec.md: Add VLS modes.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS mode reduction
case.
            * gcc.target/riscv/rvv/autovec/vls/reduc-1.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-10.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-11.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-12.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-13.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-14.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-15.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-16.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-17.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-18.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-19.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-2.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-20.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-21.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-3.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-4.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-5.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-6.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-7.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-8.c: New test.
            * gcc.target/riscv/rvv/autovec/vls/reduc-9.c: New test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
  2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2023-09-18  8:25 ` cvs-commit at gcc dot gnu.org
@ 2023-12-14  6:51 ` cvs-commit at gcc dot gnu.org
  2023-12-14  6:52 ` juzhe.zhong at rivai dot ai
  2023-12-15  0:29 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-14  6:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:5e0f67b84a615ba186ab234a9bc43df0df5a50b6

commit r14-6528-g5e0f67b84a615ba186ab234a9bc43df0df5a50b6
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Thu Dec 14 11:23:43 2023 +0800

    RISC-V: Add RVV builtin vectorization cost model

    This patch fixes PR11153:

            ble     a1,zero,.L8
            addiw   a5,a1,-1
            li      a4,4
            addi    sp,sp,-16
            mv      a2,a0
            sext.w  a3,a1
            bleu    a5,a4,.L9
            srliw   a4,a3,2
            slli    a4,a4,4
            mv      a5,a0
            add     a4,a4,a0
            vsetivli        zero,4,e32,m1,ta,ma
            vmv.v.i v1,0
            vse32.v v1,0(sp)
    .L4:
            vle32.v v1,0(a5) ---> This loop always processes 4 elements which
is ok for VLEN = 128bits, but waste a huge amount of computation units when
VLEN > 128bits
            vle32.v v2,0(sp)
            addi    a5,a5,16
            vadd.vv v1,v2,v1
            vse32.v v1,0(sp)
            bne     a4,a5,.L4
            ld      a5,0(sp)
            lw      a4,0(sp)
            andi    a1,a1,-4
            srai    a5,a5,32
            addw    a5,a4,a5
            lw      a4,8(sp)
            addw    a5,a5,a4
            ld      a4,8(sp)
            srai    a4,a4,32
            addw    a0,a5,a4
            beq     a3,a1,.L15
    .L3:
            subw    a3,a3,a1
            slli    a5,a1,32
            slli    a3,a3,32
            srli    a3,a3,32
            srli    a5,a5,30
            add     a2,a2,a5
            vsetvli a5,a3,e8,mf4,tu,mu
            vsetvli a4,zero,e32,m1,ta,ma
            sub     a1,a3,a5
            vmv.v.i v1,0
            vsetvli zero,a3,e32,m1,tu,ma
            vle32.v v2,0(a2)
            vmv.v.v v1,v2
            bne     a3,a5,.L21
    .L7:
            vsetvli a4,zero,e32,m1,ta,ma
            vmv.s.x v2,zero
            vredsum.vs      v1,v1,v2
            vmv.x.s a5,v1
            addw    a0,a0,a5
    .L15:
            addi    sp,sp,16
            jr      ra
    .L21:
            slli    a5,a5,2
            add     a2,a2,a5
            vsetvli zero,a1,e32,m1,tu,ma
            vle32.v v2,0(a2)
            vadd.vv v1,v1,v2
            j       .L7
    .L8:
            li      a0,0
            ret
    .L9:
            li      a1,0
            li      a0,0
            j       .L3

    The rootcause of this is we missed RVV builtin vectorization cost model.

    After this patch:

            ble     a1,zero,.L4
            vsetvli a5,zero,e32,m1,ta,ma
            vmv.v.i v1,0
    .L3:
            vsetvli a5,a1,e32,m1,tu,ma
            vle32.v v2,0(a0)
            slli    a4,a5,2
            sub     a1,a1,a5
            add     a0,a0,a4
            vadd.vv v1,v2,v1
            bne     a1,zero,.L3
            li      a5,0
            vsetivli        zero,1,e32,m1,ta,ma
            vmv.s.x v2,a5
            vsetvli a5,zero,e32,m1,ta,ma
            vredsum.vs      v1,v1,v2
            vmv.x.s a0,v1
            ret
    .L4:
            li      a0,0
            ret

            PR target/111153

    gcc/ChangeLog:

            * config/riscv/riscv-protos.h (struct common_vector_cost): New
struct.
            (struct scalable_vector_cost): Ditto.
            (struct cpu_vector_cost): Ditto.
            * config/riscv/riscv-vector-costs.cc (costs::add_stmt_cost): Add
RVV
            builtin vectorization cost
            * config/riscv/riscv.cc (struct riscv_tune_param): Ditto.
            (get_common_costs): New function.
            (riscv_builtin_vectorization_cost): Ditto.
            (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New targethook.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: New test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
  2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2023-12-14  6:51 ` cvs-commit at gcc dot gnu.org
@ 2023-12-14  6:52 ` juzhe.zhong at rivai dot ai
  2023-12-15  0:29 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-14  6:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
  2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
                   ` (6 preceding siblings ...)
  2023-12-14  6:52 ` juzhe.zhong at rivai dot ai
@ 2023-12-15  0:29 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-15  0:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:c7ef2189855a8cf12427a778cd5a31d42ddf6260

commit r14-6571-gc7ef2189855a8cf12427a778cd5a31d42ddf6260
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Thu Dec 14 21:45:59 2023 +0800

    Middle-end: Do not model address cost for SELECT_VL style vectorization

    Follow Richard's suggestions, we should not model address cost in the loop
    vectorizer for select_vl or decrement IV since other style vectorization
doesn't
    do that.

    To make cost model comparison apple to apple.
    This patch set COST from 2 to 1 which turns out have better codegen
    in various codegen for RVV.

    Ok for trunk ?

            PR target/111153

    gcc/ChangeLog:

            * tree-vect-loop.cc (vect_estimate_min_profitable_iters):
            Remove address cost for select_vl/decrement IV.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: Moved to...
            * gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c: ...here.
            * gcc.dg/vect/costmodel/riscv/rvv/pr111153-1.c: New test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-12-15  0:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
2023-08-25 11:18 ` [Bug c/111153] " rdapp at gcc dot gnu.org
2023-09-13 12:39 ` rdapp at gcc dot gnu.org
2023-09-13 12:47 ` juzhe.zhong at rivai dot ai
2023-09-13 13:03 ` rdapp at gcc dot gnu.org
2023-09-18  8:25 ` cvs-commit at gcc dot gnu.org
2023-12-14  6:51 ` cvs-commit at gcc dot gnu.org
2023-12-14  6:52 ` juzhe.zhong at rivai dot ai
2023-12-15  0:29 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).