public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction
@ 2023-08-25 10:07 juzhe.zhong at rivai dot ai
2023-08-25 11:18 ` [Bug c/111153] " rdapp at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-08-25 10:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
Bug ID: 111153
Summary: RISC-V: Incorrect Vector cost model for reduction
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
Consider this following case:
#include <stdint.h>
#define DEF_REDUC_PLUS(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
reduc_plus_##TYPE (TYPE * __restrict a, int n) \
{ \
TYPE r = 0; \
for (int i = 0; i < n; ++i) \
r += a[i]; \
return r; \
}
#define TEST_PLUS(T) \
T (int32_t) \
TEST_PLUS (DEF_REDUC_PLUS)
-O3 --param=riscv-autovec-preference=scalable:
reduc_plus_int32_t:
ble a1,zero,.L8
addiw a5,a1,-1
li a4,4
addi sp,sp,-16
mv a2,a0
sext.w a3,a1
bleu a5,a4,.L9
srliw a4,a3,2
slli a4,a4,4
mv a5,a0
add a4,a4,a0
vsetivli zero,4,e32,m1,ta,ma
vmv.v.i v1,0
vse32.v v1,0(sp)
.L4:
vle32.v v1,0(a5)
vle32.v v2,0(sp)
addi a5,a5,16
vadd.vv v1,v2,v1
vse32.v v1,0(sp)
bne a4,a5,.L4
ld a5,0(sp)
lw a4,0(sp)
andi a1,a1,-4
srai a5,a5,32
addw a5,a4,a5
lw a4,8(sp)
addw a5,a5,a4
ld a4,8(sp)
srai a4,a4,32
addw a0,a5,a4
beq a3,a1,.L15
.L3:
subw a3,a3,a1
slli a5,a1,32
slli a3,a3,32
srli a3,a3,32
srli a5,a5,30
add a2,a2,a5
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli a4,zero,e32,m1,ta,ma
sub a1,a3,a5
vmv.v.i v1,0
vsetvli zero,a3,e32,m1,tu,ma
vle32.v v2,0(a2)
vmv.v.v v1,v2
bne a3,a5,.L21
.L7:
vsetvli a4,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a5,v1
addw a0,a0,a5
.L15:
addi sp,sp,16
jr ra
.L21:
slli a5,a5,2
add a2,a2,a5
vsetvli zero,a1,e32,m1,tu,ma
vle32.v v2,0(a2)
vadd.vv v1,v1,v2
j .L7
.L8:
li a0,0
ret
.L9:
li a1,0
li a0,0
j .L3
-O3 --param=riscv-autovec-preference=scalable -fno-vect-cost-model:
reduc_plus_int32_t:
ble a1,zero,.L4
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
vsetvli a3,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret
The current vector cost model generates inferiors codegen.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
@ 2023-08-25 11:18 ` rdapp at gcc dot gnu.org
2023-09-13 12:39 ` rdapp at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-08-25 11:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
--- Comment #1 from Robin Dapp <rdapp at gcc dot gnu.org> ---
We seem to decide that a slightly more expensive loop (one instruction more)
without an epilogue is better than a loop with an epilogue. This looks
intentional in the vectorizer cost estimation and is not specific to our lack
of a costing model. Hmm..
The main loops are (VLA):
.L3:
vsetvli a5,a1,e32,m1,tu,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
vs (VLS):
.L4:
vle32.v v1,0(a5)
vle32.v v2,0(sp)
addi a5,a5,16
vadd.vv v1,v2,v1
vse32.v v1,0(sp)
bne a4,a5,.L4
This is doubly weird because of the spill of the accumulator. We shouldn't be
generating this sequence but even if so, it should be more expensive. This can
be achieved e.g. by the following example vectorizer cost function:
static int
riscv_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
tree vectype,
int misalign ATTRIBUTE_UNUSED)
{
unsigned elements;
switch (type_of_cost)
{
case scalar_stmt:
case scalar_load:
case scalar_store:
case vector_stmt:
case vector_gather_load:
case vector_scatter_store:
case vec_to_scalar:
case scalar_to_vec:
case cond_branch_not_taken:
case vec_perm:
case vec_promote_demote:
case unaligned_load:
case unaligned_store:
return 1;
case vector_load:
case vector_store:
return 3;
case cond_branch_taken:
return 3;
case vec_construct:
elements = estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
return elements / 2 + 1;
default:
gcc_unreachable ();
}
}
For a proper loop like
vle32.v v2,0(sp)
.L4:
vle32.v v1,0(a5)
addi a5,a5,16
vadd.vv v1,v2,v1
bne a4,a5,.L4
vse32.v v1,0(sp)
I'm not so sure anymore. For large n this could be preferable depending on the
vectorization factor and other things.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
2023-08-25 11:18 ` [Bug c/111153] " rdapp at gcc dot gnu.org
@ 2023-09-13 12:39 ` rdapp at gcc dot gnu.org
2023-09-13 12:47 ` juzhe.zhong at rivai dot ai
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-13 12:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
--- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> ---
With the current trunk we don't spill anymore:
(VLS)
.L4:
vle32.v v2,0(a5)
vadd.vv v1,v1,v2
addi a5,a5,16
bne a5,a4,.L4
Considering just that loop I'd say costing works as designed. Even though the
epilog and boilerplate code seems "crude" the main loop is as short as it can
be and is IMHO preferable.
.L3:
vsetvli a5,a1,e32,m1,tu,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
This has 6 instructions (disregarding the jump) and can't be faster than the 3
instructions for the VLS loop. Provided we iterate often enough the VLS loop
should always be a win.
Regarding "looking slow" - I think ideally we would have the VLS loop followed
directly by the VLA loop for the residual iterations and next to no additional
statements. That would require changes in the vectorizer, though.
In total: I think the current behavior is reasonable.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
2023-08-25 11:18 ` [Bug c/111153] " rdapp at gcc dot gnu.org
2023-09-13 12:39 ` rdapp at gcc dot gnu.org
@ 2023-09-13 12:47 ` juzhe.zhong at rivai dot ai
2023-09-13 13:03 ` rdapp at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-13 12:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Robin Dapp from comment #2)
> With the current trunk we don't spill anymore:
>
> (VLS)
> .L4:
> vle32.v v2,0(a5)
> vadd.vv v1,v1,v2
> addi a5,a5,16
> bne a5,a4,.L4
>
> Considering just that loop I'd say costing works as designed. Even though
> the epilog and boilerplate code seems "crude" the main loop is as short as
> it can be and is IMHO preferable.
>
> .L3:
> vsetvli a5,a1,e32,m1,tu,ma
> slli a4,a5,2
> sub a1,a1,a5
> vle32.v v2,0(a0)
> add a0,a0,a4
> vadd.vv v1,v2,v1
> bne a1,zero,.L3
>
> This has 6 instructions (disregarding the jump) and can't be faster than the
> 3 instructions for the VLS loop. Provided we iterate often enough the VLS
> loop should always be a win.
>
> Regarding "looking slow" - I think ideally we would have the VLS loop
> followed directly by the VLA loop for the residual iterations and next to no
> additional statements. That would require changes in the vectorizer, though.
>
> In total: I think the current behavior is reasonable.
Oh. I see. I just checked it now.
.L4:
vle32.v v2,0(a5)
addi a5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
lui a4,%hi(.LC0)
lui a5,%hi(.LC1)
addi a4,a4,%lo(.LC0)
vlm.v v0,0(a4)
addi a5,a5,%lo(.LC1)
andi a1,a1,-4
vmv1r.v v2,v3
vlm.v v4,0(a5)
vcompress.vm v2,v1,v0
vmv1r.v v0,v4
vadd.vv v1,v2,v1
vcompress.vm v3,v1,v0
vadd.vv v3,v3,v1
vmv.x.s a0,v3
sext.w a0,a0
beq a3,a1,.L12
It seems that the codegen will be even better if we support VLS mode
reduction.
I aggree that we first take VLS reduction choice then move to VLA reduction
choice.
But I wonder ARM SVE doesn't use this approach since they also has VLS mode
(NEON/ADVSIMD).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
` (2 preceding siblings ...)
2023-09-13 12:47 ` juzhe.zhong at rivai dot ai
@ 2023-09-13 13:03 ` rdapp at gcc dot gnu.org
2023-09-18 8:25 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-13 13:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
--- Comment #4 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Yes, with VLS reduction this will improve.
On aarch64 + sve I see
loop inside costs: 2
This is similar to our VLS costs.
And their loop is indeed short:
ld1w z30.s, p7/z, [x0, x2, lsl 2]
add x2, x2, x3
add z31.s, p7/m, z31.s, z30.s
whilelo p7.s, w2, w1
b.any .L3
Not much to be squeezed out with a VLS approach. I guess that's why.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
` (3 preceding siblings ...)
2023-09-13 13:03 ` rdapp at gcc dot gnu.org
@ 2023-09-18 8:25 ` cvs-commit at gcc dot gnu.org
2023-12-14 6:51 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-09-18 8:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:fafd2502c5416fe4f69daf13224ab1efbf256a1c
commit r14-4086-gfafd2502c5416fe4f69daf13224ab1efbf256a1c
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Sun Sep 17 10:05:49 2023 +0800
RISC-V: Support VLS modes reduction[PR111153]
This patch supports VLS reduction vectorization.
It can optimize the current reduction vectorization codegen with current
COST model.
TYPE __attribute__ ((noinline, noclone)) \
reduc_plus_##TYPE (TYPE * __restrict a, int n) \
{ \
TYPE r = 0; \
for (int i = 0; i < n; ++i) \
r += a[i]; \
return r; \
}
T (int32_t) \
TEST_PLUS (DEF_REDUC_PLUS)
Before this patch:
vle32.v v2,0(a5)
addi a5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
lui a4,%hi(.LC0)
lui a5,%hi(.LC1)
addi a4,a4,%lo(.LC0)
vlm.v v0,0(a4)
addi a5,a5,%lo(.LC1)
andi a1,a1,-4
vmv1r.v v2,v3
vlm.v v4,0(a5)
vcompress.vm v2,v1,v0
vmv1r.v v0,v4
vadd.vv v1,v2,v1
vcompress.vm v3,v1,v0
vadd.vv v3,v3,v1
vmv.x.s a0,v3
sext.w a0,a0
beq a3,a1,.L12
After this patch:
vle32.v v2,0(a5)
addi a5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
li a5,0
andi a1,a1,-4
vmv.s.x v2,a5
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
beq a3,a1,.L12
PR target/111153
gcc/ChangeLog:
* config/riscv/autovec.md: Add VLS modes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS mode reduction
case.
* gcc.target/riscv/rvv/autovec/vls/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-15.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-16.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-17.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-18.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-9.c: New test.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
` (4 preceding siblings ...)
2023-09-18 8:25 ` cvs-commit at gcc dot gnu.org
@ 2023-12-14 6:51 ` cvs-commit at gcc dot gnu.org
2023-12-14 6:52 ` juzhe.zhong at rivai dot ai
2023-12-15 0:29 ` cvs-commit at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-14 6:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:5e0f67b84a615ba186ab234a9bc43df0df5a50b6
commit r14-6528-g5e0f67b84a615ba186ab234a9bc43df0df5a50b6
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Thu Dec 14 11:23:43 2023 +0800
RISC-V: Add RVV builtin vectorization cost model
This patch fixes PR11153:
ble a1,zero,.L8
addiw a5,a1,-1
li a4,4
addi sp,sp,-16
mv a2,a0
sext.w a3,a1
bleu a5,a4,.L9
srliw a4,a3,2
slli a4,a4,4
mv a5,a0
add a4,a4,a0
vsetivli zero,4,e32,m1,ta,ma
vmv.v.i v1,0
vse32.v v1,0(sp)
.L4:
vle32.v v1,0(a5) ---> This loop always processes 4 elements which
is ok for VLEN = 128bits, but waste a huge amount of computation units when
VLEN > 128bits
vle32.v v2,0(sp)
addi a5,a5,16
vadd.vv v1,v2,v1
vse32.v v1,0(sp)
bne a4,a5,.L4
ld a5,0(sp)
lw a4,0(sp)
andi a1,a1,-4
srai a5,a5,32
addw a5,a4,a5
lw a4,8(sp)
addw a5,a5,a4
ld a4,8(sp)
srai a4,a4,32
addw a0,a5,a4
beq a3,a1,.L15
.L3:
subw a3,a3,a1
slli a5,a1,32
slli a3,a3,32
srli a3,a3,32
srli a5,a5,30
add a2,a2,a5
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli a4,zero,e32,m1,ta,ma
sub a1,a3,a5
vmv.v.i v1,0
vsetvli zero,a3,e32,m1,tu,ma
vle32.v v2,0(a2)
vmv.v.v v1,v2
bne a3,a5,.L21
.L7:
vsetvli a4,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a5,v1
addw a0,a0,a5
.L15:
addi sp,sp,16
jr ra
.L21:
slli a5,a5,2
add a2,a2,a5
vsetvli zero,a1,e32,m1,tu,ma
vle32.v v2,0(a2)
vadd.vv v1,v1,v2
j .L7
.L8:
li a0,0
ret
.L9:
li a1,0
li a0,0
j .L3
The rootcause of this is we missed RVV builtin vectorization cost model.
After this patch:
ble a1,zero,.L4
vsetvli a5,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
slli a4,a5,2
sub a1,a1,a5
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
li a5,0
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret
PR target/111153
gcc/ChangeLog:
* config/riscv/riscv-protos.h (struct common_vector_cost): New
struct.
(struct scalable_vector_cost): Ditto.
(struct cpu_vector_cost): Ditto.
* config/riscv/riscv-vector-costs.cc (costs::add_stmt_cost): Add
RVV
builtin vectorization cost
* config/riscv/riscv.cc (struct riscv_tune_param): Ditto.
(get_common_costs): New function.
(riscv_builtin_vectorization_cost): Ditto.
(TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New targethook.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: New test.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
` (5 preceding siblings ...)
2023-12-14 6:51 ` cvs-commit at gcc dot gnu.org
@ 2023-12-14 6:52 ` juzhe.zhong at rivai dot ai
2023-12-15 0:29 ` cvs-commit at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-14 6:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
JuzheZhong <juzhe.zhong at rivai dot ai> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
--- Comment #7 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c/111153] RISC-V: Incorrect Vector cost model for reduction
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
` (6 preceding siblings ...)
2023-12-14 6:52 ` juzhe.zhong at rivai dot ai
@ 2023-12-15 0:29 ` cvs-commit at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-15 0:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:c7ef2189855a8cf12427a778cd5a31d42ddf6260
commit r14-6571-gc7ef2189855a8cf12427a778cd5a31d42ddf6260
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Thu Dec 14 21:45:59 2023 +0800
Middle-end: Do not model address cost for SELECT_VL style vectorization
Follow Richard's suggestions, we should not model address cost in the loop
vectorizer for select_vl or decrement IV since other style vectorization
doesn't
do that.
To make cost model comparison apple to apple.
This patch set COST from 2 to 1 which turns out have better codegen
in various codegen for RVV.
Ok for trunk ?
PR target/111153
gcc/ChangeLog:
* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
Remove address cost for select_vl/decrement IV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/pr11153-2.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/pr111153-1.c: New test.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-12-15 0:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25 10:07 [Bug c/111153] New: RISC-V: Incorrect Vector cost model for reduction juzhe.zhong at rivai dot ai
2023-08-25 11:18 ` [Bug c/111153] " rdapp at gcc dot gnu.org
2023-09-13 12:39 ` rdapp at gcc dot gnu.org
2023-09-13 12:47 ` juzhe.zhong at rivai dot ai
2023-09-13 13:03 ` rdapp at gcc dot gnu.org
2023-09-18 8:25 ` cvs-commit at gcc dot gnu.org
2023-12-14 6:51 ` cvs-commit at gcc dot gnu.org
2023-12-14 6:52 ` juzhe.zhong at rivai dot ai
2023-12-15 0:29 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).