Hi, Jeff.

That's odd. I think maybe you should first clean up your environment ?
Or you didn't build up the toolchain correctly with this patch?

Compile option: --param=riscv-autovec-preference=scalable -O3 -ffast-math
Before this patch:
https://godbolt.org/z/Y5d44WMqs 

fail.s:

lw t5,0(sp)
ble t5,zero,.L5
.L3:
vsetvli t1,t5,e32,mf2,ta,ma
vle32.v v2,0(a4)
vle32.v v1,0(a5)
vsetvli t0,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v3,v2
vfwcvt.f.f.v v2,v1
vsetvli zero,t1,e32,mf2,ta,ma
vle32.v v5,0(a6)
vle32.v v4,0(a7)
vsetvli t0,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v1,v5
vsetvli zero,zero,e64,m1,ta,ma
vfmul.vv v5,v2,v3
vfmul.vv v2,v1,v2
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v2,0(a1)
vse64.v v5,0(a0)
vsetvli t6,zero,e64,m1,ta,ma
vfmul.vv v1,v1,v3
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v2,v4
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v1,0(a2)
vsetvli t6,zero,e64,m1,ta,ma
slli t4,t1,2
slli t3,t1,3
vfmul.vv v1,v2,v3
sub t5,t5,t1
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v1,0(a3)
add a4,a4,t4
add a5,a5,t4
add a0,a0,t3
add a6,a6,t4
add a1,a1,t3
add a2,a2,t3
add a7,a7,t4
add a3,a3,t3
bne t5,zero,.L3
.L5:
ret

After this patch:
pass.s:

lw t5,0(sp)
ble t5,zero,.L5
.L3:
vsetvli t1,t5,e32,mf2,ta,ma
vle32.v v1,0(a4)
vle32.v v3,0(a5)
vle32.v v2,0(a6)
vle32.v v4,0(a7)
vsetvli t6,zero,e32,mf2,ta,ma
vfwmul.vv v5,v3,v2
vfwmul.vv v6,v1,v3
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v6,0(a0)
vse64.v v5,0(a1)
vsetvli t6,zero,e32,mf2,ta,ma
slli t4,t1,2
slli t3,t1,3
vfwmul.vv v3,v2,v1
sub t5,t5,t1
vfwmul.vv v2,v1,v4
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v3,0(a2)
vse64.v v2,0(a3)
add a4,a4,t4
add a5,a5,t4
add a0,a0,t3
add a6,a6,t4
add a1,a1,t3
add a2,a2,t3
add a7,a7,t4
add a3,a3,t3
bne t5,zero,.L3
.L5:
ret

It's very obvious the codegen with this patch is perfect.

I have attached the .S in this patch.

I am not claiming that this patch solution is the only solution.

I am welcome you can provide another solution as long as you can make this codegen become the perfect codegen that this patch achieved.

I think maybe you should make sure you are using the correct toolchain that built with patch.

Thanks.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 07:48
To: juzhe.zhong
CC: gcc-patches; kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/29/23 17:46, juzhe.zhong wrote:
> You should try the example check the codegen before and after the patch. 
> You will understand it.
I've already done that.  It makes _no_ difference on the godbold example.
 
Jeff