You can see here: https://godbolt.org/z/d78646hWb The first case can't genreate vfwmul.vv but second case succeed. Failed to match this instruction: (set (reg:VNx2DF 150 [ vect__11.50 ]) (if_then_else:VNx2DF (unspec:VNx2BI [ (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 153) (const_int 2 [0x2]) repeated x2 (const_int 1 [0x1]) (const_int 7 [0x7]) (reg:SI 66 vl) (reg:SI 67 vtype) (reg:SI 69 N/A) ] UNSPEC_VPREDICATE) (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ])) (reg:VNx2DF 148 [ vect__8.49 ])) (unspec:VNx2DF [ (reg:SI 0 zero) ] UNSPEC_VUNDEF))) This patch is adding this combine pattern. juzhe.zhong@rivai.ai From: Jeff Law Date: 2023-06-29 00:24 To: Juzhe-Zhong; gcc-patches CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering On 6/27/23 22:15, Juzhe-Zhong wrote: > Consider the following complicate case: > #define TEST_TYPE(TYPE1, TYPE2) \ > __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( \ > TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, \ > TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b, \ > TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) \ > { \ > for (int i = 0; i < n; i++) \ > { \ > dst[i] = (TYPE1) a[i] * (TYPE1) b[i]; \ > dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i]; \ > dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i]; \ > dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i]; \ > } \ > } > > TEST_TYPE (double, float) > > Such complicate situation, Combine PASS can not combine extension of both operands on the fly. > So the combine PASS will first try to combine one of the combine extension, and then combine > the other. The combine flow is as follows: > > Original IR: > (set (reg 0) (float_extend: (reg 1)) > (set (reg 3) (float_extend: (reg 2)) > (set (reg 4) (mult: (reg 0) (reg 3)) > > First step of combine: > (set (reg 3) (float_extend: (reg 2)) > (set (reg 4) (mult: (float_extend: (reg 1) (reg 3)) > > Second step of combine: > (set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2)) > > So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md > which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))). Hmm, something doesn't make sense here. Combine knows how to do a 3->1 combination. I would expect to see the first step fail (substituting just one operand), then a later step try to combine all three instructions, substituting the extension for both input operands. Can you pass along the .combine dump from the failing case? Jeff