You can see here:

https://godbolt.org/z/d78646hWb 

The first case can't genreate vfwmul.vv but second case succeed.

Failed to match this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
    (if_then_else:VNx2DF (unspec:VNx2BI [
                (const_vector:VNx2BI repeat [
                        (const_int 1 [0x1])
                    ])
                (reg:DI 153)
                (const_int 2 [0x2]) repeated x2
                (const_int 1 [0x1])
                (const_int 7 [0x7])
                (reg:SI 66 vl)
                (reg:SI 67 vtype)
                (reg:SI 69 N/A)
            ] UNSPEC_VPREDICATE)
        (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
            (reg:VNx2DF 148 [ vect__8.49 ]))
        (unspec:VNx2DF [
                (reg:SI 0 zero)
            ] UNSPEC_VUNDEF)))


This patch is adding this combine pattern.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 00:24
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/27/23 22:15, Juzhe-Zhong wrote:
> Consider the following complicate case:
> #define TEST_TYPE(TYPE1, TYPE2)                                                \
>    __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
>      TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
>      TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
>      TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
>    {                                                                            \
>      for (int i = 0; i < n; i++)                                                \
>        {                                                                        \
> dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
> dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
> dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
> dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
>        }                                                                        \
>    }
> 
> TEST_TYPE (double, float)
> 
> Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
> So the combine PASS will first try to combine one of the combine extension, and then combine
> the other. The combine flow is as follows:
> 
> Original IR:
> (set (reg 0) (float_extend: (reg 1))
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (reg 0) (reg 3))
> 
> First step of combine:
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
> 
> Second step of combine:
> (set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
> 
> So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
> which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
Hmm, something doesn't make sense here.  Combine knows how to do a 3->1 
combination.  I would expect to see the first step fail (substituting 
just one operand), then a later step try to combine all three 
instructions, substituting the extension for both input operands.
 
Can you pass along the .combine dump from the failing case?
 
Jeff