Hi, Robin.

>> Can you explain these two points (3 and 4, maybe 2) a bit in the comments?
>> I.e. what makes fma different from a normal insn?
You can take a lookt at vector.md. The ternary instruction pattern has 
operands[0] operands[1] operands[2] operands[3] operands[4] operands[5] :

operands[0] = operands[1] ? operands[2] * operands[3] + operands[4] : operands[5]
These operands are not necessary the same RTX but we should make them overlap.
Why have operands[5] ? Since we will have len_cond_fma.
So I want to lower simple fma pattern into patterns I define in vector.md.
operands[5] should be operands[1] if operands[1] overlap operand[0] --->vmacc
or operands[3] if operands[3] overlap operand[0] -->vmadd

>>We only have three alternatives here.
Address in V2.

>>We have a bit of naming overlap between "insn" an "op" already.  I would go
>>with just ternay_insn or tern_insn here.  That the insn_types have OP in
>>their name is unfortunate but let's keep that for now.
Ok


>>Can we call data_mode dest_mode here?  data_mode imho only makes sense in
>>the context of conditionals where we have a comparison mode and a data mode.
>>I mean you could argue we always have a data mode and a mask mode so the
>>naming makes sense again but then we should get rid of dest_mode.

ok

>> __restrict vs restrict.

ok

>>Why the difference here?  Why do we need to restrict the optimization here
>>anyway?
Ok


>>Btw. any reason why you don't include fms, vnmsac in the patch?  Wouldn't the
>>patterns be really similar or do you have other plans for those?  Not needed
>>for this patch, just curious.
I want to make patch small and simple enough to review. After this patch is merged,
I will post fms.

Thanks.


juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-05-26 18:16
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; pan2.li
Subject: Re: [PATCH] RISC-V: Add RVV FMA auto-vectorization support
Hi Juzhe,
 
> +;; We can't expand FMA for the following reasons:
 
But we do :)  We just haven't selected the proper alternative yet.
 
> +;; 1. Before RA, we don't know which multiply-add instruction is the ideal one.
> +;;    The vmacc is the ideal instruction when operands[3] overlaps operands[0].
> +;;    The vmadd is the ideal instruction when operands[1|2] overlaps operands[0].
> +;; 2. According to vector.md, the multiply-add patterns has 'merge' operand which
> +;;    is the operands[5]. Since operands[5] should overlap operands[0], this operand
> +;;    should be allocated the same regno as operands[1|2|3].
> +;; 3. The 'merge' operand is always a real merge operand and we don't allow undefined
> +;;    operand.
> +;; 3. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL operand.
 
Can you explain these two points (3 and 4, maybe 2) a bit in the comments?
I.e. what makes fma different from a normal insn?
 
> +(define_insn_and_split "*fma<mode>"
> +  [(set (match_operand:VI 0 "register_operand"     "=vr, vr, ?&vr")
> + (plus:VI
> +   (mult:VI
> +     (match_operand:VI 1 "register_operand" " %0, vr,   vr")
> +     (match_operand:VI 2 "register_operand" " vr, vr,   vr"))
> +   (match_operand:VI 3 "register_operand"   " vr,  0,   vr")))
> +   (clobber (match_scratch:SI 4 "=r,r,r"))]
> +  "TARGET_VECTOR"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +    PUT_MODE (operands[4], Pmode);
> +    riscv_vector::emit_vlmax_vsetvl (<MODE>mode, operands[4]);
> +    if (which_alternative == 3)
 
We only have three alternatives here.
 
> +      emit_insn (gen_rtx_SET (operands[0], operands[3]));
> +    rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]};
> +    riscv_vector::emit_vlmax_ternop_insn (code_for_pred_mul_plus (<MODE>mode),
> +   riscv_vector::RVV_TERNOP, ops, operands[4]);
> +    DONE;
> +  }
> +  [(set_attr "type" "vimuladd")
> +   (set_attr "mode" "<MODE>")])
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 36419c95bbd..86b2798fb5e 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -140,6 +140,7 @@ enum insn_type
>    RVV_MERGE_OP = 4,
>    RVV_CMP_OP = 4,
>    RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand.  */
> +  RVV_TERNOP = 5,
>  };
 
> +emit_vlmax_ternop_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
 
We have a bit of naming overlap between "insn" an "op" already.  I would go
with just ternay_insn or tern_insn here.  That the insn_types have OP in
their name is unfortunate but let's keep that for now. 
 
> +  machine_mode data_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = get_mask_mode (data_mode).require ();
> +  /* We have a maximum of 11 operands for RVV instruction patterns according to
> +   * vector.md.  */
> +  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
> +        /*FULLY_UNMASKED_P*/ true,
> +        /*USE_REAL_MERGE_P*/ true, /*HAS_AVL_P*/ true,
> +        /*VLMAX_P*/ true,
> +        /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
 
Can we call data_mode dest_mode here?  data_mode imho only makes sense in
the context of conditionals where we have a comparison mode and a data mode.
I mean you could argue we always have a data mode and a mask mode so the
naming makes sense again but then we should get rid of dest_mode.
 
> +  /* According to LRA mov pattern in vector.md, we have a clobber operand
> +     to be used ad VL operand.  */
> +  e.set_vl (vl);
 
How does the LRA mov pattern (mov_lra?) come into play here?  I know the same
line is already in emit_vlmax_insn but it also is odd there.  What we actually
do is pass either NULL as length (before lra/reload) or a pre-allocated scratch
that we can use as vlmax length.
 
> +#include <stdint-gcc.h>
> +
> +#define TEST_TYPE(TYPE)                                                        \
> +  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,            \
> +       TYPE *__restrict a,              \
> +       TYPE *__restrict b, int n)       \
 
> +#define TEST_TYPE(TYPE)                                                        \
> +  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *restrict dest1,            \
> +       TYPE *restrict dest2,            \
> +       TYPE *restrict dest3,            \
> +       TYPE *restrict src1,             \
> +       TYPE *restrict src2, int n)      
 
__restrict vs restrict.
 
> +int __attribute__ ((optimize (1))) main ()
> +int __attribute__ ((optimize (0))) main ()
Why the difference here?  Why do we need to restrict the optimization here
anyway?
 
Btw. any reason why you don't include fms, vnmsac in the patch?  Wouldn't the
patterns be really similar or do you have other plans for those?  Not needed
for this patch, just curious.
 
Regards
Robin