[PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
@ 2015-08-10 12:22 Robert Suchanek
  2015-08-27 13:03 ` Matthew Fortune
  2016-01-05 16:16 ` Robert Suchanek
  0 siblings, 2 replies; 13+ messages in thread
From: Robert Suchanek @ 2015-08-10 12:22 UTC (permalink / raw)
  To: Catherine_Moore, Matthew Fortune; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 16993 bytes --]

Hi,

This series of patches adds the support for MIPS SIMD Architecture (MSA)
and underwent a few updates since the last review to address the comments:

https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01777.html

The series is split into four parts:

0001 [MIPS] Add support for MIPS SIMD Architecture (MSA)
0002 [MIPS] Add pipeline description for MSA
0003 Add support to run auto-vectorization tests for multiple effective targets
0004 [MIPS] Add tests for MSA

There a couple things to mention here:
- there is a minor regression on O32 ABI due to the lack of stack realignment
  AFAICS and a patch will follow.  The vectorizer generates more unaligned accesses
  than the tests expect, and hence, fail the checks.
- the series doesn't add cost modelling for auto-vectorization
- patch 0003 is independent but must go in before 0004.

Regards,
Robert

gcc/ChangeLog:

	* config.gcc: Add MSA header file for mips*-*-* target.
	* config/mips/constraints.md (YI, YC, YZ, Unv5, Uuv5, Uuv6, Ubv8):
	New constraints.
	* config/mips/mips-ftypes.def: Add function types for MSA builtins.
	* config/mips/mips-modes.def (V16QI, V8HI, V4SI, V2DI, V4SF, V2DF)
	(V32QI, V16HI, V8SI, V4DI, V8SF, V4DF): New modes.
	* config/mips/mips-msa.md: New file.
	* config/mips/mips-protos.h
	(mips_split_128bit_const_insns): New prototype.
	(mips_msa_idiv_insns): Likewise.
	(mips_split_128bit_move): Likewise.
	(mips_split_128bit_move_p): Likewise.
	(mips_split_msa_copy_d): Likewise.
	(mips_split_msa_insert_d): Likewise.
	(mips_split_msa_fill_d): Likewise.
	(mips_expand_msa_branch): Likewise.
	(mips_const_vector_same_val_p): Likewise.
	(mips_const_vector_same_byte_p): Likewise.
	(mips_const_vector_same_int_p): Likewise.
	(mips_const_vector_bitimm_set_p): Likewise.
	(mips_const_vector_bitimm_clr_p): Likewise.
	(mips_msa_output_division): Likewise.
	(mips_ldst_scaled_shift): Likewise.
	(mips_expand_vec_cond_expr): Likewise.
	* config/mips/mips.c (mips_const_vector_bitimm_set_p): New function.
	(mips_const_vector_bitimm_clr_p): Likewise.
	(mips_const_vector_same_val_p): Likewise.
	(mips_const_vector_same_byte_p): Likewise.
	(mips_const_vector_same_int_p): Likewise.
	(mips_symbol_insns): Forbid loading symbols via immediate for MSA.
	(mips_valid_offset_p): Limit offset to 10-bit for MSA loads and stores.
	(mips_valid_lo_sum_p): Forbid loadings symbols via %lo(base) for MSA.
	(mips_lx_address_p): Add support load indexed address for MSA.
	(mips_address_insns): Add calculation of instructions needed for
	stores and loads for MSA.
	(mips_const_insns): Move CONST_DOUBLE below CONST_VECTOR.  Handle
	CONST_VECTOR for MSA and let it fall through.
	(mips_ldst_scaled_shift): New function.
	(mips_subword_at_byte): Likewise.
	(mips_msa_idiv_insns): Likewise.
	(mips_legitimize_move): Validate MSA moves.
	(mips_rtx_costs): Add UNGE, UNGT, UNLE, UNLT cases.  Add calculation of
	costs for MSA division.
	(mips_split_move_p): Check if MSA moves need splitting.
	(mips_split_move): Split MSA moves if necessary.
	(mips_split_128bit_move_p): New function.
	(mips_split_128bit_move): Likewise.
	(mips_split_msa_copy_d): Likewise.
	(mips_split_msa_insert_d): Likewise.
	(mips_split_msa_fill_d): Likewise.
	(mips_output_move): Handle MSA moves.
	(mips_expand_msa_branch): New function.
	(mips_print_operand): Add 'E', 'B', 'w', 'v' modifiers.  Reinstate 'y'
	modifier.
	(mips_file_start): Add MSA .gnu_attribute.
	(mips_hard_regno_mode_ok_p): Allow TImode and 128-bit vectors in FPRs.
	(mips_hard_regno_nregs): Always return 1 for MSA supported mode.
	(mips_class_max_nregs): Add register size for MSA supported mode.
	(mips_cannot_change_mode_class): Allow conversion between MSA vector
	modes and TImode.
	(mips_mode_ok_for_mov_fmt_p): Allow MSA to use move.v instruction.
	(mips_secondary_reload_class): Force MSA loads/stores via memory.
	(mips_preferred_simd_mode): Add preffered modes for MSA.
	(mips_vector_mode_supported_p): Add MSA supported modes.
	(mips_autovectorize_vector_sizes): New function.
	(mips_msa_output_division): Likewise.
	(MSA_BUILTIN, MIPS_BUILTIN_DIRECT_NO_TARGET, MSA_NO_TARGET_BUILTIN):
	New macros.
	(CODE_FOR_msa_adds_s_b, CODE_FOR_msa_adds_s_h, CODE_FOR_msa_adds_s_w)
	(CODE_FOR_msa_adds_s_d, CODE_FOR_msa_adds_u_b, CODE_FOR_msa_adds_u_h)
	(CODE_FOR_msa_adds_u_w, CODE_FOR_msa_adds_u_d, CODE_FOR_msa_addv_b)
	(CODE_FOR_msa_addv_h, CODE_FOR_msa_addv_w, CODE_FOR_msa_addv_d)
	(CODE_FOR_msa_and_v, CODE_FOR_msa_bmnz_v, CODE_FOR_msa_bmz_v)
	(CODE_FOR_msa_bnz_v, CODE_FOR_msa_bz_v, CODE_FOR_msa_bsel_v)
	(CODE_FOR_msa_div_s_b, CODE_FOR_msa_div_s_h, CODE_FOR_msa_div_s_w)
	(CODE_FOR_msa_div_s_d, CODE_FOR_msa_div_u_b, CODE_FOR_msa_div_u_h)
	(CODE_FOR_msa_div_u_w, CODE_FOR_msa_div_u_d, CODE_FOR_msa_fadd_w)
	(CODE_FOR_msa_fadd_d, CODE_FOR_msa_ffint_s_w, CODE_FOR_msa_ffint_s_d)
	(CODE_FOR_msa_ffint_u_w, CODE_FOR_msa_ffint_u_d, CODE_FOR_msa_fsub_w)
	(CODE_FOR_msa_fsub_d, CODE_FOR_msa_fmul_w, CODE_FOR_msa_fmul_d)
	(CODE_FOR_msa_fdiv_w, CODE_FOR_msa_fdiv_d, CODE_FOR_msa_fmax_w)
	(CODE_FOR_msa_fmax_d, CODE_FOR_msa_fmax_a_w, CODE_FOR_msa_fmax_a_d)
	(CODE_FOR_msa_fmin_w, CODE_FOR_msa_fmin_d, CODE_FOR_msa_fmin_a_w)
	(CODE_FOR_msa_fmin_a_d, CODE_FOR_msa_fsqrt_w, CODE_FOR_msa_fsqrt_d)
	(CODE_FOR_msa_max_s_b, CODE_FOR_msa_max_s_h, CODE_FOR_msa_max_s_w)
	(CODE_FOR_msa_max_s_d, CODE_FOR_msa_max_u_b, CODE_FOR_msa_max_u_h)
	(CODE_FOR_msa_max_u_w, CODE_FOR_msa_max_u_d, CODE_FOR_msa_min_s_b)
	(CODE_FOR_msa_min_s_h, CODE_FOR_msa_min_s_w, CODE_FOR_msa_min_s_d)
	(CODE_FOR_msa_min_u_b, CODE_FOR_msa_min_u_h, CODE_FOR_msa_min_u_w)
	(CODE_FOR_msa_min_u_d, CODE_FOR_msa_mod_s_b, CODE_FOR_msa_mod_s_h)
	(CODE_FOR_msa_mod_s_w, CODE_FOR_msa_mod_s_d, CODE_FOR_msa_mod_u_b)
	(CODE_FOR_msa_mod_u_h, CODE_FOR_msa_mod_u_w, CODE_FOR_msa_mod_u_d)
	(CODE_FOR_msa_mod_s_b, CODE_FOR_msa_mod_s_h, CODE_FOR_msa_mod_s_w)
	(CODE_FOR_msa_mod_s_d, CODE_FOR_msa_mod_u_b, CODE_FOR_msa_mod_u_h)
	(CODE_FOR_msa_mod_u_w, CODE_FOR_msa_mod_u_d, CODE_FOR_msa_mulv_b)
	(CODE_FOR_msa_mulv_h, CODE_FOR_msa_mulv_w, CODE_FOR_msa_mulv_d)
	(CODE_FOR_msa_nlzc_b, CODE_FOR_msa_nlzc_h, CODE_FOR_msa_nlzc_w)
	(CODE_FOR_msa_nlzc_d, CODE_FOR_msa_nor_v, CODE_FOR_msa_or_v)
	(CODE_FOR_msa_pcnt_b, CODE_FOR_msa_pcnt_h, CODE_FOR_msa_pcnt_w)
	(CODE_FOR_msa_pcnt_d, CODE_FOR_msa_xor_v, CODE_FOR_msa_sll_b)
	(CODE_FOR_msa_sll_h, CODE_FOR_msa_sll_w, CODE_FOR_msa_sll_d)
	(CODE_FOR_msa_sra_b, CODE_FOR_msa_sra_h, CODE_FOR_msa_sra_w)
	(CODE_FOR_msa_sra_d, CODE_FOR_msa_srl_b, CODE_FOR_msa_srl_h)
	(CODE_FOR_msa_srl_w, CODE_FOR_msa_srl_d, CODE_FOR_msa_subv_b)
	(CODE_FOR_msa_subv_h, CODE_FOR_msa_subv_w, CODE_FOR_msa_subv_d)
	(CODE_FOR_msa_move_v, CODE_FOR_msa_vshf_b, CODE_FOR_msa_vshf_h)
	(CODE_FOR_msa_vshf_w, CODE_FOR_msa_vshf_d, CODE_FOR_msa_ilvod_d)
	(CODE_FOR_msa_ilvev_d, CODE_FOR_msa_pckod_d, CODE_FOR_msa_pckdev_d)
	(CODE_FOR_msa_ldi_b, CODE_FOR_msa_ldi_hi, CODE_FOR_msa_ldi_w)
	(CODE_FOR_msa_ldi_d, CODE_FOR_msa_cast_to_vector_float)
	(CODE_FOR_msa_cast_to_vector_double, CODE_FOR_msa_cast_to_scalar_float)
	(CODE_FOR_msa_cast_to_scalar_double): New code_aliasing macros.
	(mips_builtins): Add MSA sll_b, sll_h, sll_w, sll_d, slli_b, slli_h,
	slli_w, slli_d, sra_b, sra_h, sra_w, sra_d, srai_b, srai_h, srai_w,
	srai_d, srar_b, srar_h, srar_w, srar_d, srari_b, srari_h, srari_w,
	srari_d, srl_b, srl_h, srl_w, srl_d, srli_b, srli_h, srli_w, srli_d,
	srlr_b, srlr_h, srlr_w, srlr_d, srlri_b, srlri_h, srlri_w, srlri_d,
	bclr_b, bclr_h, bclr_w, bclr_d, bclri_b, bclri_h, bclri_w, bclri_d,
	bset_b, bset_h, bset_w, bset_d, bseti_b, bseti_h, bseti_w, bseti_d,
	bneg_b, bneg_h, bneg_w, bneg_d, bnegi_b, bnegi_h, bnegi_w, bnegi_d,
	binsl_b, binsl_h, binsl_w, binsl_d, binsli_b, binsli_h, binsli_w,
	binsli_d, binsr_b, binsr_h, binsr_w, binsr_d, binsri_b, binsri_h,
	binsri_w, binsri_d, addv_b, addv_h, addv_w, addv_d, addvi_b, addvi_h,
	addvi_w, addvi_d, subv_b, subv_h, subv_w, subv_d, subvi_b, subvi_h,
	subvi_w, subvi_d, max_s_b, max_s_h, max_s_w, max_s_d, maxi_s_b,
	maxi_s_h, maxi_s_w, maxi_s_d, max_u_b, max_u_h, max_u_w, max_u_d,
	maxi_u_b, maxi_u_h, maxi_u_w, maxi_u_d, min_s_b, min_s_h, min_s_w,
	min_s_d, mini_s_b, mini_s_h, mini_s_w, mini_s_d, min_u_b, min_u_h,
	min_u_w, min_u_d, mini_u_b, mini_u_h, mini_u_w, mini_u_d, max_a_b,
	max_a_h, max_a_w, max_a_d, min_a_b, min_a_h, min_a_w, min_a_d, ceq_b,
	ceq_h, ceq_w, ceq_d, ceqi_b, ceqi_h, ceqi_w, ceqi_d, clt_s_b, clt_s_h,
	clt_s_w, clt_s_d, clti_s_b, clti_s_h, clti_s_w, clti_s_d, clt_u_b,
	clt_u_h, clt_u_w, clt_u_d, clti_u_b, clti_u_h, clti_u_w, clti_u_d,
	cle_s_b, cle_s_h, cle_s_w, cle_s_d, clei_s_b, clei_s_h, clei_s_w,
	clei_s_d, cle_u_b, cle_u_h, cle_u_w, cle_u_d, clei_u_b, clei_u_h,
	clei_u_w, clei_u_d, ld_b, ld_h, ld_w, ld_d, st_b, st_h, st_w, st_d,
	sat_s_b, sat_s_h, sat_s_w, sat_s_d, sat_u_b, sat_u_h, sat_u_w, sat_u_d,
	add_a_b, add_a_h, add_a_w, add_a_d, adds_a_b, adds_a_h, adds_a_w,
	adds_a_d, adds_s_b, adds_s_h, adds_s_w, adds_s_d, adds_u_b, adds_u_h,
	adds_u_w, adds_u_d, ave_s_b, ave_s_h, ave_s_w, ave_s_d, ave_u_b,
	ave_u_h, ave_u_w, ave_u_d, aver_s_b, aver_s_h, aver_s_w, aver_s_d,
	aver_u_b, aver_u_h, aver_u_w, aver_u_d, subs_s_b, subs_s_h, subs_s_w,
	subs_s_d, subs_u_b, subs_u_h, subs_u_w, subs_u_d, subsuu_s_b,
	subsuu_s_h, subsuu_s_w, subsuu_s_d, subsus_u_b, subsus_u_h, subsus_u_w,
	subsus_u_d, asub_s_b, asub_s_h, asub_s_w, asub_s_d, asub_u_b, asub_u_h,
	asub_u_w, asub_u_d, mulv_b, mulv_h, mulv_w, mulv_d, maddv_b, maddv_h,
	maddv_w, maddv_d, msubv_b, msubv_h, msubv_w, msubv_d, div_s_b,
	div_s_h, div_s_w, div_s_d, div_u_b, div_u_h, div_u_w, div_u_d,
	hadd_s_h, hadd_s_w, hadd_s_d, hadd_u_h, hadd_u_w, hadd_u_d, hsub_s_h,
	hsub_s_w, hsub_s_d, hsub_u_h, hsub_u_w, hsub_u_d, mod_s_b, mod_s_h,
	mod_s_w, mod_s_d, mod_u_b, mod_u_h, mod_u_w, mod_u_d, dotp_s_h,
	dotp_s_w, dotp_s_d, dotp_u_h, dotp_u_w, dotp_u_d, dpadd_s_h, dpadd_s_w,
	dpadd_s_d, dpadd_u_h, dpadd_u_w, dpadd_u_d, dpsub_s_h, dpsub_s_w,
	dpsub_s_d, dpsub_u_h, dpsub_u_w, dpsub_u_d, sld_b, sld_h, sld_w, sld_d,
	sldi_b, sldi_h, sldi_w, sldi_d, splat_b, splat_h, splat_w, splat_d,
	splati_b, splati_h, splati_w, splati_d, pckev_b, pckev_h, pckev_w,
	pckev_d, pckod_b, pckod_h, pckod_w, pckod_d, ilvl_b, ilvl_h, ilvl_w
	ilvl_d, ilvr_b, ilvr_h, ilvr_w, ilvr_d, ilvev_b, ilvev_h, ilvev_w,
	ilvev_d, ilvod_b, ilvod_h, ilvod_w, ilvod_d, vshf_b, vshf_h, vshf_w,
	vshf_d, and_v, andi_b, or_v, ori_b, nor_v, nori_b, xor_v, xori_b,
	bmnz_v, bmnzi_b, bmz_v, bmzi_b, bsel_v, bseli_b, shf_b, shf_h, shf_w,
	bnz_v, bz_v, fill_b, fill_h, fill_w, fill_d, pcnt_b, pcnt_h, pcnt_w,
	pcnt_d, nloc_b, nloc_h, nloc_w, nloc_d, nlzc_b, nlzc_h, nlzc_w, nlzc_d,
	copy_s_b, copy_s_h, copy_s_w, copy_s_d, copy_u_b, copy_u_h, copy_u_w,
	copy_u_d, insert_b, insert_h, insert_w, insert_d, insve_b, insve_h,
	insve_w, insve_d, bnz_b, bnz_h, bnz_w, bnz_d, bz_b, bz_h, bz_w, bz_d,
	ldi_b, ldi_h, ldi_w, ldi_d, fcaf_w, fcaf_d, fcor_w, fcor_d, fcun_w,
	fcun_d, fcune_w, fcune_d, fcueq_w, fcueq_d, fceq_w, fceq_d, fcne_w,
	fcne_d, fclt_w, fclt_d, fcult_w, fcult_d, fcle_w, fcle_d, fcule_w,
	fcule_d, fsaf_w, fsaf_d, fsor_w, fsor_d, fsun_w, fsun_d, fsune_w,
	fsune_d, fsueq_w, fsueq_d, fseq_w, fseq_d, fsne_w, fsne_d, fslt_w,
	fslt_d,, fsult_w, fsult_d, fsle_w, fsle_d, fsule_w, fsule_d, fadd_w,
	fadd_d, fsub_w, fsub_d, fmul_w, fmul_d, fdiv_w, fdiv_d, fmadd_w,
	fmadd_d, fmsub_w, fmsub_d, fexp2_w, fexp2_d, fexdo_h, fexdo_w, ftq_h,
	ftq_w, fmin_w, fmin_d, fmin_a_w, fmin_a_d, fmax_w, fmax_d, fmax_a_w,
	fmax_a_d, mul_q_h, mul_q_w, mulr_q_h, mulr_q_w, madd_q_h, madd_q_w,
	maddr_q_h, maddr_q_w, msub_q_h, msub_q_w, msubr_q_h, msubr_q_w,
	fclass_w, fclass_d, fsqrt_w, fsqrt_d, frcp_w, frcp_d, frint_w, frint_d,
	frsqrt_w, frsqrt_d, flog2_w, flog2_d, fexupl_w, fexupl_d, fexupr_w,
	fexupr_d, ffql_w, ffql_d, ffqr_w, ffqr_d, ftint_s_w, ftint_s_d,
	ftint_u_w, ftint_u_d, ftrunc_s_w, ftrunc_s_d, ftrunc_u_w, ftrunc_u_d,
	ffint_s_w, ffint_s_d, ffint_u_w, ffint_u_d, ctcmsa, cfcmsa, move_v,
	cast_to_vector_float, cast_to_vector_double, cast_to_scalar_float,
	cast_to_scalar_double builtins.
	(mips_get_builtin_decl_index): New array.
	(MIPS_ATYPE_QI, MIPS_ATYPE_HI, MIPS_ATYPE_V2DI, MIPS_ATYPE_V4SI)
	(MIPS_ATYPE_V8HI, MIPS_ATYPE_V16QI, MIPS_ATYPE_V2DF, MIPS_ATYPE_V4SF)
	(MIPS_ATYPE_UV2DI, MIPS_ATYPE_UV4SI, MIPS_ATYPE_UV8HI)
	(MIPS_ATYPE_UV16QI): New.
	(mips_init_builtins): Initialize mips_get_builtin_decl_index array.
	(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define target hook.
	(mips_expand_builtin_insn): Swap operands for
	CODE_FOR_msa_ilv{l,r}_{b,h,w,d}, CODE_FOR_msa_{ilv,pck}{ev,od}_{b,h,w}.
	(mips_set_compression_mode): Disallow MSA with MIPS16 code.
	(mips_option_override): -mmsa requires -mfp64 and -mhard-float.  These
	are set implicitly and an error is reported if overridden.
	(MAX_VECT_LEN): Increase maximum length of a vector to 16 bytes.
	(TARGET_SCHED_REASSOCIATION_WIDTH): Define target hook.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Likewise.
	(mips_expand_vec_unpack): Add support for MSA.
	(mips_expand_vector_init): Likewise.
	(mips_expand_vi_constant): Use CONST0_RTX (element_mode) instead of
	const0_rtx.
	(mips_expand_msa_cmp): New function.
	(mips_expand_vec_cond_expr): Likewise.
	* config/mips/mips.h
	(TARGET_CPU_CPP_BUILTINS): Add __mips_msa and __mips_msa_width.
	(OPTION_DEFAULT_SPECS): Ignore --with-fp-32 if -mmsa is specified.
	(ASM_SPEC): Pass mmsa and mno-msa to the assembler.
	(ISA_HAS_MSA): New macro.
	(UNITS_PER_MSA_REG): Likewise.
	(BITS_PER_MSA_REG): Likewise.
	(MAX_FIXED_MODE_SIZE): Redefine using TARGET_MSA.
	(BIGGEST_ALIGNMENT): Likewise.
	(MSA_REG_FIRST): New macro.
	(MSA_REG_LAST): Likewise.
	(MSA_REG_NUM): Likewise.
	(MSA_REG_P): Likewise.
	(MSA_REG_RTX_P): Likewise.
	(MSA_SUPPORTED_MODE_P): Likewise.
	(HARD_REGNO_CALL_PART_CLOBBERED): Redefine using TARGET_MSA.
	(MOVE_MAX): Likewise.
	(MAX_MOVE_MAX): Redefine to 16 bytes.
	(ADDITIONAL_REGISTER_NAMES): Add named registers $w0-$w31.
	* config/mips/mips.md: Include mips-msa.md.
	(alu_type): Add simd_add.
	(mode): Add V2DI, V4SI, V8HI, V16QI, V2DF, V4SF.
	(type): Add simd_div, simd_fclass, simd_flog2, simd_fadd, simd_fcvt,
	simd_fmul, simd_fmadd, simd_fdiv, simd_bitins, simd_bitmov,
	simd_insert, simd_sld, simd_mul, simd_fcmp, simd_fexp2, simd_int_arith,
	simd_bit, simd_shift, simd_splat, simd_fill, simd_permute, simd_shf,
	simd_sat, simd_pcnt, simd_copy, simd_branch, simd_cmsa, simd_fminmax,
	simd_logic, simd_move, simd_load, simd_store.  Choose "multi" for moves
	for "qword_mode".
	(qword_mode): New attribute.
	(insn_count): Add instruction count for quad moves.  Increase the count
	for MIPS SIMD division.
	(UNITMODE): Add UNITMODEs for vector types.
	* config/mips/mips.opt (mmsa): New option.
	* config/mips/msa.h: New file.
	* config/mips/mti-elf.h: Don't infer -mfpxx if -mmsa is specified.
	* config/mips/mti-linux.h: Likewise.
	* config/mips/predicates.md
	(const_msa_branch_operand): New constraint.
	(const_uimm3_operand): Likewise.
	(const_uimm4_operand): Likewise.
	(const_uimm5_operand): Likewise.
	(const_uimm8_operand): Likewise.
	(const_imm5_operand): Likewise.
	(aq10b_operand): Likewise.
	(aq10h_operand): Likewise.
	(aq10w_operand): Likewise.
	(aq10d_operand): Likewise.
	(const_m1_operand): Likewise.
	(reg_or_m1_operand): Likewise.
	(const_exp_2_operand): Likewise.
	(const_exp_4_operand): Likewise.
	(const_exp_8_operand): Likewise.
	(const_exp_16_operand): Likewise.
	(const_vector_same_byte_operand): Likewise.
	(const_vector_same_ximm5_operand): Likewise.
	(const_vector_same_uimm5_operand): Likewise.
	(const_vector_same_uimm6_operand): Likewise.
	(const_vector_same_uimm8_operand): Likewise.
	(const_vector_same_cmpsimm4_operand): Likewise.
	(const_vector_same_cmpuimm4_operand): Likewise.
	(const_vector_same_v2di_set_operand): Likewise.
	(const_vector_same_v2di_clr_operand): Likewise.
	(const_vector_same_v4si_set_operand): Likewise.
	(const_vector_same_v4si_clr_operand): Likewise.
	(const_vector_same_v8hi_set_operand): Likewise.
	(const_vector_same_v8hi_clr_operand): Likewise.
	(const_vector_same_v16qi_set_operand): Likewise.
	(const_vector_same_v16qi_clr_operand): Likewise.
	(reg_or_vector_same_byte_operand): Likewise.
	(reg_or_vector_same_ximm5_operand): Likewise.
	(reg_or_vector_same_uimm6_operand): Likewise.
	(reg_or_vector_same_v2di_set_operand): Likewise.
	(reg_or_vector_same_v2di_clr_operand): Likewise.
	(reg_or_vector_same_v4si_set_operand): Likewise.
	(reg_or_vector_same_v4si_clr_operand): Likewise.
	(reg_or_vector_same_v8hi_set_operand): Likewise.
	(reg_or_vector_same_v8hi_clr_operand): Likewise.
	(reg_or_vector_same_v16qi_set_operand): Likewise.
	(reg_or_vector_same_v16qi_clr_operand): Likewise.
	* doc/extend.texi (MIPS SIMD Architecture Functions): New section.
	* doc/invoke.texi (-mmsa): Document new option.

[-- Attachment #2: 0001-MIPS-Add-support-for-MIPS-SIMD-Architecture-MSA.tgz --]
[-- Type: application/x-compressed, Size: 45970 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2015-08-10 12:22 [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA) Robert Suchanek
@ 2015-08-27 13:03 ` Matthew Fortune
  2016-01-05 16:15   ` Robert Suchanek
  2016-01-05 16:16 ` Robert Suchanek
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Fortune @ 2015-08-27 13:03 UTC (permalink / raw)
  To: Robert Suchanek, Catherine_Moore; +Cc: gcc-patches

Hi Robert,

I'm taking small steps at this review. I did the first few files as below.
mips-msa.md is next so don't expect the it quickly. I might do that in pieces
too.

No need to post an updated patch until I'm all the way through but I'd
appreciate an explicit ok/done to each point or discussion if you disagree.

>diff --git a/gcc/config/mips/constraints.md b/gcc/config/mips/constraints.md
>index 7d1a8ba..cde0196 100644
>--- a/gcc/config/mips/constraints.md
>+++ b/gcc/config/mips/constraints.md
>@@ -308,6 +308,53 @@ (define_constraint "Yx"
>    "@internal"
>    (match_operand 0 "low_bitmask_operand"))
> 
>+(define_constraint "YI"
>+  "@internal
>+   A replicated vector const in which the replicated is a 10-bit signed value."

replicated value is in the range [-512,511].

>+(define_constraint "Unv5"
>+  "@internal
>+   A replicated vector const in which the replicated value is a negative
>+   integer number in range [-31,0]."

Stick with similar phrasing throughout: replicated value is in the range...

>+  (and (match_code "const_vector")
>+       (match_test "mips_const_vector_same_int_p (op, mode, -31, 0)")))
>+
>+(define_constraint "Uuv5"
>+  "@internal
>+   A replicated vector const in which the replicated value is a positive
>+   integer number in range [0,31]."
>+  (and (match_code "const_vector")
>+       (match_test "mips_const_vector_same_int_p (op, mode, 0, 31)")))

likewise.

>+(define_constraint "Uuv6"
>+  "@internal
>+   A replicated vector const in which the replicated value is an unsigned
>+   6-bit integer number."
>+  (and (match_code "const_vector")
>+       (match_test "mips_const_vector_same_int_p (op, mode, 0, 63)")))

likewise.

>+(define_constraint "Ubv8"
>+  "@internal
>+   A replicated vector const in which the replicated value is an 8-bit byte."

A replicated vector const.

(see below)...

>+  (and (match_code "const_vector")
>+       (match_test "mips_const_vector_same_byte_p (op, mode)")))

This is no different from mips_const_vector_same_val_p excep for a mode
assertion. This constraint may as well just verify that all elements are the
same value and I think we can reasonably expect that you can't construct
a V16QI vec_const with values outside of an 8-bit range. Perhaps call it
Uxvx to indicate it is just checking mode width. Obviously this cannot be
used in a context where an immediate is implicitly sign extended but that
is what the other constraints are for.
You can then delete mips_const_vector_same_byte_p.

>diff --git a/gcc/config/mips/mips-ftypes.def b/gcc/config/mips/mips-ftypes.def
>index d56accc..29ef33c 100644
>--- a/gcc/config/mips/mips-ftypes.def
>+++ b/gcc/config/mips/mips-ftypes.def
>@@ -36,6 +36,226 @@ along with GCC; see the file COPYING3.  If not see
> DEF_MIPS_FTYPE (1, (DF, DF))
> DEF_MIPS_FTYPE (2, (DF, DF, DF))
> 
>+DEF_MIPS_FTYPE (2, (V16QI, V16QI, V16QI))

...

Please sort these as the file requests towards the top. I have not checked
these are all needed but am assuming they are.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2015-08-27 13:03 ` Matthew Fortune
@ 2016-01-05 16:15   ` Robert Suchanek
  0 siblings, 0 replies; 13+ messages in thread
From: Robert Suchanek @ 2016-01-05 16:15 UTC (permalink / raw)
  To: Matthew Fortune, Catherine_Moore; +Cc: gcc-patches

Hi,

Comments inlined.

> >diff --git a/gcc/config/mips/constraints.md b/gcc/config/mips/constraints.md
> >index 7d1a8ba..cde0196 100644
> >--- a/gcc/config/mips/constraints.md
> >+++ b/gcc/config/mips/constraints.md
> >@@ -308,6 +308,53 @@ (define_constraint "Yx"
> >    "@internal"
> >    (match_operand 0 "low_bitmask_operand"))
> >
> >+(define_constraint "YI"
> >+  "@internal
> >+   A replicated vector const in which the replicated is a 10-bit signed
> value."
> 
> replicated value is in the range [-512,511].

Done.
> 
> >+(define_constraint "Unv5"
> >+  "@internal
> >+   A replicated vector const in which the replicated value is a negative
> >+   integer number in range [-31,0]."
> 
> Stick with similar phrasing throughout: replicated value is in the range...
> 
> >+  (and (match_code "const_vector")
> >+       (match_test "mips_const_vector_same_int_p (op, mode, -31, 0)")))
> >+
> >+(define_constraint "Uuv5"
> >+  "@internal
> >+   A replicated vector const in which the replicated value is a positive
> >+   integer number in range [0,31]."
> >+  (and (match_code "const_vector")
> >+       (match_test "mips_const_vector_same_int_p (op, mode, 0, 31)")))
> 
> likewise.
> 
> >+(define_constraint "Uuv6"
> >+  "@internal
> >+   A replicated vector const in which the replicated value is an unsigned
> >+   6-bit integer number."
> >+  (and (match_code "const_vector")
> >+       (match_test "mips_const_vector_same_int_p (op, mode, 0, 63)")))
> 
> likewise.

Done.
> 
> >+(define_constraint "Ubv8"
> >+  "@internal
> >+   A replicated vector const in which the replicated value is an 8-bit byte."
> 
> A replicated vector const.
> 
> (see below)...
> 
> >+  (and (match_code "const_vector")
> >+       (match_test "mips_const_vector_same_byte_p (op, mode)")))
> 
> This is no different from mips_const_vector_same_val_p excep for a mode
> assertion. This constraint may as well just verify that all elements are the
> same value and I think we can reasonably expect that you can't construct
> a V16QI vec_const with values outside of an 8-bit range. Perhaps call it
> Uxvx to indicate it is just checking mode width. Obviously this cannot be
> used in a context where an immediate is implicitly sign extended but that
> is what the other constraints are for.
> You can then delete mips_const_vector_same_byte_p.

I was going to follow this change as suggested but changed the constraint
name to "Urv8" and used mips_const_vector_same_bytes_p.  The constraint
represents a replicated byte values regardless of the mode.  This change
followed comments for mips-msa.md.

> 
> >diff --git a/gcc/config/mips/mips-ftypes.def b/gcc/config/mips/mips-ftypes.def
> >index d56accc..29ef33c 100644
> >--- a/gcc/config/mips/mips-ftypes.def
> >+++ b/gcc/config/mips/mips-ftypes.def
> >@@ -36,6 +36,226 @@ along with GCC; see the file COPYING3.  If not see
> > DEF_MIPS_FTYPE (1, (DF, DF))
> > DEF_MIPS_FTYPE (2, (DF, DF, DF))
> >
> >+DEF_MIPS_FTYPE (2, (V16QI, V16QI, V16QI))
> 
> ...
> 
> Please sort these as the file requests towards the top. I have not checked
> these are all needed but am assuming they are.

Done. I found some unused types. I will send another patch as some of the entries
were not sorted before MSA.

> 
> Thanks,
> Matthew

Regards,
Robert

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2015-08-10 12:22 [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA) Robert Suchanek
  2015-08-27 13:03 ` Matthew Fortune
@ 2016-01-05 16:16 ` Robert Suchanek
  2016-04-04 22:22   ` Matthew Fortune
  1 sibling, 1 reply; 13+ messages in thread
From: Robert Suchanek @ 2016-01-05 16:16 UTC (permalink / raw)
  To: Catherine_Moore, Matthew Fortune; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 19279 bytes --]

Hi,

Here is the updated patch for MSA.  The patch requires updated MSA tests
and preparatory patch that reorders function types.

Tested on mips-img-linux-gnu and mips-mti-linux-gnu.

Regards,
Robert

gcc/ChangeLog:

	* config.gcc: Add MSA header file for mips*-*-* target.
	* config/mips/constraints.md (YI, YC, YZ, Unv5, Uuv5, Usv5, Uuv6)
	(Ubv8i, Urv8):	New constraints.
	* config/mips/mips-ftypes.def: Add function types for MSA builtins.
	* config/mips/mips-modes.def (V16QI, V8HI, V4SI, V2DI, V4SF, V2DF)
	(V32QI, V16HI, V8SI, V4DI, V8SF, V4DF): New modes.
	* config/mips/mips-msa.md: New file.
	* config/mips/mips-protos.h
	(mips_split_128bit_const_insns): New prototype.
	(mips_msa_idiv_insns): Likewise.
	(mips_split_128bit_move): Likewise.
	(mips_split_128bit_move_p): Likewise.
	(mips_split_msa_copy_d): Likewise.
	(mips_split_msa_insert_d): Likewise.
	(mips_split_msa_fill_d): Likewise.
	(mips_expand_msa_branch): Likewise.
	(mips_const_vector_same_val_p): Likewise.
	(mips_const_vector_same_bytes_p): Likewise.
	(mips_const_vector_same_int_p): Likewise.
	(mips_const_vector_shuffle_set_p): Likewise.
	(mips_const_vector_bitimm_set_p): Likewise.
	(mips_const_vector_bitimm_clr_p): Likewise.
	(mips_msa_vec_parallel_const_half): Likewise.
	(mips_msa_output_division): Likewise.
	(mips_ldst_scaled_shift): Likewise.
	(mips_expand_vec_cond_expr): Likewise.
	* config/mips/mips.c (enum mips_builtin_type): Add
	MIPS_BUILTIN_MSA_TEST_BRANCH.
	(mips_gen_const_int_vector_shuffle): New prototype.
	(mips_const_vector_bitimm_set_p): New function.
	(mips_const_vector_bitimm_clr_p): Likewise.
	(mips_const_vector_same_val_p): Likewise.
	(mips_const_vector_same_bytes_p): Likewise.
	(mips_const_vector_same_int_p): Likewise.
	(mips_const_vector_shuffle_set_p): Likewise.
	(mips_symbol_insns): Forbid loading symbols via immediate for MSA.
	(mips_valid_offset_p): Limit offset to 10-bit for MSA loads and stores.
	(mips_valid_lo_sum_p): Forbid loadings symbols via %lo(base) for MSA.
	(mips_lx_address_p): Add support load indexed address for MSA.
	(mips_address_insns): Add calculation of instructions needed for
	stores and loads for MSA.
	(mips_const_insns): Move CONST_DOUBLE below CONST_VECTOR.  Handle
	CONST_VECTOR for MSA and let it fall through.
	(mips_ldst_scaled_shift): New function.
	(mips_subword_at_byte): Likewise.
	(mips_msa_idiv_insns): Likewise.
	(mips_legitimize_move): Validate MSA moves.
	(mips_rtx_costs): Add UNGE, UNGT, UNLE, UNLT cases.  Add calculation of
	costs for MSA division.
	(mips_split_move_p): Check if MSA moves need splitting.
	(mips_split_move): Split MSA moves if necessary.
	(mips_split_128bit_move_p): New function.
	(mips_split_128bit_move): Likewise.
	(mips_split_msa_copy_d): Likewise.
	(mips_split_msa_insert_d): Likewise.
	(mips_split_msa_fill_d): Likewise.
	(mips_output_move): Handle MSA moves.
	(mips_expand_msa_branch): New function.
	(mips_print_operand): Add 'E', 'B', 'w', 'v' and 'V' modifiers.
	Reinstate 'y' modifier.
	(mips_file_start): Add MSA .gnu_attribute.
	(mips_hard_regno_mode_ok_p): Allow TImode and 128-bit vectors in FPRs.
	(mips_hard_regno_nregs): Always return 1 for MSA supported mode.
	(mips_class_max_nregs): Add register size for MSA supported mode.
	(mips_cannot_change_mode_class): Allow conversion between MSA vector
	modes and TImode.
	(mips_mode_ok_for_mov_fmt_p): Allow MSA to use move.v instruction.
	(mips_secondary_reload_class): Force MSA loads/stores via memory.
	(mips_preferred_simd_mode): Add preffered modes for MSA.
	(mips_vector_mode_supported_p): Add MSA supported modes.
	(mips_autovectorize_vector_sizes): New function.
	(mips_msa_output_division): Likewise.
	(MSA_BUILTIN, MIPS_BUILTIN_DIRECT_NO_TARGET, MSA_NO_TARGET_BUILTIN)
	(MSA_BUILTIN_TEST_BRANCH): New macros.
	(CODE_FOR_msa_adds_s_b, CODE_FOR_msa_adds_s_h, CODE_FOR_msa_adds_s_w)
	(CODE_FOR_msa_adds_s_d, CODE_FOR_msa_adds_u_b, CODE_FOR_msa_adds_u_h)
	(CODE_FOR_msa_adds_u_w, CODE_FOR_msa_adds_u_d, CODE_FOR_msa_addv_b)
	(CODE_FOR_msa_addv_h, CODE_FOR_msa_addv_w, CODE_FOR_msa_addv_d)
	(CODE_FOR_msa_and_v, CODE_FOR_msa_bmnz_v, CODE_FOR_msa_bmnzi_b)
	(CODE_FOR_msa_bmz_v, CODE_FOR_msa_bmzi_b, CODE_FOR_msa_bnz_v)
	(CODE_FOR_msa_bz_v, CODE_FOR_msa_bsel_v, CODE_FOR_msa_bseli_b)
	(CODE_FOR_msa_ceqi_h, CODE_FOR_msa_ceqi_w, CODE_FOR_msa_ceqi_d)
	(CODE_FOR_msa_clti_s_b, CODE_FOR_msa_clti_s_h, CODE_FOR_msa_clti_s_w)
	(CODE_FOR_msa_clti_s_d, CODE_FOR_msa_clti_u_b, CODE_FOR_msa_clti_u_h)
	(CODE_FOR_msa_clti_u_w, CODE_FOR_msa_clti_u_d, CODE_FOR_msa_clei_s_b)
	(CODE_FOR_msa_clei_s_h, CODE_FOR_msa_clei_s_w, CODE_FOR_msa_clei_s_d)
	(CODE_FOR_msa_clei_u_b, CODE_FOR_msa_clei_u_h, CODE_FOR_msa_clei_u_w)
	(CODE_FOR_msa_clei_u_d, CODE_FOR_msa_div_s_b, CODE_FOR_msa_div_s_h)
	(CODE_FOR_msa_div_s_w, CODE_FOR_msa_div_s_d, CODE_FOR_msa_div_u_b)
	(CODE_FOR_msa_div_u_h, CODE_FOR_msa_div_u_w, CODE_FOR_msa_div_u_d)
	(CODE_FOR_msa_fadd_w, CODE_FOR_msa_fadd_d, CODE_FOR_msa_fexdo_w)
	(CODE_FOR_msa_ftrunc_s_w, CODE_FOR_msa_ftrunc_s_d)
	(CODE_FOR_msa_ftrunc_u_w, CODE_FOR_msa_ftrunc_u_d)
	(CODE_FOR_msa_ffint_s_w, CODE_FOR_msa_ffint_s_d)
	(CODE_FOR_msa_ffint_u_w, CODE_FOR_msa_ffint_u_d, CODE_FOR_msa_fsub_w)
	(CODE_FOR_msa_fsub_d, CODE_FOR_msa_fmsub_d, CODE_FOR_msa_fmadd_w)
	(CODE_FOR_msa_fmadd_d, CODE_FOR_msa_fmsub_w, CODE_FOR_msa_fmul_w)
	(CODE_FOR_msa_fmul_d, CODE_FOR_msa_fdiv_w, CODE_FOR_msa_fdiv_d)
	(CODE_FOR_msa_fmax_w, CODE_FOR_msa_fmax_d, CODE_FOR_msa_fmax_a_w)
	(CODE_FOR_msa_fmax_a_d, CODE_FOR_msa_fmin_w, CODE_FOR_msa_fmin_d)
	(CODE_FOR_msa_fmin_a_w, CODE_FOR_msa_fmin_a_d)
	(CODE_FOR_msa_fsqrt_w, CODE_FOR_msa_fsqrt_d)
	(CODE_FOR_msa_max_s_b, CODE_FOR_msa_max_s_h, CODE_FOR_msa_max_s_w)
	(CODE_FOR_msa_max_s_d, CODE_FOR_msa_max_u_b, CODE_FOR_msa_max_u_h)
	(CODE_FOR_msa_max_u_w, CODE_FOR_msa_max_u_d, CODE_FOR_msa_min_s_b)
	(CODE_FOR_msa_min_s_h, CODE_FOR_msa_min_s_w, CODE_FOR_msa_min_s_d)
	(CODE_FOR_msa_min_u_b, CODE_FOR_msa_min_u_h, CODE_FOR_msa_min_u_w)
	(CODE_FOR_msa_min_u_d, CODE_FOR_msa_mod_s_b, CODE_FOR_msa_mod_s_h)
	(CODE_FOR_msa_mod_s_w, CODE_FOR_msa_mod_s_d, CODE_FOR_msa_mod_u_b)
	(CODE_FOR_msa_mod_u_h, CODE_FOR_msa_mod_u_w, CODE_FOR_msa_mod_u_d)
	(CODE_FOR_msa_mod_s_b, CODE_FOR_msa_mod_s_h, CODE_FOR_msa_mod_s_w)
	(CODE_FOR_msa_mod_s_d, CODE_FOR_msa_mod_u_b, CODE_FOR_msa_mod_u_h)
	(CODE_FOR_msa_mod_u_w, CODE_FOR_msa_mod_u_d, CODE_FOR_msa_mulv_b)
	(CODE_FOR_msa_mulv_h, CODE_FOR_msa_mulv_w, CODE_FOR_msa_mulv_d)
	(CODE_FOR_msa_nlzc_b, CODE_FOR_msa_nlzc_h, CODE_FOR_msa_nlzc_w)
	(CODE_FOR_msa_nlzc_d, CODE_FOR_msa_nor_v, CODE_FOR_msa_or_v)
	(CODE_FOR_msa_ori_b, CODE_FOR_msa_nori_b, CODE_FOR_msa_pcnt_b)
	(CODE_FOR_msa_pcnt_h, CODE_FOR_msa_pcnt_w, CODE_FOR_msa_pcnt_d)
	(CODE_FOR_msa_xor_v, CODE_FOR_msa_xori_b, CODE_FOR_msa_sll_b)
	(CODE_FOR_msa_sll_h, CODE_FOR_msa_sll_w, CODE_FOR_msa_sll_d)
	(CODE_FOR_msa_slli_b, CODE_FOR_msa_slli_h, CODE_FOR_msa_slli_w)
	(CODE_FOR_msa_slli_d, CODE_FOR_msa_sra_b, CODE_FOR_msa_sra_h)
	(CODE_FOR_msa_sra_w, CODE_FOR_msa_sra_d, CODE_FOR_msa_srai_b)
	(CODE_FOR_msa_srai_h, CODE_FOR_msa_srai_w, CODE_FOR_msa_srai_d)
	(CODE_FOR_msa_srl_b, CODE_FOR_msa_srl_h, CODE_FOR_msa_srl_w)
	(CODE_FOR_msa_srl_d, CODE_FOR_msa_srli_b, CODE_FOR_msa_srli_h)
	(CODE_FOR_msa_srli_w, CODE_FOR_msa_srli_d, CODE_FOR_msa_subv_b)
	(CODE_FOR_msa_subv_h, CODE_FOR_msa_subv_w, CODE_FOR_msa_subv_d)
	(CODE_FOR_msa_subvi_b, CODE_FOR_msa_subvi_h, CODE_FOR_msa_subvi_w)
	(CODE_FOR_msa_subvi_d, CODE_FOR_msa_move_v, CODE_FOR_msa_vshf_b)
	(CODE_FOR_msa_vshf_h, CODE_FOR_msa_vshf_w, CODE_FOR_msa_vshf_d)
	(CODE_FOR_msa_ilvod_d, CODE_FOR_msa_ilvev_d, CODE_FOR_msa_pckod_d)
	(CODE_FOR_msa_pckdev_d, CODE_FOR_msa_ldi_b, CODE_FOR_msa_ldi_hi)
	(CODE_FOR_msa_ldi_w, CODE_FOR_msa_ldi_d): New code_aliasing macros.
	(mips_builtins): Add MSA sll_b, sll_h, sll_w, sll_d, slli_b, slli_h,
	slli_w, slli_d, sra_b, sra_h, sra_w, sra_d, srai_b, srai_h, srai_w,
	srai_d, srar_b, srar_h, srar_w, srar_d, srari_b, srari_h, srari_w,
	srari_d, srl_b, srl_h, srl_w, srl_d, srli_b, srli_h, srli_w, srli_d,
	srlr_b, srlr_h, srlr_w, srlr_d, srlri_b, srlri_h, srlri_w, srlri_d,
	bclr_b, bclr_h, bclr_w, bclr_d, bclri_b, bclri_h, bclri_w, bclri_d,
	bset_b, bset_h, bset_w, bset_d, bseti_b, bseti_h, bseti_w, bseti_d,
	bneg_b, bneg_h, bneg_w, bneg_d, bnegi_b, bnegi_h, bnegi_w, bnegi_d,
	binsl_b, binsl_h, binsl_w, binsl_d, binsli_b, binsli_h, binsli_w,
	binsli_d, binsr_b, binsr_h, binsr_w, binsr_d, binsri_b, binsri_h,
	binsri_w, binsri_d, addv_b, addv_h, addv_w, addv_d, addvi_b, addvi_h,
	addvi_w, addvi_d, subv_b, subv_h, subv_w, subv_d, subvi_b, subvi_h,
	subvi_w, subvi_d, max_s_b, max_s_h, max_s_w, max_s_d, maxi_s_b,
	maxi_s_h, maxi_s_w, maxi_s_d, max_u_b, max_u_h, max_u_w, max_u_d,
	maxi_u_b, maxi_u_h, maxi_u_w, maxi_u_d, min_s_b, min_s_h, min_s_w,
	min_s_d, mini_s_b, mini_s_h, mini_s_w, mini_s_d, min_u_b, min_u_h,
	min_u_w, min_u_d, mini_u_b, mini_u_h, mini_u_w, mini_u_d, max_a_b,
	max_a_h, max_a_w, max_a_d, min_a_b, min_a_h, min_a_w, min_a_d, ceq_b,
	ceq_h, ceq_w, ceq_d, ceqi_b, ceqi_h, ceqi_w, ceqi_d, clt_s_b, clt_s_h,
	clt_s_w, clt_s_d, clti_s_b, clti_s_h, clti_s_w, clti_s_d, clt_u_b,
	clt_u_h, clt_u_w, clt_u_d, clti_u_b, clti_u_h, clti_u_w, clti_u_d,
	cle_s_b, cle_s_h, cle_s_w, cle_s_d, clei_s_b, clei_s_h, clei_s_w,
	clei_s_d, cle_u_b, cle_u_h, cle_u_w, cle_u_d, clei_u_b, clei_u_h,
	clei_u_w, clei_u_d, ld_b, ld_h, ld_w, ld_d, st_b, st_h, st_w, st_d,
	sat_s_b, sat_s_h, sat_s_w, sat_s_d, sat_u_b, sat_u_h, sat_u_w, sat_u_d,
	add_a_b, add_a_h, add_a_w, add_a_d, adds_a_b, adds_a_h, adds_a_w,
	adds_a_d, adds_s_b, adds_s_h, adds_s_w, adds_s_d, adds_u_b, adds_u_h,
	adds_u_w, adds_u_d, ave_s_b, ave_s_h, ave_s_w, ave_s_d, ave_u_b,
	ave_u_h, ave_u_w, ave_u_d, aver_s_b, aver_s_h, aver_s_w, aver_s_d,
	aver_u_b, aver_u_h, aver_u_w, aver_u_d, subs_s_b, subs_s_h, subs_s_w,
	subs_s_d, subs_u_b, subs_u_h, subs_u_w, subs_u_d, subsuu_s_b,
	subsuu_s_h, subsuu_s_w, subsuu_s_d, subsus_u_b, subsus_u_h, subsus_u_w,
	subsus_u_d, asub_s_b, asub_s_h, asub_s_w, asub_s_d, asub_u_b, asub_u_h,
	asub_u_w, asub_u_d, mulv_b, mulv_h, mulv_w, mulv_d, maddv_b, maddv_h,
	maddv_w, maddv_d, msubv_b, msubv_h, msubv_w, msubv_d, div_s_b,
	div_s_h, div_s_w, div_s_d, div_u_b, div_u_h, div_u_w, div_u_d,
	hadd_s_h, hadd_s_w, hadd_s_d, hadd_u_h, hadd_u_w, hadd_u_d, hsub_s_h,
	hsub_s_w, hsub_s_d, hsub_u_h, hsub_u_w, hsub_u_d, mod_s_b, mod_s_h,
	mod_s_w, mod_s_d, mod_u_b, mod_u_h, mod_u_w, mod_u_d, dotp_s_h,
	dotp_s_w, dotp_s_d, dotp_u_h, dotp_u_w, dotp_u_d, dpadd_s_h, dpadd_s_w,
	dpadd_s_d, dpadd_u_h, dpadd_u_w, dpadd_u_d, dpsub_s_h, dpsub_s_w,
	dpsub_s_d, dpsub_u_h, dpsub_u_w, dpsub_u_d, sld_b, sld_h, sld_w, sld_d,
	sldi_b, sldi_h, sldi_w, sldi_d, splat_b, splat_h, splat_w, splat_d,
	splati_b, splati_h, splati_w, splati_d, pckev_b, pckev_h, pckev_w,
	pckev_d, pckod_b, pckod_h, pckod_w, pckod_d, ilvl_b, ilvl_h, ilvl_w
	ilvl_d, ilvr_b, ilvr_h, ilvr_w, ilvr_d, ilvev_b, ilvev_h, ilvev_w,
	ilvev_d, ilvod_b, ilvod_h, ilvod_w, ilvod_d, vshf_b, vshf_h, vshf_w,
	vshf_d, and_v, andi_b, or_v, ori_b, nor_v, nori_b, xor_v, xori_b,
	bmnz_v, bmnzi_b, bmz_v, bmzi_b, bsel_v, bseli_b, shf_b, shf_h, shf_w,
	bnz_v, bz_v, fill_b, fill_h, fill_w, fill_d, pcnt_b, pcnt_h, pcnt_w,
	pcnt_d, nloc_b, nloc_h, nloc_w, nloc_d, nlzc_b, nlzc_h, nlzc_w, nlzc_d,
	copy_s_b, copy_s_h, copy_s_w, copy_s_d, copy_u_b, copy_u_h, copy_u_w,
	copy_u_d, insert_b, insert_h, insert_w, insert_d, insve_b, insve_h,
	insve_w, insve_d, bnz_b, bnz_h, bnz_w, bnz_d, bz_b, bz_h, bz_w, bz_d,
	ldi_b, ldi_h, ldi_w, ldi_d, fcaf_w, fcaf_d, fcor_w, fcor_d, fcun_w,
	fcun_d, fcune_w, fcune_d, fcueq_w, fcueq_d, fceq_w, fceq_d, fcne_w,
	fcne_d, fclt_w, fclt_d, fcult_w, fcult_d, fcle_w, fcle_d, fcule_w,
	fcule_d, fsaf_w, fsaf_d, fsor_w, fsor_d, fsun_w, fsun_d, fsune_w,
	fsune_d, fsueq_w, fsueq_d, fseq_w, fseq_d, fsne_w, fsne_d, fslt_w,
	fslt_d,, fsult_w, fsult_d, fsle_w, fsle_d, fsule_w, fsule_d, fadd_w,
	fadd_d, fsub_w, fsub_d, fmul_w, fmul_d, fdiv_w, fdiv_d, fmadd_w,
	fmadd_d, fmsub_w, fmsub_d, fexp2_w, fexp2_d, fexdo_h, fexdo_w, ftq_h,
	ftq_w, fmin_w, fmin_d, fmin_a_w, fmin_a_d, fmax_w, fmax_d, fmax_a_w,
	fmax_a_d, mul_q_h, mul_q_w, mulr_q_h, mulr_q_w, madd_q_h, madd_q_w,
	maddr_q_h, maddr_q_w, msub_q_h, msub_q_w, msubr_q_h, msubr_q_w,
	fclass_w, fclass_d, fsqrt_w, fsqrt_d, frcp_w, frcp_d, frint_w, frint_d,
	frsqrt_w, frsqrt_d, flog2_w, flog2_d, fexupl_w, fexupl_d, fexupr_w,
	fexupr_d, ffql_w, ffql_d, ffqr_w, ffqr_d, ftint_s_w, ftint_s_d,
	ftint_u_w, ftint_u_d, ftrunc_s_w, ftrunc_s_d, ftrunc_u_w, ftrunc_u_d,
	ffint_s_w, ffint_s_d, ffint_u_w, ffint_u_d, ctcmsa, cfcmsa, move_v
	builtins.
	(mips_get_builtin_decl_index): New array.
	(MIPS_ATYPE_QI, MIPS_ATYPE_HI, MIPS_ATYPE_V2DI, MIPS_ATYPE_V4SI)
	(MIPS_ATYPE_V8HI, MIPS_ATYPE_V16QI, MIPS_ATYPE_V2DF, MIPS_ATYPE_V4SF)
	(MIPS_ATYPE_UV2DI, MIPS_ATYPE_UV4SI, MIPS_ATYPE_UV8HI)
	(MIPS_ATYPE_UV16QI): New.
	(mips_init_builtins): Initialize mips_get_builtin_decl_index array.
	(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define target hook.
	(mips_expand_builtin_insn): Prepare operands for
	CODE_FOR_msa_addvi_b, CODE_FOR_msa_addvi_h, CODE_FOR_msa_addvi_w,
	CODE_FOR_msa_addvi_d, CODE_FOR_msa_clti_u_b, CODE_FOR_msa_clti_u_h,
	CODE_FOR_msa_clti_u_w, CODE_FOR_msa_clti_u_d, CODE_FOR_msa_clei_u_b,
	CODE_FOR_msa_clei_u_h, CODE_FOR_msa_clei_u_w, CODE_FOR_msa_clei_u_d,
	CODE_FOR_msa_maxi_u_b, CODE_FOR_msa_maxi_u_h, CODE_FOR_msa_maxi_u_w,
	CODE_FOR_msa_maxi_u_d, CODE_FOR_msa_mini_u_b, CODE_FOR_msa_mini_u_h,
	CODE_FOR_msa_mini_u_w, CODE_FOR_msa_mini_u_d, CODE_FOR_msa_subvi_b,
	CODE_FOR_msa_subvi_h, CODE_FOR_msa_subvi_w, CODE_FOR_msa_subvi_d,
	CODE_FOR_msa_ceqi_b, CODE_FOR_msa_ceqi_h, CODE_FOR_msa_ceqi_w,
	CODE_FOR_msa_ceqi_d, CODE_FOR_msa_clti_s_b, CODE_FOR_msa_clti_s_h,
	CODE_FOR_msa_clti_s_w, CODE_FOR_msa_clti_s_d, CODE_FOR_msa_clei_s_b,
	CODE_FOR_msa_clei_s_h, CODE_FOR_msa_clei_s_w, CODE_FOR_msa_clei_s_d,
	CODE_FOR_msa_maxi_s_b, CODE_FOR_msa_maxi_s_h, CODE_FOR_msa_maxi_s_w,
	CODE_FOR_msa_maxi_s_d, CODE_FOR_msa_mini_s_b, CODE_FOR_msa_mini_s_h,
	CODE_FOR_msa_mini_s_w, CODE_FOR_msa_mini_s_d, CODE_FOR_msa_andi_b,
	CODE_FOR_msa_ori_b, CODE_FOR_msa_nori_b, CODE_FOR_msa_xori_b,
	CODE_FOR_msa_bmzi_b, CODE_FOR_msa_bmnzi_b, CODE_FOR_msa_bseli_b,
	CODE_FOR_msa_fill_b, CODE_FOR_msa_fill_h, CODE_FOR_msa_fill_w,
	CODE_FOR_msa_fill_d, CODE_FOR_msa_ilvl_b, CODE_FOR_msa_ilvl_h,
	CODE_FOR_msa_ilvl_w, CODE_FOR_msa_ilvl_d, CODE_FOR_msa_ilvr_b,
	CODE_FOR_msa_ilvr_h, CODE_FOR_msa_ilvr_w, CODE_FOR_msa_ilvr_d,
	CODE_FOR_msa_ilvev_b, CODE_FOR_msa_ilvev_h, CODE_FOR_msa_ilvev_w,
	CODE_FOR_msa_ilvod_b, CODE_FOR_msa_ilvod_h, CODE_FOR_msa_ilvod_w,
	CODE_FOR_msa_pckev_b, CODE_FOR_msa_pckev_h, CODE_FOR_msa_pckev_w,
	CODE_FOR_msa_pckod_b, CODE_FOR_msa_pckod_h, CODE_FOR_msa_pckod_w,
	CODE_FOR_msa_slli_b, CODE_FOR_msa_slli_h, CODE_FOR_msa_slli_w,
	CODE_FOR_msa_slli_d, CODE_FOR_msa_srai_b, CODE_FOR_msa_srai_h,
	CODE_FOR_msa_srai_w, CODE_FOR_msa_srai_d, CODE_FOR_msa_srli_b,
	CODE_FOR_msa_srli_h, CODE_FOR_msa_srli_w, CODE_FOR_msa_srli_d,
	CODE_FOR_msa_insert_b, CODE_FOR_msa_insert_h, CODE_FOR_msa_insert_w,
	CODE_FOR_msa_insert_d, CODE_FOR_msa_insve_b, CODE_FOR_msa_insve_h,
	CODE_FOR_msa_insve_w, CODE_FOR_msa_insve_d, CODE_FOR_msa_shf_b,
	CODE_FOR_msa_shf_h, CODE_FOR_msa_shf_w, CODE_FOR_msa_shf_w_f,
	CODE_FOR_msa_vshf_b, CODE_FOR_msa_vshf_h, CODE_FOR_msa_vshf_w,
	CODE_FOR_msa_vshf_d.
	(mips_expand_builtin): Add case for MIPS_BULTIN_MSA_TEST_BRANCH.
	(mips_set_compression_mode): Disallow MSA with MIPS16 code.
	(mips_option_override): -mmsa requires -mfp64 and -mhard-float.  These
	are set implicitly and an error is reported if overridden.
	(mips_expand_builtin_msa_test_branch): New function.
	(mips_expand_msa_shuffle): Likewise.
	(MAX_VECT_LEN): Increase maximum length of a vector to 16 bytes.
	(TARGET_SCHED_REASSOCIATION_WIDTH): Define target hook.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Likewise.
	(mips_expand_vec_unpack): Add support for MSA.
	(mips_expand_vector_init): Likewise.
	(mips_expand_vi_constant): Use CONST0_RTX (element_mode) instead of
	const0_rtx.
	(mips_msa_vec_parallel_const_half): New function.
	(mips_gen_const_int_vector): Likewise.
	(mips_gen_const_int_vector_shuffle): Likewise.
	(mips_expand_msa_cmp): Likewise.
	(mips_expand_vec_cond_expr): Likewise.
	* config/mips/mips.h
	(TARGET_CPU_CPP_BUILTINS): Add __mips_msa and __mips_msa_width.
	(OPTION_DEFAULT_SPECS): Ignore --with-fp-32 if -mmsa is specified.
	(ASM_SPEC): Pass mmsa and mno-msa to the assembler.
	(ISA_HAS_MSA): New macro.
	(UNITS_PER_MSA_REG): Likewise.
	(BITS_PER_MSA_REG): Likewise.
	(MAX_FIXED_MODE_SIZE): Redefine using TARGET_MSA.
	(BIGGEST_ALIGNMENT): Likewise.
	(MSA_REG_FIRST): New macro.
	(MSA_REG_LAST): Likewise.
	(MSA_REG_NUM): Likewise.
	(MSA_REG_P): Likewise.
	(MSA_REG_RTX_P): Likewise.
	(MSA_SUPPORTED_MODE_P): Likewise.
	(HARD_REGNO_CALL_PART_CLOBBERED): Redefine using TARGET_MSA.
	(MOVE_MAX): Likewise.
	(MAX_MOVE_MAX): Redefine to 16 bytes.
	(ADDITIONAL_REGISTER_NAMES): Add named registers $w0-$w31.
	* config/mips/mips.md: Include mips-msa.md.
	(alu_type): Add simd_add.
	(mode): Add V2DI, V4SI, V8HI, V16QI, V2DF, V4SF.
	(type): Add simd_div, simd_fclass, simd_flog2, simd_fadd, simd_fcvt,
	simd_fmul, simd_fmadd, simd_fdiv, simd_bitins, simd_bitmov,
	simd_insert, simd_sld, simd_mul, simd_fcmp, simd_fexp2, simd_int_arith,
	simd_bit, simd_shift, simd_splat, simd_fill, simd_permute, simd_shf,
	simd_sat, simd_pcnt, simd_copy, simd_branch, simd_cmsa, simd_fminmax,
	simd_logic, simd_move, simd_load, simd_store.  Choose "multi" for moves
	for "qword_mode".
	(qword_mode): New attribute.
	(insn_count): Add instruction count for quad moves.  Increase the count
	for MIPS SIMD division.
	(UNITMODE): Add UNITMODEs for vector types.
	(addsub): New code iterator.
	* config/mips/mips.opt (mmsa): New option.
	* config/mips/msa.h: New file.
	* config/mips/mti-elf.h: Don't infer -mfpxx if -mmsa is specified.
	* config/mips/mti-linux.h: Likewise.
	* config/mips/predicates.md
	(const_msa_branch_operand): New constraint.
	(const_uimm3_operand): Likewise.
	(const_uimm4_operand): Likewise.
	(const_uimm5_operand): Likewise.
	(const_uimm8_operand): Likewise.
	(const_imm5_operand): Likewise.
	(aq10b_operand): Likewise.
	(aq10h_operand): Likewise.
	(aq10w_operand): Likewise.
	(aq10d_operand): Likewise.
	(const_m1_operand): Likewise.
	(reg_or_m1_operand): Likewise.
	(const_exp_2_operand): Likewise.
	(const_exp_4_operand): Likewise.
	(const_exp_8_operand): Likewise.
	(const_exp_16_operand): Likewise.
	(const_vector_same_val_operand): Likewise.
	(const_vector_same_simm5_operand): Likewise.
	(const_vector_same_uimm5_operand): Likewise.
	(const_vector_same_uimm6_operand): Likewise.
	(const_vector_same_uimm8_operand): Likewise.
	(par_const_vector_shf_set_operand): Likewise.
	(reg_or_vector_same_val_operand): Likewise.
	(reg_or_vector_same_simm5_operand): Likewise.
	(reg_or_vector_same_uimm6_operand): Likewise.
	* doc/extend.texi (MIPS SIMD Architecture Functions): New section.
	* doc/invoke.texi (-mmsa): Document new option.

[-- Attachment #2: 0002-MIPS-Add-support-for-MIPS-SIMD-Architecture-MSA.patch.tgz --]
[-- Type: application/x-compressed, Size: 52747 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2016-01-05 16:16 ` Robert Suchanek
@ 2016-04-04 22:22   ` Matthew Fortune
  2016-05-05 15:13     ` Robert Suchanek
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Fortune @ 2016-04-04 22:22 UTC (permalink / raw)
  To: Robert Suchanek, Catherine_Moore; +Cc: gcc-patches

Hi Robert,

Apologies for the long delay again. This patch is hard to get through. My comments
are not all in source sequence but I've tried to keep them short. With a few minor
things fixed and some trivial style issues done then this is ready to go. I've left
a number of things to look at after getting this patch in as I can't track any more
significant updates to this:

> mips_gen_const_int_vector
This should use gen_int_for_mode instead of GEN_INT to avoid the issues that msa_ldi is
trying to handle.

> mips_const_vector_same_bytes_p
comment on this function is same as previous function

> mips_msa_idiv_insns
Why not just update mips_idiv_insns and add a mode argument?

> Implement TARGET_PRINT_OPERAND.
Comment spacing between 'E' 'B' and description is different to existing

> mips_print_operand
case 'v' subcases V4SImode and V4SFmode are identical. same for DI/DF.

>@@ -12272,13 +12837,25 @@ mips_class_max_nregs (enum reg_class rclass, machine_mode mode)
>   if (hard_reg_set_intersect_p (left, reg_class_contents[(int) ST_REGS]))
>     {
>       if (HARD_REGNO_MODE_OK (ST_REG_FIRST, mode))
>-	size = MIN (size, 4);
>+	{
>+	  if (MSA_SUPPORTED_MODE_P (mode))
>+	    size = MIN (size, UNITS_PER_MSA_REG);
>+	  else
>+	    size = MIN (size, UNITS_PER_FPREG);
>+	}
>+

This hunk should be removed. MSA modes are not supported in ST_REGS.

>@@ -12299,6 +12876,10 @@ mips_cannot_change_mode_class (machine_mode from,
>       && INTEGRAL_MODE_P (from) && INTEGRAL_MODE_P (to))
>     return false;
> 
>+  /* Allow conversions between different MSA vector modes and TImode.  */

Remove 'and TImode' we do not support it.

>@@ -19497,9 +21284,64 @@ mips_expand_vec_unpack (rtx operands[2], bool unsigned_p, bool high_p)
>+    if (!unsigned_p)
>+    {
>+      /* Extract sign extention for each element comparing each element with
>+	 immediate zero.  */
>+      tmp = gen_reg_rtx (imode);
>+      emit_insn (cmpFunc (tmp, operands[1], CONST0_RTX (imode)));
>+    }
>+    else
>+    {
>+      tmp = force_reg (imode, CONST0_RTX (imode));
>+    }

Indentation and unnecessary braces on the else.

+   A single N-word move is usually the same cost as N single-word moves.
+   For MSA, we set MOVE_MAX to 16 bytes.
+   Then, MAX_MOVE_MAX is 16 unconditionally.  */
+#define MOVE_MAX (TARGET_MSA ? 16 : UNITS_PER_WORD)
+#define MAX_MOVE_MAX 16

The 16 here should be UNITS_PER_MSA_REG

> mips_expand_builtin_insn

General comment about operations that take an immediate. There is code to perform range
checking but it does not seem to leave any trail when the maybe_expand_insn fails to 
tell the user it was an out of range immediate that was the problem. (follow up work)

>+    case CODE_FOR_msa_andi_b:
>+    case CODE_FOR_msa_ori_b:
>+    case CODE_FOR_msa_nori_b:
>+    case CODE_FOR_msa_xori_b:
>+      gcc_assert (has_target_p && nops == 3);
>+      if (!CONST_INT_P (ops[2].value))
>+	break;
>+      ops[2].mode = ops[0].mode;
>+      /* We need to convert the unsigned value to signed.  */
>+      val = sext_hwi (INTVAL (ops[2].value),
>+		      GET_MODE_UNIT_PRECISION (ops[2].mode));
>+      ops[2].value = mips_gen_const_int_vector (ops[2].mode, val);
>+      break

Isn't the sext_hwi just effectively doing what gen_int_for_mode would? Fixing
mips_gen_const_int_vector would eliminate all of them.

>@@ -527,7 +551,9 @@ (define_attr "insn_count" ""
> 	 (const_int 2)
> 
> 	 (eq_attr "type" "idiv,idiv3")
>-	 (symbol_ref "mips_idiv_insns ()")
>+	 (cond [(eq_attr "mode" "TI")
>+		(symbol_ref "mips_msa_idiv_insns () * 4")]
>+		(symbol_ref "mips_idiv_insns () * 4"))

Why *4?

>@@ -1537,8 +1553,10 @@ FP_ASM_SPEC "\
> #define LONG_LONG_ACCUM_TYPE_SIZE (TARGET_64BIT ? 128 : 64)
> 
> /* long double is not a fixed mode, but the idea is that, if we
>-   support long double, we also want a 128-bit integer type.  */
>-#define MAX_FIXED_MODE_SIZE LONG_DOUBLE_TYPE_SIZE
>+   support long double, we also want a 128-bit integer type.
>+   For MSA, we support an integer type with a width of BITS_PER_MSA_REG.  */
>+#define MAX_FIXED_MODE_SIZE \
>+  (TARGET_MSA ? BITS_PER_MSA_REG : LONG_DOUBLE_TYPE_SIZE)

This doesn't seem right. We don't support TImode with MSA.

>diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
>new file mode 100644
>index 0000000..79e382d
>--- /dev/null
>+++ b/gcc/config/mips/mips-msa.md
>@@ -0,0 +1,2725 @@
>+(define_insn "msa_copy_s_<msafmt_f>"
>+  [(set (match_operand:<UNITMODE> 0 "register_operand" "=d")
>+	(vec_select:<UNITMODE>
>+	  (match_operand:MSA_W 1 "register_operand" "f")
>+	  (parallel [(match_operand 2 "const_<indeximm>_operand" "")])))]
>+  "ISA_HAS_MSA"
>+  "copy_s.<msafmt>\t%0,%w1[%2]"
>+  [(set_attr "type" "simd_copy")
>+   (set_attr "mode" "<MODE>")])

There is a sign extend version of this pattern needed for TARGET_64BIT widening
to DImode.

>+(define_expand "msa_ldi<mode>"
>+  [(match_operand:IMSA 0 "register_operand")
>+   (match_operand 1 "const_imm10_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  if (<MODE>mode == V16QImode)
>+    operands[1] = GEN_INT (trunc_int_for_mode (INTVAL (operands[1]),
>+					       <UNITMODE>mode));

I still don't like this expander. I think it needs moving to the builtin
expansion code as a follow up.

> "msa_fill_<msafmt_f>"
The fill with constant could be extended to handle all immediates for LDI including
those for which the constant is wider that 8-bit but contains a replicated value that
a narrower LDI could create. (Just for follow up work)

General comment: A number of TARGET_MSA instances should be ISA_HAS_MSA please check.

I'm not sure but I don't think the mapping from FP comparisons to signalling vs quiet
compares is correct. It needs checking in detail for a follow up.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2016-04-04 22:22   ` Matthew Fortune
@ 2016-05-05 15:13     ` Robert Suchanek
  2016-05-06 15:04       ` Matthew Fortune
  0 siblings, 1 reply; 13+ messages in thread
From: Robert Suchanek @ 2016-05-05 15:13 UTC (permalink / raw)
  To: Matthew Fortune, Catherine_Moore; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6849 bytes --]

Hi Matthew,

Revised patch attached.

Tested with mips-img-linux-gnu and bootstrapped x86_64-unknown-linux-gnu.

> > mips_gen_const_int_vector
> This should use gen_int_for_mode instead of GEN_INT to avoid the issues that
> msa_ldi is
> trying to handle.

gen_int_mode cannot be used to generate a vector of constants as it expects a scalar mode.
AFAICS, there isn't any generic helper to replace this.

> 
> > mips_const_vector_same_bytes_p
> comment on this function is same as previous function

Corrected.

> 
> > mips_msa_idiv_insns
> Why not just update mips_idiv_insns and add a mode argument?

Done. 

> 
> > Implement TARGET_PRINT_OPERAND.
> Comment spacing between 'E' 'B' and description is different to existing

Updated.

> 
> > mips_print_operand
> case 'v' subcases V4SImode and V4SFmode are identical. same for DI/DF.

Updated.

> 
> >@@ -12272,13 +12837,25 @@ mips_class_max_nregs (enum reg_class rclass,
> machine_mode mode)
> >   if (hard_reg_set_intersect_p (left, reg_class_contents[(int) ST_REGS]))
> >     {
> >       if (HARD_REGNO_MODE_OK (ST_REG_FIRST, mode))
> >-	size = MIN (size, 4);
> >+	{
> >+	  if (MSA_SUPPORTED_MODE_P (mode))
> >+	    size = MIN (size, UNITS_PER_MSA_REG);
> >+	  else
> >+	    size = MIN (size, UNITS_PER_FPREG);
> >+	}
> >+
> 
> This hunk should be removed. MSA modes are not supported in ST_REGS.

Indeed.  Removed.

> 
> >@@ -12299,6 +12876,10 @@ mips_cannot_change_mode_class (machine_mode from,
> >       && INTEGRAL_MODE_P (from) && INTEGRAL_MODE_P (to))
> >     return false;
> >
> >+  /* Allow conversions between different MSA vector modes and TImode.  */
> 
> Remove 'and TImode' we do not support it.

Done.

> 
> >@@ -19497,9 +21284,64 @@ mips_expand_vec_unpack (rtx operands[2], bool
> unsigned_p, bool high_p)
> >+    if (!unsigned_p)
> >+    {
> >+      /* Extract sign extention for each element comparing each element with
> >+	 immediate zero.  */
> >+      tmp = gen_reg_rtx (imode);
> >+      emit_insn (cmpFunc (tmp, operands[1], CONST0_RTX (imode)));
> >+    }
> >+    else
> >+    {
> >+      tmp = force_reg (imode, CONST0_RTX (imode));
> >+    }
> 
> Indentation and unnecessary braces on the else.

Fixed.

> 
> +   A single N-word move is usually the same cost as N single-word moves.
> +   For MSA, we set MOVE_MAX to 16 bytes.
> +   Then, MAX_MOVE_MAX is 16 unconditionally.  */
> +#define MOVE_MAX (TARGET_MSA ? 16 : UNITS_PER_WORD)
> +#define MAX_MOVE_MAX 16
> 
> The 16 here should be UNITS_PER_MSA_REG
> 

The changes have been reverted because of link to MAX_FIXED_MODE_SIZE macro
causing failures in the by_pieces logic if MOVE_MAX_PIECES is larger than MAX_FIXED_MODE_SIZE.
As it stands, vector modes appear to be handled explicitly in the common code
so it's unlikely we need to modify any of these.
If they do then it will be included in the follow up.

> > mips_expand_builtin_insn
> 
> General comment about operations that take an immediate. There is code to
> perform range
> checking but it does not seem to leave any trail when the maybe_expand_insn
> fails to
> tell the user it was an out of range immediate that was the problem. (follow up
> work)

Will do.

> 
> >+    case CODE_FOR_msa_andi_b:
> >+    case CODE_FOR_msa_ori_b:
> >+    case CODE_FOR_msa_nori_b:
> >+    case CODE_FOR_msa_xori_b:
> >+      gcc_assert (has_target_p && nops == 3);
> >+      if (!CONST_INT_P (ops[2].value))
> >+	break;
> >+      ops[2].mode = ops[0].mode;
> >+      /* We need to convert the unsigned value to signed.  */
> >+      val = sext_hwi (INTVAL (ops[2].value),
> >+		      GET_MODE_UNIT_PRECISION (ops[2].mode));
> >+      ops[2].value = mips_gen_const_int_vector (ops[2].mode, val);
> >+      break
> 
> Isn't the sext_hwi just effectively doing what gen_int_for_mode would? Fixing
> mips_gen_const_int_vector would eliminate all of them.

That's correct. I've moved it to mips_gen_cost_int_vector and used gen_int_mode.

> 
> >@@ -527,7 +551,9 @@ (define_attr "insn_count" ""
> > 	 (const_int 2)
> >
> > 	 (eq_attr "type" "idiv,idiv3")
> >-	 (symbol_ref "mips_idiv_insns ()")
> >+	 (cond [(eq_attr "mode" "TI")
> >+		(symbol_ref "mips_msa_idiv_insns () * 4")]
> >+		(symbol_ref "mips_idiv_insns () * 4"))
> 
> Why *4?

I'm not sure but it appears to be introduced long ago.
I removed it and used only mips_idiv_insns with the mode.

> 
> >@@ -1537,8 +1553,10 @@ FP_ASM_SPEC "\
> > #define LONG_LONG_ACCUM_TYPE_SIZE (TARGET_64BIT ? 128 : 64)
> >
> > /* long double is not a fixed mode, but the idea is that, if we
> >-   support long double, we also want a 128-bit integer type.  */
> >-#define MAX_FIXED_MODE_SIZE LONG_DOUBLE_TYPE_SIZE
> >+   support long double, we also want a 128-bit integer type.
> >+   For MSA, we support an integer type with a width of BITS_PER_MSA_REG.  */
> >+#define MAX_FIXED_MODE_SIZE \
> >+  (TARGET_MSA ? BITS_PER_MSA_REG : LONG_DOUBLE_TYPE_SIZE)
> 
> This doesn't seem right. We don't support TImode with MSA.

Reverted.

> 
> >diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
> >new file mode 100644
> >index 0000000..79e382d
> >--- /dev/null
> >+++ b/gcc/config/mips/mips-msa.md
> >@@ -0,0 +1,2725 @@
> >+(define_insn "msa_copy_s_<msafmt_f>"
> >+  [(set (match_operand:<UNITMODE> 0 "register_operand" "=d")
> >+	(vec_select:<UNITMODE>
> >+	  (match_operand:MSA_W 1 "register_operand" "f")
> >+	  (parallel [(match_operand 2 "const_<indeximm>_operand" "")])))]
> >+  "ISA_HAS_MSA"
> >+  "copy_s.<msafmt>\t%0,%w1[%2]"
> >+  [(set_attr "type" "simd_copy")
> >+   (set_attr "mode" "<MODE>")])
> 
> There is a sign extend version of this pattern needed for TARGET_64BIT widening
> to DImode.

Added.

> 
> >+(define_expand "msa_ldi<mode>"
> >+  [(match_operand:IMSA 0 "register_operand")
> >+   (match_operand 1 "const_imm10_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (<MODE>mode == V16QImode)
> >+    operands[1] = GEN_INT (trunc_int_for_mode (INTVAL (operands[1]),
> >+					       <UNITMODE>mode));
> 
> I still don't like this expander. I think it needs moving to the builtin
> expansion code as a follow up.

Agreed. 

> 
> > "msa_fill_<msafmt_f>"
> The fill with constant could be extended to handle all immediates for LDI
> including
> those for which the constant is wider that 8-bit but contains a replicated
> value that
> a narrower LDI could create. (Just for follow up work)
> 
> General comment: A number of TARGET_MSA instances should be ISA_HAS_MSA please
> check.

Done.

> 
> I'm not sure but I don't think the mapping from FP comparisons to signalling vs
> quiet
> compares is correct. It needs checking in detail for a follow up.

Will do.

Regards,
Robert


[-- Attachment #2: 0001-MIPS-Add-support-for-MIPS-SIMD-Architecture-MSA.patch.tgz --]
[-- Type: application/x-compressed, Size: 52391 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2016-05-05 15:13     ` Robert Suchanek
@ 2016-05-06 15:04       ` Matthew Fortune
  2016-05-09 12:22         ` Robert Suchanek
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Fortune @ 2016-05-06 15:04 UTC (permalink / raw)
  To: Robert Suchanek, Catherine_Moore; +Cc: gcc-patches

Hi Robert,

Robert Suchanek <Robert.Suchanek@imgtec.com> writes:
> Revised patch attached.
> 
> Tested with mips-img-linux-gnu and bootstrapped x86_64-unknown-linux-
> gnu.

One small tweak, ChangeLog should wrap at 74 columns. Please consider the
full list of authors for this patch as it has had many major contributors
now. I believe this includes at least the following for the implementation
but fewer for the testsuite updates:

Robert Suchanek
Sameera Deshpande
Matthew Fortune
Graham Stott
Chao-ying Fu

Otherwise, OK to commit!

Matthew

> 
> > > mips_gen_const_int_vector
> > This should use gen_int_for_mode instead of GEN_INT to avoid the
> > issues that msa_ldi is trying to handle.
> 
> gen_int_mode cannot be used to generate a vector of constants as it
> expects a scalar mode.
> AFAICS, there isn't any generic helper to replace this.
> 
> >
> > > mips_const_vector_same_bytes_p
> > comment on this function is same as previous function
> 
> Corrected.
> 
> >
> > > mips_msa_idiv_insns
> > Why not just update mips_idiv_insns and add a mode argument?
> 
> Done.
> 
> >
> > > Implement TARGET_PRINT_OPERAND.
> > Comment spacing between 'E' 'B' and description is different to
> > existing
> 
> Updated.
> 
> >
> > > mips_print_operand
> > case 'v' subcases V4SImode and V4SFmode are identical. same for DI/DF.
> 
> Updated.
> 
> >
> > >@@ -12272,13 +12837,25 @@ mips_class_max_nregs (enum reg_class
> > >rclass,
> > machine_mode mode)
> > >   if (hard_reg_set_intersect_p (left, reg_class_contents[(int)
> ST_REGS]))
> > >     {
> > >       if (HARD_REGNO_MODE_OK (ST_REG_FIRST, mode))
> > >-	size = MIN (size, 4);
> > >+	{
> > >+	  if (MSA_SUPPORTED_MODE_P (mode))
> > >+	    size = MIN (size, UNITS_PER_MSA_REG);
> > >+	  else
> > >+	    size = MIN (size, UNITS_PER_FPREG);
> > >+	}
> > >+
> >
> > This hunk should be removed. MSA modes are not supported in ST_REGS.
> 
> Indeed.  Removed.
> 
> >
> > >@@ -12299,6 +12876,10 @@ mips_cannot_change_mode_class (machine_mode
> from,
> > >       && INTEGRAL_MODE_P (from) && INTEGRAL_MODE_P (to))
> > >     return false;
> > >
> > >+  /* Allow conversions between different MSA vector modes and
> > >+ TImode.  */
> >
> > Remove 'and TImode' we do not support it.
> 
> Done.
> 
> >
> > >@@ -19497,9 +21284,64 @@ mips_expand_vec_unpack (rtx operands[2],
> > >bool
> > unsigned_p, bool high_p)
> > >+    if (!unsigned_p)
> > >+    {
> > >+      /* Extract sign extention for each element comparing each
> element with
> > >+	 immediate zero.  */
> > >+      tmp = gen_reg_rtx (imode);
> > >+      emit_insn (cmpFunc (tmp, operands[1], CONST0_RTX (imode)));
> > >+    }
> > >+    else
> > >+    {
> > >+      tmp = force_reg (imode, CONST0_RTX (imode));
> > >+    }
> >
> > Indentation and unnecessary braces on the else.
> 
> Fixed.
> 
> >
> > +   A single N-word move is usually the same cost as N single-word
> moves.
> > +   For MSA, we set MOVE_MAX to 16 bytes.
> > +   Then, MAX_MOVE_MAX is 16 unconditionally.  */ #define MOVE_MAX
> > +(TARGET_MSA ? 16 : UNITS_PER_WORD) #define MAX_MOVE_MAX 16
> >
> > The 16 here should be UNITS_PER_MSA_REG
> >
> 
> The changes have been reverted because of link to MAX_FIXED_MODE_SIZE
> macro causing failures in the by_pieces logic if MOVE_MAX_PIECES is
> larger than MAX_FIXED_MODE_SIZE.
> As it stands, vector modes appear to be handled explicitly in the common
> code so it's unlikely we need to modify any of these.
> If they do then it will be included in the follow up.
> 
> > > mips_expand_builtin_insn
> >
> > General comment about operations that take an immediate. There is code
> > to perform range checking but it does not seem to leave any trail when
> > the maybe_expand_insn fails to tell the user it was an out of range
> > immediate that was the problem. (follow up
> > work)
> 
> Will do.
> 
> >
> > >+    case CODE_FOR_msa_andi_b:
> > >+    case CODE_FOR_msa_ori_b:
> > >+    case CODE_FOR_msa_nori_b:
> > >+    case CODE_FOR_msa_xori_b:
> > >+      gcc_assert (has_target_p && nops == 3);
> > >+      if (!CONST_INT_P (ops[2].value))
> > >+	break;
> > >+      ops[2].mode = ops[0].mode;
> > >+      /* We need to convert the unsigned value to signed.  */
> > >+      val = sext_hwi (INTVAL (ops[2].value),
> > >+		      GET_MODE_UNIT_PRECISION (ops[2].mode));
> > >+      ops[2].value = mips_gen_const_int_vector (ops[2].mode, val);
> > >+      break
> >
> > Isn't the sext_hwi just effectively doing what gen_int_for_mode would?
> > Fixing mips_gen_const_int_vector would eliminate all of them.
> 
> That's correct. I've moved it to mips_gen_cost_int_vector and used
> gen_int_mode.
> 
> >
> > >@@ -527,7 +551,9 @@ (define_attr "insn_count" ""
> > > 	 (const_int 2)
> > >
> > > 	 (eq_attr "type" "idiv,idiv3")
> > >-	 (symbol_ref "mips_idiv_insns ()")
> > >+	 (cond [(eq_attr "mode" "TI")
> > >+		(symbol_ref "mips_msa_idiv_insns () * 4")]
> > >+		(symbol_ref "mips_idiv_insns () * 4"))
> >
> > Why *4?
> 
> I'm not sure but it appears to be introduced long ago.
> I removed it and used only mips_idiv_insns with the mode.
> 
> >
> > >@@ -1537,8 +1553,10 @@ FP_ASM_SPEC "\  #define
> > >LONG_LONG_ACCUM_TYPE_SIZE (TARGET_64BIT ? 128 : 64)
> > >
> > > /* long double is not a fixed mode, but the idea is that, if we
> > >-   support long double, we also want a 128-bit integer type.  */
> > >-#define MAX_FIXED_MODE_SIZE LONG_DOUBLE_TYPE_SIZE
> > >+   support long double, we also want a 128-bit integer type.
> > >+   For MSA, we support an integer type with a width of
> > >+BITS_PER_MSA_REG.  */ #define MAX_FIXED_MODE_SIZE \
> > >+  (TARGET_MSA ? BITS_PER_MSA_REG : LONG_DOUBLE_TYPE_SIZE)
> >
> > This doesn't seem right. We don't support TImode with MSA.
> 
> Reverted.
> 
> >
> > >diff --git a/gcc/config/mips/mips-msa.md
> > >b/gcc/config/mips/mips-msa.md new file mode 100644 index
> > >0000000..79e382d
> > >--- /dev/null
> > >+++ b/gcc/config/mips/mips-msa.md
> > >@@ -0,0 +1,2725 @@
> > >+(define_insn "msa_copy_s_<msafmt_f>"
> > >+  [(set (match_operand:<UNITMODE> 0 "register_operand" "=d")
> > >+	(vec_select:<UNITMODE>
> > >+	  (match_operand:MSA_W 1 "register_operand" "f")
> > >+	  (parallel [(match_operand 2 "const_<indeximm>_operand" "")])))]
> > >+  "ISA_HAS_MSA"
> > >+  "copy_s.<msafmt>\t%0,%w1[%2]"
> > >+  [(set_attr "type" "simd_copy")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > There is a sign extend version of this pattern needed for TARGET_64BIT
> > widening to DImode.
> 
> Added.
> 
> >
> > >+(define_expand "msa_ldi<mode>"
> > >+  [(match_operand:IMSA 0 "register_operand")
> > >+   (match_operand 1 "const_imm10_operand")]
> > >+  "ISA_HAS_MSA"
> > >+{
> > >+  if (<MODE>mode == V16QImode)
> > >+    operands[1] = GEN_INT (trunc_int_for_mode (INTVAL (operands[1]),
> > >+					       <UNITMODE>mode));
> >
> > I still don't like this expander. I think it needs moving to the
> > builtin expansion code as a follow up.
> 
> Agreed.
> 
> >
> > > "msa_fill_<msafmt_f>"
> > The fill with constant could be extended to handle all immediates for
> > LDI including those for which the constant is wider that 8-bit but
> > contains a replicated value that a narrower LDI could create. (Just
> > for follow up work)
> >
> > General comment: A number of TARGET_MSA instances should be
> > ISA_HAS_MSA please check.
> 
> Done.
> 
> >
> > I'm not sure but I don't think the mapping from FP comparisons to
> > signalling vs quiet compares is correct. It needs checking in detail
> > for a follow up.
> 
> Will do.
> 
> Regards,
> Robert

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2016-05-06 15:04       ` Matthew Fortune
@ 2016-05-09 12:22         ` Robert Suchanek
  0 siblings, 0 replies; 13+ messages in thread
From: Robert Suchanek @ 2016-05-09 12:22 UTC (permalink / raw)
  To: Matthew Fortune, Catherine_Moore; +Cc: gcc-patches

Hi Matthew,

> One small tweak, ChangeLog should wrap at 74 columns.

Done.

> Please consider the
> full list of authors for this patch as it has had many major contributors
> now. I believe this includes at least the following for the implementation
> but fewer for the testsuite updates:
> 
> Robert Suchanek
> Sameera Deshpande
> Matthew Fortune
> Graham Stott
> Chao-ying Fu

Of course.  All patches have or will have the correct contributors in ChangeLog.

> 
> Otherwise, OK to commit!

I removed __builtin_msa_[d]lsa from extend.texi as part of the pre-commit checks.
These two were listed in the docs but never defined. Removed as obvious.

Committed as r236030.

Regards,
Robert

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <B5E67142681B53468FAF6B7C31356562441AF59F@hhmail02.hh.imgtec.org>]

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
       [not found] <B5E67142681B53468FAF6B7C31356562441AF59F@hhmail02.hh.imgtec.org>
@ 2015-09-13  9:56 ` Matthew Fortune
  2015-10-09 14:45   ` Matthew Fortune
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Fortune @ 2015-09-13  9:56 UTC (permalink / raw)
  To: Robert Suchanek, Catherine_Moore; +Cc: gcc-patches

Hi Robert,

Next batch of comments for this patch. I've covered mips-msa.md up to the copy
patterns and one supporting function from mips.c.

>+++ b/gcc/config/mips/mips-msa.md

>+;; The attribute gives half modes for vector modes.
>+(define_mode_attr VHMODE
>+  [(V8HI "V16QI")
>+   (V4SI "V8HI")
>+   (V2DI "V4SI")
>+   (V2DF "V4SF")])
>+
>+;; The attribute gives double modes for vector modes.
>+(define_mode_attr VDMODE
>+  [(V4SI "V2DI")
>+   (V8HI "V4SI")
>+   (V16QI "V8HI")])

Presumably there is a reason why this is not a mirror of VHMODE. I.e. it does
not have floating point modes?

>+;; The attribute gives half modes with same number of elements for vector modes.
>+(define_mode_attr TRUNCMODE
>+  [(V8HI "V8QI")
>+   (V4SI "V4HI")
>+   (V2DI "V2SI")])
>+
>+;; This attribute gives the mode of the result for "copy_s_b, copy_u_b" etc.
>+(define_mode_attr RES
>+  [(V2DF "DF")
>+   (V4SF "SF")
>+   (V2DI "DI")
>+   (V4SI "SI")
>+   (V8HI "SI")
>+   (V16QI "SI")])

Verhaps prefix these with a 'V' to clarify they are vector mode attributes.

>+;; This attribute gives define_insn suffix for MSA instructions with need

with => that need

>+;; distinction between integer and floating point.
>+(define_mode_attr msafmt_f
>+  [(V2DF "d_f")
>+   (V4SF "w_f")
>+   (V2DI "d")
>+   (V4SI "w")
>+   (V8HI "h")
>+   (V16QI "b")])
>+
>+;; To represent bitmask needed for vec_merge using "const_<bitmask>_operand".

Commenting style is different here. Everything else starts with: This attribute ...

>+(define_mode_attr bitmask
>+  [(V2DF "exp_2")
>+   (V4SF "exp_4")
>+   (V2DI "exp_2")
>+   (V4SI "exp_4")
>+   (V8HI "exp_8")
>+   (V16QI "exp_16")])
>+
>+;; This attribute used to form an immediate operand constraint using

used to => is used to

>+;; "const_<bitimm>_operand".
>+(define_mode_attr bitimm
>+  [(V16QI "uimm3")
>+   (V8HI  "uimm4")
>+   (V4SI  "uimm5")
>+   (V2DI  "uimm6")
>+  ])
>+

>+(define_expand "fixuns_trunc<FMSA:mode><mode_i>2"
>+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>+	(unsigned_fix:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+{
>+  emit_insn (gen_msa_ftrunc_u_<msafmt> (operands[0], operands[1]));
>+  DONE;
>+})

The msa_ftrunc_u_* define_insns should just be renamed to use the standard
pattern names and, more importantly, standard RTL not UNSPEC.

>+
>+(define_expand "fix_trunc<FMSA:mode><mode_i>2"
>+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>+	(fix:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+{
>+  emit_insn (gen_msa_ftrunc_s_<msafmt> (operands[0], operands[1]));
>+  DONE;
>+})

Likewise.

>+
>+(define_expand "vec_pack_trunc_v2df"
>+  [(set (match_operand:V4SF 0 "register_operand")
>+	(vec_concat:V4SF
>+	  (float_truncate:V2SF (match_operand:V2DF 1 "register_operand"))
>+	  (float_truncate:V2SF (match_operand:V2DF 2 "register_operand"))))]
>+  "ISA_HAS_MSA"
>+  "")

Rename msa_fexdo_w to vec_pack_trunc_v2df.

I see that fexdo has a 'halfword' variant which creates a half-float. What
else can operate on half-float and should this really be recorded as an
HFmode?

>+(define_expand "vec_unpacks_hi_v4sf"
>+  [(set (match_operand:V2DF 0 "register_operand" "=f")
>+	(float_extend:V2DF
>+	  (vec_select:V2SF
>+	    (match_operand:V4SF 1 "register_operand" "f")
>+	    (parallel [(const_int 0) (const_int 1)])

If we swap the (parallel) for a match_operand 2...

>+	  )))]
>+  "ISA_HAS_MSA"
>+{
>+  if (BYTES_BIG_ENDIAN)
>+    emit_insn (gen_msa_fexupr_d (operands[0], operands[1]));
>+  else
>+    emit_insn (gen_msa_fexupl_d (operands[0], operands[1]));

Then these two could change to set up operands[2] with either
a parallel of 0/1 or 2/3 and then...

You could change the fexupr_d and fexupl_d insn patterns to use normal RTL
that select the appropriate elements (either 0/1 and 2/3).

>+  DONE;

Which means the RTL in the pattern would be used to expand this and
you would remove the DONE. As it stands the pattern on this expand
is simply never used.

>+})
>+
>+(define_expand "vec_unpacks_lo_v4sf"
>+  [(set (match_operand:V2DF 0 "register_operand" "=f")
>+	(float_extend:V2DF
>+	  (vec_select:V2SF
>+	    (match_operand:V4SF 1 "register_operand" "f")
>+	    (parallel [(const_int 0) (const_int 1)])
>+	  )))]
>+  "ISA_HAS_MSA"
>+{
>+  if (BYTES_BIG_ENDIAN)
>+    emit_insn (gen_msa_fexupl_d (operands[0], operands[1]));
>+  else
>+    emit_insn (gen_msa_fexupr_d (operands[0], operands[1]));
>+  DONE;
>+})

Likewise but inverted.

>+
>+(define_expand "vec_unpacks_hi_<mode>"
>+  [(set (match_operand:<VDMODE> 0 "register_operand")
>+	(match_operand:IMSA_WHB 1 "register_operand"))]

Not much point in having the (set) in here as it would be illegal anyway
if it were actually expanded. Just list the two operands. (same throughout)

>+  "ISA_HAS_MSA"
>+{
>+  mips_expand_vec_unpack (operands, false/*unsigned_p*/, true/*high_p*/);
>+  DONE;
>+})
>+
>+(define_expand "vec_unpacks_lo_<mode>"
>+  [(set (match_operand:<VDMODE> 0 "register_operand")
>+	(match_operand:IMSA_WHB 1 "register_operand"))]
>+  "ISA_HAS_MSA"
>+{
>+  mips_expand_vec_unpack (operands, false/*unsigned_p*/, false/*high_p*/);
>+  DONE;
>+})
>+
>+(define_expand "vec_unpacku_hi_<mode>"
>+  [(set (match_operand:<VDMODE> 0 "register_operand")
>+	(match_operand:IMSA_WHB 1 "register_operand"))]
>+  "ISA_HAS_MSA"
>+{
>+  mips_expand_vec_unpack (operands, true/*unsigned_p*/, true/*high_p*/);
>+  DONE;
>+})
>+
>+(define_expand "vec_unpacku_lo_<mode>"
>+  [(set (match_operand:<VDMODE> 0 "register_operand")
>+	(match_operand:IMSA_WHB 1 "register_operand"))]
>+  "ISA_HAS_MSA"
>+{
>+  mips_expand_vec_unpack (operands, true/*unsigned_p*/, false/*high_p*/);
>+  DONE;
>+})
>+

>+(define_expand "vec_set<mode>"
>+  [(match_operand:IMSA 0 "register_operand")
>+   (match_operand:<UNITMODE> 1 "reg_or_0_operand")
>+   (match_operand 2 "const_<indeximm>_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  emit_insn (gen_msa_insert_<msafmt>_insn (operands[0], operands[1],
>+					   operands[0],
>+					   GEN_INT(1 << INTVAL (operands[2]))));
>+  DONE;
>+})
>+
>+(define_expand "vec_set<mode>"
>+  [(match_operand:FMSA 0 "register_operand")
>+   (match_operand:<UNITMODE> 1 "register_operand")
>+   (match_operand 2 "const_<indeximm>_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  emit_insn (gen_msa_insve_<msafmt_f>_s (operands[0], operands[0],
>+					 GEN_INT(1 << INTVAL (operands[2])),
>+					 operands[1]));
>+  DONE;
>+})
>+
>+(define_expand "vcondu<MSA:mode><IMSA:mode>"
>+  [(set (match_operand:MSA 0 "register_operand")
>+	(if_then_else:MSA
>+	  (match_operator 3 ""
>+	    [(match_operand:IMSA 4 "register_operand")
>+	     (match_operand:IMSA 5 "register_operand")])
>+	  (match_operand:MSA 1 "reg_or_m1_operand")
>+	  (match_operand:MSA 2 "reg_or_0_operand")))]
>+  "ISA_HAS_MSA
>+   && (GET_MODE_NUNITS (<MSA:MODE>mode) == GET_MODE_NUNITS (<IMSA:MODE>mode))"
>+{
>+  mips_expand_vec_cond_expr (<MSA:MODE>mode, <MSA:VIMODE>mode, operands);
>+  DONE;
>+})
>+
>+(define_expand "vcond<MSA:mode><MSA_2:mode>"
>+  [(set (match_operand:MSA 0 "register_operand")
>+	(if_then_else:MSA
>+	  (match_operator 3 ""
>+	    [(match_operand:MSA_2 4 "register_operand")
>+	     (match_operand:MSA_2 5 "register_operand")])
>+	  (match_operand:MSA 1 "reg_or_m1_operand")
>+	  (match_operand:MSA 2 "reg_or_0_operand")))]
>+  "ISA_HAS_MSA
>+   && (GET_MODE_NUNITS (<MSA:MODE>mode) == GET_MODE_NUNITS (<MSA:MODE>mode))"

Bug: This compares MSA to MSA. the second should be MSA_2.

>+{
>+  mips_expand_vec_cond_expr (<MSA:MODE>mode, <MSA:VIMODE>mode, operands);
>+  DONE;
>+})
>+
>+;; Note used directly by builtins but via the following define_expand.
>+(define_insn "msa_insert_<msafmt>_insn"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(vec_merge:IMSA
>+	  (vec_duplicate:IMSA
>+	    (match_operand:<UNITMODE> 1 "reg_or_0_operand" "dJ"))
>+	  (match_operand:IMSA 2 "register_operand" "0")
>+	  (match_operand 3 "const_<bitmask>_operand" "")))]
>+  "ISA_HAS_MSA"
>+  "insert.<msafmt>\t%w0[%y3],%z1"
>+  [(set_attr "type" "simd_insert")
>+   (set_attr "mode" "<MODE>")])

Rename to remove _insn, see below. V2DI mode should have # as its pattern
for 32-bit targets to ensure it gets split 

>+
>+;; Expand builtin for HImode and QImode which takes SImode.
>+(define_expand "msa_insert_<msafmt>"
>+  [(match_operand:IMSA 0 "register_operand")
>+   (match_operand:IMSA 1 "register_operand")
>+   (match_operand 2 "const_<indeximm>_operand")
>+   (match_operand:<RES> 3 "reg_or_0_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  if ((GET_MODE_SIZE (<UNITMODE>mode) < GET_MODE_SIZE (<RES>mode))
>+      && (REG_P (operands[3]) || (GET_CODE (operands[3]) == SUBREG
>+				  && REG_P (SUBREG_REG (operands[3])))))
>+    operands[3] = lowpart_subreg (<UNITMODE>mode, operands[3], <RES>mode);
>+  emit_insn (gen_msa_insert_<msafmt>_insn (operands[0], operands[3],
>+					   operands[1],
>+					   GEN_INT(1 << INTVAL (operands[2]))));
>+  DONE;
>+})
>+

Lets do this during mips_expand_builtin_insn like ilvl and friends. Having expanders
simply to map the builtins to real instructions doesn't seem very useful.

>+(define_expand "msa_insert_<msafmt_f>"
>+  [(match_operand:FMSA 0 "register_operand")
>+   (match_operand:FMSA 1 "register_operand")
>+   (match_operand 2 "const_<indeximm>_operand")
>+   (match_operand:<UNITMODE> 3 "reg_or_0_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  emit_insn (gen_msa_insert_<msafmt_f>_insn (operands[0], operands[3],
>+					     operands[1],
>+					     GEN_INT(1 << INTVAL (operands[2]))));
>+  DONE;
>+})

Likewise.

>+
>+(define_insn "msa_insert_<msafmt_f>_insn"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(vec_merge:FMSA
>+	  (vec_duplicate:FMSA
>+	    (match_operand:<UNITMODE> 1 "register_operand" "d"))
>+	  (match_operand:FMSA 2 "register_operand" "0")
>+	  (match_operand 3 "const_<bitmask>_operand" "")))]
>+  "ISA_HAS_MSA"
>+  "insert.<msafmt>\t%w0[%y3],%z1"
>+  [(set_attr "type" "simd_insert")
>+   (set_attr "mode" "<MODE>")])

Rename to remove _insn, see above. V2DF mode should have # for 32-bit targets to
ensure it gets split.

>+
>+(define_split
>+  [(set (match_operand:MSA_D 0 "register_operand")
>+	(vec_merge:MSA_D
>+	  (vec_duplicate:MSA_D
>+	    (match_operand:<UNITMODE> 1 "<MSA_D:msa_d>_operand"))
>+	  (match_operand:MSA_D 2 "register_operand")
>+	  (match_operand 3 "const_<bitmask>_operand")))]
>+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
>+  [(const_int 0)]
>+{
>+  mips_split_msa_insert_d (operands[0], operands[2], operands[3], operands[1]);
>+  DONE;
>+})

...

>+(define_expand "msa_insve_<msafmt_f>"
>+  [(set (match_operand:MSA 0 "register_operand")
>+	(vec_merge:MSA
>+	  (vec_duplicate:MSA
>+	    (vec_select:<UNITMODE>
>+	      (match_operand:MSA 3 "register_operand")
>+	      (parallel [(const_int 0)])))
>+	  (match_operand:MSA 1 "register_operand")
>+	  (match_operand 2 "const_<indeximm>_operand")))]
>+  "ISA_HAS_MSA"
>+{
>+  operands[2] = GEN_INT (1 << INTVAL (operands[2]));
>+})

Like for insert patterns do this in mips_expand_builtin_insn and rename the
instruction below to remove _insn.

>+(define_insn "msa_insve_<msafmt_f>_insn"
>+  [(set (match_operand:MSA 0 "register_operand" "=f")
>+	(vec_merge:MSA
>+	  (vec_duplicate:MSA
>+	    (vec_select:<UNITMODE>
>+	      (match_operand:MSA 3 "register_operand" "f")
>+	      (parallel [(const_int 0)])))
>+	  (match_operand:MSA 1 "register_operand" "0")
>+	  (match_operand 2 "const_<bitmask>_operand" "")))]
>+  "ISA_HAS_MSA"
>+  "insve.<msafmt>\t%w0[%y2],%w3[0]"
>+  [(set_attr "type" "simd_insert")
>+   (set_attr "mode" "<MODE>")])
>+
>+;; Operand 3 is a scalar.
>+(define_insn "msa_insve_<msafmt>_f_s"

It would be more clear to have <msafmt_f> instead of <msafmt>_f. The 's' here
is for scalar I believe. Perhaps spell it out and put scalar.

>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(vec_merge:FMSA
>+	  (vec_duplicate:FMSA
>+	    (match_operand:<UNITMODE> 3 "register_operand" "f"))
>+	  (match_operand:FMSA 1 "register_operand" "0")
>+	  (match_operand 2 "const_<bitmask>_operand" "")))]
>+  "ISA_HAS_MSA"
>+  "insve.<msafmt>\t%w0[%y2],%w3[0]"
>+  [(set_attr "type" "simd_insert")
>+   (set_attr "mode" "<MODE>")])

>+;; Note that copy_s.d and copy_s.d_f will be split later if !TARGET_64BIT.
>+(define_insn "msa_copy_s_<msafmt_f>"
>+  [(set (match_operand:<RES> 0 "register_operand" "=d")
>+	(sign_extend:<RES>
>+	  (vec_select:<UNITMODE>
>+	    (match_operand:MSA 1 "register_operand" "f")
>+	    (parallel [(match_operand 2 "const_<indeximm>_operand" "")]))))]
>+  "ISA_HAS_MSA"
>+  "copy_s.<msafmt>\t%0,%w1[%2]"
>+  [(set_attr "type" "simd_copy")
>+   (set_attr "mode" "<MODE>")])

I think the splits should be explicit and therefore generate # for the the
two patterns that will be split for !TARGET_64BIT. The sign_extend should
only be present for V8HI and V16QI modes. There could be value in adding
widening patterns to extend to DImode on 64-bit targets but it may not
trigger much.

>+(define_split
>+  [(set (match_operand:<UNITMODE> 0 "register_operand")
>+	(sign_extend:<UNITMODE>
>+	  (vec_select:<UNITMODE>
>+	    (match_operand:MSA_D 1 "register_operand")
>+	    (parallel [(match_operand 2 "const_0_or_1_operand")]))))]
>+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
>+  [(const_int 0)]
>+{
>+  mips_split_msa_copy_d (operands[0], operands[1], operands[2],
>+			 gen_msa_copy_s_w);
>+  DONE;
>+})
>+
>+;; Note that copy_u.d and copy_u.d_f will be split later if !TARGET_64BIT.
>+(define_insn "msa_copy_u_<msafmt_f>"
>+  [(set (match_operand:<RES> 0 "register_operand" "=d")
>+	(zero_extend:<RES>
>+	  (vec_select:<UNITMODE>
>+	    (match_operand:MSA 1 "register_operand" "f")
>+	    (parallel [(match_operand 2 "const_<indeximm>_operand" "")]))))]
>+  "ISA_HAS_MSA"
>+  "copy_u.<msafmt>\t%0,%w1[%2]"
>+  [(set_attr "type" "simd_copy")
>+   (set_attr "mode" "<MODE>")])

Likewise on all accounts except that V2SI mode should not be included at all
here as we need to use the copy_s for V2SI->SImode on 64-bit targets to get
the correct canonical result.

>+(define_split
>+  [(set (match_operand:<UNITMODE> 0 "register_operand")
>+	(zero_extend:<UNITMODE>
>+	  (vec_select:<UNITMODE>
>+	    (match_operand:MSA_D 1 "register_operand")
>+	    (parallel [(match_operand 2 "const_0_or_1_operand")]))))]
>+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
>+  [(const_int 0)]
>+{
>+  mips_split_msa_copy_d (operands[0], operands[1], operands[2],
>+			 gen_msa_copy_u_w);

This should use copy_s so that when the 32-bit code is used on a 64-bit arch
then the data stays in canonical form. The copy_u should never be used on
a 32-bit target, I don't think it should even exist in the MSA32 spec
actually. I am discussing this with our architecture team and will update
the thread with the outcome.

>+  DONE;
>+})

= mips.c =

>+/* Expand VEC_COND_EXPR, where:
>+   MODE is mode of the result
>+   VIMODE equivalent integer mode
>+   OPERANDS operands of VEC_COND_EXPR.  */
>+
>+void
>+mips_expand_vec_cond_expr (machine_mode mode, machine_mode vimode,
>+			   rtx *operands)
>+{
>+  rtx cond = operands[3];
>+  rtx cmp_op0 = operands[4];
>+  rtx cmp_op1 = operands[5];
>+  rtx cmp_res = gen_reg_rtx (vimode);
>+
>+  mips_expand_msa_cmp (cmp_res, GET_CODE (cond), cmp_op0, cmp_op1);
>+
>+  /* We handle the following cases:
>+     1) r = a CMP b ? -1 : 0
>+     2) r = a CMP b ? -1 : v
>+     3) r = a CMP b ?  v : 0
>+     4) r = a CMP b ? v1 : v2  */
>+
>+  /* Case (1) above.  We only move the results.  */
>+  if (operands[1] == CONSTM1_RTX (vimode)
>+      && operands[2] == CONST0_RTX (vimode))
>+    emit_move_insn (operands[0], cmp_res);
>+  else
>+    {
>+      rtx src1 = gen_reg_rtx (vimode);
>+      rtx src2 = gen_reg_rtx (vimode);
>+      rtx mask = gen_reg_rtx (vimode);
>+      rtx bsel;
>+
>+      /* Move the vector result to use it as a mask.  */
>+      emit_move_insn (mask, cmp_res);
>+
>+      if (register_operand (operands[1], mode))
>+	{
>+	  rtx xop1 = operands[1];
>+	  if (mode != vimode)
>+	    {
>+	      xop1 = gen_reg_rtx (vimode);
>+	      emit_move_insn (xop1, gen_rtx_SUBREG (vimode, operands[1], 0));
>+	    }
>+	  emit_move_insn (src1, xop1);
>+	}
>+      else
>+	/* Case (2) if the below doesn't move the mask to src2.  */
>+	emit_move_insn (src1, mask);

Please assert that operands[1] is constm1.

>+
>+      if (register_operand (operands[2], mode))
>+	{
>+	  rtx xop2 = operands[2];
>+	  if (mode != vimode)
>+	    {
>+	      xop2 = gen_reg_rtx (vimode);
>+	      emit_move_insn (xop2, gen_rtx_SUBREG (vimode, operands[2], 0));
>+	    }
>+	  emit_move_insn (src2, xop2);
>+	}
>+      else
>+	/* Case (3) if the above didn't move the mask to src1.  */
>+	emit_move_insn (src2, mask);

Please assert that operands[2] is const0.

>+
>+      /* We deal with case (4) if the mask wasn't moved to either src1 or src2.
>+	 In any case, we eventually do vector mask-based copy.  */
>+      bsel = gen_rtx_UNSPEC (vimode, gen_rtvec (3, mask, src2, src1),
>+			     UNSPEC_MSA_BSEL_V);
>+      /* The result is placed back to a register with the mask.  */
>+      emit_insn (gen_rtx_SET (mask, bsel));

I guess you expand like this instead of gen_ to avoid having to select the function
based on vimode.

>+      emit_move_insn (operands[0], gen_rtx_SUBREG (mode, mask, 0));
>+    }
>+}
>+

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2015-09-13  9:56 ` Matthew Fortune
@ 2015-10-09 14:45   ` Matthew Fortune
  2016-01-05 16:16     ` Robert Suchanek
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Fortune @ 2015-10-09 14:45 UTC (permalink / raw)
  To: Robert Suchanek, 'Catherine_Moore@mentor.com'
  Cc: 'gcc-patches@gcc.gnu.org'

Hi Robert,

Next batch of comments. This set covers the rest of mips-msa.md.

>+++ b/gcc/config/mips/mips-msa.md
>+(define_expand "vec_perm<mode>"
>+  [(match_operand:MSA 0 "register_operand")
>+   (match_operand:MSA 1 "register_operand")
>+   (match_operand:MSA 2 "register_operand")
>+   (match_operand:<VIMODE> 3 "register_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  /* The optab semantics are that index 0 selects the first element
>+     of operands[1] and the highest index selects the last element
>+     of operands[2].  This is the oppossite order from "vshf.df wd,rs,wt"
>+     where index 0 selects the first element of wt and the highest index
>+     selects the last element of ws.  We therefore swap the operands here.  */
>+  emit_insn (gen_msa_vshf<mode> (operands[0], operands[3], operands[2],
>+				 operands[1]));
>+  DONE;
>+})

Can you make this the real instruction instead of msa_vshf and give it a
proper pattern (vec_select, vec_concat) etc. Swap the builtin to target
this pattern and swap the operands for the builtin expansion in C code
like you have done for some other patterns.

>+(define_expand "neg<mode>2"
>+  [(match_operand:IMSA 0 "register_operand")
>+   (match_operand:IMSA 1 "register_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  rtx reg = gen_reg_rtx (<MODE>mode);
>+  emit_insn (gen_msa_ldi<mode> (reg, const0_rtx));
>+  emit_insn (gen_sub<mode>3 (operands[0], reg, operands[1]));
>+  DONE;
>+})
>+
>+(define_expand "neg<mode>2"
>+  [(match_operand:FMSA 0 "register_operand")
>+   (match_operand:FMSA 1 "register_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  rtx reg = gen_reg_rtx (<MODE>mode);
>+  emit_move_insn (reg, CONST0_RTX (<MODE>mode));
>+  emit_insn (gen_sub<mode>3 (operands[0], reg, operands[1]));
>+  DONE;
>+})

Can't these two collapse into one like this?

(define_expand "neg<mode>2"
  [(set (match_operand:MSA 0 "register_operand")
        (minus:MSA (match_dup 2)
                   (match_operand:MSA 1 "register_operand")))]
  "ISA_HAS_MSA"
{
  operands[2] = CONST0_RTX (<MODE>mode);
})

I'd hope the const_vector then gets emitted as an LDI? I haven't
checked that there is a pattern for using LDI for FP const_vector
moves.

>+(define_expand "msa_ldi<mode>"
>+  [(match_operand:IMSA 0 "register_operand")
>+   (match_operand 1 "const_imm10_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  unsigned n_elts = GET_MODE_NUNITS (<MODE>mode);
>+  rtvec v = rtvec_alloc (n_elts);
>+  HOST_WIDE_INT val = INTVAL (operands[1]);
>+  unsigned int i;
>+
>+  if (<MODE>mode != V16QImode)
>+    {
>+      unsigned shift = HOST_BITS_PER_WIDE_INT - 10;
>+      val = trunc_int_for_mode ((val << shift) >> shift, <UNITMODE>mode);
>+    }
>+  else
>+    val = trunc_int_for_mode (val, <UNITMODE>mode);
>+
>+  for (i = 0; i < n_elts; i++)
>+    RTVEC_ELT (v, i) = GEN_INT (val);
>+  emit_move_insn (operands[0],
>+		  gen_rtx_CONST_VECTOR (<MODE>mode, v));
>+  DONE;
>+})

This is really weird. We shouldn't be simply discarding bits that don't fit.
This needs to accept all immediates and generate the correct code to
get a replicated constant of that value into a register. I think it is
probably OK to trunc_int_for_mode on the original 'val' for the
<UNIT>mode but anything out of range for V*HI/SI/DI needs to be expanded
properly.

Please do not gen_msa_ldi anywhere other than from MSA builtins. There is
no need just emit a move directly.

>+(define_insn "msa_vshf<mode>"
>+  [(set (match_operand:MSA 0 "register_operand" "=f")
>+	(unspec:MSA [(match_operand:<VIMODE> 1 "register_operand" "0")
>+		     (match_operand:MSA 2 "register_operand" "f")
>+		     (match_operand:MSA 3 "register_operand" "f")]
>+		    UNSPEC_MSA_VSHF))]
>+  "ISA_HAS_MSA"
>+  "vshf.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_sld")
>+   (set_attr "mode" "<MODE>")])

Delete this and switch to using vec_perm directly instead for the builtin.

>+;; 128bit MSA modes only in msa registers or memory.  An exception is allowing

128-bit MSA modes can only exist in MSA registers or memory. ...

>+;; Offset load
>+(define_expand "msa_ld_<msafmt_f>"
>+  [(match_operand:MSA 0 "register_operand")
>+   (match_operand 1 "pmode_register_operand")
>+   (match_operand 2 "aq10<msafmt>_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1],
>+				      INTVAL (operands[2]));
>+  mips_emit_move (operands[0], gen_rtx_MEM (<MODE>mode, addr));
>+  DONE;
>+})
>+
>+;; Offset store
>+(define_expand "msa_st_<msafmt_f>"
>+  [(match_operand:MSA 0 "register_operand")
>+   (match_operand 1 "pmode_register_operand")
>+   (match_operand 2 "aq10<msafmt>_operand")]
>+  "ISA_HAS_MSA"
>+{
>+  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1],
>+			    INTVAL (operands[2]));
>+  mips_emit_move (gen_rtx_MEM (<MODE>mode, addr), operands[0]);
>+  DONE;
>+})

There's no real need to expand these in C code. The patterns can be used
to create the RTL. As an aside, I don't really see the point in intrinsics
to load and store data the same thing can be done from straight C. 
The patterns also can't be used for const or volatile data as their
builtin prototypes are neither. I suspect they should at least support
pointers to const data.

>+;; Integer operations
>+(define_insn "add<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f,f,f")
>+	(plus:IMSA
>+	  (match_operand:IMSA 1 "register_operand" "f,f,f")
>+	  (match_operand:IMSA 2 "reg_or_vector_same_ximm5_operand" "f,Unv5,Uuv5")))]
>+  "ISA_HAS_MSA"
>+{
>+  switch (which_alternative)
>+    {
>+    case 0:
>+      return "addv.<msafmt>\t%w0,%w1,%w2";
>+    case 1:
>+      {
>+	HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
>+
>+	operands[2] = GEN_INT (-val);
>+	return "subvi.<msafmt>\t%w0,%w1,%d2";
>+      }
>+    case 2:
>+      {
>+	HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
>+
>+	operands[2] = GEN_INT (val);
>+	return "addvi.<msafmt>\t%w0,%w1,%d2";

This can use operands[2] unchanged, just use %E2 instead.

>+      }
>+    default:
>+      gcc_unreachable ();
>+    }
>+}
>+  [(set_attr "alu_type" "simd_add")
>+   (set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "sub<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f,f,f")
>+	(minus:IMSA
>+	  (match_operand:IMSA 1 "register_operand" "f,f,f")
>+	  (match_operand:IMSA 2 "reg_or_vector_same_ximm5_operand" "f,Unv5,Uuv5")))]
>+  "ISA_HAS_MSA"
>+{
>+  switch (which_alternative)
>+    {
>+    case 0:
>+      return "subv.<msafmt>\t%w0,%w1,%w2";
>+    case 1:
>+      {
>+	HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
>+
>+	operands[2] = GEN_INT (-val);
>+	return "addvi.<msafmt>\t%w0,%w1,%d2";
>+      }
>+    case 2:
>+      {
>+	HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
>+
>+	operands[2] = GEN_INT (val);
>+	return "subvi.<msafmt>\t%w0,%w1,%d2";
>+      }
>+    default:
>+      gcc_unreachable ();
>+    }
>+}
>+  [(set_attr "alu_type" "simd_add")
>+   (set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

I don't believe we need to handle constants for the sub<mode>3 pattern if
we have them covered by add<mode>3. I can't see any canonicalisation rule
in rtl.texi to say this though but simplify_plus_minus seems to show this
to be true.

>+(define_insn "xorv16qi3"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f,f")
>+	(xor:V16QI
>+	  (match_operand:V16QI 1 "register_operand" "f,f")
>+	  (match_operand:V16QI 2 "reg_or_vector_same_byte_operand" "f,Ubv8")))]
>+  "ISA_HAS_MSA"
>+{
>+  if (which_alternative == 1)
>+    {
>+      operands[2] = CONST_VECTOR_ELT (operands[2], 0);
>+      return "xori.b\t%w0,%w1,%B2";

I don't think you need to use %B2 it is already validated as a replicated
constant vector and in range so %E should be OK? This also means that you
don't need to do the output patterns in C code and can just use
"@
 alt0
 alt1"
				
xori.b seems like it could be useful for other modes too when the immediate
has replicated bit patterns across all bytes of the element size.

>+    }
>+  else
>+    return "xor.v\t%w0,%w1,%w2";
>+}
>+  [(set_attr "type" "simd_logic")
>+   (set_attr "mode" "TI,V16QI")])
>+
>+(define_insn "xor<mode>3"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f,f")
>+	(xor:IMSA_DWH
>+	  (match_operand:IMSA_DWH 1 "register_operand" "f,f")
>+	  (match_operand:IMSA_DWH 2 "reg_or_vector_same_<mode>_set_operand" "f,YC")))]
>+  "ISA_HAS_MSA"
>+{
>+  if (which_alternative == 1)
>+    {
>+      HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
>+      int vlog2 = exact_log2 (val);
>+      gcc_assert (vlog2 != -1);
>+      operands[2] = GEN_INT (vlog2);
>+      return "bnegi.%v0\t%w0,%w1,%2";

If this pattern occurs frequently then this should switch to a formatter and be
handled in print_operand.

>+    }
>+  else
>+    return "xor.v\t%w0,%w1,%w2";
>+}
>+  [(set_attr "type" "simd_logic,simd_bit")
>+   (set_attr "mode" "TI,<MODE>")])
>+
>+(define_insn "iorv16qi3"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f,f")
>+	(ior:V16QI
>+	  (match_operand:V16QI 1 "register_operand" "f,f")
>+	  (match_operand:V16QI 2 "reg_or_vector_same_byte_operand" "f,Ubv8")))]
>+  "ISA_HAS_MSA"
>+{
>+  if (which_alternative == 1)
>+    {
>+      operands[2] = CONST_VECTOR_ELT (operands[2], 0);
>+      return "ori.b\t%w0,%w1,%B2";

Switch to %E and "@ output pattern syntax.

>+    }
>+  else
>+    return "or.v\t%w0,%w1,%w2";
>+}
>+  [(set_attr "type" "simd_logic")
>+   (set_attr "mode" "TI,V16QI")])
>+
>+(define_insn "andv16qi3"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f,f")
>+	(and:V16QI
>+	  (match_operand:V16QI 1 "register_operand" "f,f")
>+	  (match_operand:V16QI 2 "reg_or_vector_same_byte_operand" "f,Ubv8")))]
>+  "ISA_HAS_MSA"
>+{
>+  if (which_alternative == 1)
>+    {
>+      operands[2] = CONST_VECTOR_ELT (operands[2], 0);
>+      return "andi.b\t%w0,%w1,%B2";

Likewise.

>+    }
>+  else
>+    return "and.v\t%w0,%w1,%w2";
>+}
>+  [(set_attr "type" "simd_logic")
>+   (set_attr "mode" "TI,V16QI")])
>+
>+(define_insn "vlshr<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f,f")
>+	(lshiftrt:IMSA
>+	  (match_operand:IMSA 1 "register_operand" "f,f")
>+	  (match_operand:IMSA 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
>+  "ISA_HAS_MSA"
>+{
>+  if (which_alternative == 0)
>+    return "srl.<msafmt>\t%w0,%w1,%w2";
>+
>+  operands[2] = GEN_INT (INTVAL (CONST_VECTOR_ELT (operands[2], 0))
>+			 & <shift_mask>);

Is this here because this pattern is used to expand builtins? It would be
nice if something other than the output pattern had cleaned this up already
and issued a warning about shift overflow. Does this match what happens when
shifting by wider than a scalar type.

>+  return "srli.<msafmt>\t%w0,%w1,%2";
>+}
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "vashr<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f,f")
>+	(ashiftrt:IMSA
>+	  (match_operand:IMSA 1 "register_operand" "f,f")
>+	  (match_operand:IMSA 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
>+  "ISA_HAS_MSA"
>+{
>+  if (which_alternative == 0)
>+    return "sra.<msafmt>\t%w0,%w1,%w2";
>+
>+  operands[2] = GEN_INT (INTVAL (CONST_VECTOR_ELT (operands[2], 0))
>+			 & <shift_mask>);

likewise.

>+  return "srai.<msafmt>\t%w0,%w1,%2";
>+}
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "vashl<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f,f")
>+	(ashift:IMSA
>+	  (match_operand:IMSA 1 "register_operand" "f,f")
>+	  (match_operand:IMSA 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
>+  "ISA_HAS_MSA"
>+{
>+  if (which_alternative == 0)
>+    return "sll.<msafmt>\t%w0,%w1,%w2";
>+
>+  operands[2] = GEN_INT (INTVAL (CONST_VECTOR_ELT (operands[2], 0))
>+			 & <shift_mask>);

likewise.

>+  return "slli.<msafmt>\t%w0,%w1,%2";
>+}
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])
>+

...

>+(define_insn "msa_fmadd_<msafmt>"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(plus:FMSA (mult:FMSA (match_operand:FMSA 2 "register_operand" "f")
>+			      (match_operand:FMSA 3 "register_operand" "f"))
>+		   (match_operand:FMSA 1 "register_operand" "0")))]
>+  "ISA_HAS_MSA"
>+  "fmadd.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_fmadd")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_fmsub_<msafmt>"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(minus:FMSA (match_operand:FMSA 1 "register_operand" "0")
>+		    (mult:FMSA (match_operand:FMSA 2 "register_operand" "f")
>+			       (match_operand:FMSA 3 "register_operand" "f"))))]
>+  "ISA_HAS_MSA"
>+  "fmsub.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_fmadd")
>+   (set_attr "mode" "<MODE>")])

These need to be usable from the fma*4 pattern(s) as well.

>+;; Built-in functions
>+(define_insn "msa_add_a_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(plus:IMSA (abs:IMSA (match_operand:IMSA 1 "register_operand" "f"))
>+		   (abs:IMSA (match_operand:IMSA 2 "register_operand" "f"))))]
>+  "ISA_HAS_MSA"
>+  "add_a.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_adds_a_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(ss_plus:IMSA
>+	  (abs:IMSA (match_operand:IMSA 1 "register_operand" "f"))
>+	  (abs:IMSA (match_operand:IMSA 2 "register_operand" "f"))))]
>+  "ISA_HAS_MSA"
>+  "adds_a.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

This can be follow on work but we should be including fixed point vector
types for these I think. This applies to the following patterns
too.

>+
>+(define_insn "ssadd<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(ss_plus:IMSA (match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "adds_s.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "usadd<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(us_plus:IMSA (match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "adds_u.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+

>+(define_expand "msa_addvi_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand")
>+	(plus:IMSA (match_operand:IMSA 1 "register_operand")
>+		   (match_operand 2 "const_uimm5_operand")))]
>+  "ISA_HAS_MSA"
>+{
>+  unsigned n_elts = GET_MODE_NUNITS (<MODE>mode);
>+  rtvec v = rtvec_alloc (n_elts);
>+  HOST_WIDE_INT val = INTVAL (operands[2]);
>+  unsigned int i;
>+
>+  for (i = 0; i < n_elts; i++)
>+    RTVEC_ELT (v, i) = GEN_INT (val);
>+
>+  emit_insn (gen_msa_addvi_<msafmt>_insn (operands[0], operands[1],
>+					  gen_rtx_CONST_VECTOR (<MODE>mode, v)));
>+  DONE;
>+})

This pattern is too common. Please expand this in the builtin expand
C code. 

>+
>+(define_expand "msa_andi_b"
>+  [(set (match_operand:V16QI 0 "register_operand")
>+	(and:V16QI (match_operand:V16QI 1 "register_operand")
>+		   (match_operand:QI 2 "const_uimm8_operand")))]
>+  "ISA_HAS_MSA"
>+{
>+  rtvec v = rtvec_alloc (16);
>+  HOST_WIDE_INT val = INTVAL (operands[2]);
>+  unsigned int i;
>+
>+  for (i = 0; i < 16; i++)
>+    RTVEC_ELT (v, i) = GEN_INT (val);
>+
>+  emit_insn (gen_msa_andi_b_insn (operands[0], operands[1],
>+				  gen_rtx_CONST_VECTOR (V16QImode, v)));
>+  DONE;
>+})

likewise.

>+
>+(define_insn "msa_addvi_<msafmt>_insn"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(plus:IMSA
>+	  (match_operand:IMSA 1 "register_operand" "f")
>+	  (match_operand:IMSA 2 "const_vector_same_uimm5_operand" "")))]
>+  "ISA_HAS_MSA"
>+  "addvi.<msafmt>\t%w0,%w1,%E2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

Why isn't this pattern part of the general msa add pattern which should
support both constants and registers as operand 2? It is also missing
a constraint for the immediate.

>+
>+(define_insn "msa_andi_b_insn"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f")
>+	(and:V16QI
>+	  (match_operand:V16QI 1 "register_operand" "f")
>+	  (match_operand:V16QI 2 "const_vector_same_uimm8_operand" "")))]

Likewise.

>+  "ISA_HAS_MSA"
>+  "andi.b\t%w0,%w1,%E2"
>+  [(set_attr "type" "simd_logic")
>+   (set_attr "mode" "V16QI")])
>+

END USEFUL commnets

>+(define_insn "msa_bclr_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_BCLR))]
>+  "ISA_HAS_MSA"
>+  "bclr.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_bit")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_bclri_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_BCLRI))]
>+  "ISA_HAS_MSA"
>+  "bclri.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_bit")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_binsl_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
>+		      (match_operand:IMSA 2 "register_operand" "f")
>+		      (match_operand:IMSA 3 "register_operand" "f")]
>+		     UNSPEC_MSA_BINSL))]
>+  "ISA_HAS_MSA"
>+  "binsl.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_bitins")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_binsli_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
>+		      (match_operand:IMSA 2 "register_operand" "f")
>+		      (match_operand 3 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_BINSLI))]
>+  "ISA_HAS_MSA"
>+  "binsli.<msafmt>\t%w0,%w2,%3"
>+  [(set_attr "type" "simd_bitins")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_binsr_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
>+		      (match_operand:IMSA 2 "register_operand" "f")
>+		      (match_operand:IMSA 3 "register_operand" "f")]
>+		     UNSPEC_MSA_BINSR))]
>+  "ISA_HAS_MSA"
>+  "binsr.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_bitins")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_binsri_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
>+		      (match_operand:IMSA 2 "register_operand" "f")
>+		      (match_operand 3 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_BINSRI))]
>+  "ISA_HAS_MSA"
>+  "binsri.<msafmt>\t%w0,%w2,%3"
>+  [(set_attr "type" "simd_bitins")
>+   (set_attr "mode" "<MODE>")])

As a follow up these instructions may be represenatble with standard
RTL.

>+(define_insn "msa_bmnz_v_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
>+		      (match_operand:IMSA 2 "register_operand" "f")
>+		      (match_operand:IMSA 3 "register_operand" "f")]
>+		     UNSPEC_MSA_BMNZ_V))]
>+  "ISA_HAS_MSA"
>+  "bmnz.v\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_bitmov")
>+   (set_attr "mode" "TI")])

This is representable in RTL. AND, IOR, NOT etc.

>+
>+(define_insn "msa_bmnzi_b"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f")
>+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
>+		       (match_operand:V16QI 2 "register_operand" "f")
>+		       (match_operand 3 "const_uimm8_operand" "")]
>+		      UNSPEC_MSA_BMNZI_B))]
>+  "ISA_HAS_MSA"
>+  "bmnzi.b\t%w0,%w2,%3"
>+  [(set_attr "type" "simd_bitmov")
>+   (set_attr "mode" "V16QI")])
>+
>+(define_insn "msa_bmz_v_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
>+		      (match_operand:IMSA 2 "register_operand" "f")
>+		      (match_operand:IMSA 3 "register_operand" "f")]
>+		     UNSPEC_MSA_BMZ_V))]
>+  "ISA_HAS_MSA"
>+  "bmz.v\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_bitmov")
>+   (set_attr "mode" "TI")])
>+
>+(define_insn "msa_bmzi_b"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f")
>+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
>+		       (match_operand:V16QI 2 "register_operand" "f")
>+		       (match_operand 3 "const_uimm8_operand" "")]
>+		      UNSPEC_MSA_BMZI_B))]
>+  "ISA_HAS_MSA"
>+  "bmzi.b\t%w0,%w2,%3"
>+  [(set_attr "type" "simd_bitmov")
>+   (set_attr "mode" "V16QI")])

Likewise to here.

>+(define_insn "msa_bneg_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_BNEG))]
>+  "ISA_HAS_MSA"
>+  "bneg.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_bit")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_bnegi_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		       (match_operand 2 "const_msa_branch_operand" "")]
>+		     UNSPEC_MSA_BNEGI))]
>+  "ISA_HAS_MSA"
>+  "bnegi.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_bit")
>+   (set_attr "mode" "<MODE>")])

Can't see any way to represent these at the moment but like all the
unspecs they should be revisited to see if they can be targetted.

>+
>+(define_insn "msa_bsel_v_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
>+		      (match_operand:IMSA 2 "register_operand" "f")
>+		      (match_operand:IMSA 3 "register_operand" "f")]
>+		     UNSPEC_MSA_BSEL_V))]
>+  "ISA_HAS_MSA"
>+  "bsel.v\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_bitmov")
>+   (set_attr "mode" "TI")])

This can also be standard RTL.

>+
>+(define_insn "msa_bseli_b"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f")
>+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
>+		       (match_operand:V16QI 2 "register_operand" "f")
>+		       (match_operand 3 "const_uimm8_operand" "")]
>+		      UNSPEC_MSA_BSELI_B))]
>+  "ISA_HAS_MSA"
>+  "bseli.b\t%w0,%w2,%3"
>+  [(set_attr "type" "simd_bitmov")
>+   (set_attr "mode" "V16QI")])

Likewise.

>+(define_insn "msa_bset_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_BSET))]
>+  "ISA_HAS_MSA"
>+  "bset.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_bit")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_bseti_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_BSETI))]
>+  "ISA_HAS_MSA"
>+  "bseti.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_bit")
>+   (set_attr "mode" "<MODE>")])

Can't see how to do these in standard RTL for now.

For the builtins that accept immediates what happens when a user
provides an out-of-range constant? It would be good to have
warnings rather than ICEs.

>+(define_code_iterator ICC [eq le leu lt ltu])
>+
>+(define_code_attr icc
>+    [(eq  "eq")
>+     (le  "le_s")
>+     (leu "le_u")
>+     (lt  "lt_s")
>+     (ltu "lt_u")])
>+
>+(define_code_attr icci
>+    [(eq  "eqi")
>+     (le  "lei_s")
>+     (leu "lei_u")
>+     (lt  "lti_s")
>+     (ltu "lti_u")])
>+
>+(define_code_attr cmpi
>+    [(eq   "s")
>+     (le   "s")
>+     (leu  "u")
>+     (lt   "s")
>+     (ltu  "u")])
>+
>+(define_insn "msa_c<ICC:icc>_<IMSA:msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(ICC:IMSA (match_operand:IMSA 1 "register_operand" "f")
>+		  (match_operand:IMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "c<ICC:icc>.<IMSA:msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_c<ICC:icci>i_<IMSA:msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(ICC:IMSA
>+	  (match_operand:IMSA 1 "register_operand" "f")
>+	  (match_operand:IMSA 2 "const_vector_same_cmp<ICC:cmpi>imm4_operand" "")))]
>+  "ISA_HAS_MSA"
>+  "c<ICC:icci>.<IMSA:msafmt>\t%w0,%w1,%E2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

Can't these be combined together with two alternatives?

>+
>+(define_insn "msa_c<ICC:icci>_<IMSA:msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(ICC:IMSA (match_operand:IMSA 1 "register_operand" "f")
>+				(match_operand 2 "const_imm5_operand" ""))]
>+		     UNSPEC_MSA_CMPI))]
>+  "ISA_HAS_MSA"
>+  "c<ICC:icci>.<IMSA:msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

Do we need this separate instruction? It should be expanded to have a
replicated constant vector for operand 2 and just use the pattern above.

>+(define_insn "msa_dotp_s_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
>+			 UNSPEC_MSA_DOTP_S))]
>+  "ISA_HAS_MSA"
>+  "dotp_s.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])

We 'could' do this in standard RTL. Whether it would ever match is questionable.

>+(define_insn "msa_dotp_u_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
>+			 UNSPEC_MSA_DOTP_U))]
>+  "ISA_HAS_MSA"
>+  "dotp_u.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_dpadd_s_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:IMSA_DWH 1 "register_operand" "0")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")
>+			  (match_operand:<VHMODE> 3 "register_operand" "f")]
>+			 UNSPEC_MSA_DPADD_S))]
>+  "ISA_HAS_MSA"
>+  "dpadd_s.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_dpadd_u_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:IMSA_DWH 1 "register_operand" "0")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")
>+			  (match_operand:<VHMODE> 3 "register_operand" "f")]
>+			 UNSPEC_MSA_DPADD_U))]
>+  "ISA_HAS_MSA"
>+  "dpadd_u.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_dpsub_s_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:IMSA_DWH 1 "register_operand" "0")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")
>+			  (match_operand:<VHMODE> 3 "register_operand" "f")]
>+			 UNSPEC_MSA_DPSUB_S))]
>+  "ISA_HAS_MSA"
>+  "dpsub_s.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_dpsub_u_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:IMSA_DWH 1 "register_operand" "0")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")
>+			  (match_operand:<VHMODE> 3 "register_operand" "f")]
>+			 UNSPEC_MSA_DPSUB_U))]
>+  "ISA_HAS_MSA"
>+  "dpsub_u.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])

Likewise to here.

>+
>+(define_insn "msa_fclass_<msafmt>"
>+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")]
>+			 UNSPEC_MSA_FCLASS))]
>+  "ISA_HAS_MSA"
>+  "fclass.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_fclass")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_fcaf_<msafmt>"
>+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")
>+			  (match_operand:FMSA 2 "register_operand" "f")]
>+			 UNSPEC_MSA_FCAF))]
>+  "ISA_HAS_MSA"
>+  "fcaf.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fcmp")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_fcune_<FMSA:msafmt>"
>+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")
>+			  (match_operand:FMSA 2 "register_operand" "f")]
>+			 UNSPEC_MSA_FCUNE))]
>+  "ISA_HAS_MSA"
>+  "fcune.<FMSA:msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fcmp")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_code_iterator FCC [unordered ordered eq ne le lt uneq unle unlt])
>+
>+(define_code_attr fcc
>+    [(unordered "fcun")
>+     (ordered   "fcor")
>+     (eq        "fceq")
>+     (ne        "fcne")
>+     (uneq      "fcueq")
>+     (unle      "fcule")
>+     (unlt      "fcult")
>+     (le        "fcle")
>+     (lt        "fclt")])
>+
>+(define_int_iterator FSC_UNS [UNSPEC_MSA_FSAF UNSPEC_MSA_FSUN UNSPEC_MSA_FSOR
>+			      UNSPEC_MSA_FSEQ UNSPEC_MSA_FSNE UNSPEC_MSA_FSUEQ
>+			      UNSPEC_MSA_FSUNE UNSPEC_MSA_FSULE UNSPEC_MSA_FSULT
>+			      UNSPEC_MSA_FSLE UNSPEC_MSA_FSLT])
>+
>+(define_int_attr fsc
>+    [(UNSPEC_MSA_FSAF  "fsaf")
>+     (UNSPEC_MSA_FSUN  "fsun")
>+     (UNSPEC_MSA_FSOR  "fsor")
>+     (UNSPEC_MSA_FSEQ  "fseq")
>+     (UNSPEC_MSA_FSNE  "fsne")
>+     (UNSPEC_MSA_FSUEQ "fsueq")
>+     (UNSPEC_MSA_FSUNE "fsune")
>+     (UNSPEC_MSA_FSULE "fsule")
>+     (UNSPEC_MSA_FSULT "fsult")
>+     (UNSPEC_MSA_FSLE  "fsle")
>+     (UNSPEC_MSA_FSLT  "fslt")])
>+
>+(define_insn "msa_<FCC:fcc>_<FMSA:msafmt>"
>+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>+	(FCC:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")
>+		      (match_operand:FMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "<FCC:fcc>.<FMSA:msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fcmp")
>+   (set_attr "mode" "<MODE>")])

Can't the msa builtins target these patterns directly? Follow on work should
implement "mov<mode>cc" for vector modes.

>+
>+(define_insn "msa_<fsc>_<FMSA:msafmt>"
>+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")
>+			   (match_operand:FMSA 2 "register_operand" "f")]
>+			 FSC_UNS))]
>+  "ISA_HAS_MSA"
>+  "<fsc>.<FMSA:msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fcmp")
>+   (set_attr "mode" "<MODE>")])

i.e. this is not necessary.

>+(define_mode_attr FINT
>+   [(V4SF "V4SI")
>+    (V2DF "V2DI")])
>+
>+(define_mode_attr fint
>+   [(V4SF "v4si")
>+    (V2DF "v2di")])
>+
>+(define_mode_attr FQ
>+   [(V4SF "V8HI")
>+    (V2DF "V4SI")])
>+
>+(define_mode_attr FINTCNV
>+  [(V4SF "I2S")
>+   (V2DF "I2D")])
>+
>+(define_mode_attr FINTCNV_2
>+  [(V4SF "S2I")
>+   (V2DF "D2I")])
>+
>+(define_insn "float<fint><FMSA:mode>2"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(float:FMSA (match_operand:<FINT> 1 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "ffint_s.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "cnv_mode" "<FINTCNV>")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "floatuns<fint><FMSA:mode>2"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(unsigned_float:FMSA (match_operand:<FINT> 1 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "ffint_u.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "cnv_mode" "<FINTCNV>")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_mode_attr FFQ
>+  [(V4SF "V8HI")
>+   (V2DF "V4SI")])
>+
>+(define_insn "msa_ffql_<msafmt>"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(unspec:FMSA [(match_operand:<FQ> 1 "register_operand" "f")]
>+		     UNSPEC_MSA_FFQL))]
>+  "ISA_HAS_MSA"
>+  "ffql.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "cnv_mode" "<FINTCNV>")
>+   (set_attr "mode" "<MODE>")])

There are fixed point vector modes in GCC. Perhaps improve fixed point support
in follow on work.

>+(define_insn "msa_ffqr_<msafmt>"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(unspec:FMSA [(match_operand:<FQ> 1 "register_operand" "f")]
>+		     UNSPEC_MSA_FFQR))]
>+  "ISA_HAS_MSA"
>+  "ffqr.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "cnv_mode" "<FINTCNV>")
>+   (set_attr "mode" "<MODE>")])

Likewise.

>+
>+;; Note used directly by builtins but via the following define_expand.
>+(define_insn "msa_fill_<msafmt>_insn"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(vec_duplicate:IMSA
>+	  (match_operand:<UNITMODE> 1 "reg_or_0_operand" "dJ")))]
>+  "ISA_HAS_MSA"
>+  "fill.<msafmt>\t%w0,%z1"
>+  [(set_attr "type" "simd_fill")
>+   (set_attr "mode" "<MODE>")])
>+
>+;; Expand builtin for HImode and QImode which takes SImode.
>+(define_expand "msa_fill_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand")
>+	(vec_duplicate:IMSA
>+	  (match_operand:<RES> 1 "reg_or_0_operand")))]
>+  "ISA_HAS_MSA"
>+{
>+  if ((GET_MODE_SIZE (<UNITMODE>mode) < GET_MODE_SIZE (<RES>mode))
>+      && (REG_P (operands[1]) || (GET_CODE (operands[1]) == SUBREG
>+				  && REG_P (SUBREG_REG (operands[1])))))
>+    operands[1] = lowpart_subreg (<UNITMODE>mode, operands[1], <RES>mode);
>+  emit_insn (gen_msa_fill_<msafmt>_insn (operands[0], operands[1]));
>+  DONE;
>+})

Let's not do this. Just change the builtin prototypes instead and have
them match naturally.

>+(define_insn "msa_fill_<msafmt_f>"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(vec_duplicate:FMSA
>+	  (match_operand:<UNITMODE> 1 "reg_or_0_operand" "dJ")))]
>+  "ISA_HAS_MSA"
>+  "fill.<msafmt>\t%w0,%z1"
>+  [(set_attr "type" "simd_fill")
>+   (set_attr "mode" "<MODE>")])

Separate out the zero alternative and use LDI. I'm not certain but I
think we should be emitting '#' for the V2DI and V2DF mode non-zero
cases, for safety if nothing else.

>+;; Note that fill.d and fill.d_f will be split later if !TARGET_64BIT.
>+(define_split
>+  [(set (match_operand:V2DI 0 "register_operand")
>+	(vec_duplicate:V2DI
>+	  (match_operand:DI 1 "reg_or_0_operand")))]
>+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
>+  [(const_int 0)]
>+{
>+  mips_split_msa_fill_d (operands[0], operands[1]);
>+  DONE;
>+})

The 'or_0' should be unnecessary here.

>+(define_insn "smax<mode>3"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(smax:FMSA (match_operand:FMSA 1 "register_operand" "f")
>+		   (match_operand:FMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "fmax.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fminmax")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "umax<mode>3"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(umax:FMSA (match_operand:FMSA 1 "register_operand" "f")
>+		   (match_operand:FMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "fmax_a.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fminmax")
>+   (set_attr "mode" "<MODE>")])

Are you sure the semantics of fmax_a and umax match? I.e. is umax
supposed to use the magitude of an FP value and ignore the sign
bit. It seems a bit odd to me.

>+(define_insn "smin<mode>3"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(smin:FMSA (match_operand:FMSA 1 "register_operand" "f")
>+		   (match_operand:FMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "fmin.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fminmax")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "umin<mode>3"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(umin:FMSA (match_operand:FMSA 1 "register_operand" "f")
>+		   (match_operand:FMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "fmin_a.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fminmax")
>+   (set_attr "mode" "<MODE>")])

Likewise.

>+
>+(define_insn "msa_frcp_<msafmt>"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(unspec:FMSA [(match_operand:FMSA 1 "register_operand" "f")]
>+		     UNSPEC_MSA_FRCP))]
>+  "ISA_HAS_MSA"
>+  "frcp.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_fdiv")
>+   (set_attr "mode" "<MODE>")])

This can be standard RTL with unsafe math opts I believe. Can be left
as a follow on.

>+(define_insn "msa_frsqrt_<msafmt>"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(unspec:FMSA [(match_operand:FMSA 1 "register_operand" "f")]
>+		     UNSPEC_MSA_FRSQRT))]
>+  "ISA_HAS_MSA"
>+  "frsqrt.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_fdiv")
>+   (set_attr "mode" "<MODE>")])

Likewise.

>+(define_insn "msa_ftq_h"
>+  [(set (match_operand:V8HI 0 "register_operand" "=f")
>+	(unspec:V8HI [(match_operand:V4SF 1 "register_operand" "f")
>+		      (match_operand:V4SF 2 "register_operand" "f")]
>+		     UNSPEC_MSA_FTQ))]
>+  "ISA_HAS_MSA"
>+  "ftq.h\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "cnv_mode" "S2I")
>+   (set_attr "mode" "V4SF")])

These should be made into standard RTL as follow on work to add fixed
point modes.

>+(define_insn "msa_ftq_w"
>+  [(set (match_operand:V4SI 0 "register_operand" "=f")
>+	(unspec:V4SI [(match_operand:V2DF 1 "register_operand" "f")
>+		      (match_operand:V2DF 2 "register_operand" "f")]
>+		     UNSPEC_MSA_FTQ))]
>+  "ISA_HAS_MSA"
>+  "ftq.w\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "cnv_mode" "D2I")
>+   (set_attr "mode" "V2DF")])

Likewise.

>+
>+(define_insn "msa_hadd_s_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
>+			 UNSPEC_MSA_HADD_S))]
>+  "ISA_HAS_MSA"
>+  "hadd_s.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

This can be standard RTL.

>+
>+(define_insn "msa_hadd_u_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
>+			 UNSPEC_MSA_HADD_U))]
>+  "ISA_HAS_MSA"
>+  "hadd_u.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_hsub_s_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
>+			 UNSPEC_MSA_HSUB_S))]
>+  "ISA_HAS_MSA"
>+  "hsub_s.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_hsub_u_<msafmt>"
>+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
>+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
>+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
>+			 UNSPEC_MSA_HSUB_U))]
>+  "ISA_HAS_MSA"
>+  "hsub_u.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

Likewise to here.

I have not reviewed interleave patterns in detail as they all look OK and
I trust they 'do the right thing'.

>+(define_insn "msa_madd_q_<msafmt>"
>+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
>+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "0")
>+			 (match_operand:IMSA_WH 2 "register_operand" "f")
>+			 (match_operand:IMSA_WH 3 "register_operand" "f")]
>+			UNSPEC_MSA_MADD_Q))]
>+  "ISA_HAS_MSA"
>+  "madd_q.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_maddr_q_<msafmt>"
>+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
>+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "0")
>+			 (match_operand:IMSA_WH 2 "register_operand" "f")
>+			 (match_operand:IMSA_WH 3 "register_operand" "f")]
>+			UNSPEC_MSA_MADDR_Q))]
>+  "ISA_HAS_MSA"
>+  "maddr_q.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])

Follow on work for fixed point.

>+(define_insn "msa_max_a_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_MAX_A))]
>+  "ISA_HAS_MSA"
>+  "max_a.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

This could be used as part of mov<mode>cc support.

>+(define_insn "smax<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(smax:IMSA (match_operand:IMSA 1 "register_operand" "f")
>+		   (match_operand:IMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "max_s.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "umax<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(umax:IMSA (match_operand:IMSA 1 "register_operand" "f")
>+		   (match_operand:IMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "max_u.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_maxi_s_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_imm5_operand" "")]
>+		     UNSPEC_MSA_MAXI_S))]
>+  "ISA_HAS_MSA"
>+  "maxi_s.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

This should be part of the instructions above with a second alternative.

>+
>+(define_insn "msa_maxi_u_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_uimm5_operand" "")]
>+		     UNSPEC_MSA_MAXI_U))]
>+  "ISA_HAS_MSA"
>+  "maxi_u.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

Likewise.

>+(define_insn "msa_min_a_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_MIN_A))]
>+  "ISA_HAS_MSA"
>+  "min_a.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

This could be used as part of mov<mode>cc support.

>+(define_insn "smin<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(smin:IMSA (match_operand:IMSA 1 "register_operand" "f")
>+		   (match_operand:IMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "min_s.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "umin<mode>3"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(umin:IMSA (match_operand:IMSA 1 "register_operand" "f")
>+		   (match_operand:IMSA 2 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "min_u.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_mini_s_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_imm5_operand" "")]
>+		     UNSPEC_MSA_MINI_S))]
>+  "ISA_HAS_MSA"
>+  "mini_s.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

This should be merged with the instructions above.

>+
>+(define_insn "msa_mini_u_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_uimm5_operand" "")]
>+		     UNSPEC_MSA_MINI_U))]
>+  "ISA_HAS_MSA"
>+  "mini_u.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

Likewise.

>+
>+(define_insn "msa_msub_q_<msafmt>"
>+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
>+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "0")
>+			 (match_operand:IMSA_WH 2 "register_operand" "f")
>+			 (match_operand:IMSA_WH 3 "register_operand" "f")]
>+			UNSPEC_MSA_MSUB_Q))]
>+  "ISA_HAS_MSA"
>+  "msub_q.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_msubr_q_<msafmt>"
>+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
>+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "0")
>+			 (match_operand:IMSA_WH 2 "register_operand" "f")
>+			 (match_operand:IMSA_WH 3 "register_operand" "f")]
>+			UNSPEC_MSA_MSUBR_Q))]
>+  "ISA_HAS_MSA"
>+  "msubr_q.<msafmt>\t%w0,%w2,%w3"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_mul_q_<msafmt>"
>+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
>+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "f")
>+			 (match_operand:IMSA_WH 2 "register_operand" "f")]
>+			UNSPEC_MSA_MUL_Q))]
>+  "ISA_HAS_MSA"
>+  "mul_q.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_mulr_q_<msafmt>"
>+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
>+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "f")
>+			 (match_operand:IMSA_WH 2 "register_operand" "f")]
>+			UNSPEC_MSA_MULR_Q))]
>+  "ISA_HAS_MSA"
>+  "mulr_q.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_mul")
>+   (set_attr "mode" "<MODE>")])

Rework when adding proper fixed point mode support.

>+(define_insn "msa_nloc_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")]
>+		     UNSPEC_MSA_NLOC))]
>+  "ISA_HAS_MSA"
>+  "nloc.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_bit")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "clz<mode>2"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(clz:IMSA (match_operand:IMSA 1 "register_operand" "f")))]
>+  "ISA_HAS_MSA"
>+  "nlzc.<msafmt>\t%w0,%w1"
>+  [(set_attr "type" "simd_bit")
>+   (set_attr "mode" "<MODE>")])

Can you confirm that nlzc has a natural value when given an operand of zero?
I.e. 8 for B 16 for H 32 for W and 64 for D?

Also CLZ_DEFINED_VALUE_AT_ZERO looks like it needs updating to know the
bitsize of elements in a vector rather than the whole vector. I.e. I think
it will say the result is 128 for any vector mode.

>+
>+(define_insn "msa_nor_v_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(and:IMSA (not:IMSA (match_operand:IMSA 1 "register_operand" "f"))
>+		  (not:IMSA (match_operand:IMSA 2 "register_operand" "f"))))]
>+  "ISA_HAS_MSA"
>+  "nor.v\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_logic")
>+   (set_attr "mode" "TI")])
>+
>+(define_insn "msa_nori_b"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f")
>+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
>+		       (match_operand 2 "const_uimm8_operand" "")]
>+		      UNSPEC_MSA_NORI_B))]
>+  "ISA_HAS_MSA"
>+  "nori.b\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_logic")
>+   (set_attr "mode" "V16QI")])

This can be standard rtl and joined with the previous insn. The immediate
for HI/SI/DI patterns simply must have replicates byte values as well as
elements.

>+
>+(define_insn "msa_ori_b"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f")
>+	(ior:V16QI (match_operand:V16QI 1 "register_operand" "f")
>+		   (match_operand 2 "const_uimm8_operand" "")))]
>+  "ISA_HAS_MSA"
>+  "ori.b\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_logic")
>+   (set_attr "mode" "V16QI")])

This should be part of register based OR with the same rules as above.

>+(define_insn "msa_shf_<msafmt>"
>+  [(set (match_operand:IMSA_WHB 0 "register_operand" "=f")
>+	(unspec:IMSA_WHB [(match_operand:IMSA_WHB 1 "register_operand" "f")
>+			  (match_operand 2 "const_uimm8_operand" "")]
>+			 UNSPEC_MSA_SHF))]
>+  "ISA_HAS_MSA"
>+  "shf.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_shf")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_shf_w_f"
>+  [(set (match_operand:V4SF 0 "register_operand" "=f")
>+	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")
>+		      (match_operand 2 "const_uimm8_operand" "")]
>+		     UNSPEC_MSA_SHF))]
>+  "ISA_HAS_MSA"
>+  "shf.w\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_shf")
>+   (set_attr "mode" "V4SF")])

These seem representable in RTL albeit ugly. Perhaps look at this in a
follow up.

>+(define_insn "msa_slli_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_SLLI))]
>+  "ISA_HAS_MSA"
>+  "slli.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])

vashl, vashr, vlshr SPNs cover these. At least some if not all are
already defined in this file so place switch the builtins to target
them instead of these unspecs.

>+(define_insn "msa_srai_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_SRAI))]
>+  "ISA_HAS_MSA"
>+  "srai.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_srar_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_SRAR))]
>+  "ISA_HAS_MSA"
>+  "srar.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])

The shifts with rounding can stay of course.

>+
>+(define_insn "msa_srari_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_SRARI))]
>+  "ISA_HAS_MSA"
>+  "srari.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_srli_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_SRLI))]
>+  "ISA_HAS_MSA"
>+  "srli.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_srlr_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_SRLR))]
>+  "ISA_HAS_MSA"
>+  "srlr.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_srlri_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_<bitimm>_operand" "")]
>+		     UNSPEC_MSA_SRLRI))]
>+  "ISA_HAS_MSA"
>+  "srlri.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_shift")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_subs_s_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_SUBS_S))]
>+  "ISA_HAS_MSA"
>+  "subs_s.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

Due for update in a follow up adding fixed point mode support.

>+
>+(define_insn "msa_subs_u_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_SUBS_U))]
>+  "ISA_HAS_MSA"
>+  "subs_u.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_subsuu_s_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_SUBSUU_S))]
>+  "ISA_HAS_MSA"
>+  "subsuu_s.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_subsus_u_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand:IMSA 2 "register_operand" "f")]
>+		     UNSPEC_MSA_SUBSUS_U))]
>+  "ISA_HAS_MSA"
>+  "subsus_u.<msafmt>\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

likewise to here.

>+
>+(define_insn "msa_subvi_<msafmt>"
>+  [(set (match_operand:IMSA 0 "register_operand" "=f")
>+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
>+		      (match_operand 2 "const_uimm5_operand" "")]
>+		     UNSPEC_MSA_SUBVI))]
>+  "ISA_HAS_MSA"
>+  "subvi.<msafmt>\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_int_arith")
>+   (set_attr "mode" "<MODE>")])

This should be part of the simple sub<mode>3 pattern and that pattern
should be used by the builtin.

>+
>+(define_insn "msa_xori_b"
>+  [(set (match_operand:V16QI 0 "register_operand" "=f")
>+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
>+		       (match_operand 2 "const_uimm8_operand" "")]
>+		      UNSPEC_MSA_XORI_B))]
>+  "ISA_HAS_MSA"
>+  "xori.b\t%w0,%w1,%2"
>+  [(set_attr "type" "simd_logic")
>+   (set_attr "mode" "V16QI")])

Similar issues as ANDI and ORI.

>+(define_insn "msa_sld_<msafmt_f>"
>+  [(set (match_operand:MSA 0 "register_operand" "=f")
>+	(unspec:MSA [(match_operand:MSA 1 "register_operand" "0")
>+		     (match_operand:MSA 2 "register_operand" "f")
>+		     (match_operand:SI 3 "reg_or_0_operand" "dJ")]
>+		    UNSPEC_MSA_SLD))]
>+  "ISA_HAS_MSA"
>+  "sld.<msafmt>\t%w0,%w2[%z3]"
>+  [(set_attr "type" "simd_sld")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_sldi_<msafmt_f>"
>+  [(set (match_operand:MSA 0 "register_operand" "=f")
>+	(unspec:MSA [(match_operand:MSA 1 "register_operand" "0")
>+		     (match_operand:MSA 2 "register_operand" "f")
>+		     (match_operand 3 "const_<indeximm>_operand" "")]
>+		    UNSPEC_MSA_SLDI))]
>+  "ISA_HAS_MSA"
>+  "sldi.<msafmt>\t%w0,%w2[%3]"
>+  [(set_attr "type" "simd_sld")
>+   (set_attr "mode" "<MODE>")])
>+
>+(define_insn "msa_splat_<msafmt_f>"
>+  [(set (match_operand:MSA 0 "register_operand" "=f")
>+	(unspec:MSA [(match_operand:MSA 1 "register_operand" "f")
>+		     (match_operand:SI 2 "reg_or_0_operand" "dJ")]
>+		    UNSPEC_MSA_SPLAT))]
>+  "ISA_HAS_MSA"
>+  "splat.<msafmt>\t%w0,%w1[%z2]"
>+  [(set_attr "type" "simd_splat")
>+   (set_attr "mode" "<MODE>")])

This doesn't need 'or_0' support as the splati below covers it. I think
this is targetable but seem to recall an off-list discussion that says
it can never be generated even if represented with an appropriate
pattern.

>+
>+(define_insn "msa_splati_<msafmt_f>"
>+  [(set (match_operand:MSA 0 "register_operand" "=f")
>+	(unspec:MSA [(match_operand:MSA 1 "register_operand" "f")
>+		     (match_operand 2 "const_<indeximm>_operand" "")]
>+		    UNSPEC_MSA_SPLATI))]
>+  "ISA_HAS_MSA"
>+  "splati.<msafmt>\t%w0,%w1[%2]"
>+  [(set_attr "type" "simd_splat")
>+   (set_attr "mode" "<MODE>")])

This seems targettable with standard RTL.

>+
>+;; Operand 1 is a scalar.
>+(define_insn "msa_splati_<msafmt_f>_s"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(unspec:FMSA [(match_operand:<UNITMODE> 1 "register_operand" "f")
>+		      (match_operand 2 "const_<indeximm>_operand" "")]
>+		     UNSPEC_MSA_SPLATI))]
>+  "ISA_HAS_MSA"
>+  "splati.<msafmt>\t%w0,%w1[%2]"
>+  [(set_attr "type" "simd_splat")
>+   (set_attr "mode" "<MODE>")])

I don't understand this pattern. Why is there an element selector for
a scalar input. Isn't it just element 0 hard-coded?

>+(define_insn "msa_cfcmsa"
>+  [(set (match_operand:SI 0 "register_operand" "=d")
>+	(unspec_volatile:SI [(match_operand 1 "const_uimm5_operand" "")]
>+			    UNSPEC_MSA_CFCMSA))]
>+  "ISA_HAS_MSA"
>+  "cfcmsa\t%0,$%1"
>+  [(set_attr "type" "simd_cmsa")
>+   (set_attr "mode" "SI")])
>+
>+(define_insn "msa_ctcmsa"
>+  [(unspec_volatile [(match_operand 0 "const_uimm5_operand" "")
>+		     (match_operand:SI 1 "register_operand" "d")]
>+		    UNSPEC_MSA_CTCMSA)]
>+  "ISA_HAS_MSA"
>+  "ctcmsa\t$%0,%1"
>+  [(set_attr "type" "simd_cmsa")
>+   (set_attr "mode" "SI")])

Just noting that the CTCMSA instruction is defined with arguments backwards
to other ctc* instructions in the base arch. The copro register always goes
second apart from CTCMSA.

>+
>+(define_insn "msa_fexdo_h"
>+  [(set (match_operand:V8HI 0 "register_operand" "=f")
>+	(unspec:V8HI [(match_operand:V4SF 1 "register_operand" "f")
>+		      (match_operand:V4SF 2 "register_operand" "f")]
>+		     UNSPEC_MSA_FEXDO))]
>+  "ISA_HAS_MSA"
>+  "fexdo.h\t%w0,%w1,%w2"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "mode" "V8HI")])
>+
>+(define_insn "msa_fexdo_w"
>+  [(set (match_operand:V4SF 0 "register_operand" "=f")
>+	(vec_concat:V4SF
>+	  (float_truncate:V2SF (match_operand:V2DF 1 "register_operand" "f"))
>+	  (float_truncate:V2SF (match_operand:V2DF 2 "register_operand" "f"))))]
>+  "ISA_HAS_MSA"
>+  "fexdo.w\t%w0,%w2,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "mode" "V4SF")])
>+
>+(define_insn "msa_fexupl_w"
>+  [(set (match_operand:V4SF 0 "register_operand" "=f")
>+	(unspec:V4SF [(match_operand:V8HI 1 "register_operand" "f")]
>+		     UNSPEC_MSA_FEXUPL))]
>+  "ISA_HAS_MSA"
>+  "fexupl.w\t%w0,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "mode" "V4SF")])
>+
>+(define_insn "msa_fexupl_d"
>+  [(set (match_operand:V2DF 0 "register_operand" "=f")
>+	(unspec:V2DF [(match_operand:V4SF 1 "register_operand" "f")]
>+		     UNSPEC_MSA_FEXUPL))]
>+  "ISA_HAS_MSA"
>+  "fexupl.d\t%w0,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "mode" "V2DF")])

This should be possible with float_extend similar to fexdo_w

>+
>+(define_insn "msa_fexupr_w"
>+  [(set (match_operand:V4SF 0 "register_operand" "=f")
>+       (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "f")]
>+		    UNSPEC_MSA_FEXUPR))]
>+  "ISA_HAS_MSA"
>+  "fexupr.w\t%w0,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "mode" "V4SF")])

Likewise.

>+(define_insn "msa_fexupr_d"
>+  [(set (match_operand:V2DF 0 "register_operand" "=f")
>+       (unspec:V2DF [(match_operand:V4SF 1 "register_operand" "f")]
>+		    UNSPEC_MSA_FEXUPR))]
>+  "ISA_HAS_MSA"
>+  "fexupr.d\t%w0,%w1"
>+  [(set_attr "type" "simd_fcvt")
>+   (set_attr "mode" "V2DF")])
>+
>+(define_insn "msa_branch_nz_v_<msafmt_f>"
>+ [(set (pc) (if_then_else
>+	      (ne (unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
>+			     UNSPEC_MSA_BNZ_V)
>+		  (match_operand:SI 2 "const_0_operand"))
>+		  (label_ref (match_operand 0))
>+		  (pc)))]
>+ "ISA_HAS_MSA"
>+{
>+  return mips_output_conditional_branch (insn, operands,
>+					 MIPS_BRANCH ("bnz.v", "%w1,%0"),
>+					 MIPS_BRANCH ("bz.v", "%w1,%0"));
>+}
>+ [(set_attr "type" "simd_branch")
>+  (set_attr "mode" "TI")])

This needs updating for compact branch logic. See the floating point branches
for reference, it must be attribute compact_form==never.
This can be a proper RTL pattern quite easily with a comparison against
a const_vector with all zeros. The eq and ne cases should be merged
together with a template (see other branches).

>+
>+(define_expand "msa_bnz_v_<msafmt_f>"
>+  [(set (match_operand:SI 0 "register_operand" "=d")
>+	(unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
>+		   UNSPEC_MSA_TSTNZ_V))]
>+  "ISA_HAS_MSA"
>+{
>+  mips_expand_msa_branch (operands, gen_msa_branch_nz_v_<MSA:msafmt_f>);
>+  DONE;
>+})

I find these instruction definitions very weird. Why is it a good thing to
expose branches as intrinsics that end up with a GPR value that has to then
be used in another branch to create control flow?
Iff there is some value to these then the function name for expanding them
is a bit misleading as it is not really branching anywhere it is just setting
a GPR to 1 or 0 which happens to include a set of if-then-else branchs. It
would be nice to avoid having a define_expand at all here and use a custom
builtin type to expand it entirely in C.

>+(define_insn "msa_branchz_v_<msafmt_f>"
>+ [(set (pc) (if_then_else
>+	      (eq (unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
>+			     UNSPEC_MSA_BZ_V)
>+		  (match_operand:SI 2 "const_0_operand"))
>+		  (label_ref (match_operand 0))
>+		  (pc)))]
>+ "ISA_HAS_MSA"
>+{
>+  return mips_output_conditional_branch (insn, operands,
>+					 MIPS_BRANCH ("bz.v", "%w1,%0"),
>+					 MIPS_BRANCH ("bnz.v", "%w1,%0"));
>+}
>+ [(set_attr "type" "simd_branch")
>+  (set_attr "mode" "TI")])

Merge with msa_branch_nz_v_<msafmt_f>.

>+(define_expand "msa_bz_v_<msafmt_f>"
>+  [(set (match_operand:SI 0 "register_operand" "=d")
>+	(unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
>+		   UNSPEC_MSA_TSTZ_V))]
>+  "ISA_HAS_MSA"
>+{
>+  mips_expand_msa_branch (operands, gen_msa_branchz_v_<MSA:msafmt_f>);
>+  DONE;
>+})

Similar to msa_bnz_v_<msafmt_f>;

>+(define_insn "msa_branchnz_<msafmt_f>"
>+ [(set (pc) (if_then_else
>+	      (ne (unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
>+			     UNSPEC_MSA_BNZ)
>+		  (match_operand:SI 2 "const_0_operand"))
>+		  (label_ref (match_operand 0))
>+		  (pc)))]
>+ "ISA_HAS_MSA"
>+{
>+  return mips_output_conditional_branch (insn, operands,
>+					 MIPS_BRANCH ("bnz.<msafmt>", "%w1,%0"),
>+					 MIPS_BRANCH ("bz.<msafmt>", "%w1,%0"));
>+
>+}
>+

whitespace.

>+ [(set_attr "type" "simd_branch")
>+  (set_attr "mode" "<MODE>")])

As above I find the branch instructions a bit pointless unless we can use them
as actual branches. This branch could be represented as an 'IOR' of all elements
and a test against zero. I don't know if that can be generated. These need
some more thought but it can be done as a follow on. These patterns do need
compact_form=never applying.

>+
>+(define_expand "msa_bnz_<msafmt>"
>+  [(set (match_operand:SI 0 "register_operand" "=d")
>+	(unspec:SI [(match_operand:IMSA 1 "register_operand" "f")]
>+		   UNSPEC_MSA_TSTNZ))]
>+  "ISA_HAS_MSA"
>+{
>+  mips_expand_msa_branch (operands, gen_msa_branchnz_<IMSA:msafmt>);
>+  DONE;
>+})
>+
>+(define_insn "msa_branchz_<msafmt>"
>+ [(set (pc) (if_then_else
>+	      (eq (unspec:SI [(match_operand:IMSA 1 "register_operand" "f")]
>+			     UNSPEC_MSA_BZ)
>+		   (match_operand:IMSA 2 "const_0_operand"))
>+		  (label_ref (match_operand 0))
>+		  (pc)))]
>+ "ISA_HAS_MSA"
>+{
>+  return mips_output_conditional_branch (insn, operands,
>+					 MIPS_BRANCH ("bz.<msafmt>", "%w1,%0"),
>+					 MIPS_BRANCH ("bnz.<msafmt>","%w1,%0"));
>+}
>+ [(set_attr "type" "simd_branch")
>+  (set_attr "mode" "<MODE>")])
>+
>+(define_expand "msa_bz_<msafmt>"
>+  [(set (match_operand:SI 0 "register_operand" "=d")
>+	(unspec:SI [(match_operand:IMSA 1 "register_operand" "f")]
>+		   UNSPEC_MSA_TSTZ))]
>+  "ISA_HAS_MSA"
>+{
>+  mips_expand_msa_branch (operands, gen_msa_branchz_<IMSA:msafmt>);
>+  DONE;
>+})
>+
>+;; Note that this instruction treats scalar as vector registers freely.
>+(define_insn "msa_cast_to_vector_<msafmt_f>"
>+  [(set (match_operand:FMSA 0 "register_operand" "=f")
>+	(unspec:FMSA [(match_operand:<UNITMODE> 1 "register_operand" "f")]
>+		     UNSPEC_MSA_CAST_TO_VECTOR))]
>+  "ISA_HAS_MSA"
>+{
>+  if (REGNO (operands[0]) == REGNO (operands[1]))
>+    return "nop\t# Cast %1 to %w0";
>+  else
>+    return "mov.<unitfmt>\t%0,%1\t# Cast %1 to %w0";
>+}
>+  [(set_attr "type" "arith")
>+   (set_attr "mode" "TI")])

This is simply INSVE but implemented in a weird way. Unless you can explain
their value please delete them :-)

>+
>+;; Note that this instruction treats vector as scalar registers freely.
>+(define_insn "msa_cast_to_scalar_<msafmt_f>"
>+  [(set (match_operand:<UNITMODE> 0 "register_operand" "=f")
>+	(unspec:<UNITMODE> [(match_operand:FMSA 1 "register_operand" "f")]
>+			   UNSPEC_MSA_CAST_TO_SCALAR))]
>+  "ISA_HAS_MSA"
>+{
>+  if (REGNO (operands[0]) == REGNO (operands[1]))
>+    return "nop\t# Cast %w1 to %0";
>+  else
>+    return "mov.<unitfmt>\t%0,%1\t# Cast %w1 to %0";
>+}
>+  [(set_attr "type" "arith")
>+   (set_attr "mode" "TI")])

Likewise. Update vec_extract to use a simple move with a subreg instead
and leave the other patterns to sort it out. There may be a bit of complexity
to deal with here but this certainly looks like the wrong way to solve the
problem to me.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2015-10-09 14:45   ` Matthew Fortune
@ 2016-01-05 16:16     ` Robert Suchanek
  2016-01-11 13:26       ` Matthew Fortune
  0 siblings, 1 reply; 13+ messages in thread
From: Robert Suchanek @ 2016-01-05 16:16 UTC (permalink / raw)
  To: Matthew Fortune, 'Catherine_Moore@mentor.com'
  Cc: 'gcc-patches@gcc.gnu.org'

Hi,

Comments inlined. The updated patch will be sent in another email
as this message is already long.

> Hi Robert,
> 
> Next batch of comments. This set covers the rest of mips-msa.md.
> 
> >+++ b/gcc/config/mips/mips-msa.md
> >+(define_expand "vec_perm<mode>"
> >+  [(match_operand:MSA 0 "register_operand")
> >+   (match_operand:MSA 1 "register_operand")
> >+   (match_operand:MSA 2 "register_operand")
> >+   (match_operand:<VIMODE> 3 "register_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  /* The optab semantics are that index 0 selects the first element
> >+     of operands[1] and the highest index selects the last element
> >+     of operands[2].  This is the oppossite order from "vshf.df wd,rs,wt"
> >+     where index 0 selects the first element of wt and the highest index
> >+     selects the last element of ws.  We therefore swap the operands here.
> */
> >+  emit_insn (gen_msa_vshf<mode> (operands[0], operands[3], operands[2],
> >+				 operands[1]));
> >+  DONE;
> >+})
> 
> Can you make this the real instruction instead of msa_vshf and give it a
> proper pattern (vec_select, vec_concat) etc. Swap the builtin to target
> this pattern and swap the operands for the builtin expansion in C code
> like you have done for some other patterns.

Done.

> 
> >+(define_expand "neg<mode>2"
> >+  [(match_operand:IMSA 0 "register_operand")
> >+   (match_operand:IMSA 1 "register_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  rtx reg = gen_reg_rtx (<MODE>mode);
> >+  emit_insn (gen_msa_ldi<mode> (reg, const0_rtx));
> >+  emit_insn (gen_sub<mode>3 (operands[0], reg, operands[1]));
> >+  DONE;
> >+})
> >+
> >+(define_expand "neg<mode>2"
> >+  [(match_operand:FMSA 0 "register_operand")
> >+   (match_operand:FMSA 1 "register_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  rtx reg = gen_reg_rtx (<MODE>mode);
> >+  emit_move_insn (reg, CONST0_RTX (<MODE>mode));
> >+  emit_insn (gen_sub<mode>3 (operands[0], reg, operands[1]));
> >+  DONE;
> >+})
> 
> Can't these two collapse into one like this?
> 
> (define_expand "neg<mode>2"
>   [(set (match_operand:MSA 0 "register_operand")
>         (minus:MSA (match_dup 2)
>                    (match_operand:MSA 1 "register_operand")))]
>   "ISA_HAS_MSA"
> {
>   operands[2] = CONST0_RTX (<MODE>mode);
> })
> 
> I'd hope the const_vector then gets emitted as an LDI? I haven't
> checked that there is a pattern for using LDI for FP const_vector
> moves.

Done with minor changes. The above is close but we still need
the CONST0_RTX to be placed in a register as we won't match the sub
pattern.

> 
> >+(define_expand "msa_ldi<mode>"
> >+  [(match_operand:IMSA 0 "register_operand")
> >+   (match_operand 1 "const_imm10_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  unsigned n_elts = GET_MODE_NUNITS (<MODE>mode);
> >+  rtvec v = rtvec_alloc (n_elts);
> >+  HOST_WIDE_INT val = INTVAL (operands[1]);
> >+  unsigned int i;
> >+
> >+  if (<MODE>mode != V16QImode)
> >+    {
> >+      unsigned shift = HOST_BITS_PER_WIDE_INT - 10;
> >+      val = trunc_int_for_mode ((val << shift) >> shift, <UNITMODE>mode);
> >+    }
> >+  else
> >+    val = trunc_int_for_mode (val, <UNITMODE>mode);
> >+
> >+  for (i = 0; i < n_elts; i++)
> >+    RTVEC_ELT (v, i) = GEN_INT (val);
> >+  emit_move_insn (operands[0],
> >+		  gen_rtx_CONST_VECTOR (<MODE>mode, v));
> >+  DONE;
> >+})
> 
> This is really weird. We shouldn't be simply discarding bits that don't fit.
> This needs to accept all immediates and generate the correct code to
> get a replicated constant of that value into a register. I think it is
> probably OK to trunc_int_for_mode on the original 'val' for the
> <UNIT>mode but anything out of range for V*HI/SI/DI needs to be expanded
> properly.
> 
> Please do not gen_msa_ldi anywhere other than from MSA builtins. There is
> no need just emit a move directly.

AFAICS, the truncation for everything except V16QImode is not needed
since we have the predicate here. Truncating the immediate for bytes may make
life easier for users when debugging. Although the extra bits are ignored by
the hardware, it doesn't stop us from encoding numbers out of range.
The RTL doesn't seem to have validation of ranges of constants and modes.
I did a small test and could output any number within the allowable range
in the predicate.

> 
> >+(define_insn "msa_vshf<mode>"
> >+  [(set (match_operand:MSA 0 "register_operand" "=f")
> >+	(unspec:MSA [(match_operand:<VIMODE> 1 "register_operand" "0")
> >+		     (match_operand:MSA 2 "register_operand" "f")
> >+		     (match_operand:MSA 3 "register_operand" "f")]
> >+		    UNSPEC_MSA_VSHF))]
> >+  "ISA_HAS_MSA"
> >+  "vshf.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_sld")
> >+   (set_attr "mode" "<MODE>")])
> 
> Delete this and switch to using vec_perm directly instead for the builtin.

Done.

> 
> >+;; 128bit MSA modes only in msa registers or memory.  An exception is
> allowing
> 
> 128-bit MSA modes can only exist in MSA registers or memory. ...

Done.

> 
> >+;; Offset load
> >+(define_expand "msa_ld_<msafmt_f>"
> >+  [(match_operand:MSA 0 "register_operand")
> >+   (match_operand 1 "pmode_register_operand")
> >+   (match_operand 2 "aq10<msafmt>_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1],
> >+				      INTVAL (operands[2]));
> >+  mips_emit_move (operands[0], gen_rtx_MEM (<MODE>mode, addr));
> >+  DONE;
> >+})
> >+
> >+;; Offset store
> >+(define_expand "msa_st_<msafmt_f>"
> >+  [(match_operand:MSA 0 "register_operand")
> >+   (match_operand 1 "pmode_register_operand")
> >+   (match_operand 2 "aq10<msafmt>_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1],
> >+			    INTVAL (operands[2]));
> >+  mips_emit_move (gen_rtx_MEM (<MODE>mode, addr), operands[0]);
> >+  DONE;
> >+})
> 
> There's no real need to expand these in C code. The patterns can be used
> to create the RTL. As an aside, I don't really see the point in intrinsics
> to load and store data the same thing can be done from straight C.
> The patterns also can't be used for const or volatile data as their
> builtin prototypes are neither. I suspect they should at least support
> pointers to const data.

I was going to remove the expansion in C code but the problem is that
we need the Pmode. I could duplicate the patterns and override the icode
e.g for TARGET_64BIT when expanding the built-ins but this doesn't look
cleaner. Unless I'm missing something here?
Since the intrinsics have been present for a while it would probably confuse
if we removed them now. Indeed, the intrinsics are not really needed as we
can do loads and stores from C. I changed the type in the prototype
to CVPOINTER so qualifiers like const/volatile are not discarded.

> 
> >+;; Integer operations
> >+(define_insn "add<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f,f,f")
> >+	(plus:IMSA
> >+	  (match_operand:IMSA 1 "register_operand" "f,f,f")
> >+	  (match_operand:IMSA 2 "reg_or_vector_same_ximm5_operand"
> "f,Unv5,Uuv5")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  switch (which_alternative)
> >+    {
> >+    case 0:
> >+      return "addv.<msafmt>\t%w0,%w1,%w2";
> >+    case 1:
> >+      {
> >+	HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
> >+
> >+	operands[2] = GEN_INT (-val);
> >+	return "subvi.<msafmt>\t%w0,%w1,%d2";
> >+      }
> >+    case 2:
> >+      {
> >+	HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
> >+
> >+	operands[2] = GEN_INT (val);
> >+	return "addvi.<msafmt>\t%w0,%w1,%d2";
> 
> This can use operands[2] unchanged, just use %E2 instead.

Changed.

> 
> >+      }
> >+    default:
> >+      gcc_unreachable ();
> >+    }
> >+}
> >+  [(set_attr "alu_type" "simd_add")
> >+   (set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "sub<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f,f,f")
> >+	(minus:IMSA
> >+	  (match_operand:IMSA 1 "register_operand" "f,f,f")
> >+	  (match_operand:IMSA 2 "reg_or_vector_same_ximm5_operand"
> "f,Unv5,Uuv5")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  switch (which_alternative)
> >+    {
> >+    case 0:
> >+      return "subv.<msafmt>\t%w0,%w1,%w2";
> >+    case 1:
> >+      {
> >+	HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
> >+
> >+	operands[2] = GEN_INT (-val);
> >+	return "addvi.<msafmt>\t%w0,%w1,%d2";
> >+      }
> >+    case 2:
> >+      {
> >+	HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
> >+
> >+	operands[2] = GEN_INT (val);
> >+	return "subvi.<msafmt>\t%w0,%w1,%d2";
> >+      }
> >+    default:
> >+      gcc_unreachable ();
> >+    }
> >+}
> >+  [(set_attr "alu_type" "simd_add")
> >+   (set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> I don't believe we need to handle constants for the sub<mode>3 pattern if
> we have them covered by add<mode>3. I can't see any canonicalisation rule
> in rtl.texi to say this though but simplify_plus_minus seems to show this
> to be true.

If this is the case then we don't need the alternatives. Removed.

> 
> >+(define_insn "xorv16qi3"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f,f")
> >+	(xor:V16QI
> >+	  (match_operand:V16QI 1 "register_operand" "f,f")
> >+	  (match_operand:V16QI 2 "reg_or_vector_same_byte_operand" "f,Ubv8")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (which_alternative == 1)
> >+    {
> >+      operands[2] = CONST_VECTOR_ELT (operands[2], 0);
> >+      return "xori.b\t%w0,%w1,%B2";
> 
> I don't think you need to use %B2 it is already validated as a replicated
> constant vector and in range so %E should be OK? This also means that you
> don't need to do the output patterns in C code and can just use

Changed.

> "@
>  alt0
>  alt1"
> 
> xori.b seems like it could be useful for other modes too when the immediate
> has replicated bit patterns across all bytes of the element size.

Done. As a result, I removed a set of predicates since they appeared to be
redundant and added a new constraint that checks if all bytes and all the elements
are the same. 

> 
> >+    }
> >+  else
> >+    return "xor.v\t%w0,%w1,%w2";
> >+}
> >+  [(set_attr "type" "simd_logic")
> >+   (set_attr "mode" "TI,V16QI")])
> >+
> >+(define_insn "xor<mode>3"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f,f")
> >+	(xor:IMSA_DWH
> >+	  (match_operand:IMSA_DWH 1 "register_operand" "f,f")
> >+	  (match_operand:IMSA_DWH 2 "reg_or_vector_same_<mode>_set_operand"
> "f,YC")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (which_alternative == 1)
> >+    {
> >+      HOST_WIDE_INT val = INTVAL (CONST_VECTOR_ELT (operands[2], 0));
> >+      int vlog2 = exact_log2 (val);
> >+      gcc_assert (vlog2 != -1);
> >+      operands[2] = GEN_INT (vlog2);
> >+      return "bnegi.%v0\t%w0,%w1,%2";
> 
> If this pattern occurs frequently then this should switch to a formatter and be
> handled in print_operand.

Added 'V' operand.

The helper functions for constraints YC, YZ had bugs when checking if a bit
is set/cleared i.e. incorrect handling of MSB bit, GET_MODE_UNIT_SIZE
used instead of GET_MODE_UNIT_BITSIZE. Fix included.

> 
> >+    }
> >+  else
> >+    return "xor.v\t%w0,%w1,%w2";
> >+}
> >+  [(set_attr "type" "simd_logic,simd_bit")
> >+   (set_attr "mode" "TI,<MODE>")])
> >+
> >+(define_insn "iorv16qi3"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f,f")
> >+	(ior:V16QI
> >+	  (match_operand:V16QI 1 "register_operand" "f,f")
> >+	  (match_operand:V16QI 2 "reg_or_vector_same_byte_operand" "f,Ubv8")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (which_alternative == 1)
> >+    {
> >+      operands[2] = CONST_VECTOR_ELT (operands[2], 0);
> >+      return "ori.b\t%w0,%w1,%B2";
> 
> Switch to %E and "@ output pattern syntax.

Since it the pattern was combined with "ior<mode>3" and reworked, 'B' was
repurposed to handle replicated bytes.

> 
> >+    }
> >+  else
> >+    return "or.v\t%w0,%w1,%w2";
> >+}
> >+  [(set_attr "type" "simd_logic")
> >+   (set_attr "mode" "TI,V16QI")])
> >+
> >+(define_insn "andv16qi3"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f,f")
> >+	(and:V16QI
> >+	  (match_operand:V16QI 1 "register_operand" "f,f")
> >+	  (match_operand:V16QI 2 "reg_or_vector_same_byte_operand" "f,Ubv8")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (which_alternative == 1)
> >+    {
> >+      operands[2] = CONST_VECTOR_ELT (operands[2], 0);
> >+      return "andi.b\t%w0,%w1,%B2";
> 
> Likewise.

As above but had to keep C code to flip bits for bit clearing insn.

> 
> >+    }
> >+  else
> >+    return "and.v\t%w0,%w1,%w2";
> >+}
> >+  [(set_attr "type" "simd_logic")
> >+   (set_attr "mode" "TI,V16QI")])
> >+
> >+(define_insn "vlshr<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f,f")
> >+	(lshiftrt:IMSA
> >+	  (match_operand:IMSA 1 "register_operand" "f,f")
> >+	  (match_operand:IMSA 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (which_alternative == 0)
> >+    return "srl.<msafmt>\t%w0,%w1,%w2";
> >+
> >+  operands[2] = GEN_INT (INTVAL (CONST_VECTOR_ELT (operands[2], 0))
> >+			 & <shift_mask>);
> 
> Is this here because this pattern is used to expand builtins? It would be
> nice if something other than the output pattern had cleaned this up already
> and issued a warning about shift overflow. Does this match what happens when
> shifting by wider than a scalar type.

It appears to be pointless to have this.  There is an error for out-of-range
values and an overflow/truncate warning if the immediate is larger than a byte.
Switched to "@ output pattern with shift_mask mode attribute removed.

> 
> >+  return "srli.<msafmt>\t%w0,%w1,%2";
> >+}
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "vashr<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f,f")
> >+	(ashiftrt:IMSA
> >+	  (match_operand:IMSA 1 "register_operand" "f,f")
> >+	  (match_operand:IMSA 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (which_alternative == 0)
> >+    return "sra.<msafmt>\t%w0,%w1,%w2";
> >+
> >+  operands[2] = GEN_INT (INTVAL (CONST_VECTOR_ELT (operands[2], 0))
> >+			 & <shift_mask>);
> 
> likewise.

As above.

> 
> >+  return "srai.<msafmt>\t%w0,%w1,%2";
> >+}
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "vashl<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f,f")
> >+	(ashift:IMSA
> >+	  (match_operand:IMSA 1 "register_operand" "f,f")
> >+	  (match_operand:IMSA 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (which_alternative == 0)
> >+    return "sll.<msafmt>\t%w0,%w1,%w2";
> >+
> >+  operands[2] = GEN_INT (INTVAL (CONST_VECTOR_ELT (operands[2], 0))
> >+			 & <shift_mask>);
> 
> likewise.

As above.

> 
> >+  return "slli.<msafmt>\t%w0,%w1,%2";
> >+}
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> >+
> 
> ...
> 
> >+(define_insn "msa_fmadd_<msafmt>"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(plus:FMSA (mult:FMSA (match_operand:FMSA 2 "register_operand" "f")
> >+			      (match_operand:FMSA 3 "register_operand" "f"))
> >+		   (match_operand:FMSA 1 "register_operand" "0")))]
> >+  "ISA_HAS_MSA"
> >+  "fmadd.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_fmadd")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_fmsub_<msafmt>"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(minus:FMSA (match_operand:FMSA 1 "register_operand" "0")
> >+		    (mult:FMSA (match_operand:FMSA 2 "register_operand" "f")
> >+			       (match_operand:FMSA 3 "register_operand" "f"))))]
> >+  "ISA_HAS_MSA"
> >+  "fmsub.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_fmadd")
> >+   (set_attr "mode" "<MODE>")])
> 
> These need to be usable from the fma*4 pattern(s) as well.

Changed to use SPN.

> 
> >+;; Built-in functions
> >+(define_insn "msa_add_a_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(plus:IMSA (abs:IMSA (match_operand:IMSA 1 "register_operand" "f"))
> >+		   (abs:IMSA (match_operand:IMSA 2 "register_operand" "f"))))]
> >+  "ISA_HAS_MSA"
> >+  "add_a.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_adds_a_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(ss_plus:IMSA
> >+	  (abs:IMSA (match_operand:IMSA 1 "register_operand" "f"))
> >+	  (abs:IMSA (match_operand:IMSA 2 "register_operand" "f"))))]
> >+  "ISA_HAS_MSA"
> >+  "adds_a.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> This can be follow on work but we should be including fixed point vector
> types for these I think. This applies to the following patterns
> too.

OK.
> 
> >+
> >+(define_insn "ssadd<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(ss_plus:IMSA (match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "adds_s.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "usadd<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(us_plus:IMSA (match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "adds_u.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> 
> >+(define_expand "msa_addvi_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand")
> >+	(plus:IMSA (match_operand:IMSA 1 "register_operand")
> >+		   (match_operand 2 "const_uimm5_operand")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  unsigned n_elts = GET_MODE_NUNITS (<MODE>mode);
> >+  rtvec v = rtvec_alloc (n_elts);
> >+  HOST_WIDE_INT val = INTVAL (operands[2]);
> >+  unsigned int i;
> >+
> >+  for (i = 0; i < n_elts; i++)
> >+    RTVEC_ELT (v, i) = GEN_INT (val);
> >+
> >+  emit_insn (gen_msa_addvi_<msafmt>_insn (operands[0], operands[1],
> >+					  gen_rtx_CONST_VECTOR (<MODE>mode, v)));
> >+  DONE;
> >+})
> 
> This pattern is too common. Please expand this in the builtin expand
> C code.

Done.

> 
> >+
> >+(define_expand "msa_andi_b"
> >+  [(set (match_operand:V16QI 0 "register_operand")
> >+	(and:V16QI (match_operand:V16QI 1 "register_operand")
> >+		   (match_operand:QI 2 "const_uimm8_operand")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  rtvec v = rtvec_alloc (16);
> >+  HOST_WIDE_INT val = INTVAL (operands[2]);
> >+  unsigned int i;
> >+
> >+  for (i = 0; i < 16; i++)
> >+    RTVEC_ELT (v, i) = GEN_INT (val);
> >+
> >+  emit_insn (gen_msa_andi_b_insn (operands[0], operands[1],
> >+				  gen_rtx_CONST_VECTOR (V16QImode, v)));
> >+  DONE;
> >+})
> 
> likewise.

Removed as it is now handled by and<mode>3 pattern.

> 
> >+
> >+(define_insn "msa_addvi_<msafmt>_insn"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(plus:IMSA
> >+	  (match_operand:IMSA 1 "register_operand" "f")
> >+	  (match_operand:IMSA 2 "const_vector_same_uimm5_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "addvi.<msafmt>\t%w0,%w1,%E2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> Why isn't this pattern part of the general msa add pattern which should
> support both constants and registers as operand 2? It is also missing
> a constraint for the immediate.

Removed.

> 
> >+
> >+(define_insn "msa_andi_b_insn"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f")
> >+	(and:V16QI
> >+	  (match_operand:V16QI 1 "register_operand" "f")
> >+	  (match_operand:V16QI 2 "const_vector_same_uimm8_operand" "")))]
> 
> Likewise.

Same here. 

> 
> >+  "ISA_HAS_MSA"
> >+  "andi.b\t%w0,%w1,%E2"
> >+  [(set_attr "type" "simd_logic")
> >+   (set_attr "mode" "V16QI")])
> >+
> 
> END USEFUL commnets
> 
> >+(define_insn "msa_bclr_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_BCLR))]
> >+  "ISA_HAS_MSA"
> >+  "bclr.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_bit")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_bclri_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_BCLRI))]
> >+  "ISA_HAS_MSA"
> >+  "bclri.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_bit")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_binsl_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> >+		      (match_operand:IMSA 2 "register_operand" "f")
> >+		      (match_operand:IMSA 3 "register_operand" "f")]
> >+		     UNSPEC_MSA_BINSL))]
> >+  "ISA_HAS_MSA"
> >+  "binsl.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_bitins")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_binsli_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> >+		      (match_operand:IMSA 2 "register_operand" "f")
> >+		      (match_operand 3 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_BINSLI))]
> >+  "ISA_HAS_MSA"
> >+  "binsli.<msafmt>\t%w0,%w2,%3"
> >+  [(set_attr "type" "simd_bitins")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_binsr_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> >+		      (match_operand:IMSA 2 "register_operand" "f")
> >+		      (match_operand:IMSA 3 "register_operand" "f")]
> >+		     UNSPEC_MSA_BINSR))]
> >+  "ISA_HAS_MSA"
> >+  "binsr.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_bitins")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_binsri_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> >+		      (match_operand:IMSA 2 "register_operand" "f")
> >+		      (match_operand 3 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_BINSRI))]
> >+  "ISA_HAS_MSA"
> >+  "binsri.<msafmt>\t%w0,%w2,%3"
> >+  [(set_attr "type" "simd_bitins")
> >+   (set_attr "mode" "<MODE>")])
> 
> As a follow up these instructions may be represenatble with standard
> RTL.

The "insvm" appears to be a good candidate but I'm not sure if this
is going to work out of the box for vector modes. This would apply
only for instructions with a replicated immediate.

> 
> >+(define_insn "msa_bmnz_v_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> >+		      (match_operand:IMSA 2 "register_operand" "f")
> >+		      (match_operand:IMSA 3 "register_operand" "f")]
> >+		     UNSPEC_MSA_BMNZ_V))]
> >+  "ISA_HAS_MSA"
> >+  "bmnz.v\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_bitmov")
> >+   (set_attr "mode" "TI")])
> 
> This is representable in RTL. AND, IOR, NOT etc.

Done.

> 
> >+
> >+(define_insn "msa_bmnzi_b"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f")
> >+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> >+		       (match_operand:V16QI 2 "register_operand" "f")
> >+		       (match_operand 3 "const_uimm8_operand" "")]
> >+		      UNSPEC_MSA_BMNZI_B))]
> >+  "ISA_HAS_MSA"
> >+  "bmnzi.b\t%w0,%w2,%3"
> >+  [(set_attr "type" "simd_bitmov")
> >+   (set_attr "mode" "V16QI")])
> >+
> >+(define_insn "msa_bmz_v_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> >+		      (match_operand:IMSA 2 "register_operand" "f")
> >+		      (match_operand:IMSA 3 "register_operand" "f")]
> >+		     UNSPEC_MSA_BMZ_V))]
> >+  "ISA_HAS_MSA"
> >+  "bmz.v\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_bitmov")
> >+   (set_attr "mode" "TI")])
> >+
> >+(define_insn "msa_bmzi_b"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f")
> >+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> >+		       (match_operand:V16QI 2 "register_operand" "f")
> >+		       (match_operand 3 "const_uimm8_operand" "")]
> >+		      UNSPEC_MSA_BMZI_B))]
> >+  "ISA_HAS_MSA"
> >+  "bmzi.b\t%w0,%w2,%3"
> >+  [(set_attr "type" "simd_bitmov")
> >+   (set_attr "mode" "V16QI")])
> 
> Likewise to here.

Done as well.

> 
> >+(define_insn "msa_bneg_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_BNEG))]
> >+  "ISA_HAS_MSA"
> >+  "bneg.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_bit")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_bnegi_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		       (match_operand 2 "const_msa_branch_operand" "")]
> >+		     UNSPEC_MSA_BNEGI))]
> >+  "ISA_HAS_MSA"
> >+  "bnegi.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_bit")
> >+   (set_attr "mode" "<MODE>")])
> 
> Can't see any way to represent these at the moment but like all the
> unspecs they should be revisited to see if they can be targetted.

I was going to remove bseti/bclri/bnegi as this is the same as or/and-not/xor
operation with a mask with a single bit but realized that it would
be impossible to distinguish the built-ins using the same insn code.
I'll leave them for the time being.

> 
> >+
> >+(define_insn "msa_bsel_v_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> >+		      (match_operand:IMSA 2 "register_operand" "f")
> >+		      (match_operand:IMSA 3 "register_operand" "f")]
> >+		     UNSPEC_MSA_BSEL_V))]
> >+  "ISA_HAS_MSA"
> >+  "bsel.v\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_bitmov")
> >+   (set_attr "mode" "TI")])
> 
> This can also be standard RTL.

Done.

> 
> >+
> >+(define_insn "msa_bseli_b"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f")
> >+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> >+		       (match_operand:V16QI 2 "register_operand" "f")
> >+		       (match_operand 3 "const_uimm8_operand" "")]
> >+		      UNSPEC_MSA_BSELI_B))]
> >+  "ISA_HAS_MSA"
> >+  "bseli.b\t%w0,%w2,%3"
> >+  [(set_attr "type" "simd_bitmov")
> >+   (set_attr "mode" "V16QI")])
> 
> Likewise.

Done.

> 
> >+(define_insn "msa_bset_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_BSET))]
> >+  "ISA_HAS_MSA"
> >+  "bset.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_bit")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_bseti_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_BSETI))]
> >+  "ISA_HAS_MSA"
> >+  "bseti.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_bit")
> >+   (set_attr "mode" "<MODE>")])
> 
> Can't see how to do these in standard RTL for now.
> 
> For the builtins that accept immediates what happens when a user
> provides an out-of-range constant? It would be good to have
> warnings rather than ICEs.

For the above we get a generic "invalid argument to built-in function" error.
For other builtins with immediates we should get at least an overflow warning.

> 
> >+(define_code_iterator ICC [eq le leu lt ltu])
> >+
> >+(define_code_attr icc
> >+    [(eq  "eq")
> >+     (le  "le_s")
> >+     (leu "le_u")
> >+     (lt  "lt_s")
> >+     (ltu "lt_u")])
> >+
> >+(define_code_attr icci
> >+    [(eq  "eqi")
> >+     (le  "lei_s")
> >+     (leu "lei_u")
> >+     (lt  "lti_s")
> >+     (ltu "lti_u")])
> >+
> >+(define_code_attr cmpi
> >+    [(eq   "s")
> >+     (le   "s")
> >+     (leu  "u")
> >+     (lt   "s")
> >+     (ltu  "u")])
> >+
> >+(define_insn "msa_c<ICC:icc>_<IMSA:msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(ICC:IMSA (match_operand:IMSA 1 "register_operand" "f")
> >+		  (match_operand:IMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "c<ICC:icc>.<IMSA:msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_c<ICC:icci>i_<IMSA:msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(ICC:IMSA
> >+	  (match_operand:IMSA 1 "register_operand" "f")
> >+	  (match_operand:IMSA 2 "const_vector_same_cmp<ICC:cmpi>imm4_operand"
> "")))]
> >+  "ISA_HAS_MSA"
> >+  "c<ICC:icci>.<IMSA:msafmt>\t%w0,%w1,%E2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> Can't these be combined together with two alternatives?

Done with some additional changes: a new constraint, predicates
and constant vector generated from an immediate for the builtins.
> 
> >+
> >+(define_insn "msa_c<ICC:icci>_<IMSA:msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(ICC:IMSA (match_operand:IMSA 1 "register_operand" "f")
> >+				(match_operand 2 "const_imm5_operand" ""))]
> >+		     UNSPEC_MSA_CMPI))]
> >+  "ISA_HAS_MSA"
> >+  "c<ICC:icci>.<IMSA:msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> Do we need this separate instruction? It should be expanded to have a
> replicated constant vector for operand 2 and just use the pattern above.

AFAICS, we don't need this instruction and again was likely added 
for the convenience of mapping to builtins. This insn is now removed. 

> 
> >+(define_insn "msa_dotp_s_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
> >+			 UNSPEC_MSA_DOTP_S))]
> >+  "ISA_HAS_MSA"
> >+  "dotp_s.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> 
> We 'could' do this in standard RTL. Whether it would ever match is
> questionable.

Changed.

> 
> >+(define_insn "msa_dotp_u_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
> >+			 UNSPEC_MSA_DOTP_U))]
> >+  "ISA_HAS_MSA"
> >+  "dotp_u.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_dpadd_s_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:IMSA_DWH 1 "register_operand" "0")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 3 "register_operand" "f")]
> >+			 UNSPEC_MSA_DPADD_S))]
> >+  "ISA_HAS_MSA"
> >+  "dpadd_s.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_dpadd_u_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:IMSA_DWH 1 "register_operand" "0")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 3 "register_operand" "f")]
> >+			 UNSPEC_MSA_DPADD_U))]
> >+  "ISA_HAS_MSA"
> >+  "dpadd_u.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_dpsub_s_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:IMSA_DWH 1 "register_operand" "0")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 3 "register_operand" "f")]
> >+			 UNSPEC_MSA_DPSUB_S))]
> >+  "ISA_HAS_MSA"
> >+  "dpsub_s.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_dpsub_u_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:IMSA_DWH 1 "register_operand" "0")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 3 "register_operand" "f")]
> >+			 UNSPEC_MSA_DPSUB_U))]
> >+  "ISA_HAS_MSA"
> >+  "dpsub_u.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise to here.

Done.
> 
> >+
> >+(define_insn "msa_fclass_<msafmt>"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")]
> >+			 UNSPEC_MSA_FCLASS))]
> >+  "ISA_HAS_MSA"
> >+  "fclass.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_fclass")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_fcaf_<msafmt>"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")
> >+			  (match_operand:FMSA 2 "register_operand" "f")]
> >+			 UNSPEC_MSA_FCAF))]
> >+  "ISA_HAS_MSA"
> >+  "fcaf.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fcmp")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_fcune_<FMSA:msafmt>"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")
> >+			  (match_operand:FMSA 2 "register_operand" "f")]
> >+			 UNSPEC_MSA_FCUNE))]
> >+  "ISA_HAS_MSA"
> >+  "fcune.<FMSA:msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fcmp")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_code_iterator FCC [unordered ordered eq ne le lt uneq unle unlt])
> >+
> >+(define_code_attr fcc
> >+    [(unordered "fcun")
> >+     (ordered   "fcor")
> >+     (eq        "fceq")
> >+     (ne        "fcne")
> >+     (uneq      "fcueq")
> >+     (unle      "fcule")
> >+     (unlt      "fcult")
> >+     (le        "fcle")
> >+     (lt        "fclt")])
> >+
> >+(define_int_iterator FSC_UNS [UNSPEC_MSA_FSAF UNSPEC_MSA_FSUN UNSPEC_MSA_FSOR
> >+			      UNSPEC_MSA_FSEQ UNSPEC_MSA_FSNE UNSPEC_MSA_FSUEQ
> >+			      UNSPEC_MSA_FSUNE UNSPEC_MSA_FSULE UNSPEC_MSA_FSULT
> >+			      UNSPEC_MSA_FSLE UNSPEC_MSA_FSLT])
> >+
> >+(define_int_attr fsc
> >+    [(UNSPEC_MSA_FSAF  "fsaf")
> >+     (UNSPEC_MSA_FSUN  "fsun")
> >+     (UNSPEC_MSA_FSOR  "fsor")
> >+     (UNSPEC_MSA_FSEQ  "fseq")
> >+     (UNSPEC_MSA_FSNE  "fsne")
> >+     (UNSPEC_MSA_FSUEQ "fsueq")
> >+     (UNSPEC_MSA_FSUNE "fsune")
> >+     (UNSPEC_MSA_FSULE "fsule")
> >+     (UNSPEC_MSA_FSULT "fsult")
> >+     (UNSPEC_MSA_FSLE  "fsle")
> >+     (UNSPEC_MSA_FSLT  "fslt")])
> >+
> >+(define_insn "msa_<FCC:fcc>_<FMSA:msafmt>"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(FCC:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")
> >+		      (match_operand:FMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "<FCC:fcc>.<FMSA:msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fcmp")
> >+   (set_attr "mode" "<MODE>")])
> 
> Can't the msa builtins target these patterns directly? Follow on work should
> implement "mov<mode>cc" for vector modes.
> 
> >+
> >+(define_insn "msa_<fsc>_<FMSA:msafmt>"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")
> >+			   (match_operand:FMSA 2 "register_operand" "f")]
> >+			 FSC_UNS))]
> >+  "ISA_HAS_MSA"
> >+  "<fsc>.<FMSA:msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fcmp")
> >+   (set_attr "mode" "<MODE>")])
> 
> i.e. this is not necessary.

The difference between this and above is the signalling vs quiet NaNs floating
point comparison instructions. The quiet NaNs are represented as standard RTL and
signalling as UNSPEC. "mov<mode>cc" for vector modes is questionable as "vcondmn"
is meant to be used for conditional vector moves unless I misunderstood something.

> 
> >+(define_mode_attr FINT
> >+   [(V4SF "V4SI")
> >+    (V2DF "V2DI")])
> >+
> >+(define_mode_attr fint
> >+   [(V4SF "v4si")
> >+    (V2DF "v2di")])
> >+
> >+(define_mode_attr FQ
> >+   [(V4SF "V8HI")
> >+    (V2DF "V4SI")])
> >+
> >+(define_mode_attr FINTCNV
> >+  [(V4SF "I2S")
> >+   (V2DF "I2D")])
> >+
> >+(define_mode_attr FINTCNV_2
> >+  [(V4SF "S2I")
> >+   (V2DF "D2I")])
> >+
> >+(define_insn "float<fint><FMSA:mode>2"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(float:FMSA (match_operand:<FINT> 1 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "ffint_s.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "cnv_mode" "<FINTCNV>")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "floatuns<fint><FMSA:mode>2"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(unsigned_float:FMSA (match_operand:<FINT> 1 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "ffint_u.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "cnv_mode" "<FINTCNV>")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_mode_attr FFQ
> >+  [(V4SF "V8HI")
> >+   (V2DF "V4SI")])
> >+
> >+(define_insn "msa_ffql_<msafmt>"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(unspec:FMSA [(match_operand:<FQ> 1 "register_operand" "f")]
> >+		     UNSPEC_MSA_FFQL))]
> >+  "ISA_HAS_MSA"
> >+  "ffql.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "cnv_mode" "<FINTCNV>")
> >+   (set_attr "mode" "<MODE>")])
> 
> There are fixed point vector modes in GCC. Perhaps improve fixed point support
> in follow on work.

Ok.
> 
> >+(define_insn "msa_ffqr_<msafmt>"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(unspec:FMSA [(match_operand:<FQ> 1 "register_operand" "f")]
> >+		     UNSPEC_MSA_FFQR))]
> >+  "ISA_HAS_MSA"
> >+  "ffqr.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "cnv_mode" "<FINTCNV>")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise.
> 
> >+
> >+;; Note used directly by builtins but via the following define_expand.
> >+(define_insn "msa_fill_<msafmt>_insn"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(vec_duplicate:IMSA
> >+	  (match_operand:<UNITMODE> 1 "reg_or_0_operand" "dJ")))]
> >+  "ISA_HAS_MSA"
> >+  "fill.<msafmt>\t%w0,%z1"
> >+  [(set_attr "type" "simd_fill")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+;; Expand builtin for HImode and QImode which takes SImode.
> >+(define_expand "msa_fill_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand")
> >+	(vec_duplicate:IMSA
> >+	  (match_operand:<RES> 1 "reg_or_0_operand")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if ((GET_MODE_SIZE (<UNITMODE>mode) < GET_MODE_SIZE (<RES>mode))
> >+      && (REG_P (operands[1]) || (GET_CODE (operands[1]) == SUBREG
> >+				  && REG_P (SUBREG_REG (operands[1])))))
> >+    operands[1] = lowpart_subreg (<UNITMODE>mode, operands[1], <RES>mode);
> >+  emit_insn (gen_msa_fill_<msafmt>_insn (operands[0], operands[1]));
> >+  DONE;
> >+})
> 
> Let's not do this. Just change the builtin prototypes instead and have
> them match naturally.

Done.
> 
> >+(define_insn "msa_fill_<msafmt_f>"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(vec_duplicate:FMSA
> >+	  (match_operand:<UNITMODE> 1 "reg_or_0_operand" "dJ")))]
> >+  "ISA_HAS_MSA"
> >+  "fill.<msafmt>\t%w0,%z1"
> >+  [(set_attr "type" "simd_fill")
> >+   (set_attr "mode" "<MODE>")])
> 
> Separate out the zero alternative and use LDI. I'm not certain but I
> think we should be emitting '#' for the V2DI and V2DF mode non-zero
> cases, for safety if nothing else.

It's probably more consistent to have the explicit split. Changed.

> 
> >+;; Note that fill.d and fill.d_f will be split later if !TARGET_64BIT.
> >+(define_split
> >+  [(set (match_operand:V2DI 0 "register_operand")
> >+	(vec_duplicate:V2DI
> >+	  (match_operand:DI 1 "reg_or_0_operand")))]
> >+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> >+  [(const_int 0)]
> >+{
> >+  mips_split_msa_fill_d (operands[0], operands[1]);
> >+  DONE;
> >+})
> 
> The 'or_0' should be unnecessary here.

Indeed. Updated.
I also realized that TARGET_MSA is used inconsistently
and changed a few cases to ISA_HAS_MSA.

> 
> >+(define_insn "smax<mode>3"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(smax:FMSA (match_operand:FMSA 1 "register_operand" "f")
> >+		   (match_operand:FMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "fmax.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fminmax")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "umax<mode>3"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(umax:FMSA (match_operand:FMSA 1 "register_operand" "f")
> >+		   (match_operand:FMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "fmax_a.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fminmax")
> >+   (set_attr "mode" "<MODE>")])
> 
> Are you sure the semantics of fmax_a and umax match? I.e. is umax
> supposed to use the magitude of an FP value and ignore the sign
> bit. It seems a bit odd to me.

Hmm. This is plain wrong and not sure how this got here.
I changed it to standard RTL using if_then_else. I'm not sure if
we can ever match it though. As it stands the auto-vectorizer
does not seem to pick this up (for integers as well) using a simple
loop with abs(es) and a ternary operator. Further investigation
and improvements will follow.

> 
> >+(define_insn "smin<mode>3"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(smin:FMSA (match_operand:FMSA 1 "register_operand" "f")
> >+		   (match_operand:FMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "fmin.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fminmax")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "umin<mode>3"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(umin:FMSA (match_operand:FMSA 1 "register_operand" "f")
> >+		   (match_operand:FMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "fmin_a.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fminmax")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise.

As above.

> 
> >+
> >+(define_insn "msa_frcp_<msafmt>"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(unspec:FMSA [(match_operand:FMSA 1 "register_operand" "f")]
> >+		     UNSPEC_MSA_FRCP))]
> >+  "ISA_HAS_MSA"
> >+  "frcp.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_fdiv")
> >+   (set_attr "mode" "<MODE>")])
> 
> This can be standard RTL with unsafe math opts I believe. Can be left
> as a follow on.

Will do.

> 
> >+(define_insn "msa_frsqrt_<msafmt>"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(unspec:FMSA [(match_operand:FMSA 1 "register_operand" "f")]
> >+		     UNSPEC_MSA_FRSQRT))]
> >+  "ISA_HAS_MSA"
> >+  "frsqrt.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_fdiv")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise.
> 
> >+(define_insn "msa_ftq_h"
> >+  [(set (match_operand:V8HI 0 "register_operand" "=f")
> >+	(unspec:V8HI [(match_operand:V4SF 1 "register_operand" "f")
> >+		      (match_operand:V4SF 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_FTQ))]
> >+  "ISA_HAS_MSA"
> >+  "ftq.h\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "cnv_mode" "S2I")
> >+   (set_attr "mode" "V4SF")])
> 
> These should be made into standard RTL as follow on work to add fixed
> point modes.
> 
> >+(define_insn "msa_ftq_w"
> >+  [(set (match_operand:V4SI 0 "register_operand" "=f")
> >+	(unspec:V4SI [(match_operand:V2DF 1 "register_operand" "f")
> >+		      (match_operand:V2DF 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_FTQ))]
> >+  "ISA_HAS_MSA"
> >+  "ftq.w\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "cnv_mode" "D2I")
> >+   (set_attr "mode" "V2DF")])
> 
> Likewise.

Will do as well.

> 
> >+
> >+(define_insn "msa_hadd_s_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
> >+			 UNSPEC_MSA_HADD_S))]
> >+  "ISA_HAS_MSA"
> >+  "hadd_s.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> This can be standard RTL.
> 
> >+
> >+(define_insn "msa_hadd_u_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
> >+			 UNSPEC_MSA_HADD_U))]
> >+  "ISA_HAS_MSA"
> >+  "hadd_u.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_hsub_s_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
> >+			 UNSPEC_MSA_HSUB_S))]
> >+  "ISA_HAS_MSA"
> >+  "hsub_s.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_hsub_u_<msafmt>"
> >+  [(set (match_operand:IMSA_DWH 0 "register_operand" "=f")
> >+	(unspec:IMSA_DWH [(match_operand:<VHMODE> 1 "register_operand" "f")
> >+			  (match_operand:<VHMODE> 2 "register_operand" "f")]
> >+			 UNSPEC_MSA_HSUB_U))]
> >+  "ISA_HAS_MSA"
> >+  "hsub_u.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise to here.

Updated. I had to add a new 'addsub' iterator in mips.md to simplify it.
> 
> I have not reviewed interleave patterns in detail as they all look OK and
> I trust they 'do the right thing'.
> 
> >+(define_insn "msa_madd_q_<msafmt>"
> >+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
> >+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "0")
> >+			 (match_operand:IMSA_WH 2 "register_operand" "f")
> >+			 (match_operand:IMSA_WH 3 "register_operand" "f")]
> >+			UNSPEC_MSA_MADD_Q))]
> >+  "ISA_HAS_MSA"
> >+  "madd_q.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_maddr_q_<msafmt>"
> >+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
> >+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "0")
> >+			 (match_operand:IMSA_WH 2 "register_operand" "f")
> >+			 (match_operand:IMSA_WH 3 "register_operand" "f")]
> >+			UNSPEC_MSA_MADDR_Q))]
> >+  "ISA_HAS_MSA"
> >+  "maddr_q.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> 
> Follow on work for fixed point.

Ok.
> 
> >+(define_insn "msa_max_a_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_MAX_A))]
> >+  "ISA_HAS_MSA"
> >+  "max_a.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> This could be used as part of mov<mode>cc support.

AFAUI, mov<mode>cc is used for non-vector modes and the support
should go into vcond[u]mn SPNs. However, for the above and floating-point
version I'm not sure if we can add it there as we don't have enough
information as it appears to be simplified to a basic comparison.

I haven't confirmed it yet but it would appear that the support
for these operations would have to be explicitly added to
the auto-vectorizer to recognize these patterns.

> 
> >+(define_insn "smax<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(smax:IMSA (match_operand:IMSA 1 "register_operand" "f")
> >+		   (match_operand:IMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "max_s.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "umax<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(umax:IMSA (match_operand:IMSA 1 "register_operand" "f")
> >+		   (match_operand:IMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "max_u.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_maxi_s_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_imm5_operand" "")]
> >+		     UNSPEC_MSA_MAXI_S))]
> >+  "ISA_HAS_MSA"
> >+  "maxi_s.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> This should be part of the instructions above with a second alternative.
> 
> >+
> >+(define_insn "msa_maxi_u_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_uimm5_operand" "")]
> >+		     UNSPEC_MSA_MAXI_U))]
> >+  "ISA_HAS_MSA"
> >+  "maxi_u.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise.

All updated.

> 
> >+(define_insn "msa_min_a_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_MIN_A))]
> >+  "ISA_HAS_MSA"
> >+  "min_a.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> This could be used as part of mov<mode>cc support.
> 
> >+(define_insn "smin<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(smin:IMSA (match_operand:IMSA 1 "register_operand" "f")
> >+		   (match_operand:IMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "min_s.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "umin<mode>3"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(umin:IMSA (match_operand:IMSA 1 "register_operand" "f")
> >+		   (match_operand:IMSA 2 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "min_u.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_mini_s_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_imm5_operand" "")]
> >+		     UNSPEC_MSA_MINI_S))]
> >+  "ISA_HAS_MSA"
> >+  "mini_s.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> This should be merged with the instructions above.

Likewise.
> 
> >+
> >+(define_insn "msa_mini_u_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_uimm5_operand" "")]
> >+		     UNSPEC_MSA_MINI_U))]
> >+  "ISA_HAS_MSA"
> >+  "mini_u.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise.
> 
> >+
> >+(define_insn "msa_msub_q_<msafmt>"
> >+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
> >+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "0")
> >+			 (match_operand:IMSA_WH 2 "register_operand" "f")
> >+			 (match_operand:IMSA_WH 3 "register_operand" "f")]
> >+			UNSPEC_MSA_MSUB_Q))]
> >+  "ISA_HAS_MSA"
> >+  "msub_q.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_msubr_q_<msafmt>"
> >+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
> >+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "0")
> >+			 (match_operand:IMSA_WH 2 "register_operand" "f")
> >+			 (match_operand:IMSA_WH 3 "register_operand" "f")]
> >+			UNSPEC_MSA_MSUBR_Q))]
> >+  "ISA_HAS_MSA"
> >+  "msubr_q.<msafmt>\t%w0,%w2,%w3"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_mul_q_<msafmt>"
> >+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
> >+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "f")
> >+			 (match_operand:IMSA_WH 2 "register_operand" "f")]
> >+			UNSPEC_MSA_MUL_Q))]
> >+  "ISA_HAS_MSA"
> >+  "mul_q.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_mulr_q_<msafmt>"
> >+  [(set (match_operand:IMSA_WH 0 "register_operand" "=f")
> >+	(unspec:IMSA_WH [(match_operand:IMSA_WH 1 "register_operand" "f")
> >+			 (match_operand:IMSA_WH 2 "register_operand" "f")]
> >+			UNSPEC_MSA_MULR_Q))]
> >+  "ISA_HAS_MSA"
> >+  "mulr_q.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_mul")
> >+   (set_attr "mode" "<MODE>")])
> 
> Rework when adding proper fixed point mode support.

Ok.
> 
> >+(define_insn "msa_nloc_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")]
> >+		     UNSPEC_MSA_NLOC))]
> >+  "ISA_HAS_MSA"
> >+  "nloc.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_bit")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "clz<mode>2"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(clz:IMSA (match_operand:IMSA 1 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+  "nlzc.<msafmt>\t%w0,%w1"
> >+  [(set_attr "type" "simd_bit")
> >+   (set_attr "mode" "<MODE>")])
> 
> Can you confirm that nlzc has a natural value when given an operand of zero?
> I.e. 8 for B 16 for H 32 for W and 64 for D?
> 
> Also CLZ_DEFINED_VALUE_AT_ZERO looks like it needs updating to know the
> bitsize of elements in a vector rather than the whole vector. I.e. I think
> it will say the result is 128 for any vector mode.

I'm not so sure about this one.

The above pattern as it stands doesn't accept the zero operand, would the macro
would still matter? The built-in also rejects the zero argument.
It appears that the code around, where the macro is used, doesn't handle vector modes
and the modification of this macro may potentially lead to strange errors.
AFAICS, it's very unlikely that we ever hit those paths so it appears we are safe.

For the sake of consistency, I think that we still should consider
vector elements so I updated the macro to use GET_MODE_UNIT_BITSIZE.

> 
> >+
> >+(define_insn "msa_nor_v_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(and:IMSA (not:IMSA (match_operand:IMSA 1 "register_operand" "f"))
> >+		  (not:IMSA (match_operand:IMSA 2 "register_operand" "f"))))]
> >+  "ISA_HAS_MSA"
> >+  "nor.v\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_logic")
> >+   (set_attr "mode" "TI")])
> >+
> >+(define_insn "msa_nori_b"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f")
> >+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
> >+		       (match_operand 2 "const_uimm8_operand" "")]
> >+		      UNSPEC_MSA_NORI_B))]
> >+  "ISA_HAS_MSA"
> >+  "nori.b\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_logic")
> >+   (set_attr "mode" "V16QI")])
> 
> This can be standard rtl and joined with the previous insn. The immediate
> for HI/SI/DI patterns simply must have replicates byte values as well as
> elements.

Done.

> 
> >+
> >+(define_insn "msa_ori_b"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f")
> >+	(ior:V16QI (match_operand:V16QI 1 "register_operand" "f")
> >+		   (match_operand 2 "const_uimm8_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "ori.b\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_logic")
> >+   (set_attr "mode" "V16QI")])
> 
> This should be part of register based OR with the same rules as above.

Done.
> 
> >+(define_insn "msa_shf_<msafmt>"
> >+  [(set (match_operand:IMSA_WHB 0 "register_operand" "=f")
> >+	(unspec:IMSA_WHB [(match_operand:IMSA_WHB 1 "register_operand" "f")
> >+			  (match_operand 2 "const_uimm8_operand" "")]
> >+			 UNSPEC_MSA_SHF))]
> >+  "ISA_HAS_MSA"
> >+  "shf.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_shf")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_shf_w_f"
> >+  [(set (match_operand:V4SF 0 "register_operand" "=f")
> >+	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")
> >+		      (match_operand 2 "const_uimm8_operand" "")]
> >+		     UNSPEC_MSA_SHF))]
> >+  "ISA_HAS_MSA"
> >+  "shf.w\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_shf")
> >+   (set_attr "mode" "V4SF")])
> 
> These seem representable in RTL albeit ugly. Perhaps look at this in a
> follow up.

Done. I added some helpers to plug this into vec_perm_const pattern.
> 
> >+(define_insn "msa_slli_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_SLLI))]
> >+  "ISA_HAS_MSA"
> >+  "slli.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> 
> vashl, vashr, vlshr SPNs cover these. At least some if not all are
> already defined in this file so place switch the builtins to target
> them instead of these unspecs.

Done.

> 
> >+(define_insn "msa_srai_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_SRAI))]
> >+  "ISA_HAS_MSA"
> >+  "srai.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_srar_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_SRAR))]
> >+  "ISA_HAS_MSA"
> >+  "srar.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> 
> The shifts with rounding can stay of course.
> 
> >+
> >+(define_insn "msa_srari_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_SRARI))]
> >+  "ISA_HAS_MSA"
> >+  "srari.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_srli_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_SRLI))]
> >+  "ISA_HAS_MSA"
> >+  "srli.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_srlr_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_SRLR))]
> >+  "ISA_HAS_MSA"
> >+  "srlr.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_srlri_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> >+		     UNSPEC_MSA_SRLRI))]
> >+  "ISA_HAS_MSA"
> >+  "srlri.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_shift")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_subs_s_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_SUBS_S))]
> >+  "ISA_HAS_MSA"
> >+  "subs_s.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> Due for update in a follow up adding fixed point mode support.

Ok.

> 
> >+
> >+(define_insn "msa_subs_u_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_SUBS_U))]
> >+  "ISA_HAS_MSA"
> >+  "subs_u.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_subsuu_s_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_SUBSUU_S))]
> >+  "ISA_HAS_MSA"
> >+  "subsuu_s.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_subsus_u_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand:IMSA 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_SUBSUS_U))]
> >+  "ISA_HAS_MSA"
> >+  "subsus_u.<msafmt>\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> likewise to here.
> 
> >+
> >+(define_insn "msa_subvi_<msafmt>"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> >+		      (match_operand 2 "const_uimm5_operand" "")]
> >+		     UNSPEC_MSA_SUBVI))]
> >+  "ISA_HAS_MSA"
> >+  "subvi.<msafmt>\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_int_arith")
> >+   (set_attr "mode" "<MODE>")])
> 
> This should be part of the simple sub<mode>3 pattern and that pattern
> should be used by the builtin.

Done.
> 
> >+
> >+(define_insn "msa_xori_b"
> >+  [(set (match_operand:V16QI 0 "register_operand" "=f")
> >+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
> >+		       (match_operand 2 "const_uimm8_operand" "")]
> >+		      UNSPEC_MSA_XORI_B))]
> >+  "ISA_HAS_MSA"
> >+  "xori.b\t%w0,%w1,%2"
> >+  [(set_attr "type" "simd_logic")
> >+   (set_attr "mode" "V16QI")])
> 
> Similar issues as ANDI and ORI.

Likewise.
> 
> >+(define_insn "msa_sld_<msafmt_f>"
> >+  [(set (match_operand:MSA 0 "register_operand" "=f")
> >+	(unspec:MSA [(match_operand:MSA 1 "register_operand" "0")
> >+		     (match_operand:MSA 2 "register_operand" "f")
> >+		     (match_operand:SI 3 "reg_or_0_operand" "dJ")]
> >+		    UNSPEC_MSA_SLD))]
> >+  "ISA_HAS_MSA"
> >+  "sld.<msafmt>\t%w0,%w2[%z3]"
> >+  [(set_attr "type" "simd_sld")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_sldi_<msafmt_f>"
> >+  [(set (match_operand:MSA 0 "register_operand" "=f")
> >+	(unspec:MSA [(match_operand:MSA 1 "register_operand" "0")
> >+		     (match_operand:MSA 2 "register_operand" "f")
> >+		     (match_operand 3 "const_<indeximm>_operand" "")]
> >+		    UNSPEC_MSA_SLDI))]
> >+  "ISA_HAS_MSA"
> >+  "sldi.<msafmt>\t%w0,%w2[%3]"
> >+  [(set_attr "type" "simd_sld")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+(define_insn "msa_splat_<msafmt_f>"
> >+  [(set (match_operand:MSA 0 "register_operand" "=f")
> >+	(unspec:MSA [(match_operand:MSA 1 "register_operand" "f")
> >+		     (match_operand:SI 2 "reg_or_0_operand" "dJ")]
> >+		    UNSPEC_MSA_SPLAT))]
> >+  "ISA_HAS_MSA"
> >+  "splat.<msafmt>\t%w0,%w1[%z2]"
> >+  [(set_attr "type" "simd_splat")
> >+   (set_attr "mode" "<MODE>")])
> 
> This doesn't need 'or_0' support as the splati below covers it. I think
> this is targetable but seem to recall an off-list discussion that says
> it can never be generated even if represented with an appropriate
> pattern.

That's correct. Another patch for the middle-end is needed but this
will be part of the follow up work.

> 
> >+
> >+(define_insn "msa_splati_<msafmt_f>"
> >+  [(set (match_operand:MSA 0 "register_operand" "=f")
> >+	(unspec:MSA [(match_operand:MSA 1 "register_operand" "f")
> >+		     (match_operand 2 "const_<indeximm>_operand" "")]
> >+		    UNSPEC_MSA_SPLATI))]
> >+  "ISA_HAS_MSA"
> >+  "splati.<msafmt>\t%w0,%w1[%2]"
> >+  [(set_attr "type" "simd_splat")
> >+   (set_attr "mode" "<MODE>")])
> 
> This seems targettable with standard RTL.

I changed this but I don't expect to be working for the same reason
as above.

> 
> >+
> >+;; Operand 1 is a scalar.
> >+(define_insn "msa_splati_<msafmt_f>_s"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(unspec:FMSA [(match_operand:<UNITMODE> 1 "register_operand" "f")
> >+		      (match_operand 2 "const_<indeximm>_operand" "")]
> >+		     UNSPEC_MSA_SPLATI))]
> >+  "ISA_HAS_MSA"
> >+  "splati.<msafmt>\t%w0,%w1[%2]"
> >+  [(set_attr "type" "simd_splat")
> >+   (set_attr "mode" "<MODE>")])
> 
> I don't understand this pattern. Why is there an element selector for
> a scalar input. Isn't it just element 0 hard-coded?

It appears so. I renamed to *_scalar and removed the need for selector.
Now it's hardcoded to 0.

> 
> >+(define_insn "msa_cfcmsa"
> >+  [(set (match_operand:SI 0 "register_operand" "=d")
> >+	(unspec_volatile:SI [(match_operand 1 "const_uimm5_operand" "")]
> >+			    UNSPEC_MSA_CFCMSA))]
> >+  "ISA_HAS_MSA"
> >+  "cfcmsa\t%0,$%1"
> >+  [(set_attr "type" "simd_cmsa")
> >+   (set_attr "mode" "SI")])
> >+
> >+(define_insn "msa_ctcmsa"
> >+  [(unspec_volatile [(match_operand 0 "const_uimm5_operand" "")
> >+		     (match_operand:SI 1 "register_operand" "d")]
> >+		    UNSPEC_MSA_CTCMSA)]
> >+  "ISA_HAS_MSA"
> >+  "ctcmsa\t$%0,%1"
> >+  [(set_attr "type" "simd_cmsa")
> >+   (set_attr "mode" "SI")])
> 
> Just noting that the CTCMSA instruction is defined with arguments backwards
> to other ctc* instructions in the base arch. The copro register always goes
> second apart from CTCMSA.
> 
> >+
> >+(define_insn "msa_fexdo_h"
> >+  [(set (match_operand:V8HI 0 "register_operand" "=f")
> >+	(unspec:V8HI [(match_operand:V4SF 1 "register_operand" "f")
> >+		      (match_operand:V4SF 2 "register_operand" "f")]
> >+		     UNSPEC_MSA_FEXDO))]
> >+  "ISA_HAS_MSA"
> >+  "fexdo.h\t%w0,%w1,%w2"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "mode" "V8HI")])
> >+
> >+(define_insn "msa_fexdo_w"
> >+  [(set (match_operand:V4SF 0 "register_operand" "=f")
> >+	(vec_concat:V4SF
> >+	  (float_truncate:V2SF (match_operand:V2DF 1 "register_operand" "f"))
> >+	  (float_truncate:V2SF (match_operand:V2DF 2 "register_operand" "f"))))]
> >+  "ISA_HAS_MSA"
> >+  "fexdo.w\t%w0,%w2,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "mode" "V4SF")])
> >+
> >+(define_insn "msa_fexupl_w"
> >+  [(set (match_operand:V4SF 0 "register_operand" "=f")
> >+	(unspec:V4SF [(match_operand:V8HI 1 "register_operand" "f")]
> >+		     UNSPEC_MSA_FEXUPL))]
> >+  "ISA_HAS_MSA"
> >+  "fexupl.w\t%w0,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "mode" "V4SF")])
> >+
> >+(define_insn "msa_fexupl_d"
> >+  [(set (match_operand:V2DF 0 "register_operand" "=f")
> >+	(unspec:V2DF [(match_operand:V4SF 1 "register_operand" "f")]
> >+		     UNSPEC_MSA_FEXUPL))]
> >+  "ISA_HAS_MSA"
> >+  "fexupl.d\t%w0,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "mode" "V2DF")])
> 
> This should be possible with float_extend similar to fexdo_w

Done.
> 
> >+
> >+(define_insn "msa_fexupr_w"
> >+  [(set (match_operand:V4SF 0 "register_operand" "=f")
> >+       (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "f")]
> >+		    UNSPEC_MSA_FEXUPR))]
> >+  "ISA_HAS_MSA"
> >+  "fexupr.w\t%w0,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "mode" "V4SF")])
> 
> Likewise.
> 
> >+(define_insn "msa_fexupr_d"
> >+  [(set (match_operand:V2DF 0 "register_operand" "=f")
> >+       (unspec:V2DF [(match_operand:V4SF 1 "register_operand" "f")]
> >+		    UNSPEC_MSA_FEXUPR))]
> >+  "ISA_HAS_MSA"
> >+  "fexupr.d\t%w0,%w1"
> >+  [(set_attr "type" "simd_fcvt")
> >+   (set_attr "mode" "V2DF")])
> >+
> >+(define_insn "msa_branch_nz_v_<msafmt_f>"
> >+ [(set (pc) (if_then_else
> >+	      (ne (unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
> >+			     UNSPEC_MSA_BNZ_V)
> >+		  (match_operand:SI 2 "const_0_operand"))
> >+		  (label_ref (match_operand 0))
> >+		  (pc)))]
> >+ "ISA_HAS_MSA"
> >+{
> >+  return mips_output_conditional_branch (insn, operands,
> >+					 MIPS_BRANCH ("bnz.v", "%w1,%0"),
> >+					 MIPS_BRANCH ("bz.v", "%w1,%0"));
> >+}
> >+ [(set_attr "type" "simd_branch")
> >+  (set_attr "mode" "TI")])
> 
> This needs updating for compact branch logic. See the floating point branches
> for reference, it must be attribute compact_form==never.
> This can be a proper RTL pattern quite easily with a comparison against
> a const_vector with all zeros. The eq and ne cases should be merged
> together with a template (see other branches).
> 
> >+
> >+(define_expand "msa_bnz_v_<msafmt_f>"
> >+  [(set (match_operand:SI 0 "register_operand" "=d")
> >+	(unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
> >+		   UNSPEC_MSA_TSTNZ_V))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_msa_branch (operands, gen_msa_branch_nz_v_<MSA:msafmt_f>);
> >+  DONE;
> >+})
> 
> I find these instruction definitions very weird. Why is it a good thing to
> expose branches as intrinsics that end up with a GPR value that has to then
> be used in another branch to create control flow?
> Iff there is some value to these then the function name for expanding them
> is a bit misleading as it is not really branching anywhere it is just setting
> a GPR to 1 or 0 which happens to include a set of if-then-else branchs. It
> would be nice to avoid having a define_expand at all here and use a custom
> builtin type to expand it entirely in C.

AFAICS, there is no way to have control flow intrinsics. With this approach,
indeed, it looks odd to set a GP register with 0/1, however, if this is combined
with passes like -fgcse and -ftree-loop-vectorize this seemingly silly double
branching are optimized quite well e.g. for "if (__builtin_msa_bnz_b (...)) {} else {}".
That's what I observed from test cases.

I removed all define_expands and reworked it to have more specialized built-ins
that set a GP reg to 0/1. A new MIPS_BUILTIN_MSA_TEST_BRANCH built-in type has
been introduced.

> 
> >+(define_insn "msa_branchz_v_<msafmt_f>"
> >+ [(set (pc) (if_then_else
> >+	      (eq (unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
> >+			     UNSPEC_MSA_BZ_V)
> >+		  (match_operand:SI 2 "const_0_operand"))
> >+		  (label_ref (match_operand 0))
> >+		  (pc)))]
> >+ "ISA_HAS_MSA"
> >+{
> >+  return mips_output_conditional_branch (insn, operands,
> >+					 MIPS_BRANCH ("bz.v", "%w1,%0"),
> >+					 MIPS_BRANCH ("bnz.v", "%w1,%0"));
> >+}
> >+ [(set_attr "type" "simd_branch")
> >+  (set_attr "mode" "TI")])
> 
> Merge with msa_branch_nz_v_<msafmt_f>.

Done.
> 
> >+(define_expand "msa_bz_v_<msafmt_f>"
> >+  [(set (match_operand:SI 0 "register_operand" "=d")
> >+	(unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
> >+		   UNSPEC_MSA_TSTZ_V))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_msa_branch (operands, gen_msa_branchz_v_<MSA:msafmt_f>);
> >+  DONE;
> >+})
> 
> Similar to msa_bnz_v_<msafmt_f>;
> 
> >+(define_insn "msa_branchnz_<msafmt_f>"
> >+ [(set (pc) (if_then_else
> >+	      (ne (unspec:SI [(match_operand:MSA 1 "register_operand" "f")]
> >+			     UNSPEC_MSA_BNZ)
> >+		  (match_operand:SI 2 "const_0_operand"))
> >+		  (label_ref (match_operand 0))
> >+		  (pc)))]
> >+ "ISA_HAS_MSA"
> >+{
> >+  return mips_output_conditional_branch (insn, operands,
> >+					 MIPS_BRANCH ("bnz.<msafmt>", "%w1,%0"),
> >+					 MIPS_BRANCH ("bz.<msafmt>", "%w1,%0"));
> >+
> >+}
> >+
> 
> whitespace.
> 
> >+ [(set_attr "type" "simd_branch")
> >+  (set_attr "mode" "<MODE>")])
> 
> As above I find the branch instructions a bit pointless unless we can use them
> as actual branches. This branch could be represented as an 'IOR' of all
> elements
> and a test against zero. I don't know if that can be generated. These need
> some more thought but it can be done as a follow on. These patterns do need
> compact_form=never applying.

As in the off-list discussion, the branch instructions should be done
as part of a follow-on work.

> 
> >+
> >+(define_expand "msa_bnz_<msafmt>"
> >+  [(set (match_operand:SI 0 "register_operand" "=d")
> >+	(unspec:SI [(match_operand:IMSA 1 "register_operand" "f")]
> >+		   UNSPEC_MSA_TSTNZ))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_msa_branch (operands, gen_msa_branchnz_<IMSA:msafmt>);
> >+  DONE;
> >+})
> >+
> >+(define_insn "msa_branchz_<msafmt>"
> >+ [(set (pc) (if_then_else
> >+	      (eq (unspec:SI [(match_operand:IMSA 1 "register_operand" "f")]
> >+			     UNSPEC_MSA_BZ)
> >+		   (match_operand:IMSA 2 "const_0_operand"))
> >+		  (label_ref (match_operand 0))
> >+		  (pc)))]
> >+ "ISA_HAS_MSA"
> >+{
> >+  return mips_output_conditional_branch (insn, operands,
> >+					 MIPS_BRANCH ("bz.<msafmt>", "%w1,%0"),
> >+					 MIPS_BRANCH ("bnz.<msafmt>","%w1,%0"));
> >+}
> >+ [(set_attr "type" "simd_branch")
> >+  (set_attr "mode" "<MODE>")])
> >+
> >+(define_expand "msa_bz_<msafmt>"
> >+  [(set (match_operand:SI 0 "register_operand" "=d")
> >+	(unspec:SI [(match_operand:IMSA 1 "register_operand" "f")]
> >+		   UNSPEC_MSA_TSTZ))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_msa_branch (operands, gen_msa_branchz_<IMSA:msafmt>);
> >+  DONE;
> >+})
> >+
> >+;; Note that this instruction treats scalar as vector registers freely.
> >+(define_insn "msa_cast_to_vector_<msafmt_f>"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(unspec:FMSA [(match_operand:<UNITMODE> 1 "register_operand" "f")]
> >+		     UNSPEC_MSA_CAST_TO_VECTOR))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (REGNO (operands[0]) == REGNO (operands[1]))
> >+    return "nop\t# Cast %1 to %w0";
> >+  else
> >+    return "mov.<unitfmt>\t%0,%1\t# Cast %1 to %w0";
> >+}
> >+  [(set_attr "type" "arith")
> >+   (set_attr "mode" "TI")])
> 
> This is simply INSVE but implemented in a weird way. Unless you can explain
> their value please delete them :-)
> 
> >+
> >+;; Note that this instruction treats vector as scalar registers freely.
> >+(define_insn "msa_cast_to_scalar_<msafmt_f>"
> >+  [(set (match_operand:<UNITMODE> 0 "register_operand" "=f")
> >+	(unspec:<UNITMODE> [(match_operand:FMSA 1 "register_operand" "f")]
> >+			   UNSPEC_MSA_CAST_TO_SCALAR))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (REGNO (operands[0]) == REGNO (operands[1]))
> >+    return "nop\t# Cast %w1 to %0";
> >+  else
> >+    return "mov.<unitfmt>\t%0,%1\t# Cast %w1 to %0";
> >+}
> >+  [(set_attr "type" "arith")
> >+   (set_attr "mode" "TI")])
> 
> Likewise. Update vec_extract to use a simple move with a subreg instead
> and leave the other patterns to sort it out. There may be a bit of complexity
> to deal with here but this certainly looks like the wrong way to solve the
> problem to me.

I presume these were added to extract floating point result. At the time,
the standard RTL (vec_select and others) wasn't used a lot or whatsoever.
I removed the two above and added a msa_vec_extract_<msafmt_f>
pattern to extract the element at 0 index and turn it into floating point
move instruction, if needed. "extract" is probably not the best description
here as it is more of a conversion between vector and scalar modes.
The built-ins and info from the documentation are also removed.

Regards,
Robert

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
  2016-01-05 16:16     ` Robert Suchanek
@ 2016-01-11 13:26       ` Matthew Fortune
  0 siblings, 0 replies; 13+ messages in thread
From: Matthew Fortune @ 2016-01-11 13:26 UTC (permalink / raw)
  To: Robert Suchanek, 'Catherine_Moore@mentor.com'
  Cc: 'gcc-patches@gcc.gnu.org'

Hi Robert,

Thanks for the update and detailed comments. There are a couple of things
which I think still need addressing based solely on your comments but
generally the changes seem to have simplified things which is nice to see.

I haven't read the patch again yet and given the changes it looks like
a complete re-read is in order so it will be a few days. All being well
there won't be much to say.

> >
> > >+(define_expand "msa_ldi<mode>"
> > >+  [(match_operand:IMSA 0 "register_operand")
> > >+   (match_operand 1 "const_imm10_operand")]
> > >+  "ISA_HAS_MSA"
> > >+{
> > >+  unsigned n_elts = GET_MODE_NUNITS (<MODE>mode);
> > >+  rtvec v = rtvec_alloc (n_elts);
> > >+  HOST_WIDE_INT val = INTVAL (operands[1]);
> > >+  unsigned int i;
> > >+
> > >+  if (<MODE>mode != V16QImode)
> > >+    {
> > >+      unsigned shift = HOST_BITS_PER_WIDE_INT - 10;
> > >+      val = trunc_int_for_mode ((val << shift) >> shift, <UNITMODE>mode);
> > >+    }
> > >+  else
> > >+    val = trunc_int_for_mode (val, <UNITMODE>mode);
> > >+
> > >+  for (i = 0; i < n_elts; i++)
> > >+    RTVEC_ELT (v, i) = GEN_INT (val);
> > >+  emit_move_insn (operands[0],
> > >+		  gen_rtx_CONST_VECTOR (<MODE>mode, v));
> > >+  DONE;
> > >+})
> >
> > This is really weird. We shouldn't be simply discarding bits that don't fit.
> > This needs to accept all immediates and generate the correct code to
> > get a replicated constant of that value into a register. I think it is
> > probably OK to trunc_int_for_mode on the original 'val' for the
> > <UNIT>mode but anything out of range for V*HI/SI/DI needs to be expanded
> > properly.
> >
> > Please do not gen_msa_ldi anywhere other than from MSA builtins. There is
> > no need just emit a move directly.
> 
> AFAICS, the truncation for everything except V16QImode is not needed
> since we have the predicate here. Truncating the immediate for bytes may make
> life easier for users when debugging. Although the extra bits are ignored by
> the hardware, it doesn't stop us from encoding numbers out of range.
> The RTL doesn't seem to have validation of ranges of constants and modes.
> I did a small test and could output any number within the allowable range
> in the predicate.

This is still giving me cause for concern. I think we need to expand this
fully so that constants larger than s10 can be loaded albeit via a longer
sequence. I am not a fan of simplistic builtins that simply map 1:1 with
hardware instructions when that means there is unexpected impact if you
don't know the exact details of the underlying instruction.

> >
> > >+;; Offset load
> > >+(define_expand "msa_ld_<msafmt_f>"
> > >+  [(match_operand:MSA 0 "register_operand")
> > >+   (match_operand 1 "pmode_register_operand")
> > >+   (match_operand 2 "aq10<msafmt>_operand")]
> > >+  "ISA_HAS_MSA"
> > >+{
> > >+  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1],
> > >+				      INTVAL (operands[2]));
> > >+  mips_emit_move (operands[0], gen_rtx_MEM (<MODE>mode, addr));
> > >+  DONE;
> > >+})
> > >+
> > >+;; Offset store
> > >+(define_expand "msa_st_<msafmt_f>"
> > >+  [(match_operand:MSA 0 "register_operand")
> > >+   (match_operand 1 "pmode_register_operand")
> > >+   (match_operand 2 "aq10<msafmt>_operand")]
> > >+  "ISA_HAS_MSA"
> > >+{
> > >+  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1],
> > >+			    INTVAL (operands[2]));
> > >+  mips_emit_move (gen_rtx_MEM (<MODE>mode, addr), operands[0]);
> > >+  DONE;
> > >+})
> >
> > There's no real need to expand these in C code. The patterns can be used
> > to create the RTL. As an aside, I don't really see the point in intrinsics
> > to load and store data the same thing can be done from straight C.
> > The patterns also can't be used for const or volatile data as their
> > builtin prototypes are neither. I suspect they should at least support
> > pointers to const data.
> 
> I was going to remove the expansion in C code but the problem is that
> we need the Pmode. I could duplicate the patterns and override the icode
> e.g for TARGET_64BIT when expanding the built-ins but this doesn't look
> cleaner. Unless I'm missing something here?
> Since the intrinsics have been present for a while it would probably confuse
> if we removed them now. Indeed, the intrinsics are not really needed as we
> can do loads and stores from C. I changed the type in the prototype
> to CVPOINTER so qualifiers like const/volatile are not discarded.

The volatile is however not propagated to the memref so is effectively lost
I believe. The constant in the offset here has a similar problem to msa_ldi
as an out-of-range value will just go wrong. I'll look for other cases when
reading this time.

> >
> > >+  "ISA_HAS_MSA"
> > >+  "andi.b\t%w0,%w1,%E2"
> > >+  [(set_attr "type" "simd_logic")
> > >+   (set_attr "mode" "V16QI")])
> > >+
> >
> > END USEFUL commnets

Oops, thanks for reading on. That was my marker for how far I'd got in one
session!.

> > >+(define_insn "msa_bclr_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")]
> > >+		     UNSPEC_MSA_BCLR))]
> > >+  "ISA_HAS_MSA"
> > >+  "bclr.<msafmt>\t%w0,%w1,%w2"
> > >+  [(set_attr "type" "simd_bit")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_bclri_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> > >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> > >+		     UNSPEC_MSA_BCLRI))]
> > >+  "ISA_HAS_MSA"
> > >+  "bclri.<msafmt>\t%w0,%w1,%2"
> > >+  [(set_attr "type" "simd_bit")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_binsl_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")
> > >+		      (match_operand:IMSA 3 "register_operand" "f")]
> > >+		     UNSPEC_MSA_BINSL))]
> > >+  "ISA_HAS_MSA"
> > >+  "binsl.<msafmt>\t%w0,%w2,%w3"
> > >+  [(set_attr "type" "simd_bitins")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_binsli_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")
> > >+		      (match_operand 3 "const_<bitimm>_operand" "")]
> > >+		     UNSPEC_MSA_BINSLI))]
> > >+  "ISA_HAS_MSA"
> > >+  "binsli.<msafmt>\t%w0,%w2,%3"
> > >+  [(set_attr "type" "simd_bitins")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_binsr_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")
> > >+		      (match_operand:IMSA 3 "register_operand" "f")]
> > >+		     UNSPEC_MSA_BINSR))]
> > >+  "ISA_HAS_MSA"
> > >+  "binsr.<msafmt>\t%w0,%w2,%w3"
> > >+  [(set_attr "type" "simd_bitins")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_binsri_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")
> > >+		      (match_operand 3 "const_<bitimm>_operand" "")]
> > >+		     UNSPEC_MSA_BINSRI))]
> > >+  "ISA_HAS_MSA"
> > >+  "binsri.<msafmt>\t%w0,%w2,%3"
> > >+  [(set_attr "type" "simd_bitins")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > As a follow up these instructions may be represenatble with standard
> > RTL.
> 
> The "insvm" appears to be a good candidate but I'm not sure if this
> is going to work out of the box for vector modes. This would apply
> only for instructions with a replicated immediate.

Let's leave it for follow on work.

> > >+(define_insn "msa_<FCC:fcc>_<FMSA:msafmt>"
> > >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> > >+	(FCC:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")
> > >+		      (match_operand:FMSA 2 "register_operand" "f")))]
> > >+  "ISA_HAS_MSA"
> > >+  "<FCC:fcc>.<FMSA:msafmt>\t%w0,%w1,%w2"
> > >+  [(set_attr "type" "simd_fcmp")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > Can't the msa builtins target these patterns directly? Follow on work should
> > implement "mov<mode>cc" for vector modes.
> >
> > >+
> > >+(define_insn "msa_<fsc>_<FMSA:msafmt>"
> > >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> > >+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")
> > >+			   (match_operand:FMSA 2 "register_operand" "f")]
> > >+			 FSC_UNS))]
> > >+  "ISA_HAS_MSA"
> > >+  "<fsc>.<FMSA:msafmt>\t%w0,%w1,%w2"
> > >+  [(set_attr "type" "simd_fcmp")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > i.e. this is not necessary.
> 
> The difference between this and above is the signalling vs quiet NaNs floating
> point comparison instructions. The quiet NaNs are represented as standard RTL and
> signalling as UNSPEC. "mov<mode>cc" for vector modes is questionable as "vcondmn"
> is meant to be used for conditional vector moves unless I misunderstood something.

Ah, my mistake. I didn't check if mov<mode>cc was applicable to vector modes.
Same thing applies that this kind of thing is good for future work.

> >
> > >+(define_insn "msa_max_a_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")]
> > >+		     UNSPEC_MSA_MAX_A))]
> > >+  "ISA_HAS_MSA"
> > >+  "max_a.<msafmt>\t%w0,%w1,%w2"
> > >+  [(set_attr "type" "simd_int_arith")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > This could be used as part of mov<mode>cc support.
> 
> AFAUI, mov<mode>cc is used for non-vector modes and the support
> should go into vcond[u]mn SPNs. However, for the above and floating-point
> version I'm not sure if we can add it there as we don't have enough
> information as it appears to be simplified to a basic comparison.
> 
> I haven't confirmed it yet but it would appear that the support
> for these operations would have to be explicitly added to
> the auto-vectorizer to recognize these patterns.

OK. We will have to see if the work Prachi and Sameera are doing will
enable these to be targeted.

> >
> > >+(define_insn "msa_nloc_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")]
> > >+		     UNSPEC_MSA_NLOC))]
> > >+  "ISA_HAS_MSA"
> > >+  "nloc.<msafmt>\t%w0,%w1"
> > >+  [(set_attr "type" "simd_bit")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "clz<mode>2"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(clz:IMSA (match_operand:IMSA 1 "register_operand" "f")))]
> > >+  "ISA_HAS_MSA"
> > >+  "nlzc.<msafmt>\t%w0,%w1"
> > >+  [(set_attr "type" "simd_bit")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > Can you confirm that nlzc has a natural value when given an operand of zero?
> > I.e. 8 for B 16 for H 32 for W and 64 for D?
> >
> > Also CLZ_DEFINED_VALUE_AT_ZERO looks like it needs updating to know the
> > bitsize of elements in a vector rather than the whole vector. I.e. I think
> > it will say the result is 128 for any vector mode.
> 
> I'm not so sure about this one.
> 
> The above pattern as it stands doesn't accept the zero operand, would the macro
> would still matter? The built-in also rejects the zero argument.

I guess not then.

> It appears that the code around, where the macro is used, doesn't handle vector modes
> and the modification of this macro may potentially lead to strange errors.
> AFAICS, it's very unlikely that we ever hit those paths so it appears we are safe.
> 
> For the sake of consistency, I think that we still should consider
> vector elements so I updated the macro to use GET_MODE_UNIT_BITSIZE.
> >

OK. I'll take a read when I go through this again.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
@ 2016-01-05 16:15 Robert Suchanek
  0 siblings, 0 replies; 13+ messages in thread
From: Robert Suchanek @ 2016-01-05 16:15 UTC (permalink / raw)
  To: Matthew Fortune, Catherine_Moore; +Cc: gcc-patches

Hi,

Comments inlined.

> >+;; The attribute gives half modes for vector modes.
> >+(define_mode_attr VHMODE
> >+  [(V8HI "V16QI")
> >+   (V4SI "V8HI")
> >+   (V2DI "V4SI")
> >+   (V2DF "V4SF")])
> >+
> >+;; The attribute gives double modes for vector modes.
> >+(define_mode_attr VDMODE
> >+  [(V4SI "V2DI")
> >+   (V8HI "V4SI")
> >+   (V16QI "V8HI")])
> 
> Presumably there is a reason why this is not a mirror of VHMODE. I.e. it does
> not have floating point modes?

This is a mistake. The floating point mode in VHMODE is never used. Removed.

> >+;; The attribute gives half modes with same number of elements for vector
> modes.
> >+(define_mode_attr TRUNCMODE
> >+  [(V8HI "V8QI")
> >+   (V4SI "V4HI")
> >+   (V2DI "V2SI")])
> >+
> >+;; This attribute gives the mode of the result for "copy_s_b, copy_u_b" etc.
> >+(define_mode_attr RES
> >+  [(V2DF "DF")
> >+   (V4SF "SF")
> >+   (V2DI "DI")
> >+   (V4SI "SI")
> >+   (V8HI "SI")
> >+   (V16QI "SI")])
> 
> Verhaps prefix these with a 'V' to clarify they are vector mode attributes.

Done.


> >+;; This attribute gives define_insn suffix for MSA instructions with need
> 
> with => that need

Fixed.

> >+;; distinction between integer and floating point.
> >+(define_mode_attr msafmt_f
> >+  [(V2DF "d_f")
> >+   (V4SF "w_f")
> >+   (V2DI "d")
> >+   (V4SI "w")
> >+   (V8HI "h")
> >+   (V16QI "b")])
> >+
> >+;; To represent bitmask needed for vec_merge using "const_<bitmask>_operand".
> 
> Commenting style is different here. Everything else starts with: This attribute
> ...

Changed.
> 
> >+(define_mode_attr bitmask
> >+  [(V2DF "exp_2")
> >+   (V4SF "exp_4")
> >+   (V2DI "exp_2")
> >+   (V4SI "exp_4")
> >+   (V8HI "exp_8")
> >+   (V16QI "exp_16")])
> >+
> >+;; This attribute used to form an immediate operand constraint using
> 
> used to => is used to

Fixed.

> 
> >+;; "const_<bitimm>_operand".
> >+(define_mode_attr bitimm
> >+  [(V16QI "uimm3")
> >+   (V8HI  "uimm4")
> >+   (V4SI  "uimm5")
> >+   (V2DI  "uimm6")
> >+  ])
> >+
> 
> >+(define_expand "fixuns_trunc<FMSA:mode><mode_i>2"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(unsigned_fix:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_ftrunc_u_<msafmt> (operands[0], operands[1]));
> >+  DONE;
> >+})
> 
> The msa_ftrunc_u_* define_insns should just be renamed to use the standard
> pattern names and, more importantly, standard RTL not UNSPEC.

Renamed, define_expands removed. I also replaced FINT with VIMODE and removed
FINT mode attribute to avoid duplication.  
> 
> >+
> >+(define_expand "fix_trunc<FMSA:mode><mode_i>2"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(fix:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_ftrunc_s_<msafmt> (operands[0], operands[1]));
> >+  DONE;
> >+})
> 
> Likewise.
> 
> >+
> >+(define_expand "vec_pack_trunc_v2df"
> >+  [(set (match_operand:V4SF 0 "register_operand")
> >+	(vec_concat:V4SF
> >+	  (float_truncate:V2SF (match_operand:V2DF 1 "register_operand"))
> >+	  (float_truncate:V2SF (match_operand:V2DF 2 "register_operand"))))]
> >+  "ISA_HAS_MSA"
> >+  "")
> 
> Rename msa_fexdo_w to vec_pack_trunc_v2df.
> 
> I see that fexdo has a 'halfword' variant which creates a half-float. What
> else can operate on half-float and should this really be recorded as an
> HFmode?
> 
> >+(define_expand "vec_unpacks_hi_v4sf"
> >+  [(set (match_operand:V2DF 0 "register_operand" "=f")
> >+	(float_extend:V2DF
> >+	  (vec_select:V2SF
> >+	    (match_operand:V4SF 1 "register_operand" "f")
> >+	    (parallel [(const_int 0) (const_int 1)])
> 
> If we swap the (parallel) for a match_operand 2...
> 
> >+	  )))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (BYTES_BIG_ENDIAN)
> >+    emit_insn (gen_msa_fexupr_d (operands[0], operands[1]));
> >+  else
> >+    emit_insn (gen_msa_fexupl_d (operands[0], operands[1]));
> 
> Then these two could change to set up operands[2] with either
> a parallel of 0/1 or 2/3 and then...
> 
> You could change the fexupr_d and fexupl_d insn patterns to use normal RTL
> that select the appropriate elements (either 0/1 and 2/3).
> 
> >+  DONE;
> 
> Which means the RTL in the pattern would be used to expand this and
> you would remove the DONE. As it stands the pattern on this expand
> is simply never used.

Reworked. The parallel expression for operands[2] is now generated.

> 
> >+})
> >+
> >+(define_expand "vec_unpacks_lo_v4sf"
> >+  [(set (match_operand:V2DF 0 "register_operand" "=f")
> >+	(float_extend:V2DF
> >+	  (vec_select:V2SF
> >+	    (match_operand:V4SF 1 "register_operand" "f")
> >+	    (parallel [(const_int 0) (const_int 1)])
> >+	  )))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (BYTES_BIG_ENDIAN)
> >+    emit_insn (gen_msa_fexupl_d (operands[0], operands[1]));
> >+  else
> >+    emit_insn (gen_msa_fexupr_d (operands[0], operands[1]));
> >+  DONE;
> >+})
> 
> Likewise but inverted.

As above.

> 
> >+
> >+(define_expand "vec_unpacks_hi_<mode>"
> >+  [(set (match_operand:<VDMODE> 0 "register_operand")
> >+	(match_operand:IMSA_WHB 1 "register_operand"))]
> 
> Not much point in having the (set) in here as it would be illegal anyway
> if it were actually expanded. Just list the two operands. (same throughout)
>

Done.
 
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_vec_unpack (operands, false/*unsigned_p*/, true/*high_p*/);
> >+  DONE;
> >+})
> >+
> >+(define_expand "vec_unpacks_lo_<mode>"
> >+  [(set (match_operand:<VDMODE> 0 "register_operand")
> >+	(match_operand:IMSA_WHB 1 "register_operand"))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_vec_unpack (operands, false/*unsigned_p*/, false/*high_p*/);
> >+  DONE;
> >+})
> >+
> >+(define_expand "vec_unpacku_hi_<mode>"
> >+  [(set (match_operand:<VDMODE> 0 "register_operand")
> >+	(match_operand:IMSA_WHB 1 "register_operand"))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_vec_unpack (operands, true/*unsigned_p*/, true/*high_p*/);
> >+  DONE;
> >+})
> >+
> >+(define_expand "vec_unpacku_lo_<mode>"
> >+  [(set (match_operand:<VDMODE> 0 "register_operand")
> >+	(match_operand:IMSA_WHB 1 "register_operand"))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_vec_unpack (operands, true/*unsigned_p*/, false/*high_p*/);
> >+  DONE;
> >+})
> >+
> 
> >+(define_expand "vec_set<mode>"
> >+  [(match_operand:IMSA 0 "register_operand")
> >+   (match_operand:<UNITMODE> 1 "reg_or_0_operand")
> >+   (match_operand 2 "const_<indeximm>_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_insert_<msafmt>_insn (operands[0], operands[1],
> >+					   operands[0],
> >+					   GEN_INT(1 << INTVAL (operands[2]))));
> >+  DONE;
> >+})
> >+
> >+(define_expand "vec_set<mode>"
> >+  [(match_operand:FMSA 0 "register_operand")
> >+   (match_operand:<UNITMODE> 1 "register_operand")
> >+   (match_operand 2 "const_<indeximm>_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_insve_<msafmt_f>_s (operands[0], operands[0],
> >+					 GEN_INT(1 << INTVAL (operands[2])),
> >+					 operands[1]));
> >+  DONE;
> >+})
> >+
> >+(define_expand "vcondu<MSA:mode><IMSA:mode>"
> >+  [(set (match_operand:MSA 0 "register_operand")
> >+	(if_then_else:MSA
> >+	  (match_operator 3 ""
> >+	    [(match_operand:IMSA 4 "register_operand")
> >+	     (match_operand:IMSA 5 "register_operand")])
> >+	  (match_operand:MSA 1 "reg_or_m1_operand")
> >+	  (match_operand:MSA 2 "reg_or_0_operand")))]
> >+  "ISA_HAS_MSA
> >+   && (GET_MODE_NUNITS (<MSA:MODE>mode) == GET_MODE_NUNITS
> (<IMSA:MODE>mode))"
> >+{
> >+  mips_expand_vec_cond_expr (<MSA:MODE>mode, <MSA:VIMODE>mode, operands);
> >+  DONE;
> >+})
> >+
> >+(define_expand "vcond<MSA:mode><MSA_2:mode>"
> >+  [(set (match_operand:MSA 0 "register_operand")
> >+	(if_then_else:MSA
> >+	  (match_operator 3 ""
> >+	    [(match_operand:MSA_2 4 "register_operand")
> >+	     (match_operand:MSA_2 5 "register_operand")])
> >+	  (match_operand:MSA 1 "reg_or_m1_operand")
> >+	  (match_operand:MSA 2 "reg_or_0_operand")))]
> >+  "ISA_HAS_MSA
> >+   && (GET_MODE_NUNITS (<MSA:MODE>mode) == GET_MODE_NUNITS (<MSA:MODE>mode))"
> 
> Bug: This compares MSA to MSA. the second should be MSA_2.

Fixed.

> 
> >+{
> >+  mips_expand_vec_cond_expr (<MSA:MODE>mode, <MSA:VIMODE>mode, operands);
> >+  DONE;
> >+})
> >+
> >+;; Note used directly by builtins but via the following define_expand.
> >+(define_insn "msa_insert_<msafmt>_insn"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(vec_merge:IMSA
> >+	  (vec_duplicate:IMSA
> >+	    (match_operand:<UNITMODE> 1 "reg_or_0_operand" "dJ"))
> >+	  (match_operand:IMSA 2 "register_operand" "0")
> >+	  (match_operand 3 "const_<bitmask>_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "insert.<msafmt>\t%w0[%y3],%z1"
> >+  [(set_attr "type" "simd_insert")
> >+   (set_attr "mode" "<MODE>")])
> 
> Rename to remove _insn, see below. V2DI mode should have # as its pattern
> for 32-bit targets to ensure it gets split

Done.

> 
> >+
> >+;; Expand builtin for HImode and QImode which takes SImode.
> >+(define_expand "msa_insert_<msafmt>"
> >+  [(match_operand:IMSA 0 "register_operand")
> >+   (match_operand:IMSA 1 "register_operand")
> >+   (match_operand 2 "const_<indeximm>_operand")
> >+   (match_operand:<RES> 3 "reg_or_0_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if ((GET_MODE_SIZE (<UNITMODE>mode) < GET_MODE_SIZE (<RES>mode))
> >+      && (REG_P (operands[3]) || (GET_CODE (operands[3]) == SUBREG
> >+				  && REG_P (SUBREG_REG (operands[3])))))
> >+    operands[3] = lowpart_subreg (<UNITMODE>mode, operands[3], <RES>mode);
> >+  emit_insn (gen_msa_insert_<msafmt>_insn (operands[0], operands[3],
> >+					   operands[1],
> >+					   GEN_INT(1 << INTVAL (operands[2]))));
> >+  DONE;
> >+})
> >+
> 
> Lets do this during mips_expand_builtin_insn like ilvl and friends. Having
> expanders
> simply to map the builtins to real instructions doesn't seem very useful.

Done.
> 
> >+(define_expand "msa_insert_<msafmt_f>"
> >+  [(match_operand:FMSA 0 "register_operand")
> >+   (match_operand:FMSA 1 "register_operand")
> >+   (match_operand 2 "const_<indeximm>_operand")
> >+   (match_operand:<UNITMODE> 3 "reg_or_0_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_insert_<msafmt_f>_insn (operands[0], operands[3],
> >+					     operands[1],
> >+					     GEN_INT(1 << INTVAL (operands[2]))));
> >+  DONE;
> >+})
> 
> Likewise.
> 
> >+
> >+(define_insn "msa_insert_<msafmt_f>_insn"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(vec_merge:FMSA
> >+	  (vec_duplicate:FMSA
> >+	    (match_operand:<UNITMODE> 1 "register_operand" "d"))
> >+	  (match_operand:FMSA 2 "register_operand" "0")
> >+	  (match_operand 3 "const_<bitmask>_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "insert.<msafmt>\t%w0[%y3],%z1"
> >+  [(set_attr "type" "simd_insert")
> >+   (set_attr "mode" "<MODE>")])
> 
> Rename to remove _insn, see above. V2DF mode should have # for 32-bit targets
> to
> ensure it gets split.

Likewise.
> 
> >+
> >+(define_split
> >+  [(set (match_operand:MSA_D 0 "register_operand")
> >+	(vec_merge:MSA_D
> >+	  (vec_duplicate:MSA_D
> >+	    (match_operand:<UNITMODE> 1 "<MSA_D:msa_d>_operand"))
> >+	  (match_operand:MSA_D 2 "register_operand")
> >+	  (match_operand 3 "const_<bitmask>_operand")))]
> >+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> >+  [(const_int 0)]
> >+{
> >+  mips_split_msa_insert_d (operands[0], operands[2], operands[3],
> operands[1]);
> >+  DONE;
> >+})
> 
> ...
> 
> >+(define_expand "msa_insve_<msafmt_f>"
> >+  [(set (match_operand:MSA 0 "register_operand")
> >+	(vec_merge:MSA
> >+	  (vec_duplicate:MSA
> >+	    (vec_select:<UNITMODE>
> >+	      (match_operand:MSA 3 "register_operand")
> >+	      (parallel [(const_int 0)])))
> >+	  (match_operand:MSA 1 "register_operand")
> >+	  (match_operand 2 "const_<indeximm>_operand")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  operands[2] = GEN_INT (1 << INTVAL (operands[2]));
> >+})
> 
> Like for insert patterns do this in mips_expand_builtin_insn and rename the
> instruction below to remove _insn.

Done.
> 
> >+(define_insn "msa_insve_<msafmt_f>_insn"
> >+  [(set (match_operand:MSA 0 "register_operand" "=f")
> >+	(vec_merge:MSA
> >+	  (vec_duplicate:MSA
> >+	    (vec_select:<UNITMODE>
> >+	      (match_operand:MSA 3 "register_operand" "f")
> >+	      (parallel [(const_int 0)])))
> >+	  (match_operand:MSA 1 "register_operand" "0")
> >+	  (match_operand 2 "const_<bitmask>_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "insve.<msafmt>\t%w0[%y2],%w3[0]"
> >+  [(set_attr "type" "simd_insert")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+;; Operand 3 is a scalar.
> >+(define_insn "msa_insve_<msafmt>_f_s"
> 
> It would be more clear to have <msafmt_f> instead of <msafmt>_f. The 's' here
> is for scalar I believe. Perhaps spell it out and put scalar.

OK.
> 
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(vec_merge:FMSA
> >+	  (vec_duplicate:FMSA
> >+	    (match_operand:<UNITMODE> 3 "register_operand" "f"))
> >+	  (match_operand:FMSA 1 "register_operand" "0")
> >+	  (match_operand 2 "const_<bitmask>_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "insve.<msafmt>\t%w0[%y2],%w3[0]"
> >+  [(set_attr "type" "simd_insert")
> >+   (set_attr "mode" "<MODE>")])
> 
> >+;; Note that copy_s.d and copy_s.d_f will be split later if !TARGET_64BIT.
> >+(define_insn "msa_copy_s_<msafmt_f>"
> >+  [(set (match_operand:<RES> 0 "register_operand" "=d")
> >+	(sign_extend:<RES>
> >+	  (vec_select:<UNITMODE>
> >+	    (match_operand:MSA 1 "register_operand" "f")
> >+	    (parallel [(match_operand 2 "const_<indeximm>_operand" "")]))))]
> >+  "ISA_HAS_MSA"
> >+  "copy_s.<msafmt>\t%0,%w1[%2]"
> >+  [(set_attr "type" "simd_copy")
> >+   (set_attr "mode" "<MODE>")])
> 
> I think the splits should be explicit and therefore generate # for the the
> two patterns that will be split for !TARGET_64BIT. The sign_extend should
> only be present for V8HI and V16QI modes. There could be value in adding
> widening patterns to extend to DImode on 64-bit targets but it may not
> trigger much.
> 
> >+(define_split
> >+  [(set (match_operand:<UNITMODE> 0 "register_operand")
> >+	(sign_extend:<UNITMODE>
> >+	  (vec_select:<UNITMODE>
> >+	    (match_operand:MSA_D 1 "register_operand")
> >+	    (parallel [(match_operand 2 "const_0_or_1_operand")]))))]
> >+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> >+  [(const_int 0)]
> >+{
> >+  mips_split_msa_copy_d (operands[0], operands[1], operands[2],
> >+			 gen_msa_copy_s_w);
> >+  DONE;
> >+})
> >+
> >+;; Note that copy_u.d and copy_u.d_f will be split later if !TARGET_64BIT.
> >+(define_insn "msa_copy_u_<msafmt_f>"
> >+  [(set (match_operand:<RES> 0 "register_operand" "=d")
> >+	(zero_extend:<RES>
> >+	  (vec_select:<UNITMODE>
> >+	    (match_operand:MSA 1 "register_operand" "f")
> >+	    (parallel [(match_operand 2 "const_<indeximm>_operand" "")]))))]
> >+  "ISA_HAS_MSA"
> >+  "copy_u.<msafmt>\t%0,%w1[%2]"
> >+  [(set_attr "type" "simd_copy")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise on all accounts except that V2SI mode should not be included at all
> here as we need to use the copy_s for V2SI->SImode on 64-bit targets to get
> the correct canonical result.
> 
> >+(define_split
> >+  [(set (match_operand:<UNITMODE> 0 "register_operand")
> >+	(zero_extend:<UNITMODE>
> >+	  (vec_select:<UNITMODE>
> >+	    (match_operand:MSA_D 1 "register_operand")
> >+	    (parallel [(match_operand 2 "const_0_or_1_operand")]))))]
> >+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> >+  [(const_int 0)]
> >+{
> >+  mips_split_msa_copy_d (operands[0], operands[1], operands[2],
> >+			 gen_msa_copy_u_w);
> 
> This should use copy_s so that when the 32-bit code is used on a 64-bit arch
> then the data stays in canonical form. The copy_u should never be used on
> a 32-bit target, I don't think it should even exist in the MSA32 spec
> actually. I am discussing this with our architecture team and will update
> the thread with the outcome.
> 
> >+  DONE;
> >+})

Refactored throughout.

> 
> = mips.c =
> 
> >+/* Expand VEC_COND_EXPR, where:
> >+   MODE is mode of the result
> >+   VIMODE equivalent integer mode
> >+   OPERANDS operands of VEC_COND_EXPR.  */
> >+
> >+void
> >+mips_expand_vec_cond_expr (machine_mode mode, machine_mode vimode,
> >+			   rtx *operands)
> >+{
> >+  rtx cond = operands[3];
> >+  rtx cmp_op0 = operands[4];
> >+  rtx cmp_op1 = operands[5];
> >+  rtx cmp_res = gen_reg_rtx (vimode);
> >+
> >+  mips_expand_msa_cmp (cmp_res, GET_CODE (cond), cmp_op0, cmp_op1);
> >+
> >+  /* We handle the following cases:
> >+     1) r = a CMP b ? -1 : 0
> >+     2) r = a CMP b ? -1 : v
> >+     3) r = a CMP b ?  v : 0
> >+     4) r = a CMP b ? v1 : v2  */
> >+
> >+  /* Case (1) above.  We only move the results.  */
> >+  if (operands[1] == CONSTM1_RTX (vimode)
> >+      && operands[2] == CONST0_RTX (vimode))
> >+    emit_move_insn (operands[0], cmp_res);
> >+  else
> >+    {
> >+      rtx src1 = gen_reg_rtx (vimode);
> >+      rtx src2 = gen_reg_rtx (vimode);
> >+      rtx mask = gen_reg_rtx (vimode);
> >+      rtx bsel;
> >+
> >+      /* Move the vector result to use it as a mask.  */
> >+      emit_move_insn (mask, cmp_res);
> >+
> >+      if (register_operand (operands[1], mode))
> >+	{
> >+	  rtx xop1 = operands[1];
> >+	  if (mode != vimode)
> >+	    {
> >+	      xop1 = gen_reg_rtx (vimode);
> >+	      emit_move_insn (xop1, gen_rtx_SUBREG (vimode, operands[1], 0));
> >+	    }
> >+	  emit_move_insn (src1, xop1);
> >+	}
> >+      else
> >+	/* Case (2) if the below doesn't move the mask to src2.  */
> >+	emit_move_insn (src1, mask);
> 
> Please assert that operands[1] is constm1.

Added.

> 
> >+
> >+      if (register_operand (operands[2], mode))
> >+	{
> >+	  rtx xop2 = operands[2];
> >+	  if (mode != vimode)
> >+	    {
> >+	      xop2 = gen_reg_rtx (vimode);
> >+	      emit_move_insn (xop2, gen_rtx_SUBREG (vimode, operands[2], 0));
> >+	    }
> >+	  emit_move_insn (src2, xop2);
> >+	}
> >+      else
> >+	/* Case (3) if the above didn't move the mask to src1.  */
> >+	emit_move_insn (src2, mask);
> 
> Please assert that operands[2] is const0.

Done.
> 
> >+
> >+      /* We deal with case (4) if the mask wasn't moved to either src1 or
> src2.
> >+	 In any case, we eventually do vector mask-based copy.  */
> >+      bsel = gen_rtx_UNSPEC (vimode, gen_rtvec (3, mask, src2, src1),
> >+			     UNSPEC_MSA_BSEL_V);
> >+      /* The result is placed back to a register with the mask.  */
> >+      emit_insn (gen_rtx_SET (mask, bsel));
> 
> I guess you expand like this instead of gen_ to avoid having to select the
> function
> based on vimode.

Correct. I thought it would be slightly cleaner to expand this way.

> 
> >+      emit_move_insn (operands[0], gen_rtx_SUBREG (mode, mask, 0));
> >+    }
> >+}
> >+
> 
> Thanks,
> Matthew

Regards,
Robert

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-05-09 12:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-10 12:22 [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA) Robert Suchanek
2015-08-27 13:03 ` Matthew Fortune
2016-01-05 16:15   ` Robert Suchanek
2016-01-05 16:16 ` Robert Suchanek
2016-04-04 22:22   ` Matthew Fortune
2016-05-05 15:13     ` Robert Suchanek
2016-05-06 15:04       ` Matthew Fortune
2016-05-09 12:22         ` Robert Suchanek
     [not found] <B5E67142681B53468FAF6B7C31356562441AF59F@hhmail02.hh.imgtec.org>
2015-09-13  9:56 ` Matthew Fortune
2015-10-09 14:45   ` Matthew Fortune
2016-01-05 16:16     ` Robert Suchanek
2016-01-11 13:26       ` Matthew Fortune
2016-01-05 16:15 Robert Suchanek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).