public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
@ 2015-08-10 12:22 Robert Suchanek
  2015-08-27 13:03 ` Matthew Fortune
  2016-01-05 16:16 ` Robert Suchanek
  0 siblings, 2 replies; 13+ messages in thread
From: Robert Suchanek @ 2015-08-10 12:22 UTC (permalink / raw)
  To: Catherine_Moore, Matthew Fortune; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 16993 bytes --]

Hi,

This series of patches adds the support for MIPS SIMD Architecture (MSA)
and underwent a few updates since the last review to address the comments:

https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01777.html

The series is split into four parts:

0001 [MIPS] Add support for MIPS SIMD Architecture (MSA)
0002 [MIPS] Add pipeline description for MSA
0003 Add support to run auto-vectorization tests for multiple effective targets
0004 [MIPS] Add tests for MSA

There a couple things to mention here:
- there is a minor regression on O32 ABI due to the lack of stack realignment
  AFAICS and a patch will follow.  The vectorizer generates more unaligned accesses
  than the tests expect, and hence, fail the checks.
- the series doesn't add cost modelling for auto-vectorization
- patch 0003 is independent but must go in before 0004.

Regards,
Robert

gcc/ChangeLog:

	* config.gcc: Add MSA header file for mips*-*-* target.
	* config/mips/constraints.md (YI, YC, YZ, Unv5, Uuv5, Uuv6, Ubv8):
	New constraints.
	* config/mips/mips-ftypes.def: Add function types for MSA builtins.
	* config/mips/mips-modes.def (V16QI, V8HI, V4SI, V2DI, V4SF, V2DF)
	(V32QI, V16HI, V8SI, V4DI, V8SF, V4DF): New modes.
	* config/mips/mips-msa.md: New file.
	* config/mips/mips-protos.h
	(mips_split_128bit_const_insns): New prototype.
	(mips_msa_idiv_insns): Likewise.
	(mips_split_128bit_move): Likewise.
	(mips_split_128bit_move_p): Likewise.
	(mips_split_msa_copy_d): Likewise.
	(mips_split_msa_insert_d): Likewise.
	(mips_split_msa_fill_d): Likewise.
	(mips_expand_msa_branch): Likewise.
	(mips_const_vector_same_val_p): Likewise.
	(mips_const_vector_same_byte_p): Likewise.
	(mips_const_vector_same_int_p): Likewise.
	(mips_const_vector_bitimm_set_p): Likewise.
	(mips_const_vector_bitimm_clr_p): Likewise.
	(mips_msa_output_division): Likewise.
	(mips_ldst_scaled_shift): Likewise.
	(mips_expand_vec_cond_expr): Likewise.
	* config/mips/mips.c (mips_const_vector_bitimm_set_p): New function.
	(mips_const_vector_bitimm_clr_p): Likewise.
	(mips_const_vector_same_val_p): Likewise.
	(mips_const_vector_same_byte_p): Likewise.
	(mips_const_vector_same_int_p): Likewise.
	(mips_symbol_insns): Forbid loading symbols via immediate for MSA.
	(mips_valid_offset_p): Limit offset to 10-bit for MSA loads and stores.
	(mips_valid_lo_sum_p): Forbid loadings symbols via %lo(base) for MSA.
	(mips_lx_address_p): Add support load indexed address for MSA.
	(mips_address_insns): Add calculation of instructions needed for
	stores and loads for MSA.
	(mips_const_insns): Move CONST_DOUBLE below CONST_VECTOR.  Handle
	CONST_VECTOR for MSA and let it fall through.
	(mips_ldst_scaled_shift): New function.
	(mips_subword_at_byte): Likewise.
	(mips_msa_idiv_insns): Likewise.
	(mips_legitimize_move): Validate MSA moves.
	(mips_rtx_costs): Add UNGE, UNGT, UNLE, UNLT cases.  Add calculation of
	costs for MSA division.
	(mips_split_move_p): Check if MSA moves need splitting.
	(mips_split_move): Split MSA moves if necessary.
	(mips_split_128bit_move_p): New function.
	(mips_split_128bit_move): Likewise.
	(mips_split_msa_copy_d): Likewise.
	(mips_split_msa_insert_d): Likewise.
	(mips_split_msa_fill_d): Likewise.
	(mips_output_move): Handle MSA moves.
	(mips_expand_msa_branch): New function.
	(mips_print_operand): Add 'E', 'B', 'w', 'v' modifiers.  Reinstate 'y'
	modifier.
	(mips_file_start): Add MSA .gnu_attribute.
	(mips_hard_regno_mode_ok_p): Allow TImode and 128-bit vectors in FPRs.
	(mips_hard_regno_nregs): Always return 1 for MSA supported mode.
	(mips_class_max_nregs): Add register size for MSA supported mode.
	(mips_cannot_change_mode_class): Allow conversion between MSA vector
	modes and TImode.
	(mips_mode_ok_for_mov_fmt_p): Allow MSA to use move.v instruction.
	(mips_secondary_reload_class): Force MSA loads/stores via memory.
	(mips_preferred_simd_mode): Add preffered modes for MSA.
	(mips_vector_mode_supported_p): Add MSA supported modes.
	(mips_autovectorize_vector_sizes): New function.
	(mips_msa_output_division): Likewise.
	(MSA_BUILTIN, MIPS_BUILTIN_DIRECT_NO_TARGET, MSA_NO_TARGET_BUILTIN):
	New macros.
	(CODE_FOR_msa_adds_s_b, CODE_FOR_msa_adds_s_h, CODE_FOR_msa_adds_s_w)
	(CODE_FOR_msa_adds_s_d, CODE_FOR_msa_adds_u_b, CODE_FOR_msa_adds_u_h)
	(CODE_FOR_msa_adds_u_w, CODE_FOR_msa_adds_u_d, CODE_FOR_msa_addv_b)
	(CODE_FOR_msa_addv_h, CODE_FOR_msa_addv_w, CODE_FOR_msa_addv_d)
	(CODE_FOR_msa_and_v, CODE_FOR_msa_bmnz_v, CODE_FOR_msa_bmz_v)
	(CODE_FOR_msa_bnz_v, CODE_FOR_msa_bz_v, CODE_FOR_msa_bsel_v)
	(CODE_FOR_msa_div_s_b, CODE_FOR_msa_div_s_h, CODE_FOR_msa_div_s_w)
	(CODE_FOR_msa_div_s_d, CODE_FOR_msa_div_u_b, CODE_FOR_msa_div_u_h)
	(CODE_FOR_msa_div_u_w, CODE_FOR_msa_div_u_d, CODE_FOR_msa_fadd_w)
	(CODE_FOR_msa_fadd_d, CODE_FOR_msa_ffint_s_w, CODE_FOR_msa_ffint_s_d)
	(CODE_FOR_msa_ffint_u_w, CODE_FOR_msa_ffint_u_d, CODE_FOR_msa_fsub_w)
	(CODE_FOR_msa_fsub_d, CODE_FOR_msa_fmul_w, CODE_FOR_msa_fmul_d)
	(CODE_FOR_msa_fdiv_w, CODE_FOR_msa_fdiv_d, CODE_FOR_msa_fmax_w)
	(CODE_FOR_msa_fmax_d, CODE_FOR_msa_fmax_a_w, CODE_FOR_msa_fmax_a_d)
	(CODE_FOR_msa_fmin_w, CODE_FOR_msa_fmin_d, CODE_FOR_msa_fmin_a_w)
	(CODE_FOR_msa_fmin_a_d, CODE_FOR_msa_fsqrt_w, CODE_FOR_msa_fsqrt_d)
	(CODE_FOR_msa_max_s_b, CODE_FOR_msa_max_s_h, CODE_FOR_msa_max_s_w)
	(CODE_FOR_msa_max_s_d, CODE_FOR_msa_max_u_b, CODE_FOR_msa_max_u_h)
	(CODE_FOR_msa_max_u_w, CODE_FOR_msa_max_u_d, CODE_FOR_msa_min_s_b)
	(CODE_FOR_msa_min_s_h, CODE_FOR_msa_min_s_w, CODE_FOR_msa_min_s_d)
	(CODE_FOR_msa_min_u_b, CODE_FOR_msa_min_u_h, CODE_FOR_msa_min_u_w)
	(CODE_FOR_msa_min_u_d, CODE_FOR_msa_mod_s_b, CODE_FOR_msa_mod_s_h)
	(CODE_FOR_msa_mod_s_w, CODE_FOR_msa_mod_s_d, CODE_FOR_msa_mod_u_b)
	(CODE_FOR_msa_mod_u_h, CODE_FOR_msa_mod_u_w, CODE_FOR_msa_mod_u_d)
	(CODE_FOR_msa_mod_s_b, CODE_FOR_msa_mod_s_h, CODE_FOR_msa_mod_s_w)
	(CODE_FOR_msa_mod_s_d, CODE_FOR_msa_mod_u_b, CODE_FOR_msa_mod_u_h)
	(CODE_FOR_msa_mod_u_w, CODE_FOR_msa_mod_u_d, CODE_FOR_msa_mulv_b)
	(CODE_FOR_msa_mulv_h, CODE_FOR_msa_mulv_w, CODE_FOR_msa_mulv_d)
	(CODE_FOR_msa_nlzc_b, CODE_FOR_msa_nlzc_h, CODE_FOR_msa_nlzc_w)
	(CODE_FOR_msa_nlzc_d, CODE_FOR_msa_nor_v, CODE_FOR_msa_or_v)
	(CODE_FOR_msa_pcnt_b, CODE_FOR_msa_pcnt_h, CODE_FOR_msa_pcnt_w)
	(CODE_FOR_msa_pcnt_d, CODE_FOR_msa_xor_v, CODE_FOR_msa_sll_b)
	(CODE_FOR_msa_sll_h, CODE_FOR_msa_sll_w, CODE_FOR_msa_sll_d)
	(CODE_FOR_msa_sra_b, CODE_FOR_msa_sra_h, CODE_FOR_msa_sra_w)
	(CODE_FOR_msa_sra_d, CODE_FOR_msa_srl_b, CODE_FOR_msa_srl_h)
	(CODE_FOR_msa_srl_w, CODE_FOR_msa_srl_d, CODE_FOR_msa_subv_b)
	(CODE_FOR_msa_subv_h, CODE_FOR_msa_subv_w, CODE_FOR_msa_subv_d)
	(CODE_FOR_msa_move_v, CODE_FOR_msa_vshf_b, CODE_FOR_msa_vshf_h)
	(CODE_FOR_msa_vshf_w, CODE_FOR_msa_vshf_d, CODE_FOR_msa_ilvod_d)
	(CODE_FOR_msa_ilvev_d, CODE_FOR_msa_pckod_d, CODE_FOR_msa_pckdev_d)
	(CODE_FOR_msa_ldi_b, CODE_FOR_msa_ldi_hi, CODE_FOR_msa_ldi_w)
	(CODE_FOR_msa_ldi_d, CODE_FOR_msa_cast_to_vector_float)
	(CODE_FOR_msa_cast_to_vector_double, CODE_FOR_msa_cast_to_scalar_float)
	(CODE_FOR_msa_cast_to_scalar_double): New code_aliasing macros.
	(mips_builtins): Add MSA sll_b, sll_h, sll_w, sll_d, slli_b, slli_h,
	slli_w, slli_d, sra_b, sra_h, sra_w, sra_d, srai_b, srai_h, srai_w,
	srai_d, srar_b, srar_h, srar_w, srar_d, srari_b, srari_h, srari_w,
	srari_d, srl_b, srl_h, srl_w, srl_d, srli_b, srli_h, srli_w, srli_d,
	srlr_b, srlr_h, srlr_w, srlr_d, srlri_b, srlri_h, srlri_w, srlri_d,
	bclr_b, bclr_h, bclr_w, bclr_d, bclri_b, bclri_h, bclri_w, bclri_d,
	bset_b, bset_h, bset_w, bset_d, bseti_b, bseti_h, bseti_w, bseti_d,
	bneg_b, bneg_h, bneg_w, bneg_d, bnegi_b, bnegi_h, bnegi_w, bnegi_d,
	binsl_b, binsl_h, binsl_w, binsl_d, binsli_b, binsli_h, binsli_w,
	binsli_d, binsr_b, binsr_h, binsr_w, binsr_d, binsri_b, binsri_h,
	binsri_w, binsri_d, addv_b, addv_h, addv_w, addv_d, addvi_b, addvi_h,
	addvi_w, addvi_d, subv_b, subv_h, subv_w, subv_d, subvi_b, subvi_h,
	subvi_w, subvi_d, max_s_b, max_s_h, max_s_w, max_s_d, maxi_s_b,
	maxi_s_h, maxi_s_w, maxi_s_d, max_u_b, max_u_h, max_u_w, max_u_d,
	maxi_u_b, maxi_u_h, maxi_u_w, maxi_u_d, min_s_b, min_s_h, min_s_w,
	min_s_d, mini_s_b, mini_s_h, mini_s_w, mini_s_d, min_u_b, min_u_h,
	min_u_w, min_u_d, mini_u_b, mini_u_h, mini_u_w, mini_u_d, max_a_b,
	max_a_h, max_a_w, max_a_d, min_a_b, min_a_h, min_a_w, min_a_d, ceq_b,
	ceq_h, ceq_w, ceq_d, ceqi_b, ceqi_h, ceqi_w, ceqi_d, clt_s_b, clt_s_h,
	clt_s_w, clt_s_d, clti_s_b, clti_s_h, clti_s_w, clti_s_d, clt_u_b,
	clt_u_h, clt_u_w, clt_u_d, clti_u_b, clti_u_h, clti_u_w, clti_u_d,
	cle_s_b, cle_s_h, cle_s_w, cle_s_d, clei_s_b, clei_s_h, clei_s_w,
	clei_s_d, cle_u_b, cle_u_h, cle_u_w, cle_u_d, clei_u_b, clei_u_h,
	clei_u_w, clei_u_d, ld_b, ld_h, ld_w, ld_d, st_b, st_h, st_w, st_d,
	sat_s_b, sat_s_h, sat_s_w, sat_s_d, sat_u_b, sat_u_h, sat_u_w, sat_u_d,
	add_a_b, add_a_h, add_a_w, add_a_d, adds_a_b, adds_a_h, adds_a_w,
	adds_a_d, adds_s_b, adds_s_h, adds_s_w, adds_s_d, adds_u_b, adds_u_h,
	adds_u_w, adds_u_d, ave_s_b, ave_s_h, ave_s_w, ave_s_d, ave_u_b,
	ave_u_h, ave_u_w, ave_u_d, aver_s_b, aver_s_h, aver_s_w, aver_s_d,
	aver_u_b, aver_u_h, aver_u_w, aver_u_d, subs_s_b, subs_s_h, subs_s_w,
	subs_s_d, subs_u_b, subs_u_h, subs_u_w, subs_u_d, subsuu_s_b,
	subsuu_s_h, subsuu_s_w, subsuu_s_d, subsus_u_b, subsus_u_h, subsus_u_w,
	subsus_u_d, asub_s_b, asub_s_h, asub_s_w, asub_s_d, asub_u_b, asub_u_h,
	asub_u_w, asub_u_d, mulv_b, mulv_h, mulv_w, mulv_d, maddv_b, maddv_h,
	maddv_w, maddv_d, msubv_b, msubv_h, msubv_w, msubv_d, div_s_b,
	div_s_h, div_s_w, div_s_d, div_u_b, div_u_h, div_u_w, div_u_d,
	hadd_s_h, hadd_s_w, hadd_s_d, hadd_u_h, hadd_u_w, hadd_u_d, hsub_s_h,
	hsub_s_w, hsub_s_d, hsub_u_h, hsub_u_w, hsub_u_d, mod_s_b, mod_s_h,
	mod_s_w, mod_s_d, mod_u_b, mod_u_h, mod_u_w, mod_u_d, dotp_s_h,
	dotp_s_w, dotp_s_d, dotp_u_h, dotp_u_w, dotp_u_d, dpadd_s_h, dpadd_s_w,
	dpadd_s_d, dpadd_u_h, dpadd_u_w, dpadd_u_d, dpsub_s_h, dpsub_s_w,
	dpsub_s_d, dpsub_u_h, dpsub_u_w, dpsub_u_d, sld_b, sld_h, sld_w, sld_d,
	sldi_b, sldi_h, sldi_w, sldi_d, splat_b, splat_h, splat_w, splat_d,
	splati_b, splati_h, splati_w, splati_d, pckev_b, pckev_h, pckev_w,
	pckev_d, pckod_b, pckod_h, pckod_w, pckod_d, ilvl_b, ilvl_h, ilvl_w
	ilvl_d, ilvr_b, ilvr_h, ilvr_w, ilvr_d, ilvev_b, ilvev_h, ilvev_w,
	ilvev_d, ilvod_b, ilvod_h, ilvod_w, ilvod_d, vshf_b, vshf_h, vshf_w,
	vshf_d, and_v, andi_b, or_v, ori_b, nor_v, nori_b, xor_v, xori_b,
	bmnz_v, bmnzi_b, bmz_v, bmzi_b, bsel_v, bseli_b, shf_b, shf_h, shf_w,
	bnz_v, bz_v, fill_b, fill_h, fill_w, fill_d, pcnt_b, pcnt_h, pcnt_w,
	pcnt_d, nloc_b, nloc_h, nloc_w, nloc_d, nlzc_b, nlzc_h, nlzc_w, nlzc_d,
	copy_s_b, copy_s_h, copy_s_w, copy_s_d, copy_u_b, copy_u_h, copy_u_w,
	copy_u_d, insert_b, insert_h, insert_w, insert_d, insve_b, insve_h,
	insve_w, insve_d, bnz_b, bnz_h, bnz_w, bnz_d, bz_b, bz_h, bz_w, bz_d,
	ldi_b, ldi_h, ldi_w, ldi_d, fcaf_w, fcaf_d, fcor_w, fcor_d, fcun_w,
	fcun_d, fcune_w, fcune_d, fcueq_w, fcueq_d, fceq_w, fceq_d, fcne_w,
	fcne_d, fclt_w, fclt_d, fcult_w, fcult_d, fcle_w, fcle_d, fcule_w,
	fcule_d, fsaf_w, fsaf_d, fsor_w, fsor_d, fsun_w, fsun_d, fsune_w,
	fsune_d, fsueq_w, fsueq_d, fseq_w, fseq_d, fsne_w, fsne_d, fslt_w,
	fslt_d,, fsult_w, fsult_d, fsle_w, fsle_d, fsule_w, fsule_d, fadd_w,
	fadd_d, fsub_w, fsub_d, fmul_w, fmul_d, fdiv_w, fdiv_d, fmadd_w,
	fmadd_d, fmsub_w, fmsub_d, fexp2_w, fexp2_d, fexdo_h, fexdo_w, ftq_h,
	ftq_w, fmin_w, fmin_d, fmin_a_w, fmin_a_d, fmax_w, fmax_d, fmax_a_w,
	fmax_a_d, mul_q_h, mul_q_w, mulr_q_h, mulr_q_w, madd_q_h, madd_q_w,
	maddr_q_h, maddr_q_w, msub_q_h, msub_q_w, msubr_q_h, msubr_q_w,
	fclass_w, fclass_d, fsqrt_w, fsqrt_d, frcp_w, frcp_d, frint_w, frint_d,
	frsqrt_w, frsqrt_d, flog2_w, flog2_d, fexupl_w, fexupl_d, fexupr_w,
	fexupr_d, ffql_w, ffql_d, ffqr_w, ffqr_d, ftint_s_w, ftint_s_d,
	ftint_u_w, ftint_u_d, ftrunc_s_w, ftrunc_s_d, ftrunc_u_w, ftrunc_u_d,
	ffint_s_w, ffint_s_d, ffint_u_w, ffint_u_d, ctcmsa, cfcmsa, move_v,
	cast_to_vector_float, cast_to_vector_double, cast_to_scalar_float,
	cast_to_scalar_double builtins.
	(mips_get_builtin_decl_index): New array.
	(MIPS_ATYPE_QI, MIPS_ATYPE_HI, MIPS_ATYPE_V2DI, MIPS_ATYPE_V4SI)
	(MIPS_ATYPE_V8HI, MIPS_ATYPE_V16QI, MIPS_ATYPE_V2DF, MIPS_ATYPE_V4SF)
	(MIPS_ATYPE_UV2DI, MIPS_ATYPE_UV4SI, MIPS_ATYPE_UV8HI)
	(MIPS_ATYPE_UV16QI): New.
	(mips_init_builtins): Initialize mips_get_builtin_decl_index array.
	(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define target hook.
	(mips_expand_builtin_insn): Swap operands for
	CODE_FOR_msa_ilv{l,r}_{b,h,w,d}, CODE_FOR_msa_{ilv,pck}{ev,od}_{b,h,w}.
	(mips_set_compression_mode): Disallow MSA with MIPS16 code.
	(mips_option_override): -mmsa requires -mfp64 and -mhard-float.  These
	are set implicitly and an error is reported if overridden.
	(MAX_VECT_LEN): Increase maximum length of a vector to 16 bytes.
	(TARGET_SCHED_REASSOCIATION_WIDTH): Define target hook.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Likewise.
	(mips_expand_vec_unpack): Add support for MSA.
	(mips_expand_vector_init): Likewise.
	(mips_expand_vi_constant): Use CONST0_RTX (element_mode) instead of
	const0_rtx.
	(mips_expand_msa_cmp): New function.
	(mips_expand_vec_cond_expr): Likewise.
	* config/mips/mips.h
	(TARGET_CPU_CPP_BUILTINS): Add __mips_msa and __mips_msa_width.
	(OPTION_DEFAULT_SPECS): Ignore --with-fp-32 if -mmsa is specified.
	(ASM_SPEC): Pass mmsa and mno-msa to the assembler.
	(ISA_HAS_MSA): New macro.
	(UNITS_PER_MSA_REG): Likewise.
	(BITS_PER_MSA_REG): Likewise.
	(MAX_FIXED_MODE_SIZE): Redefine using TARGET_MSA.
	(BIGGEST_ALIGNMENT): Likewise.
	(MSA_REG_FIRST): New macro.
	(MSA_REG_LAST): Likewise.
	(MSA_REG_NUM): Likewise.
	(MSA_REG_P): Likewise.
	(MSA_REG_RTX_P): Likewise.
	(MSA_SUPPORTED_MODE_P): Likewise.
	(HARD_REGNO_CALL_PART_CLOBBERED): Redefine using TARGET_MSA.
	(MOVE_MAX): Likewise.
	(MAX_MOVE_MAX): Redefine to 16 bytes.
	(ADDITIONAL_REGISTER_NAMES): Add named registers $w0-$w31.
	* config/mips/mips.md: Include mips-msa.md.
	(alu_type): Add simd_add.
	(mode): Add V2DI, V4SI, V8HI, V16QI, V2DF, V4SF.
	(type): Add simd_div, simd_fclass, simd_flog2, simd_fadd, simd_fcvt,
	simd_fmul, simd_fmadd, simd_fdiv, simd_bitins, simd_bitmov,
	simd_insert, simd_sld, simd_mul, simd_fcmp, simd_fexp2, simd_int_arith,
	simd_bit, simd_shift, simd_splat, simd_fill, simd_permute, simd_shf,
	simd_sat, simd_pcnt, simd_copy, simd_branch, simd_cmsa, simd_fminmax,
	simd_logic, simd_move, simd_load, simd_store.  Choose "multi" for moves
	for "qword_mode".
	(qword_mode): New attribute.
	(insn_count): Add instruction count for quad moves.  Increase the count
	for MIPS SIMD division.
	(UNITMODE): Add UNITMODEs for vector types.
	* config/mips/mips.opt (mmsa): New option.
	* config/mips/msa.h: New file.
	* config/mips/mti-elf.h: Don't infer -mfpxx if -mmsa is specified.
	* config/mips/mti-linux.h: Likewise.
	* config/mips/predicates.md
	(const_msa_branch_operand): New constraint.
	(const_uimm3_operand): Likewise.
	(const_uimm4_operand): Likewise.
	(const_uimm5_operand): Likewise.
	(const_uimm8_operand): Likewise.
	(const_imm5_operand): Likewise.
	(aq10b_operand): Likewise.
	(aq10h_operand): Likewise.
	(aq10w_operand): Likewise.
	(aq10d_operand): Likewise.
	(const_m1_operand): Likewise.
	(reg_or_m1_operand): Likewise.
	(const_exp_2_operand): Likewise.
	(const_exp_4_operand): Likewise.
	(const_exp_8_operand): Likewise.
	(const_exp_16_operand): Likewise.
	(const_vector_same_byte_operand): Likewise.
	(const_vector_same_ximm5_operand): Likewise.
	(const_vector_same_uimm5_operand): Likewise.
	(const_vector_same_uimm6_operand): Likewise.
	(const_vector_same_uimm8_operand): Likewise.
	(const_vector_same_cmpsimm4_operand): Likewise.
	(const_vector_same_cmpuimm4_operand): Likewise.
	(const_vector_same_v2di_set_operand): Likewise.
	(const_vector_same_v2di_clr_operand): Likewise.
	(const_vector_same_v4si_set_operand): Likewise.
	(const_vector_same_v4si_clr_operand): Likewise.
	(const_vector_same_v8hi_set_operand): Likewise.
	(const_vector_same_v8hi_clr_operand): Likewise.
	(const_vector_same_v16qi_set_operand): Likewise.
	(const_vector_same_v16qi_clr_operand): Likewise.
	(reg_or_vector_same_byte_operand): Likewise.
	(reg_or_vector_same_ximm5_operand): Likewise.
	(reg_or_vector_same_uimm6_operand): Likewise.
	(reg_or_vector_same_v2di_set_operand): Likewise.
	(reg_or_vector_same_v2di_clr_operand): Likewise.
	(reg_or_vector_same_v4si_set_operand): Likewise.
	(reg_or_vector_same_v4si_clr_operand): Likewise.
	(reg_or_vector_same_v8hi_set_operand): Likewise.
	(reg_or_vector_same_v8hi_clr_operand): Likewise.
	(reg_or_vector_same_v16qi_set_operand): Likewise.
	(reg_or_vector_same_v16qi_clr_operand): Likewise.
	* doc/extend.texi (MIPS SIMD Architecture Functions): New section.
	* doc/invoke.texi (-mmsa): Document new option.

[-- Attachment #2: 0001-MIPS-Add-support-for-MIPS-SIMD-Architecture-MSA.tgz --]
[-- Type: application/x-compressed, Size: 45970 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread
[parent not found: <B5E67142681B53468FAF6B7C31356562441AF59F@hhmail02.hh.imgtec.org>]
* RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)
@ 2016-01-05 16:15 Robert Suchanek
  0 siblings, 0 replies; 13+ messages in thread
From: Robert Suchanek @ 2016-01-05 16:15 UTC (permalink / raw)
  To: Matthew Fortune, Catherine_Moore; +Cc: gcc-patches

Hi,

Comments inlined.

> >+;; The attribute gives half modes for vector modes.
> >+(define_mode_attr VHMODE
> >+  [(V8HI "V16QI")
> >+   (V4SI "V8HI")
> >+   (V2DI "V4SI")
> >+   (V2DF "V4SF")])
> >+
> >+;; The attribute gives double modes for vector modes.
> >+(define_mode_attr VDMODE
> >+  [(V4SI "V2DI")
> >+   (V8HI "V4SI")
> >+   (V16QI "V8HI")])
> 
> Presumably there is a reason why this is not a mirror of VHMODE. I.e. it does
> not have floating point modes?

This is a mistake. The floating point mode in VHMODE is never used. Removed.

> >+;; The attribute gives half modes with same number of elements for vector
> modes.
> >+(define_mode_attr TRUNCMODE
> >+  [(V8HI "V8QI")
> >+   (V4SI "V4HI")
> >+   (V2DI "V2SI")])
> >+
> >+;; This attribute gives the mode of the result for "copy_s_b, copy_u_b" etc.
> >+(define_mode_attr RES
> >+  [(V2DF "DF")
> >+   (V4SF "SF")
> >+   (V2DI "DI")
> >+   (V4SI "SI")
> >+   (V8HI "SI")
> >+   (V16QI "SI")])
> 
> Verhaps prefix these with a 'V' to clarify they are vector mode attributes.

Done.


> >+;; This attribute gives define_insn suffix for MSA instructions with need
> 
> with => that need

Fixed.

> >+;; distinction between integer and floating point.
> >+(define_mode_attr msafmt_f
> >+  [(V2DF "d_f")
> >+   (V4SF "w_f")
> >+   (V2DI "d")
> >+   (V4SI "w")
> >+   (V8HI "h")
> >+   (V16QI "b")])
> >+
> >+;; To represent bitmask needed for vec_merge using "const_<bitmask>_operand".
> 
> Commenting style is different here. Everything else starts with: This attribute
> ...

Changed.
> 
> >+(define_mode_attr bitmask
> >+  [(V2DF "exp_2")
> >+   (V4SF "exp_4")
> >+   (V2DI "exp_2")
> >+   (V4SI "exp_4")
> >+   (V8HI "exp_8")
> >+   (V16QI "exp_16")])
> >+
> >+;; This attribute used to form an immediate operand constraint using
> 
> used to => is used to

Fixed.

> 
> >+;; "const_<bitimm>_operand".
> >+(define_mode_attr bitimm
> >+  [(V16QI "uimm3")
> >+   (V8HI  "uimm4")
> >+   (V4SI  "uimm5")
> >+   (V2DI  "uimm6")
> >+  ])
> >+
> 
> >+(define_expand "fixuns_trunc<FMSA:mode><mode_i>2"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(unsigned_fix:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_ftrunc_u_<msafmt> (operands[0], operands[1]));
> >+  DONE;
> >+})
> 
> The msa_ftrunc_u_* define_insns should just be renamed to use the standard
> pattern names and, more importantly, standard RTL not UNSPEC.

Renamed, define_expands removed. I also replaced FINT with VIMODE and removed
FINT mode attribute to avoid duplication.  
> 
> >+
> >+(define_expand "fix_trunc<FMSA:mode><mode_i>2"
> >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> >+	(fix:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_ftrunc_s_<msafmt> (operands[0], operands[1]));
> >+  DONE;
> >+})
> 
> Likewise.
> 
> >+
> >+(define_expand "vec_pack_trunc_v2df"
> >+  [(set (match_operand:V4SF 0 "register_operand")
> >+	(vec_concat:V4SF
> >+	  (float_truncate:V2SF (match_operand:V2DF 1 "register_operand"))
> >+	  (float_truncate:V2SF (match_operand:V2DF 2 "register_operand"))))]
> >+  "ISA_HAS_MSA"
> >+  "")
> 
> Rename msa_fexdo_w to vec_pack_trunc_v2df.
> 
> I see that fexdo has a 'halfword' variant which creates a half-float. What
> else can operate on half-float and should this really be recorded as an
> HFmode?
> 
> >+(define_expand "vec_unpacks_hi_v4sf"
> >+  [(set (match_operand:V2DF 0 "register_operand" "=f")
> >+	(float_extend:V2DF
> >+	  (vec_select:V2SF
> >+	    (match_operand:V4SF 1 "register_operand" "f")
> >+	    (parallel [(const_int 0) (const_int 1)])
> 
> If we swap the (parallel) for a match_operand 2...
> 
> >+	  )))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (BYTES_BIG_ENDIAN)
> >+    emit_insn (gen_msa_fexupr_d (operands[0], operands[1]));
> >+  else
> >+    emit_insn (gen_msa_fexupl_d (operands[0], operands[1]));
> 
> Then these two could change to set up operands[2] with either
> a parallel of 0/1 or 2/3 and then...
> 
> You could change the fexupr_d and fexupl_d insn patterns to use normal RTL
> that select the appropriate elements (either 0/1 and 2/3).
> 
> >+  DONE;
> 
> Which means the RTL in the pattern would be used to expand this and
> you would remove the DONE. As it stands the pattern on this expand
> is simply never used.

Reworked. The parallel expression for operands[2] is now generated.

> 
> >+})
> >+
> >+(define_expand "vec_unpacks_lo_v4sf"
> >+  [(set (match_operand:V2DF 0 "register_operand" "=f")
> >+	(float_extend:V2DF
> >+	  (vec_select:V2SF
> >+	    (match_operand:V4SF 1 "register_operand" "f")
> >+	    (parallel [(const_int 0) (const_int 1)])
> >+	  )))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if (BYTES_BIG_ENDIAN)
> >+    emit_insn (gen_msa_fexupl_d (operands[0], operands[1]));
> >+  else
> >+    emit_insn (gen_msa_fexupr_d (operands[0], operands[1]));
> >+  DONE;
> >+})
> 
> Likewise but inverted.

As above.

> 
> >+
> >+(define_expand "vec_unpacks_hi_<mode>"
> >+  [(set (match_operand:<VDMODE> 0 "register_operand")
> >+	(match_operand:IMSA_WHB 1 "register_operand"))]
> 
> Not much point in having the (set) in here as it would be illegal anyway
> if it were actually expanded. Just list the two operands. (same throughout)
>

Done.
 
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_vec_unpack (operands, false/*unsigned_p*/, true/*high_p*/);
> >+  DONE;
> >+})
> >+
> >+(define_expand "vec_unpacks_lo_<mode>"
> >+  [(set (match_operand:<VDMODE> 0 "register_operand")
> >+	(match_operand:IMSA_WHB 1 "register_operand"))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_vec_unpack (operands, false/*unsigned_p*/, false/*high_p*/);
> >+  DONE;
> >+})
> >+
> >+(define_expand "vec_unpacku_hi_<mode>"
> >+  [(set (match_operand:<VDMODE> 0 "register_operand")
> >+	(match_operand:IMSA_WHB 1 "register_operand"))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_vec_unpack (operands, true/*unsigned_p*/, true/*high_p*/);
> >+  DONE;
> >+})
> >+
> >+(define_expand "vec_unpacku_lo_<mode>"
> >+  [(set (match_operand:<VDMODE> 0 "register_operand")
> >+	(match_operand:IMSA_WHB 1 "register_operand"))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  mips_expand_vec_unpack (operands, true/*unsigned_p*/, false/*high_p*/);
> >+  DONE;
> >+})
> >+
> 
> >+(define_expand "vec_set<mode>"
> >+  [(match_operand:IMSA 0 "register_operand")
> >+   (match_operand:<UNITMODE> 1 "reg_or_0_operand")
> >+   (match_operand 2 "const_<indeximm>_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_insert_<msafmt>_insn (operands[0], operands[1],
> >+					   operands[0],
> >+					   GEN_INT(1 << INTVAL (operands[2]))));
> >+  DONE;
> >+})
> >+
> >+(define_expand "vec_set<mode>"
> >+  [(match_operand:FMSA 0 "register_operand")
> >+   (match_operand:<UNITMODE> 1 "register_operand")
> >+   (match_operand 2 "const_<indeximm>_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_insve_<msafmt_f>_s (operands[0], operands[0],
> >+					 GEN_INT(1 << INTVAL (operands[2])),
> >+					 operands[1]));
> >+  DONE;
> >+})
> >+
> >+(define_expand "vcondu<MSA:mode><IMSA:mode>"
> >+  [(set (match_operand:MSA 0 "register_operand")
> >+	(if_then_else:MSA
> >+	  (match_operator 3 ""
> >+	    [(match_operand:IMSA 4 "register_operand")
> >+	     (match_operand:IMSA 5 "register_operand")])
> >+	  (match_operand:MSA 1 "reg_or_m1_operand")
> >+	  (match_operand:MSA 2 "reg_or_0_operand")))]
> >+  "ISA_HAS_MSA
> >+   && (GET_MODE_NUNITS (<MSA:MODE>mode) == GET_MODE_NUNITS
> (<IMSA:MODE>mode))"
> >+{
> >+  mips_expand_vec_cond_expr (<MSA:MODE>mode, <MSA:VIMODE>mode, operands);
> >+  DONE;
> >+})
> >+
> >+(define_expand "vcond<MSA:mode><MSA_2:mode>"
> >+  [(set (match_operand:MSA 0 "register_operand")
> >+	(if_then_else:MSA
> >+	  (match_operator 3 ""
> >+	    [(match_operand:MSA_2 4 "register_operand")
> >+	     (match_operand:MSA_2 5 "register_operand")])
> >+	  (match_operand:MSA 1 "reg_or_m1_operand")
> >+	  (match_operand:MSA 2 "reg_or_0_operand")))]
> >+  "ISA_HAS_MSA
> >+   && (GET_MODE_NUNITS (<MSA:MODE>mode) == GET_MODE_NUNITS (<MSA:MODE>mode))"
> 
> Bug: This compares MSA to MSA. the second should be MSA_2.

Fixed.

> 
> >+{
> >+  mips_expand_vec_cond_expr (<MSA:MODE>mode, <MSA:VIMODE>mode, operands);
> >+  DONE;
> >+})
> >+
> >+;; Note used directly by builtins but via the following define_expand.
> >+(define_insn "msa_insert_<msafmt>_insn"
> >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> >+	(vec_merge:IMSA
> >+	  (vec_duplicate:IMSA
> >+	    (match_operand:<UNITMODE> 1 "reg_or_0_operand" "dJ"))
> >+	  (match_operand:IMSA 2 "register_operand" "0")
> >+	  (match_operand 3 "const_<bitmask>_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "insert.<msafmt>\t%w0[%y3],%z1"
> >+  [(set_attr "type" "simd_insert")
> >+   (set_attr "mode" "<MODE>")])
> 
> Rename to remove _insn, see below. V2DI mode should have # as its pattern
> for 32-bit targets to ensure it gets split

Done.

> 
> >+
> >+;; Expand builtin for HImode and QImode which takes SImode.
> >+(define_expand "msa_insert_<msafmt>"
> >+  [(match_operand:IMSA 0 "register_operand")
> >+   (match_operand:IMSA 1 "register_operand")
> >+   (match_operand 2 "const_<indeximm>_operand")
> >+   (match_operand:<RES> 3 "reg_or_0_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  if ((GET_MODE_SIZE (<UNITMODE>mode) < GET_MODE_SIZE (<RES>mode))
> >+      && (REG_P (operands[3]) || (GET_CODE (operands[3]) == SUBREG
> >+				  && REG_P (SUBREG_REG (operands[3])))))
> >+    operands[3] = lowpart_subreg (<UNITMODE>mode, operands[3], <RES>mode);
> >+  emit_insn (gen_msa_insert_<msafmt>_insn (operands[0], operands[3],
> >+					   operands[1],
> >+					   GEN_INT(1 << INTVAL (operands[2]))));
> >+  DONE;
> >+})
> >+
> 
> Lets do this during mips_expand_builtin_insn like ilvl and friends. Having
> expanders
> simply to map the builtins to real instructions doesn't seem very useful.

Done.
> 
> >+(define_expand "msa_insert_<msafmt_f>"
> >+  [(match_operand:FMSA 0 "register_operand")
> >+   (match_operand:FMSA 1 "register_operand")
> >+   (match_operand 2 "const_<indeximm>_operand")
> >+   (match_operand:<UNITMODE> 3 "reg_or_0_operand")]
> >+  "ISA_HAS_MSA"
> >+{
> >+  emit_insn (gen_msa_insert_<msafmt_f>_insn (operands[0], operands[3],
> >+					     operands[1],
> >+					     GEN_INT(1 << INTVAL (operands[2]))));
> >+  DONE;
> >+})
> 
> Likewise.
> 
> >+
> >+(define_insn "msa_insert_<msafmt_f>_insn"
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(vec_merge:FMSA
> >+	  (vec_duplicate:FMSA
> >+	    (match_operand:<UNITMODE> 1 "register_operand" "d"))
> >+	  (match_operand:FMSA 2 "register_operand" "0")
> >+	  (match_operand 3 "const_<bitmask>_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "insert.<msafmt>\t%w0[%y3],%z1"
> >+  [(set_attr "type" "simd_insert")
> >+   (set_attr "mode" "<MODE>")])
> 
> Rename to remove _insn, see above. V2DF mode should have # for 32-bit targets
> to
> ensure it gets split.

Likewise.
> 
> >+
> >+(define_split
> >+  [(set (match_operand:MSA_D 0 "register_operand")
> >+	(vec_merge:MSA_D
> >+	  (vec_duplicate:MSA_D
> >+	    (match_operand:<UNITMODE> 1 "<MSA_D:msa_d>_operand"))
> >+	  (match_operand:MSA_D 2 "register_operand")
> >+	  (match_operand 3 "const_<bitmask>_operand")))]
> >+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> >+  [(const_int 0)]
> >+{
> >+  mips_split_msa_insert_d (operands[0], operands[2], operands[3],
> operands[1]);
> >+  DONE;
> >+})
> 
> ...
> 
> >+(define_expand "msa_insve_<msafmt_f>"
> >+  [(set (match_operand:MSA 0 "register_operand")
> >+	(vec_merge:MSA
> >+	  (vec_duplicate:MSA
> >+	    (vec_select:<UNITMODE>
> >+	      (match_operand:MSA 3 "register_operand")
> >+	      (parallel [(const_int 0)])))
> >+	  (match_operand:MSA 1 "register_operand")
> >+	  (match_operand 2 "const_<indeximm>_operand")))]
> >+  "ISA_HAS_MSA"
> >+{
> >+  operands[2] = GEN_INT (1 << INTVAL (operands[2]));
> >+})
> 
> Like for insert patterns do this in mips_expand_builtin_insn and rename the
> instruction below to remove _insn.

Done.
> 
> >+(define_insn "msa_insve_<msafmt_f>_insn"
> >+  [(set (match_operand:MSA 0 "register_operand" "=f")
> >+	(vec_merge:MSA
> >+	  (vec_duplicate:MSA
> >+	    (vec_select:<UNITMODE>
> >+	      (match_operand:MSA 3 "register_operand" "f")
> >+	      (parallel [(const_int 0)])))
> >+	  (match_operand:MSA 1 "register_operand" "0")
> >+	  (match_operand 2 "const_<bitmask>_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "insve.<msafmt>\t%w0[%y2],%w3[0]"
> >+  [(set_attr "type" "simd_insert")
> >+   (set_attr "mode" "<MODE>")])
> >+
> >+;; Operand 3 is a scalar.
> >+(define_insn "msa_insve_<msafmt>_f_s"
> 
> It would be more clear to have <msafmt_f> instead of <msafmt>_f. The 's' here
> is for scalar I believe. Perhaps spell it out and put scalar.

OK.
> 
> >+  [(set (match_operand:FMSA 0 "register_operand" "=f")
> >+	(vec_merge:FMSA
> >+	  (vec_duplicate:FMSA
> >+	    (match_operand:<UNITMODE> 3 "register_operand" "f"))
> >+	  (match_operand:FMSA 1 "register_operand" "0")
> >+	  (match_operand 2 "const_<bitmask>_operand" "")))]
> >+  "ISA_HAS_MSA"
> >+  "insve.<msafmt>\t%w0[%y2],%w3[0]"
> >+  [(set_attr "type" "simd_insert")
> >+   (set_attr "mode" "<MODE>")])
> 
> >+;; Note that copy_s.d and copy_s.d_f will be split later if !TARGET_64BIT.
> >+(define_insn "msa_copy_s_<msafmt_f>"
> >+  [(set (match_operand:<RES> 0 "register_operand" "=d")
> >+	(sign_extend:<RES>
> >+	  (vec_select:<UNITMODE>
> >+	    (match_operand:MSA 1 "register_operand" "f")
> >+	    (parallel [(match_operand 2 "const_<indeximm>_operand" "")]))))]
> >+  "ISA_HAS_MSA"
> >+  "copy_s.<msafmt>\t%0,%w1[%2]"
> >+  [(set_attr "type" "simd_copy")
> >+   (set_attr "mode" "<MODE>")])
> 
> I think the splits should be explicit and therefore generate # for the the
> two patterns that will be split for !TARGET_64BIT. The sign_extend should
> only be present for V8HI and V16QI modes. There could be value in adding
> widening patterns to extend to DImode on 64-bit targets but it may not
> trigger much.
> 
> >+(define_split
> >+  [(set (match_operand:<UNITMODE> 0 "register_operand")
> >+	(sign_extend:<UNITMODE>
> >+	  (vec_select:<UNITMODE>
> >+	    (match_operand:MSA_D 1 "register_operand")
> >+	    (parallel [(match_operand 2 "const_0_or_1_operand")]))))]
> >+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> >+  [(const_int 0)]
> >+{
> >+  mips_split_msa_copy_d (operands[0], operands[1], operands[2],
> >+			 gen_msa_copy_s_w);
> >+  DONE;
> >+})
> >+
> >+;; Note that copy_u.d and copy_u.d_f will be split later if !TARGET_64BIT.
> >+(define_insn "msa_copy_u_<msafmt_f>"
> >+  [(set (match_operand:<RES> 0 "register_operand" "=d")
> >+	(zero_extend:<RES>
> >+	  (vec_select:<UNITMODE>
> >+	    (match_operand:MSA 1 "register_operand" "f")
> >+	    (parallel [(match_operand 2 "const_<indeximm>_operand" "")]))))]
> >+  "ISA_HAS_MSA"
> >+  "copy_u.<msafmt>\t%0,%w1[%2]"
> >+  [(set_attr "type" "simd_copy")
> >+   (set_attr "mode" "<MODE>")])
> 
> Likewise on all accounts except that V2SI mode should not be included at all
> here as we need to use the copy_s for V2SI->SImode on 64-bit targets to get
> the correct canonical result.
> 
> >+(define_split
> >+  [(set (match_operand:<UNITMODE> 0 "register_operand")
> >+	(zero_extend:<UNITMODE>
> >+	  (vec_select:<UNITMODE>
> >+	    (match_operand:MSA_D 1 "register_operand")
> >+	    (parallel [(match_operand 2 "const_0_or_1_operand")]))))]
> >+  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> >+  [(const_int 0)]
> >+{
> >+  mips_split_msa_copy_d (operands[0], operands[1], operands[2],
> >+			 gen_msa_copy_u_w);
> 
> This should use copy_s so that when the 32-bit code is used on a 64-bit arch
> then the data stays in canonical form. The copy_u should never be used on
> a 32-bit target, I don't think it should even exist in the MSA32 spec
> actually. I am discussing this with our architecture team and will update
> the thread with the outcome.
> 
> >+  DONE;
> >+})

Refactored throughout.

> 
> = mips.c =
> 
> >+/* Expand VEC_COND_EXPR, where:
> >+   MODE is mode of the result
> >+   VIMODE equivalent integer mode
> >+   OPERANDS operands of VEC_COND_EXPR.  */
> >+
> >+void
> >+mips_expand_vec_cond_expr (machine_mode mode, machine_mode vimode,
> >+			   rtx *operands)
> >+{
> >+  rtx cond = operands[3];
> >+  rtx cmp_op0 = operands[4];
> >+  rtx cmp_op1 = operands[5];
> >+  rtx cmp_res = gen_reg_rtx (vimode);
> >+
> >+  mips_expand_msa_cmp (cmp_res, GET_CODE (cond), cmp_op0, cmp_op1);
> >+
> >+  /* We handle the following cases:
> >+     1) r = a CMP b ? -1 : 0
> >+     2) r = a CMP b ? -1 : v
> >+     3) r = a CMP b ?  v : 0
> >+     4) r = a CMP b ? v1 : v2  */
> >+
> >+  /* Case (1) above.  We only move the results.  */
> >+  if (operands[1] == CONSTM1_RTX (vimode)
> >+      && operands[2] == CONST0_RTX (vimode))
> >+    emit_move_insn (operands[0], cmp_res);
> >+  else
> >+    {
> >+      rtx src1 = gen_reg_rtx (vimode);
> >+      rtx src2 = gen_reg_rtx (vimode);
> >+      rtx mask = gen_reg_rtx (vimode);
> >+      rtx bsel;
> >+
> >+      /* Move the vector result to use it as a mask.  */
> >+      emit_move_insn (mask, cmp_res);
> >+
> >+      if (register_operand (operands[1], mode))
> >+	{
> >+	  rtx xop1 = operands[1];
> >+	  if (mode != vimode)
> >+	    {
> >+	      xop1 = gen_reg_rtx (vimode);
> >+	      emit_move_insn (xop1, gen_rtx_SUBREG (vimode, operands[1], 0));
> >+	    }
> >+	  emit_move_insn (src1, xop1);
> >+	}
> >+      else
> >+	/* Case (2) if the below doesn't move the mask to src2.  */
> >+	emit_move_insn (src1, mask);
> 
> Please assert that operands[1] is constm1.

Added.

> 
> >+
> >+      if (register_operand (operands[2], mode))
> >+	{
> >+	  rtx xop2 = operands[2];
> >+	  if (mode != vimode)
> >+	    {
> >+	      xop2 = gen_reg_rtx (vimode);
> >+	      emit_move_insn (xop2, gen_rtx_SUBREG (vimode, operands[2], 0));
> >+	    }
> >+	  emit_move_insn (src2, xop2);
> >+	}
> >+      else
> >+	/* Case (3) if the above didn't move the mask to src1.  */
> >+	emit_move_insn (src2, mask);
> 
> Please assert that operands[2] is const0.

Done.
> 
> >+
> >+      /* We deal with case (4) if the mask wasn't moved to either src1 or
> src2.
> >+	 In any case, we eventually do vector mask-based copy.  */
> >+      bsel = gen_rtx_UNSPEC (vimode, gen_rtvec (3, mask, src2, src1),
> >+			     UNSPEC_MSA_BSEL_V);
> >+      /* The result is placed back to a register with the mask.  */
> >+      emit_insn (gen_rtx_SET (mask, bsel));
> 
> I guess you expand like this instead of gen_ to avoid having to select the
> function
> based on vimode.

Correct. I thought it would be slightly cleaner to expand this way.

> 
> >+      emit_move_insn (operands[0], gen_rtx_SUBREG (mode, mask, 0));
> >+    }
> >+}
> >+
> 
> Thanks,
> Matthew

Regards,
Robert

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-05-09 12:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-10 12:22 [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA) Robert Suchanek
2015-08-27 13:03 ` Matthew Fortune
2016-01-05 16:15   ` Robert Suchanek
2016-01-05 16:16 ` Robert Suchanek
2016-04-04 22:22   ` Matthew Fortune
2016-05-05 15:13     ` Robert Suchanek
2016-05-06 15:04       ` Matthew Fortune
2016-05-09 12:22         ` Robert Suchanek
     [not found] <B5E67142681B53468FAF6B7C31356562441AF59F@hhmail02.hh.imgtec.org>
2015-09-13  9:56 ` Matthew Fortune
2015-10-09 14:45   ` Matthew Fortune
2016-01-05 16:16     ` Robert Suchanek
2016-01-11 13:26       ` Matthew Fortune
2016-01-05 16:15 Robert Suchanek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).