public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types
@ 2021-10-22 14:48 Jonathan Wright
  2021-10-22 15:13 ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Wright @ 2021-10-22 14:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 30028 bytes --]

Hi,

Until now, GCC has used large integer machine modes (OI, CI and XI)
to model Neon vector-tuple types. This is suboptimal for many
reasons, the most notable are:

 1) Large integer modes are opaque and modifying one vector in the
    tuple requires a lot of inefficient set/get gymnastics. The
    result is a lot of superfluous move instructions.
 2) Large integer modes do not map well to types that are tuples of
    64-bit vectors - we need additional zero-padding which again
    results in superfluous move instructions.

This patch adds new machine modes that better model the C-level Neon
vector-tuple types. The approach is somewhat similar to that already
used for SVE vector-tuple types.

All of the AArch64 backend patterns and builtins that manipulate Neon
vector tuples are updated to use the new machine modes. This has the
effect of significantly reducing the amount of boiler-plate code in
the arm_neon.h header.

While this patch increases the quality of code generated in many
instances, there is still room for significant improvement - which
will be attempted in subsequent patches.

Bootstrapped and regression tested on aarch64-none-linux-gnu and
aarch64_be-none-linux-gnu - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-08-09  Jonathan Wright  <jonathan.wright@arm.com>
            Richard Sandiford  <richard.sandiford@arm.com>

	* config/aarch64/aarch64-builtins.c (v2x8qi_UP): Define.
	(v2x4hi_UP): Likewise.
	(v2x4hf_UP): Likewise.
	(v2x4bf_UP): Likewise.
	(v2x2si_UP): Likewise.
	(v2x2sf_UP): Likewise.
	(v2x1di_UP): Likewise.
	(v2x1df_UP): Likewise.
	(v2x16qi_UP): Likewise.
	(v2x8hi_UP): Likewise.
	(v2x8hf_UP): Likewise.
	(v2x8bf_UP): Likewise.
	(v2x4si_UP): Likewise.
	(v2x4sf_UP): Likewise.
	(v2x2di_UP): Likewise.
	(v2x2df_UP): Likewise.
	(v3x8qi_UP): Likewise.
	(v3x4hi_UP): Likewise.
	(v3x4hf_UP): Likewise.
	(v3x4bf_UP): Likewise.
	(v3x2si_UP): Likewise.
	(v3x2sf_UP): Likewise.
	(v3x1di_UP): Likewise.
	(v3x1df_UP): Likewise.
	(v3x16qi_UP): Likewise.
	(v3x8hi_UP): Likewise.
	(v3x8hf_UP): Likewise.
	(v3x8bf_UP): Likewise.
	(v3x4si_UP): Likewise.
	(v3x4sf_UP): Likewise.
	(v3x2di_UP): Likewise.
	(v3x2df_UP): Likewise.
	(v4x8qi_UP): Likewise.
	(v4x4hi_UP): Likewise.
	(v4x4hf_UP): Likewise.
	(v4x4bf_UP): Likewise.
	(v4x2si_UP): Likewise.
	(v4x2sf_UP): Likewise.
	(v4x1di_UP): Likewise.
	(v4x1df_UP): Likewise.
	(v4x16qi_UP): Likewise.
	(v4x8hi_UP): Likewise.
	(v4x8hf_UP): Likewise.
	(v4x8bf_UP): Likewise.
	(v4x4si_UP): Likewise.
	(v4x4sf_UP): Likewise.
	(v4x2di_UP): Likewise.
	(v4x2df_UP): Likewise.
	(TYPES_GETREGP): Delete.
	(TYPES_SETREGP): Likewise.
	(TYPES_LOADSTRUCT_U): Define.
	(TYPES_LOADSTRUCT_P): Likewise.
	(TYPES_LOADSTRUCT_LANE_U): Likewise.
	(TYPES_LOADSTRUCT_LANE_P): Likewise.
	(TYPES_STORE1P): Move for consistency.
	(TYPES_STORESTRUCT_U): Define.
	(TYPES_STORESTRUCT_P): Likewise.
	(TYPES_STORESTRUCT_LANE_U): Likewise.
	(TYPES_STORESTRUCT_LANE_P): Likewise.
	(aarch64_simd_tuple_types): Define.
	(aarch64_lookup_simd_builtin_type): Handle tuple type lookup.
	(aarch64_init_simd_builtin_functions): Update frontend lookup
	for builtin functions after handling arm_neon.h pragma.
	(register_tuple_type): Manually set modes of single-integer
	tuple types. Record tuple types.
	* config/aarch64/aarch64-modes.def
	(ADV_SIMD_D_REG_STRUCT_MODES): Define D-register tuple modes.
	(ADV_SIMD_Q_REG_STRUCT_MODES): Define Q-register tuple modes.
	(SVE_MODES): Give single-vector modes priority over vector-
	tuple modes.
	(VECTOR_MODES_WITH_PREFIX): Set partial-vector mode order to
	be after all single-vector modes.
	* config/aarch64/aarch64-simd-builtins.def: Update builtin
	generator macros to reflect modifications to the backend
	patterns.
	* config/aarch64/aarch64-simd.md (aarch64_simd_ld2<mode>):
	Use vector-tuple mode iterator and rename to...
	(aarch64_simd_ld2<vstruct_elt>): This.
	(aarch64_simd_ld2r<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_simd_ld2r<vstruct_elt>): This.
	(aarch64_vec_load_lanesoi_lane<mode>): Use vector-tuple mode
	iterator and rename to...
	(aarch64_vec_load_lanes<mode>_lane<vstruct_elt>): This.
	(vec_load_lanesoi<mode>): Use vector-tuple mode iterator and
	rename to...
	(vec_load_lanes<mode><vstruct_elt>): This.
	(aarch64_simd_st2<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_simd_st2<vstruct_elt>): This.
	(aarch64_vec_store_lanesoi_lane<mode>): Use vector-tuple mode
	iterator and rename to...
	(aarch64_vec_store_lanes<mode>_lane<vstruct_elt>): This.
	(vec_store_lanesoi<mode>): Use vector-tuple mode iterator and
	rename to...
	(vec_store_lanes<mode><vstruct_elt>): This.
	(aarch64_simd_ld3<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_simd_ld3<vstruct_elt>): This.
	(aarch64_simd_ld3r<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_simd_ld3r<vstruct_elt>): This.
	(aarch64_vec_load_lanesci_lane<mode>): Use vector-tuple mode
	iterator and rename to...
	(vec_load_lanesci<mode>): This.
	(aarch64_simd_st3<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_simd_st3<vstruct_elt>): This.
	(aarch64_vec_store_lanesci_lane<mode>): Use vector-tuple mode
	iterator and rename to...
	(vec_store_lanesci<mode>): This.
	(aarch64_simd_ld4<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_simd_ld4<vstruct_elt>): This.
	(aarch64_simd_ld4r<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_simd_ld4r<vstruct_elt>): This.
	(aarch64_vec_load_lanesxi_lane<mode>): Use vector-tuple mode
	iterator and rename to...
	(vec_load_lanesxi<mode>): This.
	(aarch64_simd_st4<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_simd_st4<vstruct_elt>): This.
	(aarch64_vec_store_lanesxi_lane<mode>): Use vector-tuple mode
	iterator and rename to...
	(vec_store_lanesxi<mode>): This.
	(mov<mode>): Define for Neon vector-tuple modes.
	(aarch64_ld1x3<VALLDIF:mode>): Use vector-tuple mode iterator
	and rename to...
	(aarch64_ld1x3<vstruct_elt>): This.
	(aarch64_ld1_x3_<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_ld1_x3_<vstruct_elt>): This.
	(aarch64_ld1x4<VALLDIF:mode>): Use vector-tuple mode iterator
	and rename to...
	(aarch64_ld1x4<vstruct_elt>): This.
	(aarch64_ld1_x4_<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_ld1_x4_<vstruct_elt>): This.
	(aarch64_st1x2<VALLDIF:mode>): Use vector-tuple mode iterator
	and rename to...
	(aarch64_st1x2<vstruct_elt>): This.
	(aarch64_st1_x2_<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_st1_x2_<vstruct_elt>): This.
	(aarch64_st1x3<VALLDIF:mode>): Use vector-tuple mode iterator
	and rename to...
	(aarch64_st1x3<vstruct_elt>): This.
	(aarch64_st1_x3_<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_st1_x3_<vstruct_elt>): This.
	(aarch64_st1x4<VALLDIF:mode>): Use vector-tuple mode iterator
	and rename to...
	(aarch64_st1x4<vstruct_elt>): This.
	(aarch64_st1_x4_<mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_st1_x4_<vstruct_elt>): This.
	(*aarch64_mov<mode>): Define for vector-tuple modes.
	(*aarch64_be_mov<mode>): Likewise.
	(aarch64_ld<VSTRUCT:nregs>r<VALLDIF:mode>): Use vector-tuple
	mode iterator and rename to...
	(aarch64_ld<nregs>r<vstruct_elt>): This.
	(aarch64_ld2<mode>_dreg): Use vector-tuple mode iterator and
	rename to...
	(aarch64_ld2<vstruct_elt>_dreg): This.
	(aarch64_ld3<mode>_dreg): Use vector-tuple mode iterator and
	rename to...
	(aarch64_ld3<vstruct_elt>_dreg): This.
	(aarch64_ld4<mode>_dreg): Use vector-tuple mode iterator and
	rename to...
	(aarch64_ld4<vstruct_elt>_dreg): This.
	(aarch64_ld<VSTRUCT:nregs><VDC:mode>): Use vector-tuple mode
	iterator and rename to...
	(aarch64_ld<nregs><vstruct_elt>): Use vector-tuple mode
	iterator and rename to...
	(aarch64_ld<VSTRUCT:nregs><VQ:mode>): Use vector-tuple mode
	(aarch64_ld1x2<VQ:mode>): Delete.
	(aarch64_ld1x2<VDC:mode>): Use vector-tuple mode iterator and
	rename to...
	(aarch64_ld1x2<vstruct_elt>): This.
	(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use vector-
	tuple mode iterator and rename to...
	(aarch64_ld<nregs>_lane<vstruct_elt>): This.
	(aarch64_get_dreg<VSTRUCT:mode><VDC:mode>): Delete.
	(aarch64_get_qreg<VSTRUCT:mode><VQ:mode>): Likewise.
	(aarch64_st2<mode>_dreg): Use vector-tuple mode iterator and
	rename to...
	(aarch64_st2<vstruct_elt>_dreg): This.
	(aarch64_st3<mode>_dreg): Use vector-tuple mode iterator and
	rename to...
	(aarch64_st3<vstruct_elt>_dreg): This.
	(aarch64_st4<mode>_dreg): Use vector-tuple mode iterator and
	rename to...
	(aarch64_st4<vstruct_elt>_dreg): This.
	(aarch64_st<VSTRUCT:nregs><VDC:mode>): Use vector-tuple mode
	iterator and rename to...
	(aarch64_st<nregs><vstruct_elt>): This.
	(aarch64_st<VSTRUCT:nregs><VQ:mode>): Use vector-tuple mode
	iterator and rename to aarch64_st<nregs><vstruct_elt>.
	(aarch64_st<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use vector-
	tuple mode iterator and rename to...
	(aarch64_st<nregs>_lane<vstruct_elt>): This.
	(aarch64_set_qreg<VSTRUCT:mode><VQ:mode>): Delete.
	(aarch64_simd_ld1<mode>_x2): Use vector-tuple mode iterator
	and rename to...
	(aarch64_simd_ld1<vstruct_elt>_x2): This.
	* config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p):
	Refactor to include new vector-tuple modes.
	(aarch64_classify_vector_mode): Add cases for new vector-
	tuple modes.
	(aarch64_advsimd_partial_struct_mode_p): Define.
	(aarch64_advsimd_full_struct_mode_p): Likewise.
	(aarch64_advsimd_vector_array_mode): Likewise.
	(aarch64_sve_data_mode): Change location in file.
	(aarch64_array_mode): Handle case of Neon vector-tuple modes.
	(aarch64_hard_regno_nregs): Handle case of partial Neon
	vector structures.
	(aarch64_classify_address): Refactor to include handling of
	Neon vector-tuple modes.
	(aarch64_print_operand): Print "d" for "%R" for a partial
	Neon vector structure.
	(aarch64_expand_vec_perm_1): Use new vector-tuple mode.
	(aarch64_modes_tieable_p): Prevent tieing Neon partial struct
	modes with scalar machines modes larger than 8 bytes.
	(aarch64_can_change_mode_class): Don't allow changes between
	partial and full Neon vector-structure modes.
	* config/aarch64/arm_neon.h (vst2_lane_f16): Use updated
	builtin and remove boiler-plate code for opaque mode.
	(vst2_lane_f32): Likewise.
	(vst2_lane_f64): Likewise.
	(vst2_lane_p8): Likewise.
	(vst2_lane_p16): Likewise.
	(vst2_lane_p64): Likewise.
	(vst2_lane_s8): Likewise.
	(vst2_lane_s16): Likewise.
	(vst2_lane_s32): Likewise.
	(vst2_lane_s64): Likewise.
	(vst2_lane_u8): Likewise.
	(vst2_lane_u16): Likewise.
	(vst2_lane_u32): Likewise.
	(vst2_lane_u64): Likewise.
	(vst2q_lane_f16): Likewise.
	(vst2q_lane_f32): Likewise.
	(vst2q_lane_f64): Likewise.
	(vst2q_lane_p8): Likewise.
	(vst2q_lane_p16): Likewise.
	(vst2q_lane_p64): Likewise.
	(vst2q_lane_s8): Likewise.
	(vst2q_lane_s16): Likewise.
	(vst2q_lane_s32): Likewise.
	(vst2q_lane_s64): Likewise.
	(vst2q_lane_u8): Likewise.
	(vst2q_lane_u16): Likewise.
	(vst2q_lane_u32): Likewise.
	(vst2q_lane_u64): Likewise.
	(vst3_lane_f16): Likewise.
	(vst3_lane_f32): Likewise.
	(vst3_lane_f64): Likewise.
	(vst3_lane_p8): Likewise.
	(vst3_lane_p16): Likewise.
	(vst3_lane_p64): Likewise.
	(vst3_lane_s8): Likewise.
	(vst3_lane_s16): Likewise.
	(vst3_lane_s32): Likewise.
	(vst3_lane_s64): Likewise.
	(vst3_lane_u8): Likewise.
	(vst3_lane_u16): Likewise.
	(vst3_lane_u32): Likewise.
	(vst3_lane_u64): Likewise.
	(vst3q_lane_f16): Likewise.
	(vst3q_lane_f32): Likewise.
	(vst3q_lane_f64): Likewise.
	(vst3q_lane_p8): Likewise.
	(vst3q_lane_p16): Likewise.
	(vst3q_lane_p64): Likewise.
	(vst3q_lane_s8): Likewise.
	(vst3q_lane_s16): Likewise.
	(vst3q_lane_s32): Likewise.
	(vst3q_lane_s64): Likewise.
	(vst3q_lane_u8): Likewise.
	(vst3q_lane_u16): Likewise.
	(vst3q_lane_u32): Likewise.
	(vst3q_lane_u64): Likewise.
	(vst4_lane_f16): Likewise.
	(vst4_lane_f32): Likewise.
	(vst4_lane_f64): Likewise.
	(vst4_lane_p8): Likewise.
	(vst4_lane_p16): Likewise.
	(vst4_lane_p64): Likewise.
	(vst4_lane_s8): Likewise.
	(vst4_lane_s16): Likewise.
	(vst4_lane_s32): Likewise.
	(vst4_lane_s64): Likewise.
	(vst4_lane_u8): Likewise.
	(vst4_lane_u16): Likewise.
	(vst4_lane_u32): Likewise.
	(vst4_lane_u64): Likewise.
	(vst4q_lane_f16): Likewise.
	(vst4q_lane_f32): Likewise.
	(vst4q_lane_f64): Likewise.
	(vst4q_lane_p8): Likewise.
	(vst4q_lane_p16): Likewise.
	(vst4q_lane_p64): Likewise.
	(vst4q_lane_s8): Likewise.
	(vst4q_lane_s16): Likewise.
	(vst4q_lane_s32): Likewise.
	(vst4q_lane_s64): Likewise.
	(vst4q_lane_u8): Likewise.
	(vst4q_lane_u16): Likewise.
	(vst4q_lane_u32): Likewise.
	(vst4q_lane_u64): Likewise.
	(vtbl3_s8): Likewise.
	(vtbl3_u8): Likewise.
	(vtbl3_p8): Likewise.
	(vtbl4_s8): Likewise.
	(vtbl4_u8): Likewise.
	(vtbl4_p8): Likewise.
	(vld1_u8_x3): Likewise.
	(vld1_s8_x3): Likewise.
	(vld1_u16_x3): Likewise.
	(vld1_s16_x3): Likewise.
	(vld1_u32_x3): Likewise.
	(vld1_s32_x3): Likewise.
	(vld1_u64_x3): Likewise.
	(vld1_s64_x3): Likewise.
	(vld1_f16_x3): Likewise.
	(vld1_f32_x3): Likewise.
	(vld1_f64_x3): Likewise.
	(vld1_p8_x3): Likewise.
	(vld1_p16_x3): Likewise.
	(vld1_p64_x3): Likewise.
	(vld1q_u8_x3): Likewise.
	(vld1q_s8_x3): Likewise.
	(vld1q_u16_x3): Likewise.
	(vld1q_s16_x3): Likewise.
	(vld1q_u32_x3): Likewise.
	(vld1q_s32_x3): Likewise.
	(vld1q_u64_x3): Likewise.
	(vld1q_s64_x3): Likewise.
	(vld1q_f16_x3): Likewise.
	(vld1q_f32_x3): Likewise.
	(vld1q_f64_x3): Likewise.
	(vld1q_p8_x3): Likewise.
	(vld1q_p16_x3): Likewise.
	(vld1q_p64_x3): Likewise.
	(vld1_u8_x2): Likewise.
	(vld1_s8_x2): Likewise.
	(vld1_u16_x2): Likewise.
	(vld1_s16_x2): Likewise.
	(vld1_u32_x2): Likewise.
	(vld1_s32_x2): Likewise.
	(vld1_u64_x2): Likewise.
	(vld1_s64_x2): Likewise.
	(vld1_f16_x2): Likewise.
	(vld1_f32_x2): Likewise.
	(vld1_f64_x2): Likewise.
	(vld1_p8_x2): Likewise.
	(vld1_p16_x2): Likewise.
	(vld1_p64_x2): Likewise.
	(vld1q_u8_x2): Likewise.
	(vld1q_s8_x2): Likewise.
	(vld1q_u16_x2): Likewise.
	(vld1q_s16_x2): Likewise.
	(vld1q_u32_x2): Likewise.
	(vld1q_s32_x2): Likewise.
	(vld1q_u64_x2): Likewise.
	(vld1q_s64_x2): Likewise.
	(vld1q_f16_x2): Likewise.
	(vld1q_f32_x2): Likewise.
	(vld1q_f64_x2): Likewise.
	(vld1q_p8_x2): Likewise.
	(vld1q_p16_x2): Likewise.
	(vld1q_p64_x2): Likewise.
	(vld1_s8_x4): Likewise.
	(vld1q_s8_x4): Likewise.
	(vld1_s16_x4): Likewise.
	(vld1q_s16_x4): Likewise.
	(vld1_s32_x4): Likewise.
	(vld1q_s32_x4): Likewise.
	(vld1_u8_x4): Likewise.
	(vld1q_u8_x4): Likewise.
	(vld1_u16_x4): Likewise.
	(vld1q_u16_x4): Likewise.
	(vld1_u32_x4): Likewise.
	(vld1q_u32_x4): Likewise.
	(vld1_f16_x4): Likewise.
	(vld1q_f16_x4): Likewise.
	(vld1_f32_x4): Likewise.
	(vld1q_f32_x4): Likewise.
	(vld1_p8_x4): Likewise.
	(vld1q_p8_x4): Likewise.
	(vld1_p16_x4): Likewise.
	(vld1q_p16_x4): Likewise.
	(vld1_s64_x4): Likewise.
	(vld1_u64_x4): Likewise.
	(vld1_p64_x4): Likewise.
	(vld1q_s64_x4): Likewise.
	(vld1q_u64_x4): Likewise.
	(vld1q_p64_x4): Likewise.
	(vld1_f64_x4): Likewise.
	(vld1q_f64_x4): Likewise.
	(vld2_s64): Likewise.
	(vld2_u64): Likewise.
	(vld2_f64): Likewise.
	(vld2_s8): Likewise.
	(vld2_p8): Likewise.
	(vld2_p64): Likewise.
	(vld2_s16): Likewise.
	(vld2_p16): Likewise.
	(vld2_s32): Likewise.
	(vld2_u8): Likewise.
	(vld2_u16): Likewise.
	(vld2_u32): Likewise.
	(vld2_f16): Likewise.
	(vld2_f32): Likewise.
	(vld2q_s8): Likewise.
	(vld2q_p8): Likewise.
	(vld2q_s16): Likewise.
	(vld2q_p16): Likewise.
	(vld2q_p64): Likewise.
	(vld2q_s32): Likewise.
	(vld2q_s64): Likewise.
	(vld2q_u8): Likewise.
	(vld2q_u16): Likewise.
	(vld2q_u32): Likewise.
	(vld2q_u64): Likewise.
	(vld2q_f16): Likewise.
	(vld2q_f32): Likewise.
	(vld2q_f64): Likewise.
	(vld3_s64): Likewise.
	(vld3_u64): Likewise.
	(vld3_f64): Likewise.
	(vld3_s8): Likewise.
	(vld3_p8): Likewise.
	(vld3_s16): Likewise.
	(vld3_p16): Likewise.
	(vld3_s32): Likewise.
	(vld3_u8): Likewise.
	(vld3_u16): Likewise.
	(vld3_u32): Likewise.
	(vld3_f16): Likewise.
	(vld3_f32): Likewise.
	(vld3_p64): Likewise.
	(vld3q_s8): Likewise.
	(vld3q_p8): Likewise.
	(vld3q_s16): Likewise.
	(vld3q_p16): Likewise.
	(vld3q_s32): Likewise.
	(vld3q_s64): Likewise.
	(vld3q_u8): Likewise.
	(vld3q_u16): Likewise.
	(vld3q_u32): Likewise.
	(vld3q_u64): Likewise.
	(vld3q_f16): Likewise.
	(vld3q_f32): Likewise.
	(vld3q_f64): Likewise.
	(vld3q_p64): Likewise.
	(vld4_s64): Likewise.
	(vld4_u64): Likewise.
	(vld4_f64): Likewise.
	(vld4_s8): Likewise.
	(vld4_p8): Likewise.
	(vld4_s16): Likewise.
	(vld4_p16): Likewise.
	(vld4_s32): Likewise.
	(vld4_u8): Likewise.
	(vld4_u16): Likewise.
	(vld4_u32): Likewise.
	(vld4_f16): Likewise.
	(vld4_f32): Likewise.
	(vld4_p64): Likewise.
	(vld4q_s8): Likewise.
	(vld4q_p8): Likewise.
	(vld4q_s16): Likewise.
	(vld4q_p16): Likewise.
	(vld4q_s32): Likewise.
	(vld4q_s64): Likewise.
	(vld4q_u8): Likewise.
	(vld4q_u16): Likewise.
	(vld4q_u32): Likewise.
	(vld4q_u64): Likewise.
	(vld4q_f16): Likewise.
	(vld4q_f32): Likewise.
	(vld4q_f64): Likewise.
	(vld4q_p64): Likewise.
	(vld2_dup_s8): Likewise.
	(vld2_dup_s16): Likewise.
	(vld2_dup_s32): Likewise.
	(vld2_dup_f16): Likewise.
	(vld2_dup_f32): Likewise.
	(vld2_dup_f64): Likewise.
	(vld2_dup_u8): Likewise.
	(vld2_dup_u16): Likewise.
	(vld2_dup_u32): Likewise.
	(vld2_dup_p8): Likewise.
	(vld2_dup_p16): Likewise.
	(vld2_dup_p64): Likewise.
	(vld2_dup_s64): Likewise.
	(vld2_dup_u64): Likewise.
	(vld2q_dup_s8): Likewise.
	(vld2q_dup_p8): Likewise.
	(vld2q_dup_s16): Likewise.
	(vld2q_dup_p16): Likewise.
	(vld2q_dup_s32): Likewise.
	(vld2q_dup_s64): Likewise.
	(vld2q_dup_u8): Likewise.
	(vld2q_dup_u16): Likewise.
	(vld2q_dup_u32): Likewise.
	(vld2q_dup_u64): Likewise.
	(vld2q_dup_f16): Likewise.
	(vld2q_dup_f32): Likewise.
	(vld2q_dup_f64): Likewise.
	(vld2q_dup_p64): Likewise.
	(vld3_dup_s64): Likewise.
	(vld3_dup_u64): Likewise.
	(vld3_dup_f64): Likewise.
	(vld3_dup_s8): Likewise.
	(vld3_dup_p8): Likewise.
	(vld3_dup_s16): Likewise.
	(vld3_dup_p16): Likewise.
	(vld3_dup_s32): Likewise.
	(vld3_dup_u8): Likewise.
	(vld3_dup_u16): Likewise.
	(vld3_dup_u32): Likewise.
	(vld3_dup_f16): Likewise.
	(vld3_dup_f32): Likewise.
	(vld3_dup_p64): Likewise.
	(vld3q_dup_s8): Likewise.
	(vld3q_dup_p8): Likewise.
	(vld3q_dup_s16): Likewise.
	(vld3q_dup_p16): Likewise.
	(vld3q_dup_s32): Likewise.
	(vld3q_dup_s64): Likewise.
	(vld3q_dup_u8): Likewise.
	(vld3q_dup_u16): Likewise.
	(vld3q_dup_u32): Likewise.
	(vld3q_dup_u64): Likewise.
	(vld3q_dup_f16): Likewise.
	(vld3q_dup_f32): Likewise.
	(vld3q_dup_f64): Likewise.
	(vld3q_dup_p64): Likewise.
	(vld4_dup_s64): Likewise.
	(vld4_dup_u64): Likewise.
	(vld4_dup_f64): Likewise.
	(vld4_dup_s8): Likewise.
	(vld4_dup_p8): Likewise.
	(vld4_dup_s16): Likewise.
	(vld4_dup_p16): Likewise.
	(vld4_dup_s32): Likewise.
	(vld4_dup_u8): Likewise.
	(vld4_dup_u16): Likewise.
	(vld4_dup_u32): Likewise.
	(vld4_dup_f16): Likewise.
	(vld4_dup_f32): Likewise.
	(vld4_dup_p64): Likewise.
	(vld4q_dup_s8): Likewise.
	(vld4q_dup_p8): Likewise.
	(vld4q_dup_s16): Likewise.
	(vld4q_dup_p16): Likewise.
	(vld4q_dup_s32): Likewise.
	(vld4q_dup_s64): Likewise.
	(vld4q_dup_u8): Likewise.
	(vld4q_dup_u16): Likewise.
	(vld4q_dup_u32): Likewise.
	(vld4q_dup_u64): Likewise.
	(vld4q_dup_f16): Likewise.
	(vld4q_dup_f32): Likewise.
	(vld4q_dup_f64): Likewise.
	(vld4q_dup_p64): Likewise.
	(vld2_lane_u8): Likewise.
	(vld2_lane_u16): Likewise.
	(vld2_lane_u32): Likewise.
	(vld2_lane_u64): Likewise.
	(vld2_lane_s8): Likewise.
	(vld2_lane_s16): Likewise.
	(vld2_lane_s32): Likewise.
	(vld2_lane_s64): Likewise.
	(vld2_lane_f16): Likewise.
	(vld2_lane_f32): Likewise.
	(vld2_lane_f64): Likewise.
	(vld2_lane_p8): Likewise.
	(vld2_lane_p16): Likewise.
	(vld2_lane_p64): Likewise.
	(vld2q_lane_u8): Likewise.
	(vld2q_lane_u16): Likewise.
	(vld2q_lane_u32): Likewise.
	(vld2q_lane_u64): Likewise.
	(vld2q_lane_s8): Likewise.
	(vld2q_lane_s16): Likewise.
	(vld2q_lane_s32): Likewise.
	(vld2q_lane_s64): Likewise.
	(vld2q_lane_f16): Likewise.
	(vld2q_lane_f32): Likewise.
	(vld2q_lane_f64): Likewise.
	(vld2q_lane_p8): Likewise.
	(vld2q_lane_p16): Likewise.
	(vld2q_lane_p64): Likewise.
	(vld3_lane_u8): Likewise.
	(vld3_lane_u16): Likewise.
	(vld3_lane_u32): Likewise.
	(vld3_lane_u64): Likewise.
	(vld3_lane_s8): Likewise.
	(vld3_lane_s16): Likewise.
	(vld3_lane_s32): Likewise.
	(vld3_lane_s64): Likewise.
	(vld3_lane_f16): Likewise.
	(vld3_lane_f32): Likewise.
	(vld3_lane_f64): Likewise.
	(vld3_lane_p8): Likewise.
	(vld3_lane_p16): Likewise.
	(vld3_lane_p64): Likewise.
	(vld3q_lane_u8): Likewise.
	(vld3q_lane_u16): Likewise.
	(vld3q_lane_u32): Likewise.
	(vld3q_lane_u64): Likewise.
	(vld3q_lane_s8): Likewise.
	(vld3q_lane_s16): Likewise.
	(vld3q_lane_s32): Likewise.
	(vld3q_lane_s64): Likewise.
	(vld3q_lane_f16): Likewise.
	(vld3q_lane_f32): Likewise.
	(vld3q_lane_f64): Likewise.
	(vld3q_lane_p8): Likewise.
	(vld3q_lane_p16): Likewise.
	(vld3q_lane_p64): Likewise.
	(vld4_lane_u8): Likewise.
	(vld4_lane_u16): Likewise.
	(vld4_lane_u32): Likewise.
	(vld4_lane_u64): Likewise.
	(vld4_lane_s8): Likewise.
	(vld4_lane_s16): Likewise.
	(vld4_lane_s32): Likewise.
	(vld4_lane_s64): Likewise.
	(vld4_lane_f16): Likewise.
	(vld4_lane_f32): Likewise.
	(vld4_lane_f64): Likewise.
	(vld4_lane_p8): Likewise.
	(vld4_lane_p16): Likewise.
	(vld4_lane_p64): Likewise.
	(vld4q_lane_u8): Likewise.
	(vld4q_lane_u16): Likewise.
	(vld4q_lane_u32): Likewise.
	(vld4q_lane_u64): Likewise.
	(vld4q_lane_s8): Likewise.
	(vld4q_lane_s16): Likewise.
	(vld4q_lane_s32): Likewise.
	(vld4q_lane_s64): Likewise.
	(vld4q_lane_f16): Likewise.
	(vld4q_lane_f32): Likewise.
	(vld4q_lane_f64): Likewise.
	(vld4q_lane_p8): Likewise.
	(vld4q_lane_p16): Likewise.
	(vld4q_lane_p64): Likewise.
	(vqtbl2_s8): Likewise.
	(vqtbl2_u8): Likewise.
	(vqtbl2_p8): Likewise.
	(vqtbl2q_s8): Likewise.
	(vqtbl2q_u8): Likewise.
	(vqtbl2q_p8): Likewise.
	(vqtbl3_s8): Likewise.
	(vqtbl3_u8): Likewise.
	(vqtbl3_p8): Likewise.
	(vqtbl3q_s8): Likewise.
	(vqtbl3q_u8): Likewise.
	(vqtbl3q_p8): Likewise.
	(vqtbl4_s8): Likewise.
	(vqtbl4_u8): Likewise.
	(vqtbl4_p8): Likewise.
	(vqtbl4q_s8): Likewise.
	(vqtbl4q_u8): Likewise.
	(vqtbl4q_p8): Likewise.
	(vqtbx2_s8): Likewise.
	(vqtbx2_u8): Likewise.
	(vqtbx2_p8): Likewise.
	(vqtbx2q_s8): Likewise.
	(vqtbx2q_u8): Likewise.
	(vqtbx2q_p8): Likewise.
	(vqtbx3_s8): Likewise.
	(vqtbx3_u8): Likewise.
	(vqtbx3_p8): Likewise.
	(vqtbx3q_s8): Likewise.
	(vqtbx3q_u8): Likewise.
	(vqtbx3q_p8): Likewise.
	(vqtbx4_s8): Likewise.
	(vqtbx4_u8): Likewise.
	(vqtbx4_p8): Likewise.
	(vqtbx4q_s8): Likewise.
	(vqtbx4q_u8): Likewise.
	(vqtbx4q_p8): Likewise.
	(vst1_s64_x2): Likewise.
	(vst1_u64_x2): Likewise.
	(vst1_f64_x2): Likewise.
	(vst1_s8_x2): Likewise.
	(vst1_p8_x2): Likewise.
	(vst1_s16_x2): Likewise.
	(vst1_p16_x2): Likewise.
	(vst1_s32_x2): Likewise.
	(vst1_u8_x2): Likewise.
	(vst1_u16_x2): Likewise.
	(vst1_u32_x2): Likewise.
	(vst1_f16_x2): Likewise.
	(vst1_f32_x2): Likewise.
	(vst1_p64_x2): Likewise.
	(vst1q_s8_x2): Likewise.
	(vst1q_p8_x2): Likewise.
	(vst1q_s16_x2): Likewise.
	(vst1q_p16_x2): Likewise.
	(vst1q_s32_x2): Likewise.
	(vst1q_s64_x2): Likewise.
	(vst1q_u8_x2): Likewise.
	(vst1q_u16_x2): Likewise.
	(vst1q_u32_x2): Likewise.
	(vst1q_u64_x2): Likewise.
	(vst1q_f16_x2): Likewise.
	(vst1q_f32_x2): Likewise.
	(vst1q_f64_x2): Likewise.
	(vst1q_p64_x2): Likewise.
	(vst1_s64_x3): Likewise.
	(vst1_u64_x3): Likewise.
	(vst1_f64_x3): Likewise.
	(vst1_s8_x3): Likewise.
	(vst1_p8_x3): Likewise.
	(vst1_s16_x3): Likewise.
	(vst1_p16_x3): Likewise.
	(vst1_s32_x3): Likewise.
	(vst1_u8_x3): Likewise.
	(vst1_u16_x3): Likewise.
	(vst1_u32_x3): Likewise.
	(vst1_f16_x3): Likewise.
	(vst1_f32_x3): Likewise.
	(vst1_p64_x3): Likewise.
	(vst1q_s8_x3): Likewise.
	(vst1q_p8_x3): Likewise.
	(vst1q_s16_x3): Likewise.
	(vst1q_p16_x3): Likewise.
	(vst1q_s32_x3): Likewise.
	(vst1q_s64_x3): Likewise.
	(vst1q_u8_x3): Likewise.
	(vst1q_u16_x3): Likewise.
	(vst1q_u32_x3): Likewise.
	(vst1q_u64_x3): Likewise.
	(vst1q_f16_x3): Likewise.
	(vst1q_f32_x3): Likewise.
	(vst1q_f64_x3): Likewise.
	(vst1q_p64_x3): Likewise.
	(vst1_s8_x4): Likewise.
	(vst1q_s8_x4): Likewise.
	(vst1_s16_x4): Likewise.
	(vst1q_s16_x4): Likewise.
	(vst1_s32_x4): Likewise.
	(vst1q_s32_x4): Likewise.
	(vst1_u8_x4): Likewise.
	(vst1q_u8_x4): Likewise.
	(vst1_u16_x4): Likewise.
	(vst1q_u16_x4): Likewise.
	(vst1_u32_x4): Likewise.
	(vst1q_u32_x4): Likewise.
	(vst1_f16_x4): Likewise.
	(vst1q_f16_x4): Likewise.
	(vst1_f32_x4): Likewise.
	(vst1q_f32_x4): Likewise.
	(vst1_p8_x4): Likewise.
	(vst1q_p8_x4): Likewise.
	(vst1_p16_x4): Likewise.
	(vst1q_p16_x4): Likewise.
	(vst1_s64_x4): Likewise.
	(vst1_u64_x4): Likewise.
	(vst1_p64_x4): Likewise.
	(vst1q_s64_x4): Likewise.
	(vst1q_u64_x4): Likewise.
	(vst1q_p64_x4): Likewise.
	(vst1_f64_x4): Likewise.
	(vst1q_f64_x4): Likewise.
	(vst2_s64): Likewise.
	(vst2_u64): Likewise.
	(vst2_f64): Likewise.
	(vst2_s8): Likewise.
	(vst2_p8): Likewise.
	(vst2_s16): Likewise.
	(vst2_p16): Likewise.
	(vst2_s32): Likewise.
	(vst2_u8): Likewise.
	(vst2_u16): Likewise.
	(vst2_u32): Likewise.
	(vst2_f16): Likewise.
	(vst2_f32): Likewise.
	(vst2_p64): Likewise.
	(vst2q_s8): Likewise.
	(vst2q_p8): Likewise.
	(vst2q_s16): Likewise.
	(vst2q_p16): Likewise.
	(vst2q_s32): Likewise.
	(vst2q_s64): Likewise.
	(vst2q_u8): Likewise.
	(vst2q_u16): Likewise.
	(vst2q_u32): Likewise.
	(vst2q_u64): Likewise.
	(vst2q_f16): Likewise.
	(vst2q_f32): Likewise.
	(vst2q_f64): Likewise.
	(vst2q_p64): Likewise.
	(vst3_s64): Likewise.
	(vst3_u64): Likewise.
	(vst3_f64): Likewise.
	(vst3_s8): Likewise.
	(vst3_p8): Likewise.
	(vst3_s16): Likewise.
	(vst3_p16): Likewise.
	(vst3_s32): Likewise.
	(vst3_u8): Likewise.
	(vst3_u16): Likewise.
	(vst3_u32): Likewise.
	(vst3_f16): Likewise.
	(vst3_f32): Likewise.
	(vst3_p64): Likewise.
	(vst3q_s8): Likewise.
	(vst3q_p8): Likewise.
	(vst3q_s16): Likewise.
	(vst3q_p16): Likewise.
	(vst3q_s32): Likewise.
	(vst3q_s64): Likewise.
	(vst3q_u8): Likewise.
	(vst3q_u16): Likewise.
	(vst3q_u32): Likewise.
	(vst3q_u64): Likewise.
	(vst3q_f16): Likewise.
	(vst3q_f32): Likewise.
	(vst3q_f64): Likewise.
	(vst3q_p64): Likewise.
	(vst4_s64): Likewise.
	(vst4_u64): Likewise.
	(vst4_f64): Likewise.
	(vst4_s8): Likewise.
	(vst4_p8): Likewise.
	(vst4_s16): Likewise.
	(vst4_p16): Likewise.
	(vst4_s32): Likewise.
	(vst4_u8): Likewise.
	(vst4_u16): Likewise.
	(vst4_u32): Likewise.
	(vst4_f16): Likewise.
	(vst4_f32): Likewise.
	(vst4_p64): Likewise.
	(vst4q_s8): Likewise.
	(vst4q_p8): Likewise.
	(vst4q_s16): Likewise.
	(vst4q_p16): Likewise.
	(vst4q_s32): Likewise.
	(vst4q_s64): Likewise.
	(vst4q_u8): Likewise.
	(vst4q_u16): Likewise.
	(vst4q_u32): Likewise.
	(vst4q_u64): Likewise.
	(vst4q_f16): Likewise.
	(vst4q_f32): Likewise.
	(vst4q_f64): Likewise.
	(vst4q_p64): Likewise.
	(vtbx4_s8): Likewise.
	(vtbx4_u8): Likewise.
	(vtbx4_p8): Likewise.
	(vld1_bf16_x2): Likewise.
	(vld1q_bf16_x2): Likewise.
	(vld1_bf16_x3): Likewise.
	(vld1q_bf16_x3): Likewise.
	(vld1_bf16_x4): Likewise.
	(vld1q_bf16_x4): Likewise.
	(vld2_bf16): Likewise.
	(vld2q_bf16): Likewise.
	(vld2_dup_bf16): Likewise.
	(vld2q_dup_bf16): Likewise.
	(vld3_bf16): Likewise.
	(vld3q_bf16): Likewise.
	(vld3_dup_bf16): Likewise.
	(vld3q_dup_bf16): Likewise.
	(vld4_bf16): Likewise.
	(vld4q_bf16): Likewise.
	(vld4_dup_bf16): Likewise.
	(vld4q_dup_bf16): Likewise.
	(vst1_bf16_x2): Likewise.
	(vst1q_bf16_x2): Likewise.
	(vst1_bf16_x3): Likewise.
	(vst1q_bf16_x3): Likewise.
	(vst1_bf16_x4): Likewise.
	(vst1q_bf16_x4): Likewise.
	(vst2_bf16): Likewise.
	(vst2q_bf16): Likewise.
	(vst3_bf16): Likewise.
	(vst3q_bf16): Likewise.
	(vst4_bf16): Likewise.
	(vst4q_bf16): Likewise.
	(vld2_lane_bf16): Likewise.
	(vld2q_lane_bf16): Likewise.
	(vld3_lane_bf16): Likewise.
	(vld3q_lane_bf16): Likewise.
	(vld4_lane_bf16): Likewise.
	(vld4q_lane_bf16): Likewise.
	(vst2_lane_bf16): Likewise.
	(vst2q_lane_bf16): Likewise.
	(vst3_lane_bf16): Likewise.
	(vst3q_lane_bf16): Likewise.
	(vst4_lane_bf16): Likewise.
	(vst4q_lane_bf16): Likewise.
	* config/aarch64/geniterators.sh: Modify iterator regex to
	match new vector-tuple modes.
	* config/aarch64/iterators.md (insn_count): Extend mode
	attribute with vector-tuple type information.
	(nregs): Likewise.
	(Vendreg): Likewise.
	(Vetype): Likewise.
	(Vtype): Likewise.
	(VSTRUCT_2D): New mode iterator.
	(VSTRUCT_2DNX): Likewise.
	(VSTRUCT_2DX): Likewise.
	(VSTRUCT_2Q): Likewise.
	(VSTRUCT_2QD): Likewise.
	(VSTRUCT_3D): Likewise.
	(VSTRUCT_3DNX): Likewise.
	(VSTRUCT_3DX): Likewise.
	(VSTRUCT_3Q): Likewise.
        (VSTRUCT_3QD): Likewise.
	(VSTRUCT_4D): Likewise.
	(VSTRUCT_4DNX): Likewise.
	(VSTRUCT_4DX): Likewise.
	(VSTRUCT_4Q): Likewise.
        (VSTRUCT_4QD): Likewise.
	(VSTRUCT_D): Likewise.
	(VSTRUCT_Q): Likewise.
	(VSTRUCT_QD): Likewise.
	(VSTRUCT_ELT): New mode attribute.
	(vstruct_elt): Likewise.
	* genmodes.c (VECTOR_MODE): Add default prefix and order
	parameters.
	(VECTOR_MODE_WITH_PREFIX): Define.
	(make_vector_mode): Add mode prefix and order parameters.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c:
	Relax incorrect register number requirement.
	* gcc.target/aarch64/sve/pcs/struct_3_256.c: Accept
	equivalent codegen with fmov.

[-- Attachment #2: rb14782.patch.zip --]
[-- Type: application/zip, Size: 38522 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types
  2021-10-22 14:48 [PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types Jonathan Wright
@ 2021-10-22 15:13 ` Richard Sandiford
  2021-11-02 11:19   ` [PATCH 4/6 V2] " Jonathan Wright
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Sandiford @ 2021-10-22 15:13 UTC (permalink / raw)
  To: Jonathan Wright; +Cc: gcc-patches, Kyrylo Tkachov

Thanks a lot for doing this.

Jonathan Wright <Jonathan.Wright@arm.com> writes:
> @@ -763,9 +839,16 @@ aarch64_lookup_simd_builtin_type (machine_mode mode,
>      return aarch64_simd_builtin_std_type (mode, q);
>  
>    for (i = 0; i < nelts; i++)
> -    if (aarch64_simd_types[i].mode == mode
> -	&& aarch64_simd_types[i].q == q)
> -      return aarch64_simd_types[i].itype;
> +    {
> +      if (aarch64_simd_types[i].mode == mode
> +	  && aarch64_simd_types[i].q == q)
> +	return aarch64_simd_types[i].itype;
> +      else if (aarch64_simd_tuple_types[i][0] != NULL_TREE)

Very minor (sorry for not noticing earlier), but: the “else” is
redundant here.

> +	for (int j = 0; j < 3; j++)
> +	  if (TYPE_MODE (aarch64_simd_tuple_types[i][j]) == mode
> +	      && aarch64_simd_types[i].q == q)
> +	    return aarch64_simd_tuple_types[i][j];
> +    }
>  
>    return NULL_TREE;
>  }
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 48eddf64e05afe3788abfa05141f6544a9323ea1..0aa185b67ff13d40c87db0449aec312929ff5387 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -6636,162 +6636,165 @@
>  
>  ;; Patterns for vector struct loads and stores.
>  
> -(define_insn "aarch64_simd_ld2<mode>"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> -	(unspec:OI [(match_operand:OI 1 "aarch64_simd_struct_operand" "Utv")
> -		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> -		   UNSPEC_LD2))]
> +(define_insn "aarch64_simd_ld2<vstruct_elt>"
> +  [(set (match_operand:VSTRUCT_2Q 0 "register_operand" "=w")
> +	(unspec:VSTRUCT_2Q [
> +	  (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand" "Utv")]
> +	  UNSPEC_LD2))]
>    "TARGET_SIMD"
>    "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
>    [(set_attr "type" "neon_load2_2reg<q>")]
>  )
>  
> -(define_insn "aarch64_simd_ld2r<mode>"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> -       (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> -                   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
> -                  UNSPEC_LD2_DUP))]
> +(define_insn "aarch64_simd_ld2r<vstruct_elt>"
> +  [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
> +	(unspec:VSTRUCT_2QD [
> +	  (match_operand:VSTRUCT_2QD 1 "aarch64_simd_struct_operand" "Utv")]
> +          UNSPEC_LD2_DUP))]

Sorry again for missing this, but the ld2rs, ld3rs and ld4rs should
keep their BLKmode arguments, since they only access 2, 3 or 4
scalar memory elements.

> @@ -7515,10 +7605,10 @@
>  )
>  
>  (define_insn_and_split "aarch64_combinev16qi"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> -	(unspec:OI [(match_operand:V16QI 1 "register_operand" "w")
> -		    (match_operand:V16QI 2 "register_operand" "w")]
> -		   UNSPEC_CONCAT))]
> +  [(set (match_operand:V2x16QI 0 "register_operand" "=w")
> +	(unspec:V2x16QI [(match_operand:V16QI 1 "register_operand" "w")
> +			 (match_operand:V16QI 2 "register_operand" "w")]
> +			UNSPEC_CONCAT))]

Just realised that we can now make this a vec_concat, since the
modes are finally self-consistent.

No need to do that though, either way is fine.

Looks good otherwise.

Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 4/6 V2] aarch64: Add machine modes for Neon vector-tuple types
  2021-10-22 15:13 ` Richard Sandiford
@ 2021-11-02 11:19   ` Jonathan Wright
  2021-11-02 12:34     ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Wright @ 2021-11-02 11:19 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 3869 bytes --]

Hi,

Each of the comments on the previous version of the patch have been
addressed.

Ok for master?

Thanks,
Jonathan


From: Richard Sandiford <richard.sandiford@arm.com>
Sent: 22 October 2021 16:13
To: Jonathan Wright <Jonathan.Wright@arm.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types 
 
Thanks a lot for doing this.

Jonathan Wright <Jonathan.Wright@arm.com> writes:
> @@ -763,9 +839,16 @@ aarch64_lookup_simd_builtin_type (machine_mode mode,
>      return aarch64_simd_builtin_std_type (mode, q);
>  
>    for (i = 0; i < nelts; i++)
> -    if (aarch64_simd_types[i].mode == mode
> -     && aarch64_simd_types[i].q == q)
> -      return aarch64_simd_types[i].itype;
> +    {
> +      if (aarch64_simd_types[i].mode == mode
> +       && aarch64_simd_types[i].q == q)
> +     return aarch64_simd_types[i].itype;
> +      else if (aarch64_simd_tuple_types[i][0] != NULL_TREE)

Very minor (sorry for not noticing earlier), but: the “else” is
redundant here.

> +     for (int j = 0; j < 3; j++)
> +       if (TYPE_MODE (aarch64_simd_tuple_types[i][j]) == mode
> +           && aarch64_simd_types[i].q == q)
> +         return aarch64_simd_tuple_types[i][j];
> +    }
>  
>    return NULL_TREE;
>  }
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 48eddf64e05afe3788abfa05141f6544a9323ea1..0aa185b67ff13d40c87db0449aec312929ff5387 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -6636,162 +6636,165 @@
>  
>  ;; Patterns for vector struct loads and stores.
>  
> -(define_insn "aarch64_simd_ld2<mode>"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> -     (unspec:OI [(match_operand:OI 1 "aarch64_simd_struct_operand" "Utv")
> -                 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> -                UNSPEC_LD2))]
> +(define_insn "aarch64_simd_ld2<vstruct_elt>"
> +  [(set (match_operand:VSTRUCT_2Q 0 "register_operand" "=w")
> +     (unspec:VSTRUCT_2Q [
> +       (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand" "Utv")]
> +       UNSPEC_LD2))]
>    "TARGET_SIMD"
>    "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
>    [(set_attr "type" "neon_load2_2reg<q>")]
>  )
>  
> -(define_insn "aarch64_simd_ld2r<mode>"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> -       (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> -                   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
> -                  UNSPEC_LD2_DUP))]
> +(define_insn "aarch64_simd_ld2r<vstruct_elt>"
> +  [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
> +     (unspec:VSTRUCT_2QD [
> +       (match_operand:VSTRUCT_2QD 1 "aarch64_simd_struct_operand" "Utv")]
> +          UNSPEC_LD2_DUP))]

Sorry again for missing this, but the ld2rs, ld3rs and ld4rs should
keep their BLKmode arguments, since they only access 2, 3 or 4
scalar memory elements.

> @@ -7515,10 +7605,10 @@
>  )
>  
>  (define_insn_and_split "aarch64_combinev16qi"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> -     (unspec:OI [(match_operand:V16QI 1 "register_operand" "w")
> -                 (match_operand:V16QI 2 "register_operand" "w")]
> -                UNSPEC_CONCAT))]
> +  [(set (match_operand:V2x16QI 0 "register_operand" "=w")
> +     (unspec:V2x16QI [(match_operand:V16QI 1 "register_operand" "w")
> +                      (match_operand:V16QI 2 "register_operand" "w")]
> +                     UNSPEC_CONCAT))]

Just realised that we can now make this a vec_concat, since the
modes are finally self-consistent.

No need to do that though, either way is fine.

Looks good otherwise.

Richard

[-- Attachment #2: rb14782.patch.zip --]
[-- Type: application/zip, Size: 38525 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 4/6 V2] aarch64: Add machine modes for Neon vector-tuple types
  2021-11-02 11:19   ` [PATCH 4/6 V2] " Jonathan Wright
@ 2021-11-02 12:34     ` Richard Sandiford
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Sandiford @ 2021-11-02 12:34 UTC (permalink / raw)
  To: Jonathan Wright; +Cc: gcc-patches, Kyrylo Tkachov

Jonathan Wright <Jonathan.Wright@arm.com> writes:
> Each of the comments on the previous version of the patch have been
> addressed.

Thanks.

I realise I was wrong with the vcombine thing: it's only vec_concat
for LE, not for BE.  Sorry for the screw-up.

The patch is OK with that part reverted to your original version.

Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-02 12:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22 14:48 [PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types Jonathan Wright
2021-10-22 15:13 ` Richard Sandiford
2021-11-02 11:19   ` [PATCH 4/6 V2] " Jonathan Wright
2021-11-02 12:34     ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).