* [1/4] [AArch64] SVE backend support
2017-11-03 17:45 [0/4] [AArch64] Add SVE support Richard Sandiford
@ 2017-11-03 17:48 ` Richard Sandiford
2018-01-05 11:41 ` Richard Sandiford
2017-11-03 17:50 ` [2/4] [AArch64] Testsuite markup for SVE Richard Sandiford
` (3 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2017-11-03 17:48 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
[-- Attachment #1: Type: text/plain, Size: 15840 bytes --]
This patch adds support for ARM's Scalable Vector Extension.
The patch just contains the core features that work with the
current vectoriser framework; later patches will add extra
capabilities to both the target-independent code and AArch64 code.
The patch doesn't include:
- support for unwinding frames whose size depends on the vector length
- modelling the effect of __tls_get_addr on the SVE registers
These are handled by later patches instead.
Some notes:
- The copyright years for aarch64-sve.md start at 2009 because some of
the code is based on aarch64.md, which also starts from then.
- The patch inserts spaces between items in the AArch64 section
of sourcebuild.texi. This matches at least the surrounding
architectures and looks a little nicer in the info output.
- aarch64-sve.md includes a pattern:
while_ult<GPI:mode><PRED_ALL:mode>
A later patch adds a matching "while_ult" optab, but the pattern
is also needed by the predicate vec_duplicate expander.
2017-11-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/invoke.texi (-msve-vector-bits=): Document new option.
(sve): Document new AArch64 extension.
* doc/md.texi (w): Extend the description of the AArch64
constraint to include SVE vectors.
(Upl, Upa): Document new AArch64 predicate constraints.
* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
enum.
* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
(msve-vector-bits=): New option.
* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
SVE when these are disabled.
(sve): New extension.
* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
modes. Adjust their number of units based on aarch64_sve_vg.
(MAX_BITSIZE_MODE_ANY_MODE): Define.
* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
aarch64_addr_query_type.
(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_check_zero_based_sve_index_immediate): Declare.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Likewise.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
(aarch64_regmode_natural_size): Likewise.
* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
left one place.
(AARCH64_ISA_SVE, TARGET_SVE): New macros.
(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
for VG and the SVE predicate registers.
(V_ALIASES): Add a "z"-prefixed alias.
(FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
(AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
(PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
(PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
(REG_CLASS_NAMES): Add entries for them.
(REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG
and the predicate registers.
(aarch64_sve_vg): Declare.
(BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
(SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
(REGMODE_NATURAL_SIZE): Define.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
SVE macros.
* config/aarch64/aarch64.c: Include cfgrtl.h.
(simd_immediate_info): Add a constructor for series vectors,
and an associated step field.
(aarch64_sve_vg): New variable.
(aarch64_dbx_register_number): Handle VG and the predicate registers.
(aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
(VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
(VEC_ANY_DATA, VEC_STRUCT): New constants.
(aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
(aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
(aarch64_sve_data_mode_p, aarch64_pred_mode, aarch64_get_mask_mode):
New functions.
(aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
(aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE
predicate modes and predicate registers. Explicitly restrict
GPRs to modes of 16 bytes or smaller. Only allow FP registers
to store a vector mode if it is recognized by
aarch64_classify_vector_mode.
(aarch64_regmode_natural_size): New function.
(aarch64_hard_regno_caller_save_mode): Return the original mode
for predicates.
(aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
(aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
(aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
functions.
(aarch64_add_offset): Add a temp2 parameter. Assert that temp1
does not overlap dest if the function is frame-related. Handle
SVE constants.
(aarch64_split_add_offset): New function.
(aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
them aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
and update call to aarch64_sub_sp.
(aarch64_add_cfa_expression): New function.
(aarch64_expand_prologue): Pass extra temporary registers to the
functions above. Handle the case in which we need to emit new
DW_CFA_expressions for registers that were originally saved
relative to the stack pointer, but now have to be expressed
relative to the frame pointer.
(aarch64_output_mi_thunk): Pass extra temporary registers to the
functions above.
(aarch64_expand_epilogue): Likewise. Prevent inheritance of
IP0 and IP1 values for SVE frames.
(aarch64_expand_vec_series): New function.
(aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
Handle SVE constants.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
(aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
(offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
(offset_9bit_signed_scaled_p): New functions.
(aarch64_replicate_bitmask_imm): New function.
(aarch64_bitmask_imm): Use it.
(aarch64_cannot_force_const_mem): Reject expressions involving
a CONST_POLY_INT. Update call to aarch64_classify_symbol.
(aarch64_classify_index): Handle SVE indices, by requiring
a plain register index with a scale that matches the element size.
(aarch64_classify_address): Handle SVE addresses. Assert that
the mode of the address is VOIDmode or an integer mode.
Update call to aarch64_classify_symbol.
(aarch64_classify_symbolic_expression): Update call to
aarch64_classify_symbol.
(aarch64_const_vec_all_same_in_range_p): Extend to VEC_DUPLICATE
constants by using const_vec_duplicate_p.
(aarch64_const_vec_all_in_range_p): New function.
(aarch64_print_vector_float_operand): Likewise.
(aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than
"vN" for FP registers with SVE modes. Handle (const ...) vectors
and the FP immediates 1.0 and 0.5.
(aarch64_print_operand_address): Use ADDR_QUERY_ANY. Handle
SVE addresses.
(aarch64_regno_regclass): Handle predicate registers.
(aarch64_secondary_reload): Handle big-endian reloads of SVE
data modes.
(aarch64_class_max_nregs): Handle SVE modes and predicate registers.
(aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
(aarch64_convert_sve_vector_bits): New function.
(aarch64_override_options): Use it to handle -msve-vector-bits=.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
Handle SVE vector and predicate modes. Accept VL-based constants
that need only one temporary register. Only call
aarch64_constant_address_p if the constant is a scalar integer.
(aarch64_conditional_register_usage): Mark the predicate registers
as fixed if SVE isn't available.
(aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
Return true for SVE vector and predicate modes.
(aarch64_simd_container_mode): Take the number of bits as a poly_int64
rather than an unsigned int. Handle SVE modes.
(aarch64_preferred_simd_mode): Update call accordingly. Handle
SVE modes.
(aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
if SVE is enabled.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): New functions.
(aarch64_sve_valid_immediate): New function.
(aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
Explicitly reject structure modes. Check for INDEX constants.
Handle PTRUE and PFALSE constants.
(aarch64_check_zero_based_sve_index_immediate): New function.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
vector modes. Accept constants in the range of CNT[BHWD].
(aarch64_simd_scalar_immediate_valid_for_move): Explicitly
ask for an Advanced SIMD mode.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
(aarch64_simd_vector_alignment): Handle SVE predicates.
(aarch64_vectorize_preferred_vector_alignment): New function.
(aarch64_simd_vector_alignment_reachable): Use it instead of
the vector size.
(aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
functions.
(MAX_VECT_LEN): Delete.
(expand_vec_perm_d): Add a vec_flags field.
(emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
(aarch64_evpc_ext): Don't apply a big-endian lane correction
for SVE modes.
(aarch64_evpc_rev): Use a predicated operation for SVE.
(aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
(aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
MAX_VECT_LEN.
(aarch64_evpc_sve_tbl): New function.
(aarch64_expand_vec_perm_const_1): Handle SVE permutes too,
using aarch64_evpc_sve_tbl rather than aarch64_evpc_tbl.
(aarch64_expand_vec_perm_const): Initialize vec_flags.
(aarch64_vectorize_vec_perm_const_ok): Likewise.
(aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
(aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
(aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
(aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond): New functions.
(aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
of aarch64_vector_mode_p.
(aarch64_dwarf_poly_indeterminate_value): New function.
(aarch64_compute_pressure_classes): Likewise.
(aarch64_can_change_mode_class): Likewise.
(TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
(TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
(TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
(TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
(TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
(TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
* config/aarch64/constraints.md (Upa, Upl, Uad, Ual, Utr, Utw, Di)
(Dm, Dv, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vfa, vfm, vfn): New
constraints.
(Dn, Dl, Dr): Accept const as well as const_vector.
(Dz): Likewise. Compare against CONST0_RTX.
* config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
of "vector" where appropriate.
(SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
(SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
(UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
(UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
(UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
(UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
(Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
(v_int_equiv): Extend to SVE modes.
(Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
mode attributes.
(LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
(optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
(logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
(LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
(SVE_COND_FP_CMP): New int iterators.
(perm_hilo): Handle the new unpack unspecs.
(optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
attributes.
* config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
(aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
(aarch64_equality_operator, aarch64_constant_vector_operand)
(aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
(aarch64_sve_nonimmediate_operand): Likewise.
(aarch64_sve_general_operand): Likewise.
(aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
(aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
(aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
(aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
(aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
(aarch64_sve_float_arith_immediate): Likewise.
(aarch64_sve_float_arith_with_sub_immediate): Likewise.
(aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
(aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
(aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
(aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
(aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
(aarch64_sve_float_arith_operand): Likewise.
(aarch64_sve_float_arith_with_sub_operand): Likewise.
(aarch64_sve_float_mul_operand): Likewise.
(aarch64_sve_vec_perm_operand): Likewise.
(aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
(aarch64_mov_operand): Accept const_poly_int and const_vector.
(aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
as well as const_vector.
(aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
in file. Use CONST0_RTX and CONSTM1_RTX.
(aarch64_simd_reg_or_zero): Accept const as well as const_vector.
Use aarch64_simd_imm_zero.
* config/aarch64/aarch64-sve.md: New file.
* config/aarch64/aarch64.md: Include it.
(VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
(UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
(UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
(UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
(UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
(movqi, movhi): Pass CONST_POLY_INT operaneds through
aarch64_expand_mov_immediate.
(*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
CNT[BHSD] immediates.
(movti): Split CONST_POLY_INT moves into two halves.
(add<mode>3): Accept aarch64_pluslong_or_poly_operand.
Split additions that need a temporary here if the destination
is the stack pointer.
(*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
(*add<mode>3_poly_1): New instruction.
(set_clobber_cc): New expander.
[-- Attachment #2: sve-01-main.diff.gz --]
[-- Type: application/gzip, Size: 55524 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [1/4] [AArch64] SVE backend support
2017-11-03 17:48 ` [1/4] [AArch64] SVE backend support Richard Sandiford
@ 2018-01-05 11:41 ` Richard Sandiford
2018-01-10 19:19 ` James Greenhalgh
0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2018-01-05 11:41 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
[-- Attachment #1: Type: text/plain, Size: 16539 bytes --]
Here's the patch updated to apply on top of the v8.4 and
__builtin_load_no_speculate support. It also handles the new
vec_perm_indices and CONST_VECTOR encoding and uses VNx... names
for the SVE modes.
Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch adds support for ARM's Scalable Vector Extension.
> The patch just contains the core features that work with the
> current vectoriser framework; later patches will add extra
> capabilities to both the target-independent code and AArch64 code.
> The patch doesn't include:
>
> - support for unwinding frames whose size depends on the vector length
> - modelling the effect of __tls_get_addr on the SVE registers
>
> These are handled by later patches instead.
>
> Some notes:
>
> - The copyright years for aarch64-sve.md start at 2009 because some of
> the code is based on aarch64.md, which also starts from then.
>
> - The patch inserts spaces between items in the AArch64 section
> of sourcebuild.texi. This matches at least the surrounding
> architectures and looks a little nicer in the info output.
>
> - aarch64-sve.md includes a pattern:
>
> while_ult<GPI:mode><PRED_ALL:mode>
>
> A later patch adds a matching "while_ult" optab, but the pattern
> is also needed by the predicate vec_duplicate expander.
2018-01-05 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/invoke.texi (-msve-vector-bits=): Document new option.
(sve): Document new AArch64 extension.
* doc/md.texi (w): Extend the description of the AArch64
constraint to include SVE vectors.
(Upl, Upa): Document new AArch64 predicate constraints.
* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
enum.
* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
(msve-vector-bits=): New option.
* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
SVE when these are disabled.
(sve): New extension.
* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
modes. Adjust their number of units based on aarch64_sve_vg.
(MAX_BITSIZE_MODE_ANY_MODE): Define.
* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
aarch64_addr_query_type.
(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_check_zero_based_sve_index_immediate): Declare.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Likewise.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
(aarch64_regmode_natural_size): Likewise.
* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
left one place.
(AARCH64_ISA_SVE, TARGET_SVE): New macros.
(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
for VG and the SVE predicate registers.
(V_ALIASES): Add a "z"-prefixed alias.
(FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
(AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
(PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
(PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
(REG_CLASS_NAMES): Add entries for them.
(REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG
and the predicate registers.
(aarch64_sve_vg): Declare.
(BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
(SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
(REGMODE_NATURAL_SIZE): Define.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
SVE macros.
* config/aarch64/aarch64.c: Include cfgrtl.h.
(simd_immediate_info): Add a constructor for series vectors,
and an associated step field.
(aarch64_sve_vg): New variable.
(aarch64_dbx_register_number): Handle VG and the predicate registers.
(aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
(VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
(VEC_ANY_DATA, VEC_STRUCT): New constants.
(aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
(aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
(aarch64_sve_data_mode_p, aarch64_sve_pred_mode)
(aarch64_get_mask_mode): New functions.
(aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
(aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE
predicate modes and predicate registers. Explicitly restrict
GPRs to modes of 16 bytes or smaller. Only allow FP registers
to store a vector mode if it is recognized by
aarch64_classify_vector_mode.
(aarch64_regmode_natural_size): New function.
(aarch64_hard_regno_caller_save_mode): Return the original mode
for predicates.
(aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
(aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
(aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
functions.
(aarch64_add_offset): Add a temp2 parameter. Assert that temp1
does not overlap dest if the function is frame-related. Handle
SVE constants.
(aarch64_split_add_offset): New function.
(aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
them aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
and update call to aarch64_sub_sp.
(aarch64_add_cfa_expression): New function.
(aarch64_expand_prologue): Pass extra temporary registers to the
functions above. Handle the case in which we need to emit new
DW_CFA_expressions for registers that were originally saved
relative to the stack pointer, but now have to be expressed
relative to the frame pointer.
(aarch64_output_mi_thunk): Pass extra temporary registers to the
functions above.
(aarch64_expand_epilogue): Likewise. Prevent inheritance of
IP0 and IP1 values for SVE frames.
(aarch64_expand_vec_series): New function.
(aarch64_expand_sve_widened_duplicate): Likewise.
(aarch64_expand_sve_const_vector): Likewise.
(aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
Handle SVE constants. Use emit_move_insn to move a force_const_mem
into the register, rather than emitting a SET directly.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
(aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
(offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
(offset_9bit_signed_scaled_p): New functions.
(aarch64_replicate_bitmask_imm): New function.
(aarch64_bitmask_imm): Use it.
(aarch64_cannot_force_const_mem): Reject expressions involving
a CONST_POLY_INT. Update call to aarch64_classify_symbol.
(aarch64_classify_index): Handle SVE indices, by requiring
a plain register index with a scale that matches the element size.
(aarch64_classify_address): Handle SVE addresses. Assert that
the mode of the address is VOIDmode or an integer mode.
Update call to aarch64_classify_symbol.
(aarch64_classify_symbolic_expression): Update call to
aarch64_classify_symbol.
(aarch64_const_vec_all_in_range_p): New function.
(aarch64_print_vector_float_operand): Likewise.
(aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than
"vN" for FP registers with SVE modes. Handle (const ...) vectors
and the FP immediates 1.0 and 0.5.
(aarch64_print_address_internal): Handle SVE addresses.
(aarch64_print_operand_address): Use ADDR_QUERY_ANY.
(aarch64_regno_regclass): Handle predicate registers.
(aarch64_secondary_reload): Handle big-endian reloads of SVE
data modes.
(aarch64_class_max_nregs): Handle SVE modes and predicate registers.
(aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
(aarch64_convert_sve_vector_bits): New function.
(aarch64_override_options): Use it to handle -msve-vector-bits=.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
Handle SVE vector and predicate modes. Accept VL-based constants
that need only one temporary register, and VL offsets that require
no temporary registers.
(aarch64_conditional_register_usage): Mark the predicate registers
as fixed if SVE isn't available.
(aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
Return true for SVE vector and predicate modes.
(aarch64_simd_container_mode): Take the number of bits as a poly_int64
rather than an unsigned int. Handle SVE modes.
(aarch64_preferred_simd_mode): Update call accordingly. Handle
SVE modes.
(aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
if SVE is enabled.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): New functions.
(aarch64_sve_valid_immediate): New function.
(aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
Explicitly reject structure modes. Check for INDEX constants.
Handle PTRUE and PFALSE constants.
(aarch64_check_zero_based_sve_index_immediate): New function.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
vector modes. Accept constants in the range of CNT[BHWD].
(aarch64_simd_scalar_immediate_valid_for_move): Explicitly
ask for an Advanced SIMD mode.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
(aarch64_simd_vector_alignment): Handle SVE predicates.
(aarch64_vectorize_preferred_vector_alignment): New function.
(aarch64_simd_vector_alignment_reachable): Use it instead of
the vector size.
(aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
functions.
(MAX_VECT_LEN): Delete.
(expand_vec_perm_d): Add a vec_flags field.
(emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
(aarch64_evpc_ext): Don't apply a big-endian lane correction
for SVE modes.
(aarch64_evpc_rev): Rename to...
(aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE.
(aarch64_evpc_rev_global): New function.
(aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
(aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
MAX_VECT_LEN.
(aarch64_evpc_sve_tbl): New function.
(aarch64_expand_vec_perm_const_1): Update after rename of
aarch64_evpc_rev. Handle SVE permutes too, trying
aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
than aarch64_evpc_tbl.
(aarch64_vectorize_vec_perm_const): Initialize vec_flags.
(aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
(aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
(aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
(aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond): New functions.
(aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
of aarch64_vector_mode_p.
(aarch64_dwarf_poly_indeterminate_value): New function.
(aarch64_compute_pressure_classes): Likewise.
(aarch64_can_change_mode_class): Likewise.
(TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
(TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
(TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
(TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
(TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
(TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
* config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
(Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
constraints.
(Dn, Dl, Dr): Accept const as well as const_vector.
(Dz): Likewise. Compare against CONST0_RTX.
* config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
of "vector" where appropriate.
(SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
(SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
(UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
(UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
(UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
(UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
(Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
(v_int_equiv): Extend to SVE modes.
(Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
mode attributes.
(LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
(optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
(logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
(LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
(SVE_COND_FP_CMP): New int iterators.
(perm_hilo): Handle the new unpack unspecs.
(optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
attributes.
* config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
(aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
(aarch64_equality_operator, aarch64_constant_vector_operand)
(aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
(aarch64_sve_nonimmediate_operand): Likewise.
(aarch64_sve_general_operand): Likewise.
(aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
(aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
(aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
(aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
(aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
(aarch64_sve_float_arith_immediate): Likewise.
(aarch64_sve_float_arith_with_sub_immediate): Likewise.
(aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
(aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
(aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
(aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
(aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
(aarch64_sve_float_arith_operand): Likewise.
(aarch64_sve_float_arith_with_sub_operand): Likewise.
(aarch64_sve_float_mul_operand): Likewise.
(aarch64_sve_vec_perm_operand): Likewise.
(aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
(aarch64_mov_operand): Accept const_poly_int and const_vector.
(aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
as well as const_vector.
(aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
in file. Use CONST0_RTX and CONSTM1_RTX.
(aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes.
(aarch64_simd_reg_or_zero): Accept const as well as const_vector.
Use aarch64_simd_imm_zero.
* config/aarch64/aarch64-sve.md: New file.
* config/aarch64/aarch64.md: Include it.
(VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
(UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
(UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
(UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
(UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
(sve): New attribute.
(enabled): Disable instructions with the sve attribute unless
TARGET_SVE.
(movqi, movhi): Pass CONST_POLY_INT operaneds through
aarch64_expand_mov_immediate.
(*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
CNT[BHSD] immediates.
(movti): Split CONST_POLY_INT moves into two halves.
(add<mode>3): Accept aarch64_pluslong_or_poly_operand.
Split additions that need a temporary here if the destination
is the stack pointer.
(*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
(*add<mode>3_poly_1): New instruction.
(set_clobber_cc): New expander.
[-- Attachment #2: sve-01-main.diff.gz --]
[-- Type: application/gzip, Size: 58140 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [1/4] [AArch64] SVE backend support
2018-01-05 11:41 ` Richard Sandiford
@ 2018-01-10 19:19 ` James Greenhalgh
2018-01-10 19:55 ` Richard Sandiford
0 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2018-01-10 19:19 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
On Fri, Jan 05, 2018 at 11:41:25AM +0000, Richard Sandiford wrote:
> Here's the patch updated to apply on top of the v8.4 and
> __builtin_load_no_speculate support. It also handles the new
> vec_perm_indices and CONST_VECTOR encoding and uses VNx... names
> for the SVE modes.
>
> Richard Sandiford <richard.sandiford@linaro.org> writes:
> > This patch adds support for ARM's Scalable Vector Extension.
> > The patch just contains the core features that work with the
> > current vectoriser framework; later patches will add extra
> > capabilities to both the target-independent code and AArch64 code.
> > The patch doesn't include:
> >
> > - support for unwinding frames whose size depends on the vector length
> > - modelling the effect of __tls_get_addr on the SVE registers
> >
> > These are handled by later patches instead.
> >
> > Some notes:
> >
> > - The copyright years for aarch64-sve.md start at 2009 because some of
> > the code is based on aarch64.md, which also starts from then.
> >
> > - The patch inserts spaces between items in the AArch64 section
> > of sourcebuild.texi. This matches at least the surrounding
> > architectures and looks a little nicer in the info output.
> >
> > - aarch64-sve.md includes a pattern:
> >
> > while_ult<GPI:mode><PRED_ALL:mode>
> >
> > A later patch adds a matching "while_ult" optab, but the pattern
> > is also needed by the predicate vec_duplicate expander.
I'm keen to take this. The code is good quality overall, I'm confident in your
reputation and implementation. There are some parts of the design that I'm
less happy about, but pragmatically, we should take this now to get the
behaviour correct, and look to optimise, refactor, and clean-up in future.
Sorry it took me a long time to get to the review. I've got no meaningful
design concerns here, and certainly nothing so critical that we couldn't
fix it after the fact in GCC 9 and up.
That said...
> (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
I'm not a big fan of these sorts of functions which return a char* where
we've dumped the text we want to print out in the short term. The interface
(fill a static char[] we can then leak on return) is pretty ugly.
One consideration for future work would be refactoring out aarch64.c - it is
getting to be too big for my liking (near 18,000 lines).
> (aarch64_expand_sve_mem_move)
Do we have a good description of how SVE big-endian vectors work, <snip more
comments - I found the detailed comment at the top of aarch64-sve.md>
The sort of comment you write later ("see the comment at the head of
aarch64-sve.md for details") would also be useful here as a reference.
> aarch64_get_reg_raw_mode
Do we assert/warn anywhere for users of __builtin_apply that they are
fundamentally unsupported?
> offset_4bit_signed_scaled_p
So much code duplication here and in similair functions. Would a single
interface (unsigned bits, bool signed, bool scaled) let you avoid the many
identical functions?
> aarch64_evpc_rev_local
I'm likely missing something obvious, but what is the distinction you're
drawing between global and local? Could you comment it?
> aarch64-sve.md - scheduling types
None of the instructions here have types for scheduling. That's going to
make for a future headache. Adding them to the existing scheduling types
is going to cause all sorts of trouble when building GCC (we already have
too many types for some compilers to handle the structure!). We'll need
to finds a solution to how we'll direct scheduling for SVE.
> aarch64-sve.md - predicated operands
It is a shame this ends up being so ugly and requiring UNSPEC_MERGE_PTRUE
everywhere. That will block a lot of useful optimisation.
Otherwise, this is OK for trunk. I'm happy to take it as is, and have the
above suggestions applied as follow-ups if you think they are worth doing.
Thanks,
James
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [1/4] [AArch64] SVE backend support
2018-01-10 19:19 ` James Greenhalgh
@ 2018-01-10 19:55 ` Richard Sandiford
0 siblings, 0 replies; 18+ messages in thread
From: Richard Sandiford @ 2018-01-10 19:55 UTC (permalink / raw)
To: James Greenhalgh; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
Thanks for the review!
James Greenhalgh <james.greenhalgh@arm.com> writes:
> On Fri, Jan 05, 2018 at 11:41:25AM +0000, Richard Sandiford wrote:
>> Here's the patch updated to apply on top of the v8.4 and
>> __builtin_load_no_speculate support. It also handles the new
>> vec_perm_indices and CONST_VECTOR encoding and uses VNx... names
>> for the SVE modes.
>>
>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> > This patch adds support for ARM's Scalable Vector Extension.
>> > The patch just contains the core features that work with the
>> > current vectoriser framework; later patches will add extra
>> > capabilities to both the target-independent code and AArch64 code.
>> > The patch doesn't include:
>> >
>> > - support for unwinding frames whose size depends on the vector length
>> > - modelling the effect of __tls_get_addr on the SVE registers
>> >
>> > These are handled by later patches instead.
>> >
>> > Some notes:
>> >
>> > - The copyright years for aarch64-sve.md start at 2009 because some of
>> > the code is based on aarch64.md, which also starts from then.
>> >
>> > - The patch inserts spaces between items in the AArch64 section
>> > of sourcebuild.texi. This matches at least the surrounding
>> > architectures and looks a little nicer in the info output.
>> >
>> > - aarch64-sve.md includes a pattern:
>> >
>> > while_ult<GPI:mode><PRED_ALL:mode>
>> >
>> > A later patch adds a matching "while_ult" optab, but the pattern
>> > is also needed by the predicate vec_duplicate expander.
>
> I'm keen to take this. The code is good quality overall, I'm confident in your
> reputation and implementation. There are some parts of the design that I'm
> less happy about, but pragmatically, we should take this now to get the
> behaviour correct, and look to optimise, refactor, and clean-up in future.
>
> Sorry it took me a long time to get to the review. I've got no meaningful
> design concerns here, and certainly nothing so critical that we couldn't
> fix it after the fact in GCC 9 and up.
>
> That said...
>
>> (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
>
> I'm not a big fan of these sorts of functions which return a char* where
> we've dumped the text we want to print out in the short term. The interface
> (fill a static char[] we can then leak on return) is pretty ugly.
Yeah, it's not pretty, but I think the various possible ways of doing
the addition do justify using output functions here. The distinction
between INC[BHWD], DEC[BHWD], ADDVL and ADDPL doesn't really affect
anything other than the final output, so it isn't something that
should be exposed as different constraints (for example).
We should probably "just" have a nicer interface for target code
to construct instruction format strings.
> One consideration for future work would be refactoring out aarch64.c - it is
> getting to be too big for my liking (near 18,000 lines).
>
>> (aarch64_expand_sve_mem_move)
>
> Do we have a good description of how SVE big-endian vectors work, <snip more
> comments - I found the detailed comment at the top of aarch64-sve.md>
>
> The sort of comment you write later ("see the comment at the head of
> aarch64-sve.md for details") would also be useful here as a reference.
Ah, yeah, will add a reference there too.
>> aarch64_get_reg_raw_mode
>
> Do we assert/warn anywhere for users of __builtin_apply that they are
> fundamentally unsupported?
Not as far as I know. FWIW, this doesn't affect SVE (yet), because we
don't yet support any types that would be passed in the SVE-specific
part of the registers.
>> offset_4bit_signed_scaled_p
>
> So much code duplication here and in similair functions. Would a single
> interface (unsigned bits, bool signed, bool scaled) let you avoid the many
> identical functions?
We just kept to the existing style here. I agree it might be a good idea
to consolidate them, but personally I'd prefer to keep the signed/scaled
distinction in the function name, since it's more readable than booleans
and shorter than a new enum.
>> aarch64_evpc_rev_local
>
> I'm likely missing something obvious, but what is the distinction you're
> drawing between global and local? Could you comment it?
"global" reverses the whole vector: the first and last elements switch
places. "local" reverses within groups of N consecutive elements but
not between them.
But yet again names are probably my downfall here. :-) I'm happy to
call them something else instead. Either way I'll expand the comments.
>> aarch64-sve.md - scheduling types
>
> None of the instructions here have types for scheduling. That's going to
> make for a future headache. Adding them to the existing scheduling types
> is going to cause all sorts of trouble when building GCC (we already have
> too many types for some compilers to handle the structure!). We'll need
> to finds a solution to how we'll direct scheduling for SVE.
Yeah. I didn't want to add scheduling attributes now without scheduling
descriptions to go with them, since there's no way of knowing what the
division should be.
>> aarch64-sve.md - predicated operands
>
> It is a shame this ends up being so ugly and requiring UNSPEC_MERGE_PTRUE
> everywhere. That will block a lot of useful optimisation.
I don't think it blocks many in practice (at least, not the kind that
really do belong in RTL rather than gimple). Most instructions map
directly to an optab and those that don't do combine OK in the
UNSPEC_MERGE_PTRUE form (e.g. AND + NOT -> BIC).
> Otherwise, this is OK for trunk. I'm happy to take it as is, and have the
> above suggestions applied as follow-ups if you think they are worth doing.
Thanks. If we can reach quick agreement about the offset checks then
I'll roll in that change.
Richard
^ permalink raw reply [flat|nested] 18+ messages in thread
* [2/4] [AArch64] Testsuite markup for SVE
2017-11-03 17:45 [0/4] [AArch64] Add SVE support Richard Sandiford
2017-11-03 17:48 ` [1/4] [AArch64] SVE backend support Richard Sandiford
@ 2017-11-03 17:50 ` Richard Sandiford
2018-01-06 17:58 ` James Greenhalgh
2017-11-03 17:51 ` [3/4] [AArch64] SVE tests Richard Sandiford
` (2 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2017-11-03 17:50 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch adds new target selectors for SVE and updates existing
selectors accordingly. It also XFAILs some tests that don't yet
work for some SVE modes; most of these go away with follow-on
vectorisation enhancements.
2017-11-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sve)
(aarch64_sve_bits, check_effective_target_aarch64_sve_hw)
(aarch64_sve_hw_bits, check_effective_target_aarch64_sve256_hw):
New procedures.
(check_effective_target_vect_perm): Handle SVE.
(check_effective_target_vect_perm_byte): Likewise.
(check_effective_target_vect_perm_short): Likewise.
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise.
(check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
(check_effective_target_vect_widen_mult_hi_to_si): Likewise.
(check_effective_target_vect_element_align_preferred): Likewise.
(check_effective_target_vect_align_stack_vars): Likewise.
(check_effective_target_vect_load_lanes): Likewise.
(check_effective_target_vect_masked_store): Likewise.
(available_vector_sizes): Use aarch64_sve_bits for SVE.
* gcc.dg/vect/tree-vect.h (VECTOR_BITS): Define appropriately
for SVE.
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add SVE XFAIL.
* gcc.dg/vect/bb-slp-pr69907.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
* gcc.dg/vect/slp-23.c: Likewise.
* gcc.dg/vect/slp-25.c: Likewise.
* gcc.dg/vect/slp-perm-5.c: Likewise.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-perm-9.c: Likewise.
* gcc.dg/vect/slp-reduc-3.c: Likewise.
* gcc.dg/vect/vect-114.c: Likewise.
* gcc.dg/vect/vect-119.c: Likewise.
* gcc.dg/vect/vect-cselim-1.c: Likewise.
* gcc.dg/vect/vect-live-slp-1.c: Likewise.
* gcc.dg/vect/vect-live-slp-2.c: Likewise.
* gcc.dg/vect/vect-live-slp-3.c: Likewise.
* gcc.dg/vect/vect-mult-const-pattern-1.c: Likewise.
* gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-1.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp 2017-11-03 17:22:13.533564036 +0000
+++ gcc/testsuite/lib/target-supports.exp 2017-11-03 17:24:09.475993817 +0000
@@ -3350,6 +3350,35 @@ proc check_effective_target_aarch64_litt
}]
}
+# Return 1 if this is an AArch64 target supporting SVE.
+proc check_effective_target_aarch64_sve { } {
+ if { ![istarget aarch64*-*-*] } {
+ return 0
+ }
+ return [check_no_compiler_messages aarch64_sve assembly {
+ #if !defined (__ARM_FEATURE_SVE)
+ #error FOO
+ #endif
+ }]
+}
+
+# Return the size in bits of an SVE vector, or 0 if the size is variable.
+proc aarch64_sve_bits { } {
+ return [check_cached_effective_target aarch64_sve_bits {
+ global tool
+
+ set src dummy[pid].c
+ set f [open $src "w"]
+ puts $f "int bits = __ARM_FEATURE_SVE_BITS;"
+ close $f
+ set output [${tool}_target_compile $src "" preprocess ""]
+ file delete $src
+
+ regsub {.*bits = ([^;]*);.*} $output {\1} bits
+ expr { $bits }
+ }]
+}
+
# Return 1 if this is a compiler supporting ARC atomic operations
proc check_effective_target_arc_atomic { } {
return [check_no_compiler_messages arc_atomic assembly {
@@ -4275,6 +4304,49 @@ proc check_effective_target_arm_neon_hw
} [add_options_for_arm_neon ""]]
}
+# Return true if this is an AArch64 target that can run SVE code.
+
+proc check_effective_target_aarch64_sve_hw { } {
+ if { ![istarget aarch64*-*-*] } {
+ return 0
+ }
+ return [check_runtime aarch64_sve_hw_available {
+ int
+ main (void)
+ {
+ asm volatile ("ptrue p0.b");
+ return 0;
+ }
+ }]
+}
+
+# Return true if this is an AArch64 target that can run SVE code and
+# if its SVE vectors have exactly BITS bits.
+
+proc aarch64_sve_hw_bits { bits } {
+ if { ![check_effective_target_aarch64_sve_hw] } {
+ return 0
+ }
+ return [check_runtime aarch64_sve${bits}_hw [subst {
+ int
+ main (void)
+ {
+ int res;
+ asm volatile ("cntd %0" : "=r" (res));
+ if (res * 64 != $bits)
+ __builtin_abort ();
+ return 0;
+ }
+ }]]
+}
+
+# Return true if this is an AArch64 target that can run SVE code and
+# if its SVE vectors have exactly 256 bits.
+
+proc check_effective_target_aarch64_sve256_hw { } {
+ return [aarch64_sve_hw_bits 256]
+}
+
proc check_effective_target_arm_neonv2_hw { } {
return [check_runtime arm_neon_hwv2_available {
#include "arm_neon.h"
@@ -5531,7 +5603,8 @@ proc check_effective_target_vect_perm {
} else {
set et_vect_perm_saved($et_index) 0
if { [is-effective-target arm_neon]
- || [istarget aarch64*-*-*]
+ || ([istarget aarch64*-*-*]
+ && ![check_effective_target_vect_variable_length])
|| [istarget powerpc*-*-*]
|| [istarget spu-*-*]
|| [istarget i?86-*-*] || [istarget x86_64-*-*]
@@ -5636,7 +5709,8 @@ proc check_effective_target_vect_perm_by
if { ([is-effective-target arm_neon]
&& [is-effective-target arm_little_endian])
|| ([istarget aarch64*-*-*]
- && [is-effective-target aarch64_little_endian])
+ && [is-effective-target aarch64_little_endian]
+ && ![check_effective_target_vect_variable_length])
|| [istarget powerpc*-*-*]
|| [istarget spu-*-*]
|| ([istarget mips-*.*]
@@ -5675,7 +5749,8 @@ proc check_effective_target_vect_perm_sh
if { ([is-effective-target arm_neon]
&& [is-effective-target arm_little_endian])
|| ([istarget aarch64*-*-*]
- && [is-effective-target aarch64_little_endian])
+ && [is-effective-target aarch64_little_endian]
+ && ![check_effective_target_vect_variable_length])
|| [istarget powerpc*-*-*]
|| [istarget spu-*-*]
|| ([istarget mips*-*-*]
@@ -5735,7 +5810,8 @@ proc check_effective_target_vect_widen_s
} else {
set et_vect_widen_sum_hi_to_si_pattern_saved($et_index) 0
if { [istarget powerpc*-*-*]
- || [istarget aarch64*-*-*]
+ || ([istarget aarch64*-*-*]
+ && ![check_effective_target_aarch64_sve])
|| [is-effective-target arm_neon]
|| [istarget ia64-*-*] } {
set et_vect_widen_sum_hi_to_si_pattern_saved($et_index) 1
@@ -5847,7 +5923,8 @@ proc check_effective_target_vect_widen_m
set et_vect_widen_mult_qi_to_hi_saved($et_index) 0
}
if { [istarget powerpc*-*-*]
- || [istarget aarch64*-*-*]
+ || ([istarget aarch64*-*-*]
+ && ![check_effective_target_aarch64_sve])
|| [is-effective-target arm_neon]
|| ([istarget s390*-*-*]
&& [check_effective_target_s390_vx]) } {
@@ -5885,7 +5962,8 @@ proc check_effective_target_vect_widen_m
if { [istarget powerpc*-*-*]
|| [istarget spu-*-*]
|| [istarget ia64-*-*]
- || [istarget aarch64*-*-*]
+ || ([istarget aarch64*-*-*]
+ && ![check_effective_target_aarch64_sve])
|| [istarget i?86-*-*] || [istarget x86_64-*-*]
|| [is-effective-target arm_neon]
|| ([istarget s390*-*-*]
@@ -6347,12 +6425,16 @@ proc check_effective_target_vect_natural
# alignment during vectorization.
proc check_effective_target_vect_element_align_preferred { } {
- return [check_effective_target_vect_variable_length]
+ return [expr { [check_effective_target_aarch64_sve]
+ && [check_effective_target_vect_variable_length] }]
}
# Return 1 if we can align stack data to the preferred vector alignment.
proc check_effective_target_vect_align_stack_vars { } {
+ if { [check_effective_target_aarch64_sve] } {
+ return [check_effective_target_vect_variable_length]
+ }
return 1
}
@@ -6424,7 +6506,8 @@ proc check_effective_target_vect_load_la
} else {
set et_vect_load_lanes 0
if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok])
- || [istarget aarch64*-*-*] } {
+ || ([istarget aarch64*-*-*]
+ && ![check_effective_target_aarch64_sve]) } {
set et_vect_load_lanes 1
}
}
@@ -6436,7 +6519,7 @@ proc check_effective_target_vect_load_la
# Return 1 if the target supports vector masked stores.
proc check_effective_target_vect_masked_store { } {
- return 0
+ return [check_effective_target_aarch64_sve]
}
# Return 1 if the target supports vector conditional operations, 0 otherwise.
@@ -6704,6 +6787,9 @@ foreach N {2 3 4 8} {
proc available_vector_sizes { } {
set result {}
if { [istarget aarch64*-*-*] } {
+ if { [check_effective_target_aarch64_sve] } {
+ lappend result [aarch64_sve_bits]
+ }
lappend result 128 64
} elseif { [istarget arm*-*-*]
&& [check_effective_target_arm_neon_ok] } {
Index: gcc/testsuite/gcc.dg/vect/tree-vect.h
===================================================================
--- gcc/testsuite/gcc.dg/vect/tree-vect.h 2017-11-03 17:21:09.761094925 +0000
+++ gcc/testsuite/gcc.dg/vect/tree-vect.h 2017-11-03 17:24:09.472993942 +0000
@@ -76,4 +76,12 @@ check_vect (void)
signal (SIGILL, SIG_DFL);
}
-#define VECTOR_BITS 128
+#if defined (__ARM_FEATURE_SVE)
+# if __ARM_FEATURE_SVE_BITS == 0
+# define VECTOR_BITS 1024
+# else
+# define VECTOR_BITS __ARM_FEATURE_SVE_BITS
+# endif
+#else
+# define VECTOR_BITS 128
+#endif
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 2017-02-23 19:54:09.000000000 +0000
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 2017-11-03 17:24:09.471993983 +0000
@@ -25,4 +25,4 @@ foo ()
but the loop reads only one element at a time, and DOM cannot resolve these.
The same happens on powerpc depending on the SIMD support available. */
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* } || { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* } || { { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } || aarch64_sve } } } } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c 2017-11-03 17:21:09.758095046 +0000
+++ gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c 2017-11-03 17:24:09.471993983 +0000
@@ -17,4 +17,6 @@ void foo(unsigned *p1, unsigned short *p
p1[n] = p2[n * 2];
}
-/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" } } */
+/* Disable for SVE because for long or variable-length vectors we don't
+ get an unrolled epilogue loop. */
+/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { ! aarch64_sve } } } } */
Index: gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c 2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c 2017-11-03 17:24:09.471993983 +0000
@@ -51,4 +51,7 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" } } */
+/* Requires reverse for variable-length SVE, which is implemented for
+ by a later patch. Until then we report it twice, once for SVE and
+ once for 128-bit Advanced SIMD. */
+/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
Index: gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c 2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c 2017-11-03 17:24:09.472993942 +0000
@@ -183,4 +183,7 @@ int main ()
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "dependence distance negative" 4 "vect" } } */
+/* f4 requires reverse for SVE, which is implemented by a later patch.
+ Until then we report it twice, once for SVE and once for 128-bit
+ Advanced SIMD. */
+/* { dg-final { scan-tree-dump-times "dependence distance negative" 4 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-23.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-23.c 2017-11-03 17:21:09.742095692 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-23.c 2017-11-03 17:24:09.472993942 +0000
@@ -107,6 +107,8 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided8 && { ! { vect_no_align} } } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided8 || vect_no_align } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_perm } } } } */
+/* We fail to vectorize the second loop with variable-length SVE but
+ fall back to 128-bit vectors, which does use SLP. */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_perm } xfail aarch64_sve } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_perm } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-25.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-25.c 2017-11-03 17:21:09.814092784 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-25.c 2017-11-03 17:24:09.472993942 +0000
@@ -57,4 +57,6 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail { { ! vect_unaligned_possible } || { ! vect_natural_alignment } } } } } */
+/* Needs store_lanes for SVE, otherwise falls back to Advanced SIMD.
+ Will be fixed when SVE LOAD_LANES support is added. */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail { { { ! vect_unaligned_possible } || { ! vect_natural_alignment } } && { ! { aarch64_sve && vect_variable_length } } } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-perm-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-perm-5.c 2017-11-03 17:21:09.792093672 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-perm-5.c 2017-11-03 17:24:09.472993942 +0000
@@ -104,7 +104,9 @@ int main (int argc, const char* argv[])
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && {! vect_load_lanes } } } } } */
+/* Fails for variable-length SVE because we fall back to Advanced SIMD
+ and use LD3/ST3. Will be fixed when SVE LOAD_LANES support is added. */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && {! vect_load_lanes } } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */
/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } */
/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-perm-6.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-perm-6.c 2017-11-03 17:21:09.792093672 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-perm-6.c 2017-11-03 17:24:09.472993942 +0000
@@ -103,7 +103,9 @@ int main (int argc, const char* argv[])
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && {! vect_load_lanes } } } } } */
+/* Fails for variable-length SVE because we fall back to Advanced SIMD
+ and use LD3/ST3. Will be fixed when SVE LOAD_LANES support is added. */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && {! vect_load_lanes } } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_load_lanes } } } */
/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } */
/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-perm-9.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-perm-9.c 2017-11-03 17:21:09.792093672 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-perm-9.c 2017-11-03 17:24:09.472993942 +0000
@@ -57,10 +57,11 @@ int main (int argc, const char* argv[])
return 0;
}
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 2 "vect" { target { ! { vect_perm_short || vect_load_lanes } } } } } */
+/* Fails for variable-length SVE because we fall back to Advanced SIMD
+ and use LD3/ST3. Will be fixed when SVE LOAD_LANES support is added. */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 2 "vect" { target { ! { vect_perm_short || vect_load_lanes } } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_perm_short || vect_load_lanes } } } } */
/* { dg-final { scan-tree-dump-times "permutation requires at least three vectors" 1 "vect" { target { vect_perm_short && { ! vect_perm3_short } } } } } */
/* { dg-final { scan-tree-dump-not "permutation requires at least three vectors" "vect" { target vect_perm3_short } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { { ! vect_perm3_short } || vect_load_lanes } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm3_short && { ! vect_load_lanes } } } } } */
-
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-3.c 2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-3.c 2017-11-03 17:24:09.472993942 +0000
@@ -58,4 +58,7 @@ int main (void)
/* The initialization loop in main also gets vectorized. */
/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { vect_short_mult && { vect_widen_sum_hi_to_si && vect_unpack } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_widen_sum_hi_to_si_pattern || { ! vect_unpack } } } } } */
+/* We can't yet create the necessary SLP constant vector for variable-length
+ SVE and so fall back to Advanced SIMD. This means that we repeat each
+ analysis note. */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_widen_sum_hi_to_si_pattern || { { ! vect_unpack } || { aarch64_sve && vect_variable_length } } } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-114.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-114.c 2015-06-02 23:53:35.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-114.c 2017-11-03 17:24:09.473993900 +0000
@@ -34,6 +34,9 @@ int main (void)
return main1 ();
}
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_perm } } } } } */
+/* Requires reverse for SVE, which is implemented by a later patch.
+ Until then we fall back to Advanced SIMD and successfully vectorize
+ the loop. */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_perm } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-119.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-119.c 2015-08-05 22:28:34.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-119.c 2017-11-03 17:24:09.473993900 +0000
@@ -25,4 +25,7 @@ unsigned int foo (const unsigned int x[O
return sum;
}
-/* { dg-final { scan-tree-dump-times "Detected interleaving load of size 2" 1 "vect" } } */
+/* Requires load-lanes for SVE, which is implemented by a later patch.
+ Until then we report it twice, once for SVE and once for 128-bit
+ Advanced SIMD. */
+/* { dg-final { scan-tree-dump-times "Detected interleaving load of size 2" 1 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-cselim-1.c 2017-11-03 17:21:09.849091370 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-cselim-1.c 2017-11-03 17:24:09.473993900 +0000
@@ -83,4 +83,6 @@ main (void)
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! vect_masked_store } xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_masked_store } } } } */
+/* Fails for variable-length SVE because we can't yet handle the
+ interleaved load. This is fixed by a later patch. */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_masked_store xfail { aarch64_sve && vect_variable_length } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c 2016-11-22 21:16:10.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c 2017-11-03 17:24:09.473993900 +0000
@@ -69,4 +69,7 @@ main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" } } */
+/* We can't yet create the necessary SLP constant vector for variable-length
+ SVE and so fall back to Advanced SIMD. This means that we repeat each
+ analysis note. */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" { xfail { aarch64_sve && vect_variable_length } } } }*/
Index: gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c 2016-11-22 21:16:10.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c 2017-11-03 17:24:09.473993900 +0000
@@ -63,4 +63,7 @@ main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 2 "vect" } } */
+/* We can't yet create the necessary SLP constant vector for variable-length
+ SVE and so fall back to Advanced SIMD. This means that we repeat each
+ analysis note. */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 2 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c 2016-11-22 21:16:10.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c 2017-11-03 17:24:09.473993900 +0000
@@ -70,4 +70,7 @@ main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" } } */
+/* We can't yet create the necessary SLP constant vector for variable-length
+ SVE and so fall back to Advanced SIMD. This means that we repeat each
+ analysis note. */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c 2016-11-22 21:16:10.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c 2017-11-03 17:24:09.473993900 +0000
@@ -37,5 +37,5 @@ main (void)
return 0;
}
-/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* xfail aarch64_sve } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target aarch64*-*-* } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c 2016-11-22 21:16:10.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c 2017-11-03 17:24:09.473993900 +0000
@@ -36,5 +36,5 @@ main (void)
return 0;
}
-/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* xfail aarch64_sve } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target aarch64*-*-* } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2016-01-13 13:48:42.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c 2017-11-03 17:24:09.473993900 +0000
@@ -59,6 +59,8 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
+/* Requires LD4 for variable-length SVE. Until that's supported we fall
+ back to Advanced SIMD, which does have widening shifts. */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2017-11-03 17:21:09.762094884 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c 2017-11-03 17:24:09.474993858 +0000
@@ -63,7 +63,9 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
+/* Requires LD4 for variable-length SVE. Until that's supported we fall
+ back to Advanced SIMD, which does have widening shifts. */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2017-11-03 17:21:09.763094844 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 2017-11-03 17:24:09.474993858 +0000
@@ -59,7 +59,9 @@ int main (void)
return 0;
}
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
+/* Requires LD4 for variable-length SVE. Until that's supported we fall
+ back to Advanced SIMD, which does have widening shifts. */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2016-01-13 13:48:42.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c 2017-11-03 17:24:09.474993858 +0000
@@ -63,6 +63,8 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
+/* Requires LD4 for variable-length SVE. Until that's supported we fall
+ back to Advanced SIMD, which does have widening shifts. */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2017-11-03 17:21:09.763094844 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c 2017-11-03 17:24:09.474993858 +0000
@@ -67,7 +67,9 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
+/* Requires LD4 for variable-length SVE. Until that's supported we fall
+ back to Advanced SIMD, which does have widening shifts. */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } xfail { aarch64_sve && vect_variable_length } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
^ permalink raw reply [flat|nested] 18+ messages in thread
* [3/4] [AArch64] SVE tests
2017-11-03 17:45 [0/4] [AArch64] Add SVE support Richard Sandiford
2017-11-03 17:48 ` [1/4] [AArch64] SVE backend support Richard Sandiford
2017-11-03 17:50 ` [2/4] [AArch64] Testsuite markup for SVE Richard Sandiford
@ 2017-11-03 17:51 ` Richard Sandiford
2018-01-06 18:06 ` James Greenhalgh
2017-11-03 17:52 ` [4/4] SVE unwinding Richard Sandiford
2017-11-24 16:34 ` [0/4] [AArch64] Add SVE support Richard Sandiford
4 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2017-11-03 17:51 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
[-- Attachment #1: Type: text/plain, Size: 8897 bytes --]
This patch adds gcc.target/aarch64 tests for SVE, and forces some
existing Advanced SIMD tests to use -march=armv8-a.
2017-11-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/testsuite/
* gcc.target/aarch64/bic_imm_1.c: Force -march=armv8-a.
* gcc.target/aarch64/fmaxmin.c: Likewise.
* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
* gcc.target/aarch64/orr_imm_1.c: Likewise.
* gcc.target/aarch64/pr62178.c: Likewise.
* gcc.target/aarch64/pr71727-2.c: Likewise.
* gcc.target/aarch64/saddw-1.c: Likewise.
* gcc.target/aarch64/saddw-2.c: Likewise.
* gcc.target/aarch64/uaddw-1.c: Likewise.
* gcc.target/aarch64/uaddw-2.c: Likewise.
* gcc.target/aarch64/uaddw-3.c: Likewise.
* gcc.target/aarch64/vect-add-sub-cond.c: Likewise.
* gcc.target/aarch64/vect-compile.c: Likewise.
* gcc.target/aarch64/vect-faddv-compile.c: Likewise.
* gcc.target/aarch64/vect-fcm-eq-d.c: Likewise.
* gcc.target/aarch64/vect-fcm-eq-f.c: Likewise.
* gcc.target/aarch64/vect-fcm-ge-d.c: Likewise.
* gcc.target/aarch64/vect-fcm-ge-f.c: Likewise.
* gcc.target/aarch64/vect-fcm-gt-d.c: Likewise.
* gcc.target/aarch64/vect-fcm-gt-f.c: Likewise.
* gcc.target/aarch64/vect-fmax-fmin-compile.c: Likewise.
* gcc.target/aarch64/vect-fmaxv-fminv-compile.c: Likewise.
* gcc.target/aarch64/vect-fmovd-zero.c: Likewise.
* gcc.target/aarch64/vect-fmovd.c: Likewise.
* gcc.target/aarch64/vect-fmovf-zero.c: Likewise.
* gcc.target/aarch64/vect-fmovf.c: Likewise.
* gcc.target/aarch64/vect-fp-compile.c: Likewise.
* gcc.target/aarch64/vect-ld1r-compile-fp.c: Likewise.
* gcc.target/aarch64/vect-ld1r-compile.c: Likewise.
* gcc.target/aarch64/vect-movi.c: Likewise.
* gcc.target/aarch64/vect-mull-compile.c: Likewise.
* gcc.target/aarch64/vect-reduc-or_1.c: Likewise.
* gcc.target/aarch64/vect-vaddv.c: Likewise.
* gcc.target/aarch64/vect_saddl_1.c: Likewise.
* gcc.target/aarch64/vect_smlal_1.c: Likewise.
* gcc.target/aarch64/vector_initialization_nostack.c: XFAIL for
fixed-length SVE.
* gcc.target/aarch64/sve_arith_1.c: New test.
* gcc.target/aarch64/sve_const_pred_1.C: Likewise.
* gcc.target/aarch64/sve_const_pred_2.C: Likewise.
* gcc.target/aarch64/sve_const_pred_3.C: Likewise.
* gcc.target/aarch64/sve_const_pred_4.C: Likewise.
* gcc.target/aarch64/sve_cvtf_signed_1.c: Likewise.
* gcc.target/aarch64/sve_cvtf_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve_cvtf_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve_cvtf_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve_dup_imm_1.c: Likewise.
* gcc.target/aarch64/sve_dup_imm_1_run.c: Likewise.
* gcc.target/aarch64/sve_dup_lane_1.c: Likewise.
* gcc.target/aarch64/sve_ext_1.c: Likewise.
* gcc.target/aarch64/sve_ext_2.c: Likewise.
* gcc.target/aarch64/sve_extract_1.c: Likewise.
* gcc.target/aarch64/sve_extract_2.c: Likewise.
* gcc.target/aarch64/sve_extract_3.c: Likewise.
* gcc.target/aarch64/sve_extract_4.c: Likewise.
* gcc.target/aarch64/sve_fabs_1.c: Likewise.
* gcc.target/aarch64/sve_fcvtz_signed_1.c: Likewise.
* gcc.target/aarch64/sve_fcvtz_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve_fcvtz_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve_fcvtz_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve_fdiv_1.c: Likewise.
* gcc.target/aarch64/sve_fdup_1.c: Likewise.
* gcc.target/aarch64/sve_fdup_1_run.c: Likewise.
* gcc.target/aarch64/sve_fmad_1.c: Likewise.
* gcc.target/aarch64/sve_fmla_1.c: Likewise.
* gcc.target/aarch64/sve_fmls_1.c: Likewise.
* gcc.target/aarch64/sve_fmsb_1.c: Likewise.
* gcc.target/aarch64/sve_fmul_1.c: Likewise.
* gcc.target/aarch64/sve_fneg_1.c: Likewise.
* gcc.target/aarch64/sve_fnmad_1.c: Likewise.
* gcc.target/aarch64/sve_fnmla_1.c: Likewise.
* gcc.target/aarch64/sve_fnmls_1.c: Likewise.
* gcc.target/aarch64/sve_fnmsb_1.c: Likewise.
* gcc.target/aarch64/sve_fp_arith_1.c: Likewise.
* gcc.target/aarch64/sve_frinta_1.c: Likewise.
* gcc.target/aarch64/sve_frinti_1.c: Likewise.
* gcc.target/aarch64/sve_frintm_1.c: Likewise.
* gcc.target/aarch64/sve_frintp_1.c: Likewise.
* gcc.target/aarch64/sve_frintx_1.c: Likewise.
* gcc.target/aarch64/sve_frintz_1.c: Likewise.
* gcc.target/aarch64/sve_fsqrt_1.c: Likewise.
* gcc.target/aarch64/sve_fsubr_1.c: Likewise.
* gcc.target/aarch64/sve_index_1.c: Likewise.
* gcc.target/aarch64/sve_index_1_run.c: Likewise.
* gcc.target/aarch64/sve_ld1r_1.c: Likewise.
* gcc.target/aarch64/sve_load_const_offset_1.c: Likewise.
* gcc.target/aarch64/sve_load_scalar_offset_1.c: Likewise.
* gcc.target/aarch64/sve_logical_1.c: Likewise.
* gcc.target/aarch64/sve_loop_add_1.c: Likewise.
* gcc.target/aarch64/sve_loop_add_1_run.c: Likewise.
* gcc.target/aarch64/sve_mad_1.c: Likewise.
* gcc.target/aarch64/sve_maxmin_1.c: Likewise.
* gcc.target/aarch64/sve_maxmin_1_run.c: Likewise.
* gcc.target/aarch64/sve_maxmin_strict_1.c: Likewise.
* gcc.target/aarch64/sve_maxmin_strict_1_run.c: Likewise.
* gcc.target/aarch64/sve_mla_1.c: Likewise.
* gcc.target/aarch64/sve_mls_1.c: Likewise.
* gcc.target/aarch64/sve_mov_rr_1.c: Likewise.
* gcc.target/aarch64/sve_msb_1.c: Likewise.
* gcc.target/aarch64/sve_mul_1.c: Likewise.
* gcc.target/aarch64/sve_neg_1.c: Likewise.
* gcc.target/aarch64/sve_nlogical_1.c: Likewise.
* gcc.target/aarch64/sve_nlogical_1_run.c: Likewise.
* gcc.target/aarch64/sve_pack_1.c: Likewise.
* gcc.target/aarch64/sve_pack_1_run.c: Likewise.
* gcc.target/aarch64/sve_pack_fcvt_signed_1.c: Likewise.
* gcc.target/aarch64/sve_pack_fcvt_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve_pack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve_pack_fcvt_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve_pack_float_1.c: Likewise.
* gcc.target/aarch64/sve_pack_float_1_run.c: Likewise.
* gcc.target/aarch64/sve_popcount_1.c: Likewise.
* gcc.target/aarch64/sve_popcount_1_run.c: Likewise.
* gcc.target/aarch64/sve_reduc_1.c: Likewise.
* gcc.target/aarch64/sve_reduc_1_run.c: Likewise.
* gcc.target/aarch64/sve_reduc_2.c: Likewise.
* gcc.target/aarch64/sve_reduc_2_run.c: Likewise.
* gcc.target/aarch64/sve_reduc_3.c: Likewise.
* gcc.target/aarch64/sve_revb_1.c: Likewise.
* gcc.target/aarch64/sve_revh_1.c: Likewise.
* gcc.target/aarch64/sve_revw_1.c: Likewise.
* gcc.target/aarch64/sve_shift_1.c: Likewise.
* gcc.target/aarch64/sve_single_1.c: Likewise.
* gcc.target/aarch64/sve_single_2.c: Likewise.
* gcc.target/aarch64/sve_single_3.c: Likewise.
* gcc.target/aarch64/sve_single_4.c: Likewise.
* gcc.target/aarch64/sve_store_scalar_offset_1.c: Likewise.
* gcc.target/aarch64/sve_subr_1.c: Likewise.
* gcc.target/aarch64/sve_trn1_1.c: Likewise.
* gcc.target/aarch64/sve_trn2_1.c: Likewise.
* gcc.target/aarch64/sve_unpack_fcvt_signed_1.c: Likewise.
* gcc.target/aarch64/sve_unpack_fcvt_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve_unpack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve_unpack_fcvt_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve_unpack_float_1.c: Likewise.
* gcc.target/aarch64/sve_unpack_float_1_run.c: Likewise.
* gcc.target/aarch64/sve_unpack_signed_1.c: Likewise.
* gcc.target/aarch64/sve_unpack_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve_unpack_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve_unpack_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve_uzp1_1.c: Likewise.
* gcc.target/aarch64/sve_uzp1_1_run.c: Likewise.
* gcc.target/aarch64/sve_uzp2_1.c: Likewise.
* gcc.target/aarch64/sve_uzp2_1_run.c: Likewise.
* gcc.target/aarch64/sve_vcond_1.C: Likewise.
* gcc.target/aarch64/sve_vcond_1_run.C: Likewise.
* gcc.target/aarch64/sve_vcond_2.c: Likewise.
* gcc.target/aarch64/sve_vcond_2_run.c: Likewise.
* gcc.target/aarch64/sve_vcond_3.c: Likewise.
* gcc.target/aarch64/sve_vcond_4.c: Likewise.
* gcc.target/aarch64/sve_vcond_4_run.c: Likewise.
* gcc.target/aarch64/sve_vcond_5.c: Likewise.
* gcc.target/aarch64/sve_vcond_5_run.c: Likewise.
* gcc.target/aarch64/sve_vcond_6.c: Likewise.
* gcc.target/aarch64/sve_vcond_6_run.c: Likewise.
* gcc.target/aarch64/sve_vec_init_1.c: Likewise.
* gcc.target/aarch64/sve_vec_init_1_run.c: Likewise.
* gcc.target/aarch64/sve_vec_init_2.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_1.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_1_run.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_1_overrange_run.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_const_1.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_const_1_overrun.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_const_1_run.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_const_single_1.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_const_single_1_run.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_single_1.c: Likewise.
* gcc.target/aarch64/sve_vec_perm_single_1_run.c: Likewise.
* gcc.target/aarch64/sve_zip1_1.c: Likewise.
* gcc.target/aarch64/sve_zip2_1.c: Likewise.
[-- Attachment #2: sve-03-tests.diff.gz --]
[-- Type: application/gzip, Size: 31572 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [3/4] [AArch64] SVE tests
2017-11-03 17:51 ` [3/4] [AArch64] SVE tests Richard Sandiford
@ 2018-01-06 18:06 ` James Greenhalgh
2018-01-06 19:13 ` Richard Sandiford
0 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2018-01-06 18:06 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Nov 03, 2017 at 05:50:54PM +0000, Richard Sandiford wrote:
> This patch adds gcc.target/aarch64 tests for SVE, and forces some
> existing Advanced SIMD tests to use -march=armv8-a.
I'm going to assume that these new testcases are broadly sensible, and not
spend any significant time looking at them.
I'm not completely happy forcing the architecture to Armv8-a - it would be
useful for our testing coverage if users which have configured with other
architecture variants had this test execute in those environments. That
way we'd check we still do the right thing once we have an implicit
-march=armv8.2-a .
However, as we don't have a good way to make that happen (other than maybe
only forcing the arch if we are in a configuration wired for SVE?) I'm
happy with this patch as a compromise for now.
OK, but a modification to cover the above point would make me happier.
Thanks,
James
>
>
> 2017-11-03 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/testsuite/
> * gcc.target/aarch64/bic_imm_1.c: Force -march=armv8-a.
> * gcc.target/aarch64/fmaxmin.c: Likewise.
> * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
> * gcc.target/aarch64/orr_imm_1.c: Likewise.
> * gcc.target/aarch64/pr62178.c: Likewise.
> * gcc.target/aarch64/pr71727-2.c: Likewise.
> * gcc.target/aarch64/saddw-1.c: Likewise.
> * gcc.target/aarch64/saddw-2.c: Likewise.
> * gcc.target/aarch64/uaddw-1.c: Likewise.
> * gcc.target/aarch64/uaddw-2.c: Likewise.
> * gcc.target/aarch64/uaddw-3.c: Likewise.
> * gcc.target/aarch64/vect-add-sub-cond.c: Likewise.
> * gcc.target/aarch64/vect-compile.c: Likewise.
> * gcc.target/aarch64/vect-faddv-compile.c: Likewise.
> * gcc.target/aarch64/vect-fcm-eq-d.c: Likewise.
> * gcc.target/aarch64/vect-fcm-eq-f.c: Likewise.
> * gcc.target/aarch64/vect-fcm-ge-d.c: Likewise.
> * gcc.target/aarch64/vect-fcm-ge-f.c: Likewise.
> * gcc.target/aarch64/vect-fcm-gt-d.c: Likewise.
> * gcc.target/aarch64/vect-fcm-gt-f.c: Likewise.
> * gcc.target/aarch64/vect-fmax-fmin-compile.c: Likewise.
> * gcc.target/aarch64/vect-fmaxv-fminv-compile.c: Likewise.
> * gcc.target/aarch64/vect-fmovd-zero.c: Likewise.
> * gcc.target/aarch64/vect-fmovd.c: Likewise.
> * gcc.target/aarch64/vect-fmovf-zero.c: Likewise.
> * gcc.target/aarch64/vect-fmovf.c: Likewise.
> * gcc.target/aarch64/vect-fp-compile.c: Likewise.
> * gcc.target/aarch64/vect-ld1r-compile-fp.c: Likewise.
> * gcc.target/aarch64/vect-ld1r-compile.c: Likewise.
> * gcc.target/aarch64/vect-movi.c: Likewise.
> * gcc.target/aarch64/vect-mull-compile.c: Likewise.
> * gcc.target/aarch64/vect-reduc-or_1.c: Likewise.
> * gcc.target/aarch64/vect-vaddv.c: Likewise.
> * gcc.target/aarch64/vect_saddl_1.c: Likewise.
> * gcc.target/aarch64/vect_smlal_1.c: Likewise.
> * gcc.target/aarch64/vector_initialization_nostack.c: XFAIL for
> fixed-length SVE.
> * gcc.target/aarch64/sve_arith_1.c: New test.
> * gcc.target/aarch64/sve_const_pred_1.C: Likewise.
> * gcc.target/aarch64/sve_const_pred_2.C: Likewise.
> * gcc.target/aarch64/sve_const_pred_3.C: Likewise.
> * gcc.target/aarch64/sve_const_pred_4.C: Likewise.
> * gcc.target/aarch64/sve_cvtf_signed_1.c: Likewise.
> * gcc.target/aarch64/sve_cvtf_signed_1_run.c: Likewise.
> * gcc.target/aarch64/sve_cvtf_unsigned_1.c: Likewise.
> * gcc.target/aarch64/sve_cvtf_unsigned_1_run.c: Likewise.
> * gcc.target/aarch64/sve_dup_imm_1.c: Likewise.
> * gcc.target/aarch64/sve_dup_imm_1_run.c: Likewise.
> * gcc.target/aarch64/sve_dup_lane_1.c: Likewise.
> * gcc.target/aarch64/sve_ext_1.c: Likewise.
> * gcc.target/aarch64/sve_ext_2.c: Likewise.
> * gcc.target/aarch64/sve_extract_1.c: Likewise.
> * gcc.target/aarch64/sve_extract_2.c: Likewise.
> * gcc.target/aarch64/sve_extract_3.c: Likewise.
> * gcc.target/aarch64/sve_extract_4.c: Likewise.
> * gcc.target/aarch64/sve_fabs_1.c: Likewise.
> * gcc.target/aarch64/sve_fcvtz_signed_1.c: Likewise.
> * gcc.target/aarch64/sve_fcvtz_signed_1_run.c: Likewise.
> * gcc.target/aarch64/sve_fcvtz_unsigned_1.c: Likewise.
> * gcc.target/aarch64/sve_fcvtz_unsigned_1_run.c: Likewise.
> * gcc.target/aarch64/sve_fdiv_1.c: Likewise.
> * gcc.target/aarch64/sve_fdup_1.c: Likewise.
> * gcc.target/aarch64/sve_fdup_1_run.c: Likewise.
> * gcc.target/aarch64/sve_fmad_1.c: Likewise.
> * gcc.target/aarch64/sve_fmla_1.c: Likewise.
> * gcc.target/aarch64/sve_fmls_1.c: Likewise.
> * gcc.target/aarch64/sve_fmsb_1.c: Likewise.
> * gcc.target/aarch64/sve_fmul_1.c: Likewise.
> * gcc.target/aarch64/sve_fneg_1.c: Likewise.
> * gcc.target/aarch64/sve_fnmad_1.c: Likewise.
> * gcc.target/aarch64/sve_fnmla_1.c: Likewise.
> * gcc.target/aarch64/sve_fnmls_1.c: Likewise.
> * gcc.target/aarch64/sve_fnmsb_1.c: Likewise.
> * gcc.target/aarch64/sve_fp_arith_1.c: Likewise.
> * gcc.target/aarch64/sve_frinta_1.c: Likewise.
> * gcc.target/aarch64/sve_frinti_1.c: Likewise.
> * gcc.target/aarch64/sve_frintm_1.c: Likewise.
> * gcc.target/aarch64/sve_frintp_1.c: Likewise.
> * gcc.target/aarch64/sve_frintx_1.c: Likewise.
> * gcc.target/aarch64/sve_frintz_1.c: Likewise.
> * gcc.target/aarch64/sve_fsqrt_1.c: Likewise.
> * gcc.target/aarch64/sve_fsubr_1.c: Likewise.
> * gcc.target/aarch64/sve_index_1.c: Likewise.
> * gcc.target/aarch64/sve_index_1_run.c: Likewise.
> * gcc.target/aarch64/sve_ld1r_1.c: Likewise.
> * gcc.target/aarch64/sve_load_const_offset_1.c: Likewise.
> * gcc.target/aarch64/sve_load_scalar_offset_1.c: Likewise.
> * gcc.target/aarch64/sve_logical_1.c: Likewise.
> * gcc.target/aarch64/sve_loop_add_1.c: Likewise.
> * gcc.target/aarch64/sve_loop_add_1_run.c: Likewise.
> * gcc.target/aarch64/sve_mad_1.c: Likewise.
> * gcc.target/aarch64/sve_maxmin_1.c: Likewise.
> * gcc.target/aarch64/sve_maxmin_1_run.c: Likewise.
> * gcc.target/aarch64/sve_maxmin_strict_1.c: Likewise.
> * gcc.target/aarch64/sve_maxmin_strict_1_run.c: Likewise.
> * gcc.target/aarch64/sve_mla_1.c: Likewise.
> * gcc.target/aarch64/sve_mls_1.c: Likewise.
> * gcc.target/aarch64/sve_mov_rr_1.c: Likewise.
> * gcc.target/aarch64/sve_msb_1.c: Likewise.
> * gcc.target/aarch64/sve_mul_1.c: Likewise.
> * gcc.target/aarch64/sve_neg_1.c: Likewise.
> * gcc.target/aarch64/sve_nlogical_1.c: Likewise.
> * gcc.target/aarch64/sve_nlogical_1_run.c: Likewise.
> * gcc.target/aarch64/sve_pack_1.c: Likewise.
> * gcc.target/aarch64/sve_pack_1_run.c: Likewise.
> * gcc.target/aarch64/sve_pack_fcvt_signed_1.c: Likewise.
> * gcc.target/aarch64/sve_pack_fcvt_signed_1_run.c: Likewise.
> * gcc.target/aarch64/sve_pack_fcvt_unsigned_1.c: Likewise.
> * gcc.target/aarch64/sve_pack_fcvt_unsigned_1_run.c: Likewise.
> * gcc.target/aarch64/sve_pack_float_1.c: Likewise.
> * gcc.target/aarch64/sve_pack_float_1_run.c: Likewise.
> * gcc.target/aarch64/sve_popcount_1.c: Likewise.
> * gcc.target/aarch64/sve_popcount_1_run.c: Likewise.
> * gcc.target/aarch64/sve_reduc_1.c: Likewise.
> * gcc.target/aarch64/sve_reduc_1_run.c: Likewise.
> * gcc.target/aarch64/sve_reduc_2.c: Likewise.
> * gcc.target/aarch64/sve_reduc_2_run.c: Likewise.
> * gcc.target/aarch64/sve_reduc_3.c: Likewise.
> * gcc.target/aarch64/sve_revb_1.c: Likewise.
> * gcc.target/aarch64/sve_revh_1.c: Likewise.
> * gcc.target/aarch64/sve_revw_1.c: Likewise.
> * gcc.target/aarch64/sve_shift_1.c: Likewise.
> * gcc.target/aarch64/sve_single_1.c: Likewise.
> * gcc.target/aarch64/sve_single_2.c: Likewise.
> * gcc.target/aarch64/sve_single_3.c: Likewise.
> * gcc.target/aarch64/sve_single_4.c: Likewise.
> * gcc.target/aarch64/sve_store_scalar_offset_1.c: Likewise.
> * gcc.target/aarch64/sve_subr_1.c: Likewise.
> * gcc.target/aarch64/sve_trn1_1.c: Likewise.
> * gcc.target/aarch64/sve_trn2_1.c: Likewise.
> * gcc.target/aarch64/sve_unpack_fcvt_signed_1.c: Likewise.
> * gcc.target/aarch64/sve_unpack_fcvt_signed_1_run.c: Likewise.
> * gcc.target/aarch64/sve_unpack_fcvt_unsigned_1.c: Likewise.
> * gcc.target/aarch64/sve_unpack_fcvt_unsigned_1_run.c: Likewise.
> * gcc.target/aarch64/sve_unpack_float_1.c: Likewise.
> * gcc.target/aarch64/sve_unpack_float_1_run.c: Likewise.
> * gcc.target/aarch64/sve_unpack_signed_1.c: Likewise.
> * gcc.target/aarch64/sve_unpack_signed_1_run.c: Likewise.
> * gcc.target/aarch64/sve_unpack_unsigned_1.c: Likewise.
> * gcc.target/aarch64/sve_unpack_unsigned_1_run.c: Likewise.
> * gcc.target/aarch64/sve_uzp1_1.c: Likewise.
> * gcc.target/aarch64/sve_uzp1_1_run.c: Likewise.
> * gcc.target/aarch64/sve_uzp2_1.c: Likewise.
> * gcc.target/aarch64/sve_uzp2_1_run.c: Likewise.
> * gcc.target/aarch64/sve_vcond_1.C: Likewise.
> * gcc.target/aarch64/sve_vcond_1_run.C: Likewise.
> * gcc.target/aarch64/sve_vcond_2.c: Likewise.
> * gcc.target/aarch64/sve_vcond_2_run.c: Likewise.
> * gcc.target/aarch64/sve_vcond_3.c: Likewise.
> * gcc.target/aarch64/sve_vcond_4.c: Likewise.
> * gcc.target/aarch64/sve_vcond_4_run.c: Likewise.
> * gcc.target/aarch64/sve_vcond_5.c: Likewise.
> * gcc.target/aarch64/sve_vcond_5_run.c: Likewise.
> * gcc.target/aarch64/sve_vcond_6.c: Likewise.
> * gcc.target/aarch64/sve_vcond_6_run.c: Likewise.
> * gcc.target/aarch64/sve_vec_init_1.c: Likewise.
> * gcc.target/aarch64/sve_vec_init_1_run.c: Likewise.
> * gcc.target/aarch64/sve_vec_init_2.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_1.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_1_run.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_1_overrange_run.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_const_1.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_const_1_overrun.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_const_1_run.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_const_single_1.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_const_single_1_run.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_single_1.c: Likewise.
> * gcc.target/aarch64/sve_vec_perm_single_1_run.c: Likewise.
> * gcc.target/aarch64/sve_zip1_1.c: Likewise.
> * gcc.target/aarch64/sve_zip2_1.c: Likewise.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [3/4] [AArch64] SVE tests
2018-01-06 18:06 ` James Greenhalgh
@ 2018-01-06 19:13 ` Richard Sandiford
[not found] ` <20180107165948.GA13800@arm.com>
0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2018-01-06 19:13 UTC (permalink / raw)
To: James Greenhalgh; +Cc: gcc-patches, richard.earnshaw, marcus.shawcroft, nd
James Greenhalgh <james.greenhalgh@arm.com> writes:
> On Fri, Nov 03, 2017 at 05:50:54PM +0000, Richard Sandiford wrote:
>> This patch adds gcc.target/aarch64 tests for SVE, and forces some
>> existing Advanced SIMD tests to use -march=armv8-a.
>
> I'm going to assume that these new testcases are broadly sensible, and not
> spend any significant time looking at them.
>
> I'm not completely happy forcing the architecture to Armv8-a - it would be
> useful for our testing coverage if users which have configured with other
> architecture variants had this test execute in those environments. That
> way we'd check we still do the right thing once we have an implicit
> -march=armv8.2-a .
>
> However, as we don't have a good way to make that happen (other than maybe
> only forcing the arch if we are in a configuration wired for SVE?) I'm
> happy with this patch as a compromise for now.
Would something like LLVM's -mattr be useful? Then we could have
-mattr=+nosve without having to change the base architecture.
I suppose we'd need to be careful about how it interacts with -march
though, so it probably isn't GCC 8 material. I'll try only forcing
the arch when we're compiling for SVE, like you say.
Not strictly related, but do you think it's OK to require binutils 2.28+
when testing GCC (rather than simply building it)? When trying with an
older OS the other day, I realised that the SVE dg-do assemble tests
would fail for 2.27 and earlier. We'd need something like:
/* { dg-do assemble { aarch64_sve_asm } } */
if we wanted to support older binutils.
Thanks,
Richard
^ permalink raw reply [flat|nested] 18+ messages in thread
* [4/4] SVE unwinding
2017-11-03 17:45 [0/4] [AArch64] Add SVE support Richard Sandiford
` (2 preceding siblings ...)
2017-11-03 17:51 ` [3/4] [AArch64] SVE tests Richard Sandiford
@ 2017-11-03 17:52 ` Richard Sandiford
2017-11-10 10:58 ` James Greenhalgh
2017-11-24 16:34 ` [0/4] [AArch64] Add SVE support Richard Sandiford
4 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2017-11-03 17:52 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch adds support for unwinding frames that use the SVE
pseudo VG register. We want this register to act like a normal
register if the CFI explicitly sets it, but want to provide a
default value otherwise. Computing the default value requires
an SVE target, so we only want to compute it on demand.
aarch64_vg uses a hard-coded .inst in order to avoid a build
dependency on binutils 2.28 or later.
2017-11-03 Richard Sandiford <richard.sandiford@linaro.org>
libgcc/
* config/aarch64/value-unwind.h (aarch64_vg): New function.
(DWARF_LAZY_REGISTER_VALUE): Define.
* unwind-dw2.c (_Unwind_GetGR): Use DWARF_LAZY_REGISTER_VALUE
to provide a fallback register value.
gcc/testsuite/
* g++.target/aarch64/aarch64.exp: New harness.
* g++.target/aarch64/sve_catch_1.C: New test.
* g++.target/aarch64/sve_catch_2.C: Likewise.
* g++.target/aarch64/sve_catch_3.C: Likewise.
* g++.target/aarch64/sve_catch_4.C: Likewise.
* g++.target/aarch64/sve_catch_5.C: Likewise.
* g++.target/aarch64/sve_catch_6.C: Likewise.
Index: libgcc/config/aarch64/value-unwind.h
===================================================================
--- libgcc/config/aarch64/value-unwind.h 2017-02-23 19:53:58.000000000 +0000
+++ libgcc/config/aarch64/value-unwind.h 2017-11-03 17:24:20.172023500 +0000
@@ -23,3 +23,19 @@
#if defined __aarch64__ && !defined __LP64__
# define REG_VALUE_IN_UNWIND_CONTEXT
#endif
+
+/* Return the value of the pseudo VG register. This should only be
+ called if we know this is an SVE host. */
+static inline int
+aarch64_vg (void)
+{
+ register int vg asm ("x0");
+ /* CNTD X0. */
+ asm (".inst 0x04e0e3e0" : "=r" (vg));
+ return vg;
+}
+
+/* Lazily provide a value for VG, so that we don't try to execute SVE
+ instructions unless we know they're needed. */
+#define DWARF_LAZY_REGISTER_VALUE(REGNO, VALUE) \
+ ((REGNO) == AARCH64_DWARF_VG && ((*VALUE) = aarch64_vg (), 1))
Index: libgcc/unwind-dw2.c
===================================================================
--- libgcc/unwind-dw2.c 2017-02-23 19:54:02.000000000 +0000
+++ libgcc/unwind-dw2.c 2017-11-03 17:24:20.172023500 +0000
@@ -216,12 +216,12 @@ _Unwind_IsExtendedContext (struct _Unwin
|| (context->flags & EXTENDED_CONTEXT_BIT));
}
\f
-/* Get the value of register INDEX as saved in CONTEXT. */
+/* Get the value of register REGNO as saved in CONTEXT. */
inline _Unwind_Word
-_Unwind_GetGR (struct _Unwind_Context *context, int index)
+_Unwind_GetGR (struct _Unwind_Context *context, int regno)
{
- int size;
+ int size, index;
_Unwind_Context_Reg_Val val;
#ifdef DWARF_ZERO_REG
@@ -229,7 +229,7 @@ _Unwind_GetGR (struct _Unwind_Context *c
return 0;
#endif
- index = DWARF_REG_TO_UNWIND_COLUMN (index);
+ index = DWARF_REG_TO_UNWIND_COLUMN (regno);
gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
size = dwarf_reg_size_table[index];
val = context->reg[index];
@@ -237,6 +237,14 @@ _Unwind_GetGR (struct _Unwind_Context *c
if (_Unwind_IsExtendedContext (context) && context->by_value[index])
return _Unwind_Get_Unwind_Word (val);
+#ifdef DWARF_LAZY_REGISTER_VALUE
+ {
+ _Unwind_Word value;
+ if (DWARF_LAZY_REGISTER_VALUE (regno, &value))
+ return value;
+ }
+#endif
+
/* This will segfault if the register hasn't been saved. */
if (size == sizeof(_Unwind_Ptr))
return * (_Unwind_Ptr *) (_Unwind_Internal_Ptr) val;
Index: gcc/testsuite/g++.target/aarch64/aarch64.exp
===================================================================
--- /dev/null 2017-11-03 10:40:07.002381728 +0000
+++ gcc/testsuite/g++.target/aarch64/aarch64.exp 2017-11-03 17:24:20.171023116 +0000
@@ -0,0 +1,38 @@
+# Specific regression driver for AArch64.
+# Copyright (C) 2009-2017 Free Software Foundation, Inc.
+# Contributed by ARM Ltd.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>. */
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+ return
+}
+
+# Load support procs.
+load_lib g++-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] "" ""
+
+# All done.
+dg-finish
Index: gcc/testsuite/g++.target/aarch64/sve_catch_1.C
===================================================================
--- /dev/null 2017-11-03 10:40:07.002381728 +0000
+++ gcc/testsuite/g++.target/aarch64/sve_catch_1.C 2017-11-03 17:24:20.171023116 +0000
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fopenmp-simd -fno-omit-frame-pointer" } */
+/* { dg-options "-O3 -fopenmp-simd -fno-omit-frame-pointer -march=armv8-a+sve" { target aarch64_sve_hw } } */
+
+/* Invoke X (P##n) for n in [0, 7]. */
+#define REPEAT8(X, P) \
+ X (P##0) X (P##1) X (P##2) X (P##3) X (P##4) X (P##5) X (P##6) X (P##7)
+
+/* Invoke X (n) for all octal n in [0, 39]. */
+#define REPEAT40(X) \
+ REPEAT8 (X, 0) REPEAT8 (X, 1) REPEAT8 (X, 2) REPEAT8 (X, 3) REPEAT8 (X, 4)
+
+volatile int testi;
+
+/* Throw to f3. */
+void __attribute__ ((weak))
+f1 (int x[40][100], int *y)
+{
+ /* A wild write to x and y. */
+ asm volatile ("" ::: "memory");
+ if (y[testi] == x[testi][testi])
+ throw 100;
+}
+
+/* Expect vector work to be done, with spilling of vector registers. */
+void __attribute__ ((weak))
+f2 (int x[40][100], int *y)
+{
+ /* Try to force some spilling. */
+#define DECLARE(N) int y##N = y[N];
+ REPEAT40 (DECLARE);
+ for (int j = 0; j < 20; ++j)
+ {
+ f1 (x, y);
+#pragma omp simd
+ for (int i = 0; i < 100; ++i)
+ {
+#define INC(N) x[N][i] += y##N;
+ REPEAT40 (INC);
+ }
+ }
+}
+
+/* Catch an exception thrown from f1, via f2. */
+void __attribute__ ((weak))
+f3 (int x[40][100], int *y, int *z)
+{
+ volatile int extra = 111;
+ try
+ {
+ f2 (x, y);
+ }
+ catch (int val)
+ {
+ *z = val + extra;
+ }
+}
+
+static int x[40][100];
+static int y[40];
+static int z;
+
+int
+main (void)
+{
+ f3 (x, y, &z);
+ if (z != 211)
+ __builtin_abort ();
+ return 0;
+}
Index: gcc/testsuite/g++.target/aarch64/sve_catch_2.C
===================================================================
--- /dev/null 2017-11-03 10:40:07.002381728 +0000
+++ gcc/testsuite/g++.target/aarch64/sve_catch_2.C 2017-11-03 17:24:20.171023116 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fopenmp-simd -fomit-frame-pointer" } */
+/* { dg-options "-O3 -fopenmp-simd -fomit-frame-pointer -march=armv8-a+sve" { target aarch64_sve_hw } } */
+
+#include "sve_catch_1.C"
Index: gcc/testsuite/g++.target/aarch64/sve_catch_3.C
===================================================================
--- /dev/null 2017-11-03 10:40:07.002381728 +0000
+++ gcc/testsuite/g++.target/aarch64/sve_catch_3.C 2017-11-03 17:24:20.171023116 +0000
@@ -0,0 +1,79 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fopenmp-simd -fno-omit-frame-pointer" } */
+/* { dg-options "-O3 -fopenmp-simd -fno-omit-frame-pointer -march=armv8-a+sve" { target aarch64_sve_hw } } */
+
+/* Invoke X (P##n) for n in [0, 7]. */
+#define REPEAT8(X, P) \
+ X (P##0) X (P##1) X (P##2) X (P##3) X (P##4) X (P##5) X (P##6) X (P##7)
+
+/* Invoke X (n) for all octal n in [0, 39]. */
+#define REPEAT40(X) \
+ REPEAT8 (X, 0) REPEAT8 (X, 1) REPEAT8 (X, 2) REPEAT8 (X, 3) REPEAT8 (X, 4)
+
+volatile int testi, sink;
+
+/* Take 2 stack arguments and throw to f3. */
+void __attribute__ ((weak))
+f1 (int x[40][100], int *y, int z1, int z2, int z3, int z4,
+ int z5, int z6, int z7, int z8)
+{
+ /* A wild write to x and y. */
+ sink = z1;
+ sink = z2;
+ sink = z3;
+ sink = z4;
+ sink = z5;
+ sink = z6;
+ sink = z7;
+ sink = z8;
+ asm volatile ("" ::: "memory");
+ if (y[testi] == x[testi][testi])
+ throw 100;
+}
+
+/* Expect vector work to be done, with spilling of vector registers. */
+void __attribute__ ((weak))
+f2 (int x[40][100], int *y)
+{
+ /* Try to force some spilling. */
+#define DECLARE(N) int y##N = y[N];
+ REPEAT40 (DECLARE);
+ for (int j = 0; j < 20; ++j)
+ {
+ f1 (x, y, 1, 2, 3, 4, 5, 6, 7, 8);
+#pragma omp simd
+ for (int i = 0; i < 100; ++i)
+ {
+#define INC(N) x[N][i] += y##N;
+ REPEAT40 (INC);
+ }
+ }
+}
+
+/* Catch an exception thrown from f1, via f2. */
+void __attribute__ ((weak))
+f3 (int x[40][100], int *y, int *z)
+{
+ volatile int extra = 111;
+ try
+ {
+ f2 (x, y);
+ }
+ catch (int val)
+ {
+ *z = val + extra;
+ }
+}
+
+static int x[40][100];
+static int y[40];
+static int z;
+
+int
+main (void)
+{
+ f3 (x, y, &z);
+ if (z != 211)
+ __builtin_abort ();
+ return 0;
+}
Index: gcc/testsuite/g++.target/aarch64/sve_catch_4.C
===================================================================
--- /dev/null 2017-11-03 10:40:07.002381728 +0000
+++ gcc/testsuite/g++.target/aarch64/sve_catch_4.C 2017-11-03 17:24:20.171023116 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fopenmp-simd -fomit-frame-pointer" } */
+/* { dg-options "-O3 -fopenmp-simd -fomit-frame-pointer -march=armv8-a+sve" { target aarch64_sve_hw } } */
+
+#include "sve_catch_3.C"
Index: gcc/testsuite/g++.target/aarch64/sve_catch_5.C
===================================================================
--- /dev/null 2017-11-03 10:40:07.002381728 +0000
+++ gcc/testsuite/g++.target/aarch64/sve_catch_5.C 2017-11-03 17:24:20.172023500 +0000
@@ -0,0 +1,82 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fopenmp-simd -fno-omit-frame-pointer" } */
+/* { dg-options "-O3 -fopenmp-simd -fno-omit-frame-pointer -march=armv8-a+sve" { target aarch64_sve_hw } } */
+
+/* Invoke X (P##n) for n in [0, 7]. */
+#define REPEAT8(X, P) \
+ X (P##0) X (P##1) X (P##2) X (P##3) X (P##4) X (P##5) X (P##6) X (P##7)
+
+/* Invoke X (n) for all octal n in [0, 39]. */
+#define REPEAT40(X) \
+ REPEAT8 (X, 0) REPEAT8 (X, 1) REPEAT8 (X, 2) REPEAT8 (X, 3) REPEAT8 (X, 4)
+
+volatile int testi, sink;
+volatile void *ptr;
+
+/* Take 2 stack arguments and throw to f3. */
+void __attribute__ ((weak))
+f1 (int x[40][100], int *y, int z1, int z2, int z3, int z4,
+ int z5, int z6, int z7, int z8)
+{
+ /* A wild write to x and y. */
+ sink = z1;
+ sink = z2;
+ sink = z3;
+ sink = z4;
+ sink = z5;
+ sink = z6;
+ sink = z7;
+ sink = z8;
+ asm volatile ("" ::: "memory");
+ if (y[testi] == x[testi][testi])
+ throw 100;
+}
+
+/* Expect vector work to be done, with spilling of vector registers. */
+void __attribute__ ((weak))
+f2 (int x[40][100], int *y)
+{
+ /* Create a true variable-sized frame. */
+ ptr = __builtin_alloca (testi + 40);
+ /* Try to force some spilling. */
+#define DECLARE(N) int y##N = y[N];
+ REPEAT40 (DECLARE);
+ for (int j = 0; j < 20; ++j)
+ {
+ f1 (x, y, 1, 2, 3, 4, 5, 6, 7, 8);
+#pragma omp simd
+ for (int i = 0; i < 100; ++i)
+ {
+#define INC(N) x[N][i] += y##N;
+ REPEAT40 (INC);
+ }
+ }
+}
+
+/* Catch an exception thrown from f1, via f2. */
+void __attribute__ ((weak))
+f3 (int x[40][100], int *y, int *z)
+{
+ volatile int extra = 111;
+ try
+ {
+ f2 (x, y);
+ }
+ catch (int val)
+ {
+ *z = val + extra;
+ }
+}
+
+static int x[40][100];
+static int y[40];
+static int z;
+
+int
+main (void)
+{
+ f3 (x, y, &z);
+ if (z != 211)
+ __builtin_abort ();
+ return 0;
+}
Index: gcc/testsuite/g++.target/aarch64/sve_catch_6.C
===================================================================
--- /dev/null 2017-11-03 10:40:07.002381728 +0000
+++ gcc/testsuite/g++.target/aarch64/sve_catch_6.C 2017-11-03 17:24:20.172023500 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fopenmp-simd -fomit-frame-pointer" } */
+/* { dg-options "-O3 -fopenmp-simd -fomit-frame-pointer -march=armv8-a+sve" { target aarch64_sve_hw } } */
+
+#include "sve_catch_5.C"
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [4/4] SVE unwinding
2017-11-03 17:52 ` [4/4] SVE unwinding Richard Sandiford
@ 2017-11-10 10:58 ` James Greenhalgh
0 siblings, 0 replies; 18+ messages in thread
From: James Greenhalgh @ 2017-11-10 10:58 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Nov 03, 2017 at 05:52:05PM +0000, Richard Sandiford wrote:
> This patch adds support for unwinding frames that use the SVE
> pseudo VG register. We want this register to act like a normal
> register if the CFI explicitly sets it, but want to provide a
> default value otherwise. Computing the default value requires
> an SVE target, so we only want to compute it on demand.
>
> aarch64_vg uses a hard-coded .inst in order to avoid a build
> dependency on binutils 2.28 or later.
I think the new hook needs documenting in tm.texi , particularly as it
implies a conditional write to VALUE.
I think this is practice we've seen before, for example
DWARF_REG_TO_UNWIND_COLUMN and REG_VALUE_IN_UNWIND_CONTEXT are defined
in libgcc/config and documented in tm.texi.
Otherwise, the AArch64 parts of this are OK. You mind need to wait for
someone to OK the unwind-dw2.c part.
Thanks,
James
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
> 2017-11-03 Richard Sandiford <richard.sandiford@linaro.org>
>
> libgcc/
> * config/aarch64/value-unwind.h (aarch64_vg): New function.
> (DWARF_LAZY_REGISTER_VALUE): Define.
> * unwind-dw2.c (_Unwind_GetGR): Use DWARF_LAZY_REGISTER_VALUE
> to provide a fallback register value.
>
> gcc/testsuite/
> * g++.target/aarch64/aarch64.exp: New harness.
> * g++.target/aarch64/sve_catch_1.C: New test.
> * g++.target/aarch64/sve_catch_2.C: Likewise.
> * g++.target/aarch64/sve_catch_3.C: Likewise.
> * g++.target/aarch64/sve_catch_4.C: Likewise.
> * g++.target/aarch64/sve_catch_5.C: Likewise.
> * g++.target/aarch64/sve_catch_6.C: Likewise.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [0/4] [AArch64] Add SVE support
2017-11-03 17:45 [0/4] [AArch64] Add SVE support Richard Sandiford
` (3 preceding siblings ...)
2017-11-03 17:52 ` [4/4] SVE unwinding Richard Sandiford
@ 2017-11-24 16:34 ` Richard Sandiford
2018-01-06 18:09 ` James Greenhalgh
4 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2017-11-24 16:34 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
[-- Attachment #1: Type: text/plain, Size: 17607 bytes --]
Richard Sandiford <richard.sandiford@linaro.org> writes:
> This series adds support for ARM's Scalable Vector Extension.
> More details on SVE can be found here:
>
> https://developer.arm.com/products/architecture/a-profile/docs/arm-architecture-reference-manual-supplement-armv8-a
>
> There are four parts for ease of review, but it probably makes
> sense to commit them as one patch.
>
> The series plugs SVE into the current vectorisation framework without
> adding any new features to the framework itself. This means for example
> that vector loops still handle full vectors, with a scalar epilogue loop
> being needed for the rest. Later patches add support for other features
> like fully-predicated loops.
>
> The patches build on top of the various series that I've already posted.
> Sorry that there were so many, and thanks again for all the reviews.
>
> Tested on aarch64-linux-gnu without SVE and aarch64-linux-gnu with SVE
> (in the default vector-length agnostic mode). Also tested with
> -msve-vector-bits=256 and -msve-vector-bits=512 to select 256-bit
> and 512-bit SVE registers.
Here's an update based on an off-list discussion with the maintainers.
Changes since v1:
- Changed the names of the modes from 256-bit vectors to "VNx"
+ a 128-bit mode name, e.g. V32QI -> VNx16QI.
- Added an "sve" attribute and used it in the "enabled" attribute.
This allows generic aarch64.md patterns to disable things related
to SVE on non-SVE targets; previously this was implicit through the
constraints.
- Improved the consistency of the constraint names, specifically:
Ua?: addition contraints (already used for Uaa)
Us?: general scalar constraints (already used for various other scalars)
Ut?: memory constraints (unchanged from v1)
vs?: vector SVE constraints (mostly unchanged, but now includes FP
as well as integer constraints)
There's still the general "Dm" (minus one) constraint, for consistency
with "Dz" (zero).
- Added missing register descriptions above FIXED_REGISTERS.
- "should"/"is expected to" -> "must".
- Added more commentary to things like regmode_natural_size.
I also did a before and after comparison of the testsuite output
for base AArch64 (but using the new FIRST_PSEUDO_REGISTER definition
to avoid changes to hash values). There were no differences.
Thanks,
Richard
2017-11-24 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/invoke.texi (-msve-vector-bits=): Document new option.
(sve): Document new AArch64 extension.
* doc/md.texi (w): Extend the description of the AArch64
constraint to include SVE vectors.
(Upl, Upa): Document new AArch64 predicate constraints.
* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
enum.
* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
(msve-vector-bits=): New option.
* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
SVE when these are disabled.
(sve): New extension.
* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
modes. Adjust their number of units based on aarch64_sve_vg.
(MAX_BITSIZE_MODE_ANY_MODE): Define.
* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
aarch64_addr_query_type.
(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_check_zero_based_sve_index_immediate): Declare.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Likewise.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
(aarch64_regmode_natural_size): Likewise.
* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
left one place.
(AARCH64_ISA_SVE, TARGET_SVE): New macros.
(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
for VG and the SVE predicate registers.
(V_ALIASES): Add a "z"-prefixed alias.
(FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
(AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
(PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
(PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
(REG_CLASS_NAMES): Add entries for them.
(REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG
and the predicate registers.
(aarch64_sve_vg): Declare.
(BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
(SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
(REGMODE_NATURAL_SIZE): Define.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
SVE macros.
* config/aarch64/aarch64.c: Include cfgrtl.h.
(simd_immediate_info): Add a constructor for series vectors,
and an associated step field.
(aarch64_sve_vg): New variable.
(aarch64_dbx_register_number): Handle VG and the predicate registers.
(aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
(VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
(VEC_ANY_DATA, VEC_STRUCT): New constants.
(aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
(aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
(aarch64_sve_data_mode_p, aarch64_pred_mode, aarch64_get_mask_mode):
New functions.
(aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
(aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE
predicate modes and predicate registers. Explicitly restrict
GPRs to modes of 16 bytes or smaller. Only allow FP registers
to store a vector mode if it is recognized by
aarch64_classify_vector_mode.
(aarch64_regmode_natural_size): New function.
(aarch64_hard_regno_caller_save_mode): Return the original mode
for predicates.
(aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
(aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
(aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
functions.
(aarch64_add_offset): Add a temp2 parameter. Assert that temp1
does not overlap dest if the function is frame-related. Handle
SVE constants.
(aarch64_split_add_offset): New function.
(aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
them aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
and update call to aarch64_sub_sp.
(aarch64_add_cfa_expression): New function.
(aarch64_expand_prologue): Pass extra temporary registers to the
functions above. Handle the case in which we need to emit new
DW_CFA_expressions for registers that were originally saved
relative to the stack pointer, but now have to be expressed
relative to the frame pointer.
(aarch64_output_mi_thunk): Pass extra temporary registers to the
functions above.
(aarch64_expand_epilogue): Likewise. Prevent inheritance of
IP0 and IP1 values for SVE frames.
(aarch64_expand_vec_series): New function.
(aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
Handle SVE constants. Use emit_move_insn to move a force_const_mem
into the register, rather than emitting a SET directly.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
(aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
(offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
(offset_9bit_signed_scaled_p): New functions.
(aarch64_replicate_bitmask_imm): New function.
(aarch64_bitmask_imm): Use it.
(aarch64_cannot_force_const_mem): Reject expressions involving
a CONST_POLY_INT. Update call to aarch64_classify_symbol.
(aarch64_classify_index): Handle SVE indices, by requiring
a plain register index with a scale that matches the element size.
(aarch64_classify_address): Handle SVE addresses. Assert that
the mode of the address is VOIDmode or an integer mode.
Update call to aarch64_classify_symbol.
(aarch64_classify_symbolic_expression): Update call to
aarch64_classify_symbol.
(aarch64_const_vec_all_same_in_range_p): Extend to VEC_DUPLICATE
constants by using const_vec_duplicate_p.
(aarch64_const_vec_all_in_range_p): New function.
(aarch64_print_vector_float_operand): Likewise.
(aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than
"vN" for FP registers with SVE modes. Handle (const ...) vectors
and the FP immediates 1.0 and 0.5.
(aarch64_print_operand_address): Use ADDR_QUERY_ANY. Handle
SVE addresses.
(aarch64_regno_regclass): Handle predicate registers.
(aarch64_secondary_reload): Handle big-endian reloads of SVE
data modes.
(aarch64_class_max_nregs): Handle SVE modes and predicate registers.
(aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
(aarch64_convert_sve_vector_bits): New function.
(aarch64_override_options): Use it to handle -msve-vector-bits=.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
Handle SVE vector and predicate modes. Accept VL-based constants
that need only one temporary register, and VL offsets that require
no temporary registers.
(aarch64_conditional_register_usage): Mark the predicate registers
as fixed if SVE isn't available.
(aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
Return true for SVE vector and predicate modes.
(aarch64_simd_container_mode): Take the number of bits as a poly_int64
rather than an unsigned int. Handle SVE modes.
(aarch64_preferred_simd_mode): Update call accordingly. Handle
SVE modes.
(aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
if SVE is enabled.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): New functions.
(aarch64_sve_valid_immediate): New function.
(aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
Explicitly reject structure modes. Check for INDEX constants.
Handle PTRUE and PFALSE constants.
(aarch64_check_zero_based_sve_index_immediate): New function.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
vector modes. Accept constants in the range of CNT[BHWD].
(aarch64_simd_scalar_immediate_valid_for_move): Explicitly
ask for an Advanced SIMD mode.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
(aarch64_simd_vector_alignment): Handle SVE predicates.
(aarch64_vectorize_preferred_vector_alignment): New function.
(aarch64_simd_vector_alignment_reachable): Use it instead of
the vector size.
(aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
functions.
(MAX_VECT_LEN): Delete.
(expand_vec_perm_d): Add a vec_flags field.
(emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
(aarch64_evpc_ext): Don't apply a big-endian lane correction
for SVE modes.
(aarch64_evpc_rev): Rename to...
(aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE.
(aarch64_evpc_rev_global): New function.
(aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
(aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
MAX_VECT_LEN.
(aarch64_evpc_sve_tbl): New function.
(aarch64_expand_vec_perm_const_1): Update after rename of
aarch64_evpc_rev. Handle SVE permutes too, trying
aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
than aarch64_evpc_tbl.
(aarch64_expand_vec_perm_const): Initialize vec_flags.
(aarch64_vectorize_vec_perm_const_ok): Likewise.
(aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
(aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
(aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
(aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond): New functions.
(aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
of aarch64_vector_mode_p.
(aarch64_dwarf_poly_indeterminate_value): New function.
(aarch64_compute_pressure_classes): Likewise.
(aarch64_can_change_mode_class): Likewise.
(TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
(TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
(TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
(TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
(TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
(TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
* config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
(Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
constraints.
(Dn, Dl, Dr): Accept const as well as const_vector.
(Dz): Likewise. Compare against CONST0_RTX.
* config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
of "vector" where appropriate.
(SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
(SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
(UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
(UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
(UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
(UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
(Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
(v_int_equiv): Extend to SVE modes.
(Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
mode attributes.
(LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
(optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
(logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
(LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
(SVE_COND_FP_CMP): New int iterators.
(perm_hilo): Handle the new unpack unspecs.
(optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
attributes.
* config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
(aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
(aarch64_equality_operator, aarch64_constant_vector_operand)
(aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
(aarch64_sve_nonimmediate_operand): Likewise.
(aarch64_sve_general_operand): Likewise.
(aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
(aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
(aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
(aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
(aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
(aarch64_sve_float_arith_immediate): Likewise.
(aarch64_sve_float_arith_with_sub_immediate): Likewise.
(aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
(aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
(aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
(aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
(aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
(aarch64_sve_float_arith_operand): Likewise.
(aarch64_sve_float_arith_with_sub_operand): Likewise.
(aarch64_sve_float_mul_operand): Likewise.
(aarch64_sve_vec_perm_operand): Likewise.
(aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
(aarch64_mov_operand): Accept const_poly_int and const_vector.
(aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
as well as const_vector.
(aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
in file. Use CONST0_RTX and CONSTM1_RTX.
(aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes.
(aarch64_simd_reg_or_zero): Accept const as well as const_vector.
Use aarch64_simd_imm_zero.
* config/aarch64/aarch64-sve.md: New file.
* config/aarch64/aarch64.md: Include it.
(VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
(UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
(UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
(UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
(UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
(sve): New attribute.
(enabled): Disable instructions with the sve attribute unless
TARGET_SVE.
(movqi, movhi): Pass CONST_POLY_INT operaneds through
aarch64_expand_mov_immediate.
(*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
CNT[BHSD] immediates.
(movti): Split CONST_POLY_INT moves into two halves.
(add<mode>3): Accept aarch64_pluslong_or_poly_operand.
Split additions that need a temporary here if the destination
is the stack pointer.
(*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
(*add<mode>3_poly_1): New instruction.
(set_clobber_cc): New expander.
[-- Attachment #2: sve-01-main.diff.gz --]
[-- Type: application/gzip, Size: 56871 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [0/4] [AArch64] Add SVE support
2017-11-24 16:34 ` [0/4] [AArch64] Add SVE support Richard Sandiford
@ 2018-01-06 18:09 ` James Greenhalgh
2018-01-06 19:39 ` Richard Sandiford
0 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2018-01-06 18:09 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
On Fri, Nov 24, 2017 at 03:59:58PM +0000, Richard Sandiford wrote:
> Richard Sandiford <richard.sandiford@linaro.org> writes:
> > This series adds support for ARM's Scalable Vector Extension.
> > More details on SVE can be found here:
> >
> > https://developer.arm.com/products/architecture/a-profile/docs/arm-architecture-reference-manual-supplement-armv8-a
> >
> > There are four parts for ease of review, but it probably makes
> > sense to commit them as one patch.
> >
> > The series plugs SVE into the current vectorisation framework without
> > adding any new features to the framework itself. This means for example
> > that vector loops still handle full vectors, with a scalar epilogue loop
> > being needed for the rest. Later patches add support for other features
> > like fully-predicated loops.
> >
> > The patches build on top of the various series that I've already posted.
> > Sorry that there were so many, and thanks again for all the reviews.
> >
> > Tested on aarch64-linux-gnu without SVE and aarch64-linux-gnu with SVE
> > (in the default vector-length agnostic mode). Also tested with
> > -msve-vector-bits=256 and -msve-vector-bits=512 to select 256-bit
> > and 512-bit SVE registers.
>
> Here's an update based on an off-list discussion with the maintainers.
> Changes since v1:
>
> - Changed the names of the modes from 256-bit vectors to "VNx"
> + a 128-bit mode name, e.g. V32QI -> VNx16QI.
>
> - Added an "sve" attribute and used it in the "enabled" attribute.
> This allows generic aarch64.md patterns to disable things related
> to SVE on non-SVE targets; previously this was implicit through the
> constraints.
>
> - Improved the consistency of the constraint names, specifically:
>
> Ua?: addition contraints (already used for Uaa)
> Us?: general scalar constraints (already used for various other scalars)
> Ut?: memory constraints (unchanged from v1)
> vs?: vector SVE constraints (mostly unchanged, but now includes FP
> as well as integer constraints)
>
> There's still the general "Dm" (minus one) constraint, for consistency
> with "Dz" (zero).
>
> - Added missing register descriptions above FIXED_REGISTERS.
>
> - "should"/"is expected to" -> "must".
>
> - Added more commentary to things like regmode_natural_size.
>
> I also did a before and after comparison of the testsuite output
> for base AArch64 (but using the new FIRST_PSEUDO_REGISTER definition
> to avoid changes to hash values). There were no differences.
I seem to have lost 4/4 in my mailer. Would you mind pinging it if I have
any action to take? Also, please ping any other SVE parts I've missed that
you haven't pinged in recent days.
I'll get to 1/4 in good time, but at 5000+ lines, it will need at least
another day! I'd like to OK everything around it which is outstanding, then
build up the courage for the big patch!
Thanks,
James
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [0/4] [AArch64] Add SVE support
2018-01-06 18:09 ` James Greenhalgh
@ 2018-01-06 19:39 ` Richard Sandiford
[not found] ` <20180107210818.GQ6993@arm.com>
0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2018-01-06 19:39 UTC (permalink / raw)
To: James Greenhalgh; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
James Greenhalgh <james.greenhalgh@arm.com> writes:
> On Fri, Nov 24, 2017 at 03:59:58PM +0000, Richard Sandiford wrote:
>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> > This series adds support for ARM's Scalable Vector Extension.
>> > More details on SVE can be found here:
>> >
>> > https://developer.arm.com/products/architecture/a-profile/docs/arm-architecture-reference-manual-supplement-armv8-a
>> >
>> > There are four parts for ease of review, but it probably makes
>> > sense to commit them as one patch.
>> >
>> > The series plugs SVE into the current vectorisation framework without
>> > adding any new features to the framework itself. This means for example
>> > that vector loops still handle full vectors, with a scalar epilogue loop
>> > being needed for the rest. Later patches add support for other features
>> > like fully-predicated loops.
>> >
>> > The patches build on top of the various series that I've already posted.
>> > Sorry that there were so many, and thanks again for all the reviews.
>> >
>> > Tested on aarch64-linux-gnu without SVE and aarch64-linux-gnu with SVE
>> > (in the default vector-length agnostic mode). Also tested with
>> > -msve-vector-bits=256 and -msve-vector-bits=512 to select 256-bit
>> > and 512-bit SVE registers.
>>
>> Here's an update based on an off-list discussion with the maintainers.
>> Changes since v1:
>>
>> - Changed the names of the modes from 256-bit vectors to "VNx"
>> + a 128-bit mode name, e.g. V32QI -> VNx16QI.
>>
>> - Added an "sve" attribute and used it in the "enabled" attribute.
>> This allows generic aarch64.md patterns to disable things related
>> to SVE on non-SVE targets; previously this was implicit through the
>> constraints.
>>
>> - Improved the consistency of the constraint names, specifically:
>>
>> Ua?: addition contraints (already used for Uaa)
>> Us?: general scalar constraints (already used for various other scalars)
>> Ut?: memory constraints (unchanged from v1)
>> vs?: vector SVE constraints (mostly unchanged, but now includes FP
>> as well as integer constraints)
>>
>> There's still the general "Dm" (minus one) constraint, for consistency
>> with "Dz" (zero).
>>
>> - Added missing register descriptions above FIXED_REGISTERS.
>>
>> - "should"/"is expected to" -> "must".
>>
>> - Added more commentary to things like regmode_natural_size.
>>
>> I also did a before and after comparison of the testsuite output
>> for base AArch64 (but using the new FIRST_PSEUDO_REGISTER definition
>> to avoid changes to hash values). There were no differences.
>
> I seem to have lost 4/4 in my mailer. Would you mind pinging it if I have
> any action to take? Also, please ping any other SVE parts I've missed that
> you haven't pinged in recent days.
4/4 was the unwinder support, which you've already reviewed (thanks):
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00251.html
There are two other AArch64 patches that I'll ping in a sec.
There are also quite a few patches that add target-independent
support for something and also add corresponding SVE code
to config/aarch64 and/or code quality tests to gcc.target/aarch64.
I think the full list of those is:
Patches with config/aarch64 code:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02066.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02068.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01484.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01485.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01491.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01494.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01497.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01506.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01570.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01575.html
Patches with gcc.target/aarch64 tests but no config/aarch64 changes,
with the tests being in the spirit of the ones added in the original
SVE patch:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00752.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01446.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01489.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01490.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01498.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01499.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01572.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01573.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01577.html
The target-independent pieces have already been reviewed (except where
I'll ping seperately).
Thanks,
Richard
^ permalink raw reply [flat|nested] 18+ messages in thread