From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 8425A385DC03 for ; Wed, 8 Nov 2023 15:03:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8425A385DC03 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8425A385DC03 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:67c:2178:6::1c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455828; cv=none; b=getOitwHiUCVEd9jwUBQoHZECppd3stEoswrvy2vgyFKKrMQLouTA/IH4HTqO1psj2oYyglm/U3Dhx+81xh/ouTZtxh4/KOuA+YacgK7fxsplgICjjsbgA5HY8XoAFLDoxjqYO9uKGcHoQU3+mraJMzzUTKtyFCK45DzE5/FGuM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455828; c=relaxed/simple; bh=kRh7a4UaskQRrGEf6dtpFP1uZoMN7/lteDurQE0n1DU=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:MIME-Version: Message-Id; b=ugRz3SZdgI3NbaBgmzsse3a++vm4ukHQyZIFiaEgSyjpUXp7KMYVfUNpzXbTOkgdy5LVdtiuwkyAc+REHCBv4d08lSnSYdgpCxUY4PhnbIzFYqYfDwv0eMiwqj6obNqKMy9vp9Dk928UJ+1TtlEt/WgJuIiYFfenBz9iaM2MwLg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id AB48521906 for ; Wed, 8 Nov 2023 15:03:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1699455816; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=t7OH+V/B6QKAPHT07q8v9qDIz3bHDEqYZRt7HYNZxtE=; b=xPiwXJZuJt9RtqmsgSDAoiD1P4DqZUjuyXxzEG3uTuf7ihVF8U2XNJMr3Jmi/1uUtflkom TMYv17nfal2fAE1fuKGXgQvT1+TfT+eU7QhS/s+dU0/qamVVG/LSsUvSQpoJ96nA7dLza3 H7Wza7tAdwud9f0VMKf0Nu1kzY19S6o= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1699455816; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=t7OH+V/B6QKAPHT07q8v9qDIz3bHDEqYZRt7HYNZxtE=; b=AZ1fNlls5QzaX7ltTbJOKpdW/usQBRfci0AXozYo2aL3aPCfRQwXr15HwM1wDm+tQPSKTu 588E4Q1WF/NhUAAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 96BCC133F5 for ; Wed, 8 Nov 2023 15:03:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 3QqmI0ijS2UseAAAMHmgww (envelope-from ) for ; Wed, 08 Nov 2023 15:03:36 +0000 Date: Wed, 8 Nov 2023 16:03:36 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/4] Refactor x86 decl based scatter vectorization, prepare SLP MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Message-Id: <20231108150336.96BCC133F5@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The following refactors the x86 decl based scatter vectorization similar to what I did to the gather path. This prepares scatters for SLP as well, mainly single-lane since there are multiple missing bits to support multi-lane scatters. Tested extensively on the SLP-only branch which has the ability to force SLP even for single lanes. Bootstrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimization/111133 * tree-vect-stmts.cc (vect_build_scatter_store_calls): Remove and refactor to ... (vect_build_one_scatter_store_call): ... this new function. (vectorizable_store): Use vect_check_scalar_mask to record the SLP node for the mask operand. Code generate scatters with builtin decls from the main scatter vectorization path and prepare that for SLP. --- gcc/tree-vect-stmts.cc | 683 ++++++++++++++++++++--------------------- 1 file changed, 326 insertions(+), 357 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 913a4fb08ed..f41b4825a6a 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2703,238 +2703,87 @@ vect_build_one_gather_load_call (vec_info *vinfo, stmt_vec_info stmt_info, } /* Build a scatter store call while vectorizing STMT_INFO. Insert new - instructions before GSI and add them to VEC_STMT. GS_INFO describes - the scatter store operation. If the store is conditional, MASK is the - unvectorized condition, otherwise MASK is null. */ + instructions before GSI. GS_INFO describes the scatter store operation. + PTR is the base pointer, OFFSET the vectorized offsets and OPRND the + vectorized data to store. + If the store is conditional, MASK is the vectorized condition, otherwise + MASK is null. */ -static void -vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info, - gimple_stmt_iterator *gsi, gimple **vec_stmt, - gather_scatter_info *gs_info, tree mask, - stmt_vector_for_cost *cost_vec) +static gimple * +vect_build_one_scatter_store_call (vec_info *vinfo, stmt_vec_info stmt_info, + gimple_stmt_iterator *gsi, + gather_scatter_info *gs_info, + tree ptr, tree offset, tree oprnd, tree mask) { - loop_vec_info loop_vinfo = dyn_cast (vinfo); - tree vectype = STMT_VINFO_VECTYPE (stmt_info); - poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); - int ncopies = vect_get_num_copies (loop_vinfo, vectype); - enum { NARROW, NONE, WIDEN } modifier; - poly_uint64 scatter_off_nunits - = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype); - - /* FIXME: Keep the previous costing way in vect_model_store_cost by - costing N scalar stores, but it should be tweaked to use target - specific costs on related scatter store calls. */ - if (cost_vec) - { - tree op = vect_get_store_rhs (stmt_info); - enum vect_def_type dt; - gcc_assert (vect_is_simple_use (op, vinfo, &dt)); - unsigned int inside_cost, prologue_cost = 0; - if (dt == vect_constant_def || dt == vect_external_def) - prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, - stmt_info, 0, vect_prologue); - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits, - scalar_store, stmt_info, 0, vect_body); - - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_store_cost: inside_cost = %d, " - "prologue_cost = %d .\n", - inside_cost, prologue_cost); - return; - } - - tree perm_mask = NULL_TREE, mask_halfvectype = NULL_TREE; - if (known_eq (nunits, scatter_off_nunits)) - modifier = NONE; - else if (known_eq (nunits * 2, scatter_off_nunits)) - { - modifier = WIDEN; - - /* Currently gathers and scatters are only supported for - fixed-length vectors. */ - unsigned int count = scatter_off_nunits.to_constant (); - vec_perm_builder sel (count, count, 1); - for (unsigned i = 0; i < (unsigned int) count; ++i) - sel.quick_push (i | (count / 2)); - - vec_perm_indices indices (sel, 1, count); - perm_mask = vect_gen_perm_mask_checked (gs_info->offset_vectype, indices); - gcc_assert (perm_mask != NULL_TREE); - } - else if (known_eq (nunits, scatter_off_nunits * 2)) - { - modifier = NARROW; - - /* Currently gathers and scatters are only supported for - fixed-length vectors. */ - unsigned int count = nunits.to_constant (); - vec_perm_builder sel (count, count, 1); - for (unsigned i = 0; i < (unsigned int) count; ++i) - sel.quick_push (i | (count / 2)); - - vec_perm_indices indices (sel, 2, count); - perm_mask = vect_gen_perm_mask_checked (vectype, indices); - gcc_assert (perm_mask != NULL_TREE); - ncopies *= 2; - - if (mask) - mask_halfvectype = truth_type_for (gs_info->offset_vectype); - } - else - gcc_unreachable (); - tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl)); tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl)); - tree ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + /* tree ptrtype = TREE_VALUE (arglist); */ arglist = TREE_CHAIN (arglist); tree masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); tree idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); tree scaletype = TREE_VALUE (arglist); - gcc_checking_assert (TREE_CODE (masktype) == INTEGER_TYPE && TREE_CODE (rettype) == VOID_TYPE); - tree ptr = fold_convert (ptrtype, gs_info->base); - if (!is_gimple_min_invariant (ptr)) + tree mask_arg = NULL_TREE; + if (mask) { - gimple_seq seq; - ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE); - class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - edge pe = loop_preheader_edge (loop); - basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); - gcc_assert (!new_bb); + mask_arg = mask; + tree optype = TREE_TYPE (mask_arg); + tree utype; + if (TYPE_MODE (masktype) == TYPE_MODE (optype)) + utype = masktype; + else + utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1); + tree var = vect_get_new_ssa_name (utype, vect_scalar_var); + mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_arg); + gassign *new_stmt + = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + mask_arg = var; + if (!useless_type_conversion_p (masktype, utype)) + { + gcc_assert (TYPE_PRECISION (utype) <= TYPE_PRECISION (masktype)); + tree var = vect_get_new_ssa_name (masktype, vect_scalar_var); + new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + mask_arg = var; + } } - - tree mask_arg = NULL_TREE; - if (mask == NULL_TREE) + else { mask_arg = build_int_cst (masktype, -1); mask_arg = vect_init_vector (vinfo, stmt_info, mask_arg, masktype, NULL); } - tree scale = build_int_cst (scaletype, gs_info->scale); - - auto_vec vec_oprnds0; - auto_vec vec_oprnds1; - auto_vec vec_masks; - if (mask) + tree src = oprnd; + if (!useless_type_conversion_p (srctype, TREE_TYPE (src))) { - tree mask_vectype = truth_type_for (vectype); - vect_get_vec_defs_for_operand (vinfo, stmt_info, - modifier == NARROW ? ncopies / 2 : ncopies, - mask, &vec_masks, mask_vectype); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)), + TYPE_VECTOR_SUBPARTS (srctype))); + tree var = vect_get_new_ssa_name (srctype, vect_simple_var); + src = build1 (VIEW_CONVERT_EXPR, srctype, src); + gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + src = var; } - vect_get_vec_defs_for_operand (vinfo, stmt_info, - modifier == WIDEN ? ncopies / 2 : ncopies, - gs_info->offset, &vec_oprnds0); - tree op = vect_get_store_rhs (stmt_info); - vect_get_vec_defs_for_operand (vinfo, stmt_info, - modifier == NARROW ? ncopies / 2 : ncopies, op, - &vec_oprnds1); - tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE; - tree mask_op = NULL_TREE; - tree src, vec_mask; - for (int j = 0; j < ncopies; ++j) + tree op = offset; + if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) { - if (modifier == WIDEN) - { - if (j & 1) - op = permute_vec_elements (vinfo, vec_oprnd0, vec_oprnd0, perm_mask, - stmt_info, gsi); - else - op = vec_oprnd0 = vec_oprnds0[j / 2]; - src = vec_oprnd1 = vec_oprnds1[j]; - if (mask) - mask_op = vec_mask = vec_masks[j]; - } - else if (modifier == NARROW) - { - if (j & 1) - src = permute_vec_elements (vinfo, vec_oprnd1, vec_oprnd1, - perm_mask, stmt_info, gsi); - else - src = vec_oprnd1 = vec_oprnds1[j / 2]; - op = vec_oprnd0 = vec_oprnds0[j]; - if (mask) - mask_op = vec_mask = vec_masks[j / 2]; - } - else - { - op = vec_oprnd0 = vec_oprnds0[j]; - src = vec_oprnd1 = vec_oprnds1[j]; - if (mask) - mask_op = vec_mask = vec_masks[j]; - } - - if (!useless_type_conversion_p (srctype, TREE_TYPE (src))) - { - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)), - TYPE_VECTOR_SUBPARTS (srctype))); - tree var = vect_get_new_ssa_name (srctype, vect_simple_var); - src = build1 (VIEW_CONVERT_EXPR, srctype, src); - gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - src = var; - } - - if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) - { - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)), - TYPE_VECTOR_SUBPARTS (idxtype))); - tree var = vect_get_new_ssa_name (idxtype, vect_simple_var); - op = build1 (VIEW_CONVERT_EXPR, idxtype, op); - gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - op = var; - } - - if (mask) - { - tree utype; - mask_arg = mask_op; - if (modifier == NARROW) - { - tree var - = vect_get_new_ssa_name (mask_halfvectype, vect_simple_var); - gassign *new_stmt - = gimple_build_assign (var, - (j & 1) ? VEC_UNPACK_HI_EXPR - : VEC_UNPACK_LO_EXPR, - mask_op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - mask_arg = var; - } - tree optype = TREE_TYPE (mask_arg); - if (TYPE_MODE (masktype) == TYPE_MODE (optype)) - utype = masktype; - else - utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1); - tree var = vect_get_new_ssa_name (utype, vect_scalar_var); - mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_arg); - gassign *new_stmt - = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - mask_arg = var; - if (!useless_type_conversion_p (masktype, utype)) - { - gcc_assert (TYPE_PRECISION (utype) <= TYPE_PRECISION (masktype)); - tree var = vect_get_new_ssa_name (masktype, vect_scalar_var); - new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - mask_arg = var; - } - } - - gcall *new_stmt - = gimple_build_call (gs_info->decl, 5, ptr, mask_arg, op, src, scale); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)), + TYPE_VECTOR_SUBPARTS (idxtype))); + tree var = vect_get_new_ssa_name (idxtype, vect_simple_var); + op = build1 (VIEW_CONVERT_EXPR, idxtype, op); + gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op); vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + op = var; } - *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; + + tree scale = build_int_cst (scaletype, gs_info->scale); + gcall *new_stmt + = gimple_build_call (gs_info->decl, 5, ptr, mask_arg, op, src, scale); + return new_stmt; } /* Prepare the base and offset in GS_INFO for vectorization. @@ -8209,6 +8058,7 @@ vectorizable_store (vec_info *vinfo, /* Is vectorizable store? */ tree mask = NULL_TREE, mask_vectype = NULL_TREE; + slp_tree mask_node = NULL; if (gassign *assign = dyn_cast (stmt_info->stmt)) { tree scalar_dest = gimple_assign_lhs (assign); @@ -8240,7 +8090,8 @@ vectorizable_store (vec_info *vinfo, (call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info)); if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, - &mask, NULL, &mask_dt, &mask_vectype)) + &mask, &mask_node, &mask_dt, + &mask_vectype)) return false; } @@ -8409,13 +8260,7 @@ vectorizable_store (vec_info *vinfo, ensure_base_align (dr_info); - if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) - { - vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info, - mask, cost_vec); - return true; - } - else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3) + if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3) { gcc_assert (memory_access_type == VMAT_CONTIGUOUS); gcc_assert (!slp); @@ -9052,7 +8897,7 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER) { - gcc_assert (!slp && !grouped_store); + gcc_assert (!grouped_store); auto_vec vec_offsets; unsigned int inside_cost = 0, prologue_cost = 0; for (j = 0; j < ncopies; j++) @@ -9068,22 +8913,22 @@ vectorizable_store (vec_info *vinfo, /* Since the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAIN is of size 1. */ gcc_assert (group_size == 1); - op = vect_get_store_rhs (first_stmt_info); - vect_get_vec_defs_for_operand (vinfo, first_stmt_info, - ncopies, op, gvec_oprnds[0]); - vec_oprnd = (*gvec_oprnds[0])[0]; - dr_chain.quick_push (vec_oprnd); + if (slp_node) + vect_get_slp_defs (op_node, gvec_oprnds[0]); + else + vect_get_vec_defs_for_operand (vinfo, first_stmt_info, + ncopies, op, gvec_oprnds[0]); if (mask) { - vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, - mask, &vec_masks, - mask_vectype); - vec_mask = vec_masks[0]; + if (slp_node) + vect_get_slp_defs (mask_node, &vec_masks); + else + vect_get_vec_defs_for_operand (vinfo, stmt_info, + ncopies, + mask, &vec_masks, + mask_vectype); } - /* We should have catched mismatched types earlier. */ - gcc_assert ( - useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd))); if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, slp_node, &gs_info, @@ -9099,156 +8944,280 @@ vectorizable_store (vec_info *vinfo, else if (!costing_p) { gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); - vec_oprnd = (*gvec_oprnds[0])[j]; - dr_chain[0] = vec_oprnd; - if (mask) - vec_mask = vec_masks[j]; if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi, stmt_info, bump); } new_stmt = NULL; - unsigned HOST_WIDE_INT align; - tree final_mask = NULL_TREE; - tree final_len = NULL_TREE; - tree bias = NULL_TREE; - if (!costing_p) - { - if (loop_masks) - final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, - ncopies, vectype, j); - if (vec_mask) - final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, - final_mask, vec_mask, gsi); - } - - if (gs_info.ifn != IFN_LAST) + for (i = 0; i < vec_num; ++i) { - if (costing_p) + if (!costing_p) { - unsigned int cnunits = vect_nunits_for_cost (vectype); - inside_cost - += record_stmt_cost (cost_vec, cnunits, scalar_store, - stmt_info, 0, vect_body); - continue; + vec_oprnd = (*gvec_oprnds[0])[vec_num * j + i]; + if (mask) + vec_mask = vec_masks[vec_num * j + i]; + /* We should have catched mismatched types earlier. */ + gcc_assert (useless_type_conversion_p (vectype, + TREE_TYPE (vec_oprnd))); + } + unsigned HOST_WIDE_INT align; + tree final_mask = NULL_TREE; + tree final_len = NULL_TREE; + tree bias = NULL_TREE; + if (!costing_p) + { + if (loop_masks) + final_mask = vect_get_loop_mask (loop_vinfo, gsi, + loop_masks, ncopies, + vectype, j); + if (vec_mask) + final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, + final_mask, vec_mask, gsi); } - if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) - vec_offset = vec_offsets[j]; - tree scale = size_int (gs_info.scale); - - if (gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE) + if (gs_info.ifn != IFN_LAST) { - if (loop_lens) - final_len = vect_get_loop_len (loop_vinfo, gsi, loop_lens, - ncopies, vectype, j, 1); + if (costing_p) + { + unsigned int cnunits = vect_nunits_for_cost (vectype); + inside_cost + += record_stmt_cost (cost_vec, cnunits, scalar_store, + stmt_info, 0, vect_body); + continue; + } + + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + vec_offset = vec_offsets[vec_num * j + i]; + tree scale = size_int (gs_info.scale); + + if (gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE) + { + if (loop_lens) + final_len = vect_get_loop_len (loop_vinfo, gsi, + loop_lens, ncopies, + vectype, j, 1); + else + final_len = size_int (TYPE_VECTOR_SUBPARTS (vectype)); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + bias = build_int_cst (intQI_type_node, biasval); + if (!final_mask) + { + mask_vectype = truth_type_for (vectype); + final_mask = build_minus_one_cst (mask_vectype); + } + } + + gcall *call; + if (final_len && final_mask) + call = gimple_build_call_internal + (IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr, + vec_offset, scale, vec_oprnd, final_mask, + final_len, bias); + else if (final_mask) + call = gimple_build_call_internal + (IFN_MASK_SCATTER_STORE, 5, dataref_ptr, + vec_offset, scale, vec_oprnd, final_mask); else - final_len = build_int_cst (sizetype, - TYPE_VECTOR_SUBPARTS (vectype)); - signed char biasval - = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); - bias = build_int_cst (intQI_type_node, biasval); - if (!final_mask) + call = gimple_build_call_internal (IFN_SCATTER_STORE, 4, + dataref_ptr, vec_offset, + scale, vec_oprnd); + gimple_call_set_nothrow (call, true); + vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); + new_stmt = call; + } + else if (gs_info.decl) + { + /* The builtin decls path for scatter is legacy, x86 only. */ + gcc_assert (nunits.is_constant () + && (!final_mask + || SCALAR_INT_MODE_P + (TYPE_MODE (TREE_TYPE (final_mask))))); + if (costing_p) { - mask_vectype = truth_type_for (vectype); - final_mask = build_minus_one_cst (mask_vectype); + unsigned int cnunits = vect_nunits_for_cost (vectype); + inside_cost + += record_stmt_cost (cost_vec, cnunits, scalar_store, + stmt_info, 0, vect_body); + continue; } + poly_uint64 offset_nunits + = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype); + if (known_eq (nunits, offset_nunits)) + { + new_stmt = vect_build_one_scatter_store_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, vec_offsets[vec_num * j + i], + vec_oprnd, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + else if (known_eq (nunits, offset_nunits * 2)) + { + /* We have a offset vector with half the number of + lanes but the builtins will store full vectype + data from the lower lanes. */ + new_stmt = vect_build_one_scatter_store_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, + vec_offsets[2 * vec_num * j + 2 * i], + vec_oprnd, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + int count = nunits.to_constant (); + vec_perm_builder sel (count, count, 1); + sel.quick_grow (count); + for (int i = 0; i < count; ++i) + sel[i] = i | (count / 2); + vec_perm_indices indices (sel, 2, count); + tree perm_mask + = vect_gen_perm_mask_checked (vectype, indices); + new_stmt = gimple_build_assign (NULL_TREE, VEC_PERM_EXPR, + vec_oprnd, vec_oprnd, + perm_mask); + vec_oprnd = make_ssa_name (vectype); + gimple_set_lhs (new_stmt, vec_oprnd); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + if (final_mask) + { + new_stmt = gimple_build_assign (NULL_TREE, + VEC_UNPACK_HI_EXPR, + final_mask); + final_mask = make_ssa_name + (truth_type_for (gs_info.offset_vectype)); + gimple_set_lhs (new_stmt, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + new_stmt = vect_build_one_scatter_store_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, + vec_offsets[2 * vec_num * j + 2 * i + 1], + vec_oprnd, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + else if (known_eq (nunits * 2, offset_nunits)) + { + /* We have a offset vector with double the number of + lanes. Select the low/high part accordingly. */ + vec_offset = vec_offsets[(vec_num * j + i) / 2]; + if ((vec_num * j + i) & 1) + { + int count = offset_nunits.to_constant (); + vec_perm_builder sel (count, count, 1); + sel.quick_grow (count); + for (int i = 0; i < count; ++i) + sel[i] = i | (count / 2); + vec_perm_indices indices (sel, 2, count); + tree perm_mask = vect_gen_perm_mask_checked + (TREE_TYPE (vec_offset), indices); + new_stmt = gimple_build_assign (NULL_TREE, + VEC_PERM_EXPR, + vec_offset, + vec_offset, + perm_mask); + vec_offset = make_ssa_name (TREE_TYPE (vec_offset)); + gimple_set_lhs (new_stmt, vec_offset); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + new_stmt = vect_build_one_scatter_store_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, vec_offset, + vec_oprnd, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + else + gcc_unreachable (); } - - gcall *call; - if (final_len && final_mask) - call = gimple_build_call_internal (IFN_MASK_LEN_SCATTER_STORE, - 7, dataref_ptr, vec_offset, - scale, vec_oprnd, final_mask, - final_len, bias); - else if (final_mask) - call - = gimple_build_call_internal (IFN_MASK_SCATTER_STORE, 5, - dataref_ptr, vec_offset, scale, - vec_oprnd, final_mask); else - call = gimple_build_call_internal (IFN_SCATTER_STORE, 4, - dataref_ptr, vec_offset, - scale, vec_oprnd); - gimple_call_set_nothrow (call, true); - vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); - new_stmt = call; - } - else - { - /* Emulated scatter. */ - gcc_assert (!final_mask); - if (costing_p) { - unsigned int cnunits = vect_nunits_for_cost (vectype); - /* For emulated scatter N offset vector element extracts - (we assume the scalar scaling and ptr + offset add is - consumed by the load). */ - inside_cost - += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, - stmt_info, 0, vect_body); - /* N scalar stores plus extracting the elements. */ - inside_cost - += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, - stmt_info, 0, vect_body); - inside_cost - += record_stmt_cost (cost_vec, cnunits, scalar_store, - stmt_info, 0, vect_body); - continue; - } + /* Emulated scatter. */ + gcc_assert (!final_mask); + if (costing_p) + { + unsigned int cnunits = vect_nunits_for_cost (vectype); + /* For emulated scatter N offset vector element extracts + (we assume the scalar scaling and ptr + offset add is + consumed by the load). */ + inside_cost + += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, + stmt_info, 0, vect_body); + /* N scalar stores plus extracting the elements. */ + inside_cost + += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, + stmt_info, 0, vect_body); + inside_cost + += record_stmt_cost (cost_vec, cnunits, scalar_store, + stmt_info, 0, vect_body); + continue; + } - unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); - unsigned HOST_WIDE_INT const_offset_nunits - = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant (); - vec *ctor_elts; - vec_alloc (ctor_elts, const_nunits); - gimple_seq stmts = NULL; - tree elt_type = TREE_TYPE (vectype); - unsigned HOST_WIDE_INT elt_size - = tree_to_uhwi (TYPE_SIZE (elt_type)); - /* We support offset vectors with more elements - than the data vector for now. */ - unsigned HOST_WIDE_INT factor - = const_offset_nunits / const_nunits; - vec_offset = vec_offsets[j / factor]; - unsigned elt_offset = (j % factor) * const_nunits; - tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset)); - tree scale = size_int (gs_info.scale); - align = get_object_alignment (DR_REF (first_dr_info->dr)); - tree ltype = build_aligned_type (TREE_TYPE (vectype), align); - for (unsigned k = 0; k < const_nunits; ++k) - { - /* Compute the offsetted pointer. */ - tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type), - bitsize_int (k + elt_offset)); - tree idx - = gimple_build (&stmts, BIT_FIELD_REF, idx_type, vec_offset, - TYPE_SIZE (idx_type), boff); - idx = gimple_convert (&stmts, sizetype, idx); - idx = gimple_build (&stmts, MULT_EXPR, sizetype, idx, scale); - tree ptr - = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (dataref_ptr), - dataref_ptr, idx); - ptr = gimple_convert (&stmts, ptr_type_node, ptr); - /* Extract the element to be stored. */ - tree elt - = gimple_build (&stmts, BIT_FIELD_REF, TREE_TYPE (vectype), - vec_oprnd, TYPE_SIZE (elt_type), - bitsize_int (k * elt_size)); - gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); - stmts = NULL; - tree ref - = build2 (MEM_REF, ltype, ptr, build_int_cst (ref_type, 0)); - new_stmt = gimple_build_assign (ref, elt); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); + unsigned HOST_WIDE_INT const_offset_nunits + = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant (); + vec *ctor_elts; + vec_alloc (ctor_elts, const_nunits); + gimple_seq stmts = NULL; + tree elt_type = TREE_TYPE (vectype); + unsigned HOST_WIDE_INT elt_size + = tree_to_uhwi (TYPE_SIZE (elt_type)); + /* We support offset vectors with more elements + than the data vector for now. */ + unsigned HOST_WIDE_INT factor + = const_offset_nunits / const_nunits; + vec_offset = vec_offsets[(vec_num * j + i) / factor]; + unsigned elt_offset = (j % factor) * const_nunits; + tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset)); + tree scale = size_int (gs_info.scale); + align = get_object_alignment (DR_REF (first_dr_info->dr)); + tree ltype = build_aligned_type (TREE_TYPE (vectype), align); + for (unsigned k = 0; k < const_nunits; ++k) + { + /* Compute the offsetted pointer. */ + tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type), + bitsize_int (k + elt_offset)); + tree idx + = gimple_build (&stmts, BIT_FIELD_REF, idx_type, + vec_offset, TYPE_SIZE (idx_type), boff); + idx = gimple_convert (&stmts, sizetype, idx); + idx = gimple_build (&stmts, MULT_EXPR, sizetype, + idx, scale); + tree ptr + = gimple_build (&stmts, PLUS_EXPR, + TREE_TYPE (dataref_ptr), + dataref_ptr, idx); + ptr = gimple_convert (&stmts, ptr_type_node, ptr); + /* Extract the element to be stored. */ + tree elt + = gimple_build (&stmts, BIT_FIELD_REF, + TREE_TYPE (vectype), + vec_oprnd, TYPE_SIZE (elt_type), + bitsize_int (k * elt_size)); + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); + stmts = NULL; + tree ref + = build2 (MEM_REF, ltype, ptr, + build_int_cst (ref_type, 0)); + new_stmt = gimple_build_assign (ref, elt); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + if (slp) + slp_node->push_vec_def (new_stmt); } } - if (j == 0) - *vec_stmt = new_stmt; - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + if (!slp && !costing_p) + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); } + if (!slp && !costing_p) + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; + if (costing_p && dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vect_model_store_cost: inside_cost = %d, " -- 2.35.3