* Re: [PATCH 4/4] Refactor x86 decl based scatter vectorization, prepare SLP
@ 2023-11-09 12:59 Richard Biener
0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2023-11-09 12:59 UTC (permalink / raw)
To: gcc-patches
On Wed, 8 Nov 2023, Richard Biener wrote:
> The following refactors the x86 decl based scatter vectorization
> similar to what I did to the gather path. This prepares scatters
> for SLP as well, mainly single-lane since there are multiple
> missing bits to support multi-lane scatters.
>
> Tested extensively on the SLP-only branch which has the ability
> to force SLP even for single lanes.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
The following is the final version I applied.
Richard.
From 46450124064fc1c663220d949e1b8c5942a4a0bd Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Wed, 8 Nov 2023 13:14:59 +0100
Subject: [PATCH] Refactor x86 decl based scatter vectorization, prepare SLP
To: gcc-patches@gcc.gnu.org
The following refactors the x86 decl based scatter vectorization
similar to what I did to the gather path. This prepares scatters
for SLP as well, mainly single-lane since there are multiple
missing bits to support multi-lane scatters.
Tested extensively on the SLP-only branch which has the ability
to force SLP even for single lanes.
PR tree-optimization/111133
* tree-vect-stmts.cc (vect_build_scatter_store_calls):
Remove and refactor to ...
(vect_build_one_scatter_store_call): ... this new function.
(vectorizable_store): Use vect_check_scalar_mask to record
the SLP node for the mask operand. Code generate scatters
with builtin decls from the main scatter vectorization
path and prepare that for SLP.
* tree-vect-slp.cc (vect_get_operand_map): Do not look
at the VDEF to decide between scatter or gather since that
doesn't work for patterns. Use the LHS being an SSA_NAME
or not instead.
---
gcc/tree-vect-slp.cc | 3 +-
gcc/tree-vect-stmts.cc | 689 ++++++++++++++++++++---------------------
2 files changed, 332 insertions(+), 360 deletions(-)
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 176aaf270f4..3e5814c3a31 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -549,7 +549,8 @@ vect_get_operand_map (const gimple *stmt, bool gather_scatter_p = false,
&& swap)
return op1_op0_map;
if (gather_scatter_p)
- return gimple_vdef (stmt) ? off_op0_map : off_map;
+ return (TREE_CODE (gimple_assign_lhs (assign)) != SSA_NAME
+ ? off_op0_map : off_map);
}
gcc_assert (!swap);
if (auto call = dyn_cast<const gcall *> (stmt))
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 8cd02afdeab..ee89f47c468 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2703,238 +2703,87 @@ vect_build_one_gather_load_call (vec_info *vinfo, stmt_vec_info stmt_info,
}
/* Build a scatter store call while vectorizing STMT_INFO. Insert new
- instructions before GSI and add them to VEC_STMT. GS_INFO describes
- the scatter store operation. If the store is conditional, MASK is the
- unvectorized condition, otherwise MASK is null. */
+ instructions before GSI. GS_INFO describes the scatter store operation.
+ PTR is the base pointer, OFFSET the vectorized offsets and OPRND the
+ vectorized data to store.
+ If the store is conditional, MASK is the vectorized condition, otherwise
+ MASK is null. */
-static void
-vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info,
- gimple_stmt_iterator *gsi, gimple **vec_stmt,
- gather_scatter_info *gs_info, tree mask,
- stmt_vector_for_cost *cost_vec)
+static gimple *
+vect_build_one_scatter_store_call (vec_info *vinfo, stmt_vec_info stmt_info,
+ gimple_stmt_iterator *gsi,
+ gather_scatter_info *gs_info,
+ tree ptr, tree offset, tree oprnd, tree mask)
{
- loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
- tree vectype = STMT_VINFO_VECTYPE (stmt_info);
- poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
- int ncopies = vect_get_num_copies (loop_vinfo, vectype);
- enum { NARROW, NONE, WIDEN } modifier;
- poly_uint64 scatter_off_nunits
- = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
-
- /* FIXME: Keep the previous costing way in vect_model_store_cost by
- costing N scalar stores, but it should be tweaked to use target
- specific costs on related scatter store calls. */
- if (cost_vec)
- {
- tree op = vect_get_store_rhs (stmt_info);
- enum vect_def_type dt;
- gcc_assert (vect_is_simple_use (op, vinfo, &dt));
- unsigned int inside_cost, prologue_cost = 0;
- if (dt == vect_constant_def || dt == vect_external_def)
- prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
- stmt_info, 0, vect_prologue);
- unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
- inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
- scalar_store, stmt_info, 0, vect_body);
-
- if (dump_enabled_p ())
- dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_store_cost: inside_cost = %d, "
- "prologue_cost = %d .\n",
- inside_cost, prologue_cost);
- return;
- }
-
- tree perm_mask = NULL_TREE, mask_halfvectype = NULL_TREE;
- if (known_eq (nunits, scatter_off_nunits))
- modifier = NONE;
- else if (known_eq (nunits * 2, scatter_off_nunits))
- {
- modifier = WIDEN;
-
- /* Currently gathers and scatters are only supported for
- fixed-length vectors. */
- unsigned int count = scatter_off_nunits.to_constant ();
- vec_perm_builder sel (count, count, 1);
- for (unsigned i = 0; i < (unsigned int) count; ++i)
- sel.quick_push (i | (count / 2));
-
- vec_perm_indices indices (sel, 1, count);
- perm_mask = vect_gen_perm_mask_checked (gs_info->offset_vectype, indices);
- gcc_assert (perm_mask != NULL_TREE);
- }
- else if (known_eq (nunits, scatter_off_nunits * 2))
- {
- modifier = NARROW;
-
- /* Currently gathers and scatters are only supported for
- fixed-length vectors. */
- unsigned int count = nunits.to_constant ();
- vec_perm_builder sel (count, count, 1);
- for (unsigned i = 0; i < (unsigned int) count; ++i)
- sel.quick_push (i | (count / 2));
-
- vec_perm_indices indices (sel, 2, count);
- perm_mask = vect_gen_perm_mask_checked (vectype, indices);
- gcc_assert (perm_mask != NULL_TREE);
- ncopies *= 2;
-
- if (mask)
- mask_halfvectype = truth_type_for (gs_info->offset_vectype);
- }
- else
- gcc_unreachable ();
-
tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl));
tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl));
- tree ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+ /* tree ptrtype = TREE_VALUE (arglist); */ arglist = TREE_CHAIN (arglist);
tree masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
tree idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
tree scaletype = TREE_VALUE (arglist);
-
gcc_checking_assert (TREE_CODE (masktype) == INTEGER_TYPE
&& TREE_CODE (rettype) == VOID_TYPE);
- tree ptr = fold_convert (ptrtype, gs_info->base);
- if (!is_gimple_min_invariant (ptr))
+ tree mask_arg = NULL_TREE;
+ if (mask)
{
- gimple_seq seq;
- ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE);
- class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
- edge pe = loop_preheader_edge (loop);
- basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
- gcc_assert (!new_bb);
+ mask_arg = mask;
+ tree optype = TREE_TYPE (mask_arg);
+ tree utype;
+ if (TYPE_MODE (masktype) == TYPE_MODE (optype))
+ utype = masktype;
+ else
+ utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1);
+ tree var = vect_get_new_ssa_name (utype, vect_scalar_var);
+ mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_arg);
+ gassign *new_stmt
+ = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg);
+ vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ mask_arg = var;
+ if (!useless_type_conversion_p (masktype, utype))
+ {
+ gcc_assert (TYPE_PRECISION (utype) <= TYPE_PRECISION (masktype));
+ tree var = vect_get_new_ssa_name (masktype, vect_scalar_var);
+ new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg);
+ vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ mask_arg = var;
+ }
}
-
- tree mask_arg = NULL_TREE;
- if (mask == NULL_TREE)
+ else
{
mask_arg = build_int_cst (masktype, -1);
mask_arg = vect_init_vector (vinfo, stmt_info, mask_arg, masktype, NULL);
}
- tree scale = build_int_cst (scaletype, gs_info->scale);
-
- auto_vec<tree> vec_oprnds0;
- auto_vec<tree> vec_oprnds1;
- auto_vec<tree> vec_masks;
- if (mask)
+ tree src = oprnd;
+ if (!useless_type_conversion_p (srctype, TREE_TYPE (src)))
{
- tree mask_vectype = truth_type_for (vectype);
- vect_get_vec_defs_for_operand (vinfo, stmt_info,
- modifier == NARROW ? ncopies / 2 : ncopies,
- mask, &vec_masks, mask_vectype);
+ gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)),
+ TYPE_VECTOR_SUBPARTS (srctype)));
+ tree var = vect_get_new_ssa_name (srctype, vect_simple_var);
+ src = build1 (VIEW_CONVERT_EXPR, srctype, src);
+ gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src);
+ vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ src = var;
}
- vect_get_vec_defs_for_operand (vinfo, stmt_info,
- modifier == WIDEN ? ncopies / 2 : ncopies,
- gs_info->offset, &vec_oprnds0);
- tree op = vect_get_store_rhs (stmt_info);
- vect_get_vec_defs_for_operand (vinfo, stmt_info,
- modifier == NARROW ? ncopies / 2 : ncopies, op,
- &vec_oprnds1);
- tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
- tree mask_op = NULL_TREE;
- tree src, vec_mask;
- for (int j = 0; j < ncopies; ++j)
+ tree op = offset;
+ if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
{
- if (modifier == WIDEN)
- {
- if (j & 1)
- op = permute_vec_elements (vinfo, vec_oprnd0, vec_oprnd0, perm_mask,
- stmt_info, gsi);
- else
- op = vec_oprnd0 = vec_oprnds0[j / 2];
- src = vec_oprnd1 = vec_oprnds1[j];
- if (mask)
- mask_op = vec_mask = vec_masks[j];
- }
- else if (modifier == NARROW)
- {
- if (j & 1)
- src = permute_vec_elements (vinfo, vec_oprnd1, vec_oprnd1,
- perm_mask, stmt_info, gsi);
- else
- src = vec_oprnd1 = vec_oprnds1[j / 2];
- op = vec_oprnd0 = vec_oprnds0[j];
- if (mask)
- mask_op = vec_mask = vec_masks[j / 2];
- }
- else
- {
- op = vec_oprnd0 = vec_oprnds0[j];
- src = vec_oprnd1 = vec_oprnds1[j];
- if (mask)
- mask_op = vec_mask = vec_masks[j];
- }
-
- if (!useless_type_conversion_p (srctype, TREE_TYPE (src)))
- {
- gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)),
- TYPE_VECTOR_SUBPARTS (srctype)));
- tree var = vect_get_new_ssa_name (srctype, vect_simple_var);
- src = build1 (VIEW_CONVERT_EXPR, srctype, src);
- gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- src = var;
- }
-
- if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
- {
- gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
- TYPE_VECTOR_SUBPARTS (idxtype)));
- tree var = vect_get_new_ssa_name (idxtype, vect_simple_var);
- op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
- gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- op = var;
- }
-
- if (mask)
- {
- tree utype;
- mask_arg = mask_op;
- if (modifier == NARROW)
- {
- tree var
- = vect_get_new_ssa_name (mask_halfvectype, vect_simple_var);
- gassign *new_stmt
- = gimple_build_assign (var,
- (j & 1) ? VEC_UNPACK_HI_EXPR
- : VEC_UNPACK_LO_EXPR,
- mask_op);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- mask_arg = var;
- }
- tree optype = TREE_TYPE (mask_arg);
- if (TYPE_MODE (masktype) == TYPE_MODE (optype))
- utype = masktype;
- else
- utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1);
- tree var = vect_get_new_ssa_name (utype, vect_scalar_var);
- mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_arg);
- gassign *new_stmt
- = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- mask_arg = var;
- if (!useless_type_conversion_p (masktype, utype))
- {
- gcc_assert (TYPE_PRECISION (utype) <= TYPE_PRECISION (masktype));
- tree var = vect_get_new_ssa_name (masktype, vect_scalar_var);
- new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- mask_arg = var;
- }
- }
-
- gcall *new_stmt
- = gimple_build_call (gs_info->decl, 5, ptr, mask_arg, op, src, scale);
+ gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
+ TYPE_VECTOR_SUBPARTS (idxtype)));
+ tree var = vect_get_new_ssa_name (idxtype, vect_simple_var);
+ op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
+ gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op);
vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
-
- STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+ op = var;
}
- *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+
+ tree scale = build_int_cst (scaletype, gs_info->scale);
+ gcall *new_stmt
+ = gimple_build_call (gs_info->decl, 5, ptr, mask_arg, op, src, scale);
+ return new_stmt;
}
/* Prepare the base and offset in GS_INFO for vectorization.
@@ -8208,6 +8057,7 @@ vectorizable_store (vec_info *vinfo,
/* Is vectorizable store? */
tree mask = NULL_TREE, mask_vectype = NULL_TREE;
+ slp_tree mask_node = NULL;
if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
{
tree scalar_dest = gimple_assign_lhs (assign);
@@ -8239,7 +8089,8 @@ vectorizable_store (vec_info *vinfo,
(call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
if (mask_index >= 0
&& !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask, &mask_node, &mask_dt,
+ &mask_vectype))
return false;
}
@@ -8373,8 +8224,10 @@ vectorizable_store (vec_info *vinfo,
mask);
if (slp_node
- && !vect_maybe_update_slp_op_vectype (SLP_TREE_CHILDREN (slp_node)[0],
- vectype))
+ && (!vect_maybe_update_slp_op_vectype (op_node, vectype)
+ || (mask
+ && !vect_maybe_update_slp_op_vectype (mask_node,
+ mask_vectype))))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -8408,13 +8261,7 @@ vectorizable_store (vec_info *vinfo,
ensure_base_align (dr_info);
- if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl)
- {
- vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info,
- mask, cost_vec);
- return true;
- }
- else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
+ if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
{
gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
gcc_assert (!slp);
@@ -9051,7 +8898,7 @@ vectorizable_store (vec_info *vinfo,
if (memory_access_type == VMAT_GATHER_SCATTER)
{
- gcc_assert (!slp && !grouped_store);
+ gcc_assert (!grouped_store);
auto_vec<tree> vec_offsets;
unsigned int inside_cost = 0, prologue_cost = 0;
for (j = 0; j < ncopies; j++)
@@ -9067,22 +8914,22 @@ vectorizable_store (vec_info *vinfo,
/* Since the store is not grouped, DR_GROUP_SIZE is 1, and
DR_CHAIN is of size 1. */
gcc_assert (group_size == 1);
- op = vect_get_store_rhs (first_stmt_info);
- vect_get_vec_defs_for_operand (vinfo, first_stmt_info,
- ncopies, op, gvec_oprnds[0]);
- vec_oprnd = (*gvec_oprnds[0])[0];
- dr_chain.quick_push (vec_oprnd);
+ if (slp_node)
+ vect_get_slp_defs (op_node, gvec_oprnds[0]);
+ else
+ vect_get_vec_defs_for_operand (vinfo, first_stmt_info,
+ ncopies, op, gvec_oprnds[0]);
if (mask)
{
- vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies,
- mask, &vec_masks,
- mask_vectype);
- vec_mask = vec_masks[0];
+ if (slp_node)
+ vect_get_slp_defs (mask_node, &vec_masks);
+ else
+ vect_get_vec_defs_for_operand (vinfo, stmt_info,
+ ncopies,
+ mask, &vec_masks,
+ mask_vectype);
}
- /* We should have catched mismatched types earlier. */
- gcc_assert (
- useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd)));
if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
slp_node, &gs_info,
@@ -9098,156 +8945,280 @@ vectorizable_store (vec_info *vinfo,
else if (!costing_p)
{
gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
- vec_oprnd = (*gvec_oprnds[0])[j];
- dr_chain[0] = vec_oprnd;
- if (mask)
- vec_mask = vec_masks[j];
if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
gsi, stmt_info, bump);
}
new_stmt = NULL;
- unsigned HOST_WIDE_INT align;
- tree final_mask = NULL_TREE;
- tree final_len = NULL_TREE;
- tree bias = NULL_TREE;
- if (!costing_p)
- {
- if (loop_masks)
- final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
- ncopies, vectype, j);
- if (vec_mask)
- final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
- final_mask, vec_mask, gsi);
- }
-
- if (gs_info.ifn != IFN_LAST)
+ for (i = 0; i < vec_num; ++i)
{
- if (costing_p)
+ if (!costing_p)
{
- unsigned int cnunits = vect_nunits_for_cost (vectype);
- inside_cost
- += record_stmt_cost (cost_vec, cnunits, scalar_store,
- stmt_info, 0, vect_body);
- continue;
+ vec_oprnd = (*gvec_oprnds[0])[vec_num * j + i];
+ if (mask)
+ vec_mask = vec_masks[vec_num * j + i];
+ /* We should have catched mismatched types earlier. */
+ gcc_assert (useless_type_conversion_p (vectype,
+ TREE_TYPE (vec_oprnd)));
+ }
+ unsigned HOST_WIDE_INT align;
+ tree final_mask = NULL_TREE;
+ tree final_len = NULL_TREE;
+ tree bias = NULL_TREE;
+ if (!costing_p)
+ {
+ if (loop_masks)
+ final_mask = vect_get_loop_mask (loop_vinfo, gsi,
+ loop_masks, ncopies,
+ vectype, j);
+ if (vec_mask)
+ final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
+ final_mask, vec_mask, gsi);
}
- if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
- vec_offset = vec_offsets[j];
- tree scale = size_int (gs_info.scale);
-
- if (gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE)
+ if (gs_info.ifn != IFN_LAST)
{
- if (loop_lens)
- final_len = vect_get_loop_len (loop_vinfo, gsi, loop_lens,
- ncopies, vectype, j, 1);
+ if (costing_p)
+ {
+ unsigned int cnunits = vect_nunits_for_cost (vectype);
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, scalar_store,
+ stmt_info, 0, vect_body);
+ continue;
+ }
+
+ if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+ vec_offset = vec_offsets[vec_num * j + i];
+ tree scale = size_int (gs_info.scale);
+
+ if (gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE)
+ {
+ if (loop_lens)
+ final_len = vect_get_loop_len (loop_vinfo, gsi,
+ loop_lens, ncopies,
+ vectype, j, 1);
+ else
+ final_len = size_int (TYPE_VECTOR_SUBPARTS (vectype));
+ signed char biasval
+ = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ if (!final_mask)
+ {
+ mask_vectype = truth_type_for (vectype);
+ final_mask = build_minus_one_cst (mask_vectype);
+ }
+ }
+
+ gcall *call;
+ if (final_len && final_mask)
+ call = gimple_build_call_internal
+ (IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr,
+ vec_offset, scale, vec_oprnd, final_mask,
+ final_len, bias);
+ else if (final_mask)
+ call = gimple_build_call_internal
+ (IFN_MASK_SCATTER_STORE, 5, dataref_ptr,
+ vec_offset, scale, vec_oprnd, final_mask);
else
- final_len = build_int_cst (sizetype,
- TYPE_VECTOR_SUBPARTS (vectype));
- signed char biasval
- = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
- bias = build_int_cst (intQI_type_node, biasval);
- if (!final_mask)
+ call = gimple_build_call_internal (IFN_SCATTER_STORE, 4,
+ dataref_ptr, vec_offset,
+ scale, vec_oprnd);
+ gimple_call_set_nothrow (call, true);
+ vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
+ new_stmt = call;
+ }
+ else if (gs_info.decl)
+ {
+ /* The builtin decls path for scatter is legacy, x86 only. */
+ gcc_assert (nunits.is_constant ()
+ && (!final_mask
+ || SCALAR_INT_MODE_P
+ (TYPE_MODE (TREE_TYPE (final_mask)))));
+ if (costing_p)
{
- mask_vectype = truth_type_for (vectype);
- final_mask = build_minus_one_cst (mask_vectype);
+ unsigned int cnunits = vect_nunits_for_cost (vectype);
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, scalar_store,
+ stmt_info, 0, vect_body);
+ continue;
}
+ poly_uint64 offset_nunits
+ = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
+ if (known_eq (nunits, offset_nunits))
+ {
+ new_stmt = vect_build_one_scatter_store_call
+ (vinfo, stmt_info, gsi, &gs_info,
+ dataref_ptr, vec_offsets[vec_num * j + i],
+ vec_oprnd, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ else if (known_eq (nunits, offset_nunits * 2))
+ {
+ /* We have a offset vector with half the number of
+ lanes but the builtins will store full vectype
+ data from the lower lanes. */
+ new_stmt = vect_build_one_scatter_store_call
+ (vinfo, stmt_info, gsi, &gs_info,
+ dataref_ptr,
+ vec_offsets[2 * vec_num * j + 2 * i],
+ vec_oprnd, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ int count = nunits.to_constant ();
+ vec_perm_builder sel (count, count, 1);
+ sel.quick_grow (count);
+ for (int i = 0; i < count; ++i)
+ sel[i] = i | (count / 2);
+ vec_perm_indices indices (sel, 2, count);
+ tree perm_mask
+ = vect_gen_perm_mask_checked (vectype, indices);
+ new_stmt = gimple_build_assign (NULL_TREE, VEC_PERM_EXPR,
+ vec_oprnd, vec_oprnd,
+ perm_mask);
+ vec_oprnd = make_ssa_name (vectype);
+ gimple_set_lhs (new_stmt, vec_oprnd);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ if (final_mask)
+ {
+ new_stmt = gimple_build_assign (NULL_TREE,
+ VEC_UNPACK_HI_EXPR,
+ final_mask);
+ final_mask = make_ssa_name
+ (truth_type_for (gs_info.offset_vectype));
+ gimple_set_lhs (new_stmt, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ new_stmt = vect_build_one_scatter_store_call
+ (vinfo, stmt_info, gsi, &gs_info,
+ dataref_ptr,
+ vec_offsets[2 * vec_num * j + 2 * i + 1],
+ vec_oprnd, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ else if (known_eq (nunits * 2, offset_nunits))
+ {
+ /* We have a offset vector with double the number of
+ lanes. Select the low/high part accordingly. */
+ vec_offset = vec_offsets[(vec_num * j + i) / 2];
+ if ((vec_num * j + i) & 1)
+ {
+ int count = offset_nunits.to_constant ();
+ vec_perm_builder sel (count, count, 1);
+ sel.quick_grow (count);
+ for (int i = 0; i < count; ++i)
+ sel[i] = i | (count / 2);
+ vec_perm_indices indices (sel, 2, count);
+ tree perm_mask = vect_gen_perm_mask_checked
+ (TREE_TYPE (vec_offset), indices);
+ new_stmt = gimple_build_assign (NULL_TREE,
+ VEC_PERM_EXPR,
+ vec_offset,
+ vec_offset,
+ perm_mask);
+ vec_offset = make_ssa_name (TREE_TYPE (vec_offset));
+ gimple_set_lhs (new_stmt, vec_offset);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ new_stmt = vect_build_one_scatter_store_call
+ (vinfo, stmt_info, gsi, &gs_info,
+ dataref_ptr, vec_offset,
+ vec_oprnd, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ else
+ gcc_unreachable ();
}
-
- gcall *call;
- if (final_len && final_mask)
- call = gimple_build_call_internal (IFN_MASK_LEN_SCATTER_STORE,
- 7, dataref_ptr, vec_offset,
- scale, vec_oprnd, final_mask,
- final_len, bias);
- else if (final_mask)
- call
- = gimple_build_call_internal (IFN_MASK_SCATTER_STORE, 5,
- dataref_ptr, vec_offset, scale,
- vec_oprnd, final_mask);
else
- call = gimple_build_call_internal (IFN_SCATTER_STORE, 4,
- dataref_ptr, vec_offset,
- scale, vec_oprnd);
- gimple_call_set_nothrow (call, true);
- vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
- new_stmt = call;
- }
- else
- {
- /* Emulated scatter. */
- gcc_assert (!final_mask);
- if (costing_p)
{
- unsigned int cnunits = vect_nunits_for_cost (vectype);
- /* For emulated scatter N offset vector element extracts
- (we assume the scalar scaling and ptr + offset add is
- consumed by the load). */
- inside_cost
- += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
- stmt_info, 0, vect_body);
- /* N scalar stores plus extracting the elements. */
- inside_cost
- += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
- stmt_info, 0, vect_body);
- inside_cost
- += record_stmt_cost (cost_vec, cnunits, scalar_store,
- stmt_info, 0, vect_body);
- continue;
- }
+ /* Emulated scatter. */
+ gcc_assert (!final_mask);
+ if (costing_p)
+ {
+ unsigned int cnunits = vect_nunits_for_cost (vectype);
+ /* For emulated scatter N offset vector element extracts
+ (we assume the scalar scaling and ptr + offset add is
+ consumed by the load). */
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
+ stmt_info, 0, vect_body);
+ /* N scalar stores plus extracting the elements. */
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
+ stmt_info, 0, vect_body);
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, scalar_store,
+ stmt_info, 0, vect_body);
+ continue;
+ }
- unsigned HOST_WIDE_INT const_nunits = nunits.to_constant ();
- unsigned HOST_WIDE_INT const_offset_nunits
- = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant ();
- vec<constructor_elt, va_gc> *ctor_elts;
- vec_alloc (ctor_elts, const_nunits);
- gimple_seq stmts = NULL;
- tree elt_type = TREE_TYPE (vectype);
- unsigned HOST_WIDE_INT elt_size
- = tree_to_uhwi (TYPE_SIZE (elt_type));
- /* We support offset vectors with more elements
- than the data vector for now. */
- unsigned HOST_WIDE_INT factor
- = const_offset_nunits / const_nunits;
- vec_offset = vec_offsets[j / factor];
- unsigned elt_offset = (j % factor) * const_nunits;
- tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset));
- tree scale = size_int (gs_info.scale);
- align = get_object_alignment (DR_REF (first_dr_info->dr));
- tree ltype = build_aligned_type (TREE_TYPE (vectype), align);
- for (unsigned k = 0; k < const_nunits; ++k)
- {
- /* Compute the offsetted pointer. */
- tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type),
- bitsize_int (k + elt_offset));
- tree idx
- = gimple_build (&stmts, BIT_FIELD_REF, idx_type, vec_offset,
- TYPE_SIZE (idx_type), boff);
- idx = gimple_convert (&stmts, sizetype, idx);
- idx = gimple_build (&stmts, MULT_EXPR, sizetype, idx, scale);
- tree ptr
- = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (dataref_ptr),
- dataref_ptr, idx);
- ptr = gimple_convert (&stmts, ptr_type_node, ptr);
- /* Extract the element to be stored. */
- tree elt
- = gimple_build (&stmts, BIT_FIELD_REF, TREE_TYPE (vectype),
- vec_oprnd, TYPE_SIZE (elt_type),
- bitsize_int (k * elt_size));
- gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
- stmts = NULL;
- tree ref
- = build2 (MEM_REF, ltype, ptr, build_int_cst (ref_type, 0));
- new_stmt = gimple_build_assign (ref, elt);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ unsigned HOST_WIDE_INT const_nunits = nunits.to_constant ();
+ unsigned HOST_WIDE_INT const_offset_nunits
+ = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant ();
+ vec<constructor_elt, va_gc> *ctor_elts;
+ vec_alloc (ctor_elts, const_nunits);
+ gimple_seq stmts = NULL;
+ tree elt_type = TREE_TYPE (vectype);
+ unsigned HOST_WIDE_INT elt_size
+ = tree_to_uhwi (TYPE_SIZE (elt_type));
+ /* We support offset vectors with more elements
+ than the data vector for now. */
+ unsigned HOST_WIDE_INT factor
+ = const_offset_nunits / const_nunits;
+ vec_offset = vec_offsets[(vec_num * j + i) / factor];
+ unsigned elt_offset = (j % factor) * const_nunits;
+ tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset));
+ tree scale = size_int (gs_info.scale);
+ align = get_object_alignment (DR_REF (first_dr_info->dr));
+ tree ltype = build_aligned_type (TREE_TYPE (vectype), align);
+ for (unsigned k = 0; k < const_nunits; ++k)
+ {
+ /* Compute the offsetted pointer. */
+ tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type),
+ bitsize_int (k + elt_offset));
+ tree idx
+ = gimple_build (&stmts, BIT_FIELD_REF, idx_type,
+ vec_offset, TYPE_SIZE (idx_type), boff);
+ idx = gimple_convert (&stmts, sizetype, idx);
+ idx = gimple_build (&stmts, MULT_EXPR, sizetype,
+ idx, scale);
+ tree ptr
+ = gimple_build (&stmts, PLUS_EXPR,
+ TREE_TYPE (dataref_ptr),
+ dataref_ptr, idx);
+ ptr = gimple_convert (&stmts, ptr_type_node, ptr);
+ /* Extract the element to be stored. */
+ tree elt
+ = gimple_build (&stmts, BIT_FIELD_REF,
+ TREE_TYPE (vectype),
+ vec_oprnd, TYPE_SIZE (elt_type),
+ bitsize_int (k * elt_size));
+ gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+ stmts = NULL;
+ tree ref
+ = build2 (MEM_REF, ltype, ptr,
+ build_int_cst (ref_type, 0));
+ new_stmt = gimple_build_assign (ref, elt);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ if (slp)
+ slp_node->push_vec_def (new_stmt);
}
}
- if (j == 0)
- *vec_stmt = new_stmt;
- STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+ if (!slp && !costing_p)
+ STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
}
+ if (!slp && !costing_p)
+ *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+
if (costing_p && dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"vect_model_store_cost: inside_cost = %d, "
--
2.35.3
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH 4/4] Refactor x86 decl based scatter vectorization, prepare SLP
@ 2023-11-08 15:03 Richard Biener
0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2023-11-08 15:03 UTC (permalink / raw)
To: gcc-patches
The following refactors the x86 decl based scatter vectorization
similar to what I did to the gather path. This prepares scatters
for SLP as well, mainly single-lane since there are multiple
missing bits to support multi-lane scatters.
Tested extensively on the SLP-only branch which has the ability
to force SLP even for single lanes.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
PR tree-optimization/111133
* tree-vect-stmts.cc (vect_build_scatter_store_calls):
Remove and refactor to ...
(vect_build_one_scatter_store_call): ... this new function.
(vectorizable_store): Use vect_check_scalar_mask to record
the SLP node for the mask operand. Code generate scatters
with builtin decls from the main scatter vectorization
path and prepare that for SLP.
---
gcc/tree-vect-stmts.cc | 683 ++++++++++++++++++++---------------------
1 file changed, 326 insertions(+), 357 deletions(-)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 913a4fb08ed..f41b4825a6a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2703,238 +2703,87 @@ vect_build_one_gather_load_call (vec_info *vinfo, stmt_vec_info stmt_info,
}
/* Build a scatter store call while vectorizing STMT_INFO. Insert new
- instructions before GSI and add them to VEC_STMT. GS_INFO describes
- the scatter store operation. If the store is conditional, MASK is the
- unvectorized condition, otherwise MASK is null. */
+ instructions before GSI. GS_INFO describes the scatter store operation.
+ PTR is the base pointer, OFFSET the vectorized offsets and OPRND the
+ vectorized data to store.
+ If the store is conditional, MASK is the vectorized condition, otherwise
+ MASK is null. */
-static void
-vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info,
- gimple_stmt_iterator *gsi, gimple **vec_stmt,
- gather_scatter_info *gs_info, tree mask,
- stmt_vector_for_cost *cost_vec)
+static gimple *
+vect_build_one_scatter_store_call (vec_info *vinfo, stmt_vec_info stmt_info,
+ gimple_stmt_iterator *gsi,
+ gather_scatter_info *gs_info,
+ tree ptr, tree offset, tree oprnd, tree mask)
{
- loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
- tree vectype = STMT_VINFO_VECTYPE (stmt_info);
- poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
- int ncopies = vect_get_num_copies (loop_vinfo, vectype);
- enum { NARROW, NONE, WIDEN } modifier;
- poly_uint64 scatter_off_nunits
- = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
-
- /* FIXME: Keep the previous costing way in vect_model_store_cost by
- costing N scalar stores, but it should be tweaked to use target
- specific costs on related scatter store calls. */
- if (cost_vec)
- {
- tree op = vect_get_store_rhs (stmt_info);
- enum vect_def_type dt;
- gcc_assert (vect_is_simple_use (op, vinfo, &dt));
- unsigned int inside_cost, prologue_cost = 0;
- if (dt == vect_constant_def || dt == vect_external_def)
- prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
- stmt_info, 0, vect_prologue);
- unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
- inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
- scalar_store, stmt_info, 0, vect_body);
-
- if (dump_enabled_p ())
- dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_store_cost: inside_cost = %d, "
- "prologue_cost = %d .\n",
- inside_cost, prologue_cost);
- return;
- }
-
- tree perm_mask = NULL_TREE, mask_halfvectype = NULL_TREE;
- if (known_eq (nunits, scatter_off_nunits))
- modifier = NONE;
- else if (known_eq (nunits * 2, scatter_off_nunits))
- {
- modifier = WIDEN;
-
- /* Currently gathers and scatters are only supported for
- fixed-length vectors. */
- unsigned int count = scatter_off_nunits.to_constant ();
- vec_perm_builder sel (count, count, 1);
- for (unsigned i = 0; i < (unsigned int) count; ++i)
- sel.quick_push (i | (count / 2));
-
- vec_perm_indices indices (sel, 1, count);
- perm_mask = vect_gen_perm_mask_checked (gs_info->offset_vectype, indices);
- gcc_assert (perm_mask != NULL_TREE);
- }
- else if (known_eq (nunits, scatter_off_nunits * 2))
- {
- modifier = NARROW;
-
- /* Currently gathers and scatters are only supported for
- fixed-length vectors. */
- unsigned int count = nunits.to_constant ();
- vec_perm_builder sel (count, count, 1);
- for (unsigned i = 0; i < (unsigned int) count; ++i)
- sel.quick_push (i | (count / 2));
-
- vec_perm_indices indices (sel, 2, count);
- perm_mask = vect_gen_perm_mask_checked (vectype, indices);
- gcc_assert (perm_mask != NULL_TREE);
- ncopies *= 2;
-
- if (mask)
- mask_halfvectype = truth_type_for (gs_info->offset_vectype);
- }
- else
- gcc_unreachable ();
-
tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl));
tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl));
- tree ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+ /* tree ptrtype = TREE_VALUE (arglist); */ arglist = TREE_CHAIN (arglist);
tree masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
tree idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
tree scaletype = TREE_VALUE (arglist);
-
gcc_checking_assert (TREE_CODE (masktype) == INTEGER_TYPE
&& TREE_CODE (rettype) == VOID_TYPE);
- tree ptr = fold_convert (ptrtype, gs_info->base);
- if (!is_gimple_min_invariant (ptr))
+ tree mask_arg = NULL_TREE;
+ if (mask)
{
- gimple_seq seq;
- ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE);
- class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
- edge pe = loop_preheader_edge (loop);
- basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
- gcc_assert (!new_bb);
+ mask_arg = mask;
+ tree optype = TREE_TYPE (mask_arg);
+ tree utype;
+ if (TYPE_MODE (masktype) == TYPE_MODE (optype))
+ utype = masktype;
+ else
+ utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1);
+ tree var = vect_get_new_ssa_name (utype, vect_scalar_var);
+ mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_arg);
+ gassign *new_stmt
+ = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg);
+ vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ mask_arg = var;
+ if (!useless_type_conversion_p (masktype, utype))
+ {
+ gcc_assert (TYPE_PRECISION (utype) <= TYPE_PRECISION (masktype));
+ tree var = vect_get_new_ssa_name (masktype, vect_scalar_var);
+ new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg);
+ vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ mask_arg = var;
+ }
}
-
- tree mask_arg = NULL_TREE;
- if (mask == NULL_TREE)
+ else
{
mask_arg = build_int_cst (masktype, -1);
mask_arg = vect_init_vector (vinfo, stmt_info, mask_arg, masktype, NULL);
}
- tree scale = build_int_cst (scaletype, gs_info->scale);
-
- auto_vec<tree> vec_oprnds0;
- auto_vec<tree> vec_oprnds1;
- auto_vec<tree> vec_masks;
- if (mask)
+ tree src = oprnd;
+ if (!useless_type_conversion_p (srctype, TREE_TYPE (src)))
{
- tree mask_vectype = truth_type_for (vectype);
- vect_get_vec_defs_for_operand (vinfo, stmt_info,
- modifier == NARROW ? ncopies / 2 : ncopies,
- mask, &vec_masks, mask_vectype);
+ gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)),
+ TYPE_VECTOR_SUBPARTS (srctype)));
+ tree var = vect_get_new_ssa_name (srctype, vect_simple_var);
+ src = build1 (VIEW_CONVERT_EXPR, srctype, src);
+ gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src);
+ vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ src = var;
}
- vect_get_vec_defs_for_operand (vinfo, stmt_info,
- modifier == WIDEN ? ncopies / 2 : ncopies,
- gs_info->offset, &vec_oprnds0);
- tree op = vect_get_store_rhs (stmt_info);
- vect_get_vec_defs_for_operand (vinfo, stmt_info,
- modifier == NARROW ? ncopies / 2 : ncopies, op,
- &vec_oprnds1);
- tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
- tree mask_op = NULL_TREE;
- tree src, vec_mask;
- for (int j = 0; j < ncopies; ++j)
+ tree op = offset;
+ if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
{
- if (modifier == WIDEN)
- {
- if (j & 1)
- op = permute_vec_elements (vinfo, vec_oprnd0, vec_oprnd0, perm_mask,
- stmt_info, gsi);
- else
- op = vec_oprnd0 = vec_oprnds0[j / 2];
- src = vec_oprnd1 = vec_oprnds1[j];
- if (mask)
- mask_op = vec_mask = vec_masks[j];
- }
- else if (modifier == NARROW)
- {
- if (j & 1)
- src = permute_vec_elements (vinfo, vec_oprnd1, vec_oprnd1,
- perm_mask, stmt_info, gsi);
- else
- src = vec_oprnd1 = vec_oprnds1[j / 2];
- op = vec_oprnd0 = vec_oprnds0[j];
- if (mask)
- mask_op = vec_mask = vec_masks[j / 2];
- }
- else
- {
- op = vec_oprnd0 = vec_oprnds0[j];
- src = vec_oprnd1 = vec_oprnds1[j];
- if (mask)
- mask_op = vec_mask = vec_masks[j];
- }
-
- if (!useless_type_conversion_p (srctype, TREE_TYPE (src)))
- {
- gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)),
- TYPE_VECTOR_SUBPARTS (srctype)));
- tree var = vect_get_new_ssa_name (srctype, vect_simple_var);
- src = build1 (VIEW_CONVERT_EXPR, srctype, src);
- gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- src = var;
- }
-
- if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
- {
- gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
- TYPE_VECTOR_SUBPARTS (idxtype)));
- tree var = vect_get_new_ssa_name (idxtype, vect_simple_var);
- op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
- gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- op = var;
- }
-
- if (mask)
- {
- tree utype;
- mask_arg = mask_op;
- if (modifier == NARROW)
- {
- tree var
- = vect_get_new_ssa_name (mask_halfvectype, vect_simple_var);
- gassign *new_stmt
- = gimple_build_assign (var,
- (j & 1) ? VEC_UNPACK_HI_EXPR
- : VEC_UNPACK_LO_EXPR,
- mask_op);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- mask_arg = var;
- }
- tree optype = TREE_TYPE (mask_arg);
- if (TYPE_MODE (masktype) == TYPE_MODE (optype))
- utype = masktype;
- else
- utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1);
- tree var = vect_get_new_ssa_name (utype, vect_scalar_var);
- mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_arg);
- gassign *new_stmt
- = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- mask_arg = var;
- if (!useless_type_conversion_p (masktype, utype))
- {
- gcc_assert (TYPE_PRECISION (utype) <= TYPE_PRECISION (masktype));
- tree var = vect_get_new_ssa_name (masktype, vect_scalar_var);
- new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
- mask_arg = var;
- }
- }
-
- gcall *new_stmt
- = gimple_build_call (gs_info->decl, 5, ptr, mask_arg, op, src, scale);
+ gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
+ TYPE_VECTOR_SUBPARTS (idxtype)));
+ tree var = vect_get_new_ssa_name (idxtype, vect_simple_var);
+ op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
+ gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op);
vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
-
- STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+ op = var;
}
- *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+
+ tree scale = build_int_cst (scaletype, gs_info->scale);
+ gcall *new_stmt
+ = gimple_build_call (gs_info->decl, 5, ptr, mask_arg, op, src, scale);
+ return new_stmt;
}
/* Prepare the base and offset in GS_INFO for vectorization.
@@ -8209,6 +8058,7 @@ vectorizable_store (vec_info *vinfo,
/* Is vectorizable store? */
tree mask = NULL_TREE, mask_vectype = NULL_TREE;
+ slp_tree mask_node = NULL;
if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
{
tree scalar_dest = gimple_assign_lhs (assign);
@@ -8240,7 +8090,8 @@ vectorizable_store (vec_info *vinfo,
(call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info));
if (mask_index >= 0
&& !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask, &mask_node, &mask_dt,
+ &mask_vectype))
return false;
}
@@ -8409,13 +8260,7 @@ vectorizable_store (vec_info *vinfo,
ensure_base_align (dr_info);
- if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl)
- {
- vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info,
- mask, cost_vec);
- return true;
- }
- else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
+ if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
{
gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
gcc_assert (!slp);
@@ -9052,7 +8897,7 @@ vectorizable_store (vec_info *vinfo,
if (memory_access_type == VMAT_GATHER_SCATTER)
{
- gcc_assert (!slp && !grouped_store);
+ gcc_assert (!grouped_store);
auto_vec<tree> vec_offsets;
unsigned int inside_cost = 0, prologue_cost = 0;
for (j = 0; j < ncopies; j++)
@@ -9068,22 +8913,22 @@ vectorizable_store (vec_info *vinfo,
/* Since the store is not grouped, DR_GROUP_SIZE is 1, and
DR_CHAIN is of size 1. */
gcc_assert (group_size == 1);
- op = vect_get_store_rhs (first_stmt_info);
- vect_get_vec_defs_for_operand (vinfo, first_stmt_info,
- ncopies, op, gvec_oprnds[0]);
- vec_oprnd = (*gvec_oprnds[0])[0];
- dr_chain.quick_push (vec_oprnd);
+ if (slp_node)
+ vect_get_slp_defs (op_node, gvec_oprnds[0]);
+ else
+ vect_get_vec_defs_for_operand (vinfo, first_stmt_info,
+ ncopies, op, gvec_oprnds[0]);
if (mask)
{
- vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies,
- mask, &vec_masks,
- mask_vectype);
- vec_mask = vec_masks[0];
+ if (slp_node)
+ vect_get_slp_defs (mask_node, &vec_masks);
+ else
+ vect_get_vec_defs_for_operand (vinfo, stmt_info,
+ ncopies,
+ mask, &vec_masks,
+ mask_vectype);
}
- /* We should have catched mismatched types earlier. */
- gcc_assert (
- useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd)));
if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
slp_node, &gs_info,
@@ -9099,156 +8944,280 @@ vectorizable_store (vec_info *vinfo,
else if (!costing_p)
{
gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
- vec_oprnd = (*gvec_oprnds[0])[j];
- dr_chain[0] = vec_oprnd;
- if (mask)
- vec_mask = vec_masks[j];
if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
gsi, stmt_info, bump);
}
new_stmt = NULL;
- unsigned HOST_WIDE_INT align;
- tree final_mask = NULL_TREE;
- tree final_len = NULL_TREE;
- tree bias = NULL_TREE;
- if (!costing_p)
- {
- if (loop_masks)
- final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
- ncopies, vectype, j);
- if (vec_mask)
- final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
- final_mask, vec_mask, gsi);
- }
-
- if (gs_info.ifn != IFN_LAST)
+ for (i = 0; i < vec_num; ++i)
{
- if (costing_p)
+ if (!costing_p)
{
- unsigned int cnunits = vect_nunits_for_cost (vectype);
- inside_cost
- += record_stmt_cost (cost_vec, cnunits, scalar_store,
- stmt_info, 0, vect_body);
- continue;
+ vec_oprnd = (*gvec_oprnds[0])[vec_num * j + i];
+ if (mask)
+ vec_mask = vec_masks[vec_num * j + i];
+ /* We should have catched mismatched types earlier. */
+ gcc_assert (useless_type_conversion_p (vectype,
+ TREE_TYPE (vec_oprnd)));
+ }
+ unsigned HOST_WIDE_INT align;
+ tree final_mask = NULL_TREE;
+ tree final_len = NULL_TREE;
+ tree bias = NULL_TREE;
+ if (!costing_p)
+ {
+ if (loop_masks)
+ final_mask = vect_get_loop_mask (loop_vinfo, gsi,
+ loop_masks, ncopies,
+ vectype, j);
+ if (vec_mask)
+ final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
+ final_mask, vec_mask, gsi);
}
- if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
- vec_offset = vec_offsets[j];
- tree scale = size_int (gs_info.scale);
-
- if (gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE)
+ if (gs_info.ifn != IFN_LAST)
{
- if (loop_lens)
- final_len = vect_get_loop_len (loop_vinfo, gsi, loop_lens,
- ncopies, vectype, j, 1);
+ if (costing_p)
+ {
+ unsigned int cnunits = vect_nunits_for_cost (vectype);
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, scalar_store,
+ stmt_info, 0, vect_body);
+ continue;
+ }
+
+ if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+ vec_offset = vec_offsets[vec_num * j + i];
+ tree scale = size_int (gs_info.scale);
+
+ if (gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE)
+ {
+ if (loop_lens)
+ final_len = vect_get_loop_len (loop_vinfo, gsi,
+ loop_lens, ncopies,
+ vectype, j, 1);
+ else
+ final_len = size_int (TYPE_VECTOR_SUBPARTS (vectype));
+ signed char biasval
+ = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ if (!final_mask)
+ {
+ mask_vectype = truth_type_for (vectype);
+ final_mask = build_minus_one_cst (mask_vectype);
+ }
+ }
+
+ gcall *call;
+ if (final_len && final_mask)
+ call = gimple_build_call_internal
+ (IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr,
+ vec_offset, scale, vec_oprnd, final_mask,
+ final_len, bias);
+ else if (final_mask)
+ call = gimple_build_call_internal
+ (IFN_MASK_SCATTER_STORE, 5, dataref_ptr,
+ vec_offset, scale, vec_oprnd, final_mask);
else
- final_len = build_int_cst (sizetype,
- TYPE_VECTOR_SUBPARTS (vectype));
- signed char biasval
- = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
- bias = build_int_cst (intQI_type_node, biasval);
- if (!final_mask)
+ call = gimple_build_call_internal (IFN_SCATTER_STORE, 4,
+ dataref_ptr, vec_offset,
+ scale, vec_oprnd);
+ gimple_call_set_nothrow (call, true);
+ vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
+ new_stmt = call;
+ }
+ else if (gs_info.decl)
+ {
+ /* The builtin decls path for scatter is legacy, x86 only. */
+ gcc_assert (nunits.is_constant ()
+ && (!final_mask
+ || SCALAR_INT_MODE_P
+ (TYPE_MODE (TREE_TYPE (final_mask)))));
+ if (costing_p)
{
- mask_vectype = truth_type_for (vectype);
- final_mask = build_minus_one_cst (mask_vectype);
+ unsigned int cnunits = vect_nunits_for_cost (vectype);
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, scalar_store,
+ stmt_info, 0, vect_body);
+ continue;
}
+ poly_uint64 offset_nunits
+ = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
+ if (known_eq (nunits, offset_nunits))
+ {
+ new_stmt = vect_build_one_scatter_store_call
+ (vinfo, stmt_info, gsi, &gs_info,
+ dataref_ptr, vec_offsets[vec_num * j + i],
+ vec_oprnd, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ else if (known_eq (nunits, offset_nunits * 2))
+ {
+ /* We have a offset vector with half the number of
+ lanes but the builtins will store full vectype
+ data from the lower lanes. */
+ new_stmt = vect_build_one_scatter_store_call
+ (vinfo, stmt_info, gsi, &gs_info,
+ dataref_ptr,
+ vec_offsets[2 * vec_num * j + 2 * i],
+ vec_oprnd, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ int count = nunits.to_constant ();
+ vec_perm_builder sel (count, count, 1);
+ sel.quick_grow (count);
+ for (int i = 0; i < count; ++i)
+ sel[i] = i | (count / 2);
+ vec_perm_indices indices (sel, 2, count);
+ tree perm_mask
+ = vect_gen_perm_mask_checked (vectype, indices);
+ new_stmt = gimple_build_assign (NULL_TREE, VEC_PERM_EXPR,
+ vec_oprnd, vec_oprnd,
+ perm_mask);
+ vec_oprnd = make_ssa_name (vectype);
+ gimple_set_lhs (new_stmt, vec_oprnd);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ if (final_mask)
+ {
+ new_stmt = gimple_build_assign (NULL_TREE,
+ VEC_UNPACK_HI_EXPR,
+ final_mask);
+ final_mask = make_ssa_name
+ (truth_type_for (gs_info.offset_vectype));
+ gimple_set_lhs (new_stmt, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ new_stmt = vect_build_one_scatter_store_call
+ (vinfo, stmt_info, gsi, &gs_info,
+ dataref_ptr,
+ vec_offsets[2 * vec_num * j + 2 * i + 1],
+ vec_oprnd, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ else if (known_eq (nunits * 2, offset_nunits))
+ {
+ /* We have a offset vector with double the number of
+ lanes. Select the low/high part accordingly. */
+ vec_offset = vec_offsets[(vec_num * j + i) / 2];
+ if ((vec_num * j + i) & 1)
+ {
+ int count = offset_nunits.to_constant ();
+ vec_perm_builder sel (count, count, 1);
+ sel.quick_grow (count);
+ for (int i = 0; i < count; ++i)
+ sel[i] = i | (count / 2);
+ vec_perm_indices indices (sel, 2, count);
+ tree perm_mask = vect_gen_perm_mask_checked
+ (TREE_TYPE (vec_offset), indices);
+ new_stmt = gimple_build_assign (NULL_TREE,
+ VEC_PERM_EXPR,
+ vec_offset,
+ vec_offset,
+ perm_mask);
+ vec_offset = make_ssa_name (TREE_TYPE (vec_offset));
+ gimple_set_lhs (new_stmt, vec_offset);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ new_stmt = vect_build_one_scatter_store_call
+ (vinfo, stmt_info, gsi, &gs_info,
+ dataref_ptr, vec_offset,
+ vec_oprnd, final_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ else
+ gcc_unreachable ();
}
-
- gcall *call;
- if (final_len && final_mask)
- call = gimple_build_call_internal (IFN_MASK_LEN_SCATTER_STORE,
- 7, dataref_ptr, vec_offset,
- scale, vec_oprnd, final_mask,
- final_len, bias);
- else if (final_mask)
- call
- = gimple_build_call_internal (IFN_MASK_SCATTER_STORE, 5,
- dataref_ptr, vec_offset, scale,
- vec_oprnd, final_mask);
else
- call = gimple_build_call_internal (IFN_SCATTER_STORE, 4,
- dataref_ptr, vec_offset,
- scale, vec_oprnd);
- gimple_call_set_nothrow (call, true);
- vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
- new_stmt = call;
- }
- else
- {
- /* Emulated scatter. */
- gcc_assert (!final_mask);
- if (costing_p)
{
- unsigned int cnunits = vect_nunits_for_cost (vectype);
- /* For emulated scatter N offset vector element extracts
- (we assume the scalar scaling and ptr + offset add is
- consumed by the load). */
- inside_cost
- += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
- stmt_info, 0, vect_body);
- /* N scalar stores plus extracting the elements. */
- inside_cost
- += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
- stmt_info, 0, vect_body);
- inside_cost
- += record_stmt_cost (cost_vec, cnunits, scalar_store,
- stmt_info, 0, vect_body);
- continue;
- }
+ /* Emulated scatter. */
+ gcc_assert (!final_mask);
+ if (costing_p)
+ {
+ unsigned int cnunits = vect_nunits_for_cost (vectype);
+ /* For emulated scatter N offset vector element extracts
+ (we assume the scalar scaling and ptr + offset add is
+ consumed by the load). */
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
+ stmt_info, 0, vect_body);
+ /* N scalar stores plus extracting the elements. */
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
+ stmt_info, 0, vect_body);
+ inside_cost
+ += record_stmt_cost (cost_vec, cnunits, scalar_store,
+ stmt_info, 0, vect_body);
+ continue;
+ }
- unsigned HOST_WIDE_INT const_nunits = nunits.to_constant ();
- unsigned HOST_WIDE_INT const_offset_nunits
- = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant ();
- vec<constructor_elt, va_gc> *ctor_elts;
- vec_alloc (ctor_elts, const_nunits);
- gimple_seq stmts = NULL;
- tree elt_type = TREE_TYPE (vectype);
- unsigned HOST_WIDE_INT elt_size
- = tree_to_uhwi (TYPE_SIZE (elt_type));
- /* We support offset vectors with more elements
- than the data vector for now. */
- unsigned HOST_WIDE_INT factor
- = const_offset_nunits / const_nunits;
- vec_offset = vec_offsets[j / factor];
- unsigned elt_offset = (j % factor) * const_nunits;
- tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset));
- tree scale = size_int (gs_info.scale);
- align = get_object_alignment (DR_REF (first_dr_info->dr));
- tree ltype = build_aligned_type (TREE_TYPE (vectype), align);
- for (unsigned k = 0; k < const_nunits; ++k)
- {
- /* Compute the offsetted pointer. */
- tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type),
- bitsize_int (k + elt_offset));
- tree idx
- = gimple_build (&stmts, BIT_FIELD_REF, idx_type, vec_offset,
- TYPE_SIZE (idx_type), boff);
- idx = gimple_convert (&stmts, sizetype, idx);
- idx = gimple_build (&stmts, MULT_EXPR, sizetype, idx, scale);
- tree ptr
- = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (dataref_ptr),
- dataref_ptr, idx);
- ptr = gimple_convert (&stmts, ptr_type_node, ptr);
- /* Extract the element to be stored. */
- tree elt
- = gimple_build (&stmts, BIT_FIELD_REF, TREE_TYPE (vectype),
- vec_oprnd, TYPE_SIZE (elt_type),
- bitsize_int (k * elt_size));
- gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
- stmts = NULL;
- tree ref
- = build2 (MEM_REF, ltype, ptr, build_int_cst (ref_type, 0));
- new_stmt = gimple_build_assign (ref, elt);
- vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+ unsigned HOST_WIDE_INT const_nunits = nunits.to_constant ();
+ unsigned HOST_WIDE_INT const_offset_nunits
+ = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant ();
+ vec<constructor_elt, va_gc> *ctor_elts;
+ vec_alloc (ctor_elts, const_nunits);
+ gimple_seq stmts = NULL;
+ tree elt_type = TREE_TYPE (vectype);
+ unsigned HOST_WIDE_INT elt_size
+ = tree_to_uhwi (TYPE_SIZE (elt_type));
+ /* We support offset vectors with more elements
+ than the data vector for now. */
+ unsigned HOST_WIDE_INT factor
+ = const_offset_nunits / const_nunits;
+ vec_offset = vec_offsets[(vec_num * j + i) / factor];
+ unsigned elt_offset = (j % factor) * const_nunits;
+ tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset));
+ tree scale = size_int (gs_info.scale);
+ align = get_object_alignment (DR_REF (first_dr_info->dr));
+ tree ltype = build_aligned_type (TREE_TYPE (vectype), align);
+ for (unsigned k = 0; k < const_nunits; ++k)
+ {
+ /* Compute the offsetted pointer. */
+ tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type),
+ bitsize_int (k + elt_offset));
+ tree idx
+ = gimple_build (&stmts, BIT_FIELD_REF, idx_type,
+ vec_offset, TYPE_SIZE (idx_type), boff);
+ idx = gimple_convert (&stmts, sizetype, idx);
+ idx = gimple_build (&stmts, MULT_EXPR, sizetype,
+ idx, scale);
+ tree ptr
+ = gimple_build (&stmts, PLUS_EXPR,
+ TREE_TYPE (dataref_ptr),
+ dataref_ptr, idx);
+ ptr = gimple_convert (&stmts, ptr_type_node, ptr);
+ /* Extract the element to be stored. */
+ tree elt
+ = gimple_build (&stmts, BIT_FIELD_REF,
+ TREE_TYPE (vectype),
+ vec_oprnd, TYPE_SIZE (elt_type),
+ bitsize_int (k * elt_size));
+ gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+ stmts = NULL;
+ tree ref
+ = build2 (MEM_REF, ltype, ptr,
+ build_int_cst (ref_type, 0));
+ new_stmt = gimple_build_assign (ref, elt);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+ new_stmt, gsi);
+ }
+ if (slp)
+ slp_node->push_vec_def (new_stmt);
}
}
- if (j == 0)
- *vec_stmt = new_stmt;
- STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+ if (!slp && !costing_p)
+ STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
}
+ if (!slp && !costing_p)
+ *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+
if (costing_p && dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"vect_model_store_cost: inside_cost = %d, "
--
2.35.3
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-11-09 12:59 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09 12:59 [PATCH 4/4] Refactor x86 decl based scatter vectorization, prepare SLP Richard Biener
-- strict thread matches above, loose matches on Subject: below --
2023-11-08 15:03 Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).