From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) by sourceware.org (Postfix) with ESMTPS id 2F1C23858D1E for ; Tue, 22 Aug 2023 13:29:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2F1C23858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2bbac8ec902so70135171fa.1 for ; Tue, 22 Aug 2023 06:29:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692710954; x=1693315754; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5CapJlCa0eB0JHa4IaTM4PUeaLPVGpuLQ9JXF6ZkthA=; b=WDzoMrDCdOv3w83dZSUHOTUHJKnCRTA/1nEdDJzg79mFeQ22MdGU8z4dJ5rFbrhczX qesP8SpgGg+nH0FhVdaE6sT+VyurbQNVuNyEC9d1lB1ikw8QJa3Uw0yFAdXo4Hj6Spuw wEjD5UmB5ode09lS2exWx/zzX5QE2Q5oHXC7leDMErGARl0eV/+HbYwptG1ZDI9DtPqs zrA8Y2Lm/SbfgpTDpAfd/Uv3osDV4V0Nxo19ooGw9ONCw3fmKf2wL6E3owkzcpoIQbYG vElL7r479AzK92bmsYJ56BupOCDljAbTTOP3bfXk5ldUxcQoYs57tmstKHq/ACvjD07O WlBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692710954; x=1693315754; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5CapJlCa0eB0JHa4IaTM4PUeaLPVGpuLQ9JXF6ZkthA=; b=IWh5yOiMNPeWluN9VCEY4ntzWDZzqAlCVPAT7cVRDQu8N3nhpesMJwrIGkIOMtXeiW TsTMBB+aMg6n9f+XUjIcC7tTL4/tfpl37EWBMNkOUfbyWzMm11kpIIdXyjSkiX+UpeqO CTsSywI7V5ETxqVrA0yV+pXcv0QmwH4Kcs93ci68YjBvsE2XjJeFh1gLOCME/7BE1tb2 7cXVc0cF6VuExzjysDPa2q2uSivHygv/xmvnjArA/515idIOP33mgPgZrI044lvszOdR ehR6uFZ3lTtqkDPChC0RHkaMnmSW/w4W2UA8MxHoLgXfjAgmigmUndKEEI9/WpUnpNPu rPmQ== X-Gm-Message-State: AOJu0YxonqY6jcdDZT3bWFx7lRPB3EkmJlDniYkCix3dXBTHbQ4OSQiy tKbAccr6sHqTmyjgAfV3xWlyNuoZcuN0AwbiqmOuFtCD X-Google-Smtp-Source: AGHT+IEyzU58lahdi2Y8cl22Bhk4170KUA24EUBDkAgC7fzXancOjVp/jhaXbBrSn7Y1rs6lMDHWmRUKja+wbwqGlbA= X-Received: by 2002:a2e:320c:0:b0:2b9:e93e:65e6 with SMTP id y12-20020a2e320c000000b002b9e93e65e6mr7122473ljy.35.1692710953312; Tue, 22 Aug 2023 06:29:13 -0700 (PDT) MIME-Version: 1.0 References: <8c6c6b96-0b97-4eed-5b88-bda2b3dcc902@linux.ibm.com> <8a82c294-eaab-bfb2-5e2d-a08d38f3e570@linux.ibm.com> In-Reply-To: <8a82c294-eaab-bfb2-5e2d-a08d38f3e570@linux.ibm.com> From: Richard Biener Date: Tue, 22 Aug 2023 15:27:43 +0200 Message-ID: Subject: Re: [PATCH 2/3] vect: Move VMAT_LOAD_STORE_LANES handlings from final loop nest To: "Kewen.Lin" Cc: GCC Patches , Richard Sandiford , Segher Boessenkool , Peter Bergner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Aug 22, 2023 at 10:49=E2=80=AFAM Kewen.Lin wr= ote: > > Hi, > > Like commit r14-3214 which moves the handlings on memory > access type VMAT_LOAD_STORE_LANES in vectorizable_load > final loop nest, this one is to deal with the function > vectorizable_store. > > Bootstrapped and regtested on x86_64-redhat-linux, > aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > > Is it ok for trunk? OK. > BR, > Kewen > ----- > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_store): Move the handlings on > VMAT_LOAD_STORE_LANES in the final loop nest to its own loop, > and update the final nest accordingly. > --- > gcc/tree-vect-stmts.cc | 732 ++++++++++++++++++++++------------------- > 1 file changed, 387 insertions(+), 345 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index fcaa4127e52..18f5ebcc09c 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -8779,42 +8779,29 @@ vectorizable_store (vec_info *vinfo, > */ > > auto_vec dr_chain (group_size); > - auto_vec result_chain (group_size); > auto_vec vec_masks; > tree vec_mask =3D NULL; > - auto_vec vec_offsets; > auto_delete_vec> gvec_oprnds (group_size); > for (i =3D 0; i < group_size; i++) > gvec_oprnds.quick_push (new auto_vec (ncopies)); > - auto_vec vec_oprnds; > - for (j =3D 0; j < ncopies; j++) > + > + if (memory_access_type =3D=3D VMAT_LOAD_STORE_LANES) > { > - gimple *new_stmt; > - if (j =3D=3D 0) > + gcc_assert (!slp && grouped_store); > + for (j =3D 0; j < ncopies; j++) > { > - if (slp) > - { > - /* Get vectorized arguments for SLP_NODE. */ > - vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, > - op, &vec_oprnds); > - vec_oprnd =3D vec_oprnds[0]; > - } > - else > - { > - /* For interleaved stores we collect vectorized defs for al= l the > - stores in the group in DR_CHAIN. DR_CHAIN is then used a= s an > - input to vect_permute_store_chain(). > - > - If the store is not grouped, DR_GROUP_SIZE is 1, and DR_= CHAIN > - is of size 1. */ > + gimple *new_stmt; > + if (j =3D=3D 0) > + { > + /* For interleaved stores we collect vectorized defs for al= l > + the stores in the group in DR_CHAIN. DR_CHAIN is then us= ed > + as an input to vect_permute_store_chain(). */ > stmt_vec_info next_stmt_info =3D first_stmt_info; > for (i =3D 0; i < group_size; i++) > { > /* Since gaps are not supported for interleaved stores, > - DR_GROUP_SIZE is the exact number of stmts in the ch= ain. > - Therefore, NEXT_STMT_INFO can't be NULL_TREE. In ca= se > - that there is no interleaving, DR_GROUP_SIZE is 1, > - and only one iteration of the loop will be executed.= */ > + DR_GROUP_SIZE is the exact number of stmts in the > + chain. Therefore, NEXT_STMT_INFO can't be NULL_TREE.= */ > op =3D vect_get_store_rhs (next_stmt_info); > vect_get_vec_defs_for_operand (vinfo, next_stmt_info, n= copies, > op, gvec_oprnds[i]); > @@ -8825,66 +8812,37 @@ vectorizable_store (vec_info *vinfo, > if (mask) > { > vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopie= s, > - mask, &vec_masks, mask_v= ectype); > + mask, &vec_masks, > + mask_vectype); > vec_mask =3D vec_masks[0]; > } > - } > > - /* We should have catched mismatched types earlier. */ > - gcc_assert (useless_type_conversion_p (vectype, > - TREE_TYPE (vec_oprnd))); > - bool simd_lane_access_p > - =3D STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) !=3D 0; > - if (simd_lane_access_p > - && !loop_masks > - && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) =3D=3D A= DDR_EXPR > - && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr)= , 0)) > - && integer_zerop (get_dr_vinfo_offset (vinfo, first_dr_info= )) > - && integer_zerop (DR_INIT (first_dr_info->dr)) > - && alias_sets_conflict_p (get_alias_set (aggr_type), > - get_alias_set (TREE_TYPE (ref_typ= e)))) > - { > - dataref_ptr =3D unshare_expr (DR_BASE_ADDRESS (first_dr_inf= o->dr)); > - dataref_offset =3D build_int_cst (ref_type, 0); > + /* We should have catched mismatched types earlier. */ > + gcc_assert ( > + useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd)= )); > + dataref_ptr > + =3D vect_create_data_ref_ptr (vinfo, first_stmt_info, agg= r_type, > + NULL, offset, &dummy, gsi, > + &ptr_incr, false, bump); > } > - else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > - vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, > - slp_node, &gs_info, &dataref_ptr= , > - &vec_offsets); > else > - dataref_ptr > - =3D vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_= type, > - simd_lane_access_p ? loop : NUL= L, > - offset, &dummy, gsi, &ptr_incr, > - simd_lane_access_p, bump); > - } > - else > - { > - gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > - /* DR_CHAIN is then used as an input to vect_permute_store_chai= n(). > - If the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAI= N is > - of size 1. */ > - for (i =3D 0; i < group_size; i++) > { > - vec_oprnd =3D (*gvec_oprnds[i])[j]; > - dr_chain[i] =3D vec_oprnd; > + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > + /* DR_CHAIN is then used as an input to > + vect_permute_store_chain(). */ > + for (i =3D 0; i < group_size; i++) > + { > + vec_oprnd =3D (*gvec_oprnds[i])[j]; > + dr_chain[i] =3D vec_oprnd; > + } > + if (mask) > + vec_mask =3D vec_masks[j]; > + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_in= cr, gsi, > + stmt_info, bump); > } > - if (mask) > - vec_mask =3D vec_masks[j]; > - if (dataref_offset) > - dataref_offset > - =3D int_const_binop (PLUS_EXPR, dataref_offset, bump); > - else if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > - dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_incr= , gsi, > - stmt_info, bump); > - } > - > - if (memory_access_type =3D=3D VMAT_LOAD_STORE_LANES) > - { > - tree vec_array; > > /* Get an array into which we can store the individual vectors.= */ > - vec_array =3D create_vector_array (vectype, vec_num); > + tree vec_array =3D create_vector_array (vectype, vec_num); > > /* Invalidate the current contents of VEC_ARRAY. This should > become an RTL clobber too, which prevents the vector registe= rs > @@ -8895,8 +8853,8 @@ vectorizable_store (vec_info *vinfo, > for (i =3D 0; i < vec_num; i++) > { > vec_oprnd =3D dr_chain[i]; > - write_vector_array (vinfo, stmt_info, > - gsi, vec_oprnd, vec_array, i); > + write_vector_array (vinfo, stmt_info, gsi, vec_oprnd, vec_a= rray, > + i); > } > > tree final_mask =3D NULL; > @@ -8906,8 +8864,8 @@ vectorizable_store (vec_info *vinfo, > final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_mask= s, > ncopies, vectype, j); > if (vec_mask) > - final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype, > - final_mask, vec_mask, gsi); > + final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype, fi= nal_mask, > + vec_mask, gsi); > > if (lanes_ifn =3D=3D IFN_MASK_LEN_STORE_LANES) > { > @@ -8955,8 +8913,7 @@ vectorizable_store (vec_info *vinfo, > /* Emit: > MEM_REF[...all elements...] =3D STORE_LANES (VEC_ARRAY= ). */ > data_ref =3D create_array_ref (aggr_type, dataref_ptr, ref_= type); > - call =3D gimple_build_call_internal (IFN_STORE_LANES, 1, > - vec_array); > + call =3D gimple_build_call_internal (IFN_STORE_LANES, 1, ve= c_array); > gimple_call_set_lhs (call, data_ref); > } > gimple_call_set_nothrow (call, true); > @@ -8965,301 +8922,386 @@ vectorizable_store (vec_info *vinfo, > > /* Record that VEC_ARRAY is now dead. */ > vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); > + if (j =3D=3D 0) > + *vec_stmt =3D new_stmt; > + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); > } > - else > - { > - new_stmt =3D NULL; > - if (grouped_store) > - /* Permute. */ > - vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_i= nfo, > - gsi, &result_chain); > > - stmt_vec_info next_stmt_info =3D first_stmt_info; > - for (i =3D 0; i < vec_num; i++) > - { > - unsigned misalign; > - unsigned HOST_WIDE_INT align; > + return true; > + } > > - tree final_mask =3D NULL_TREE; > - tree final_len =3D NULL_TREE; > - tree bias =3D NULL_TREE; > - if (loop_masks) > - final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_= masks, > - vec_num * ncopies, > - vectype, vec_num * j + i= ); > - if (vec_mask) > - final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype= , > - final_mask, vec_mask, gsi)= ; > + auto_vec result_chain (group_size); > + auto_vec vec_offsets; > + auto_vec vec_oprnds; > + for (j =3D 0; j < ncopies; j++) > + { > + gimple *new_stmt; > + if (j =3D=3D 0) > + { > + if (slp) > + { > + /* Get vectorized arguments for SLP_NODE. */ > + vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op, > + &vec_oprnds); > + vec_oprnd =3D vec_oprnds[0]; > + } > + else > + { > + /* For interleaved stores we collect vectorized defs for al= l the > + stores in the group in DR_CHAIN. DR_CHAIN is then used a= s an > + input to vect_permute_store_chain(). > > - if (memory_access_type =3D=3D VMAT_GATHER_SCATTER > - && gs_info.ifn !=3D IFN_LAST) > + If the store is not grouped, DR_GROUP_SIZE is 1, and DR_= CHAIN > + is of size 1. */ > + stmt_vec_info next_stmt_info =3D first_stmt_info; > + for (i =3D 0; i < group_size; i++) > { > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > - vec_offset =3D vec_offsets[vec_num * j + i]; > - tree scale =3D size_int (gs_info.scale); > - > - if (gs_info.ifn =3D=3D IFN_MASK_LEN_SCATTER_STORE) > - { > - if (loop_lens) > - final_len > - =3D vect_get_loop_len (loop_vinfo, gsi, loop_le= ns, > - vec_num * ncopies, vectype= , > - vec_num * j + i, 1); > - else > - final_len > - =3D build_int_cst (sizetype, > - TYPE_VECTOR_SUBPARTS (vectype)= ); > - signed char biasval > - =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinf= o); > - bias =3D build_int_cst (intQI_type_node, biasval); > - if (!final_mask) > - { > - mask_vectype =3D truth_type_for (vectype); > - final_mask =3D build_minus_one_cst (mask_vectyp= e); > - } > - } > - > - gcall *call; > - if (final_len && final_mask) > - call > - =3D gimple_build_call_internal (IFN_MASK_LEN_SCATTE= R_STORE, > - 7, dataref_ptr, vec_o= ffset, > - scale, vec_oprnd, fin= al_mask, > - final_len, bias); > - else if (final_mask) > - call =3D gimple_build_call_internal > - (IFN_MASK_SCATTER_STORE, 5, dataref_ptr, vec_offset= , > - scale, vec_oprnd, final_mask); > - else > - call =3D gimple_build_call_internal > - (IFN_SCATTER_STORE, 4, dataref_ptr, vec_offset, > - scale, vec_oprnd); > - gimple_call_set_nothrow (call, true); > - vect_finish_stmt_generation (vinfo, stmt_info, call, gs= i); > - new_stmt =3D call; > - break; > + /* Since gaps are not supported for interleaved stores, > + DR_GROUP_SIZE is the exact number of stmts in the ch= ain. > + Therefore, NEXT_STMT_INFO can't be NULL_TREE. In ca= se > + that there is no interleaving, DR_GROUP_SIZE is 1, > + and only one iteration of the loop will be executed.= */ > + op =3D vect_get_store_rhs (next_stmt_info); > + vect_get_vec_defs_for_operand (vinfo, next_stmt_info, n= copies, > + op, gvec_oprnds[i]); > + vec_oprnd =3D (*gvec_oprnds[i])[0]; > + dr_chain.quick_push (vec_oprnd); > + next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt_inf= o); > } > - else if (memory_access_type =3D=3D VMAT_GATHER_SCATTER) > + if (mask) > { > - /* Emulated scatter. */ > - gcc_assert (!final_mask); > - unsigned HOST_WIDE_INT const_nunits =3D nunits.to_const= ant (); > - unsigned HOST_WIDE_INT const_offset_nunits > - =3D TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) > - .to_constant (); > - vec *ctor_elts; > - vec_alloc (ctor_elts, const_nunits); > - gimple_seq stmts =3D NULL; > - tree elt_type =3D TREE_TYPE (vectype); > - unsigned HOST_WIDE_INT elt_size > - =3D tree_to_uhwi (TYPE_SIZE (elt_type)); > - /* We support offset vectors with more elements > - than the data vector for now. */ > - unsigned HOST_WIDE_INT factor > - =3D const_offset_nunits / const_nunits; > - vec_offset =3D vec_offsets[j / factor]; > - unsigned elt_offset =3D (j % factor) * const_nunits; > - tree idx_type =3D TREE_TYPE (TREE_TYPE (vec_offset)); > - tree scale =3D size_int (gs_info.scale); > - align =3D get_object_alignment (DR_REF (first_dr_info->= dr)); > - tree ltype =3D build_aligned_type (TREE_TYPE (vectype),= align); > - for (unsigned k =3D 0; k < const_nunits; ++k) > - { > - /* Compute the offsetted pointer. */ > - tree boff =3D size_binop (MULT_EXPR, TYPE_SIZE (idx= _type), > - bitsize_int (k + elt_offset= )); > - tree idx =3D gimple_build (&stmts, BIT_FIELD_REF, > - idx_type, vec_offset, > - TYPE_SIZE (idx_type), boff= ); > - idx =3D gimple_convert (&stmts, sizetype, idx); > - idx =3D gimple_build (&stmts, MULT_EXPR, > - sizetype, idx, scale); > - tree ptr =3D gimple_build (&stmts, PLUS_EXPR, > - TREE_TYPE (dataref_ptr), > - dataref_ptr, idx); > - ptr =3D gimple_convert (&stmts, ptr_type_node, ptr)= ; > - /* Extract the element to be stored. */ > - tree elt =3D gimple_build (&stmts, BIT_FIELD_REF, > - TREE_TYPE (vectype), vec_o= prnd, > - TYPE_SIZE (elt_type), > - bitsize_int (k * elt_size)= ); > - gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > - stmts =3D NULL; > - tree ref =3D build2 (MEM_REF, ltype, ptr, > - build_int_cst (ref_type, 0)); > - new_stmt =3D gimple_build_assign (ref, elt); > - vect_finish_stmt_generation (vinfo, stmt_info, > - new_stmt, gsi); > - } > - break; > + vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopie= s, > + mask, &vec_masks, > + mask_vectype); > + vec_mask =3D vec_masks[0]; > } > + } > > - if (i > 0) > - /* Bump the vector pointer. */ > - dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_= incr, > - gsi, stmt_info, bump); > + /* We should have catched mismatched types earlier. */ > + gcc_assert (useless_type_conversion_p (vectype, > + TREE_TYPE (vec_oprnd))); > + bool simd_lane_access_p > + =3D STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) !=3D 0; > + if (simd_lane_access_p > + && !loop_masks > + && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) =3D=3D A= DDR_EXPR > + && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr)= , 0)) > + && integer_zerop (get_dr_vinfo_offset (vinfo, first_dr_info= )) > + && integer_zerop (DR_INIT (first_dr_info->dr)) > + && alias_sets_conflict_p (get_alias_set (aggr_type), > + get_alias_set (TREE_TYPE (ref_typ= e)))) > + { > + dataref_ptr =3D unshare_expr (DR_BASE_ADDRESS (first_dr_inf= o->dr)); > + dataref_offset =3D build_int_cst (ref_type, 0); > + } > + else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > + vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, slp= _node, > + &gs_info, &dataref_ptr, &vec_off= sets); > + else > + dataref_ptr > + =3D vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_= type, > + simd_lane_access_p ? loop : NUL= L, > + offset, &dummy, gsi, &ptr_incr, > + simd_lane_access_p, bump); > + } > + else > + { > + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > + /* DR_CHAIN is then used as an input to vect_permute_store_chai= n(). > + If the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAI= N is > + of size 1. */ > + for (i =3D 0; i < group_size; i++) > + { > + vec_oprnd =3D (*gvec_oprnds[i])[j]; > + dr_chain[i] =3D vec_oprnd; > + } > + if (mask) > + vec_mask =3D vec_masks[j]; > + if (dataref_offset) > + dataref_offset =3D int_const_binop (PLUS_EXPR, dataref_offset= , bump); > + else if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_incr= , gsi, > + stmt_info, bump); > + } > > - if (slp) > - vec_oprnd =3D vec_oprnds[i]; > - else if (grouped_store) > - /* For grouped stores vectorized defs are interleaved in > - vect_permute_store_chain(). */ > - vec_oprnd =3D result_chain[i]; > + new_stmt =3D NULL; > + if (grouped_store) > + /* Permute. */ > + vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info,= gsi, > + &result_chain); > > - align =3D known_alignment (DR_TARGET_ALIGNMENT (first_dr_in= fo)); > - if (alignment_support_scheme =3D=3D dr_aligned) > - misalign =3D 0; > - else if (misalignment =3D=3D DR_MISALIGNMENT_UNKNOWN) > - { > - align =3D dr_alignment (vect_dr_behavior (vinfo, first_= dr_info)); > - misalign =3D 0; > - } > - else > - misalign =3D misalignment; > - if (dataref_offset =3D=3D NULL_TREE > - && TREE_CODE (dataref_ptr) =3D=3D SSA_NAME) > - set_ptr_info_alignment (get_ptr_info (dataref_ptr), align= , > - misalign); > - align =3D least_bit_hwi (misalign | align); > - > - if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE) > - { > - tree perm_mask =3D perm_mask_for_reverse (vectype); > - tree perm_dest =3D vect_create_destination_var > - (vect_get_store_rhs (stmt_info), vectype); > - tree new_temp =3D make_ssa_name (perm_dest); > - > - /* Generate the permute statement. */ > - gimple *perm_stmt > - =3D gimple_build_assign (new_temp, VEC_PERM_EXPR, vec= _oprnd, > - vec_oprnd, perm_mask); > - vect_finish_stmt_generation (vinfo, stmt_info, perm_stm= t, gsi); > - > - perm_stmt =3D SSA_NAME_DEF_STMT (new_temp); > - vec_oprnd =3D new_temp; > - } > + stmt_vec_info next_stmt_info =3D first_stmt_info; > + for (i =3D 0; i < vec_num; i++) > + { > + unsigned misalign; > + unsigned HOST_WIDE_INT align; > > - /* Compute IFN when LOOP_LENS or final_mask valid. */ > - machine_mode vmode =3D TYPE_MODE (vectype); > - machine_mode new_vmode =3D vmode; > - internal_fn partial_ifn =3D IFN_LAST; > - if (loop_lens) > - { > - opt_machine_mode new_ovmode > - =3D get_len_load_store_mode (vmode, false, &partial_i= fn); > - new_vmode =3D new_ovmode.require (); > - unsigned factor > - =3D (new_ovmode =3D=3D vmode) ? 1 : GET_MODE_UNIT_SIZ= E (vmode); > - final_len =3D vect_get_loop_len (loop_vinfo, gsi, loop_= lens, > - vec_num * ncopies, vecty= pe, > - vec_num * j + i, factor)= ; > - } > - else if (final_mask) > - { > - if (!can_vec_mask_load_store_p (vmode, > - TYPE_MODE (TREE_TYPE (f= inal_mask)), > - false, &partial_ifn)) > - gcc_unreachable (); > - } > + tree final_mask =3D NULL_TREE; > + tree final_len =3D NULL_TREE; > + tree bias =3D NULL_TREE; > + if (loop_masks) > + final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_mask= s, > + vec_num * ncopies, vectype, > + vec_num * j + i); > + if (vec_mask) > + final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype, fi= nal_mask, > + vec_mask, gsi); > > - if (partial_ifn =3D=3D IFN_MASK_LEN_STORE) > + if (memory_access_type =3D=3D VMAT_GATHER_SCATTER > + && gs_info.ifn !=3D IFN_LAST) > + { > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > + vec_offset =3D vec_offsets[vec_num * j + i]; > + tree scale =3D size_int (gs_info.scale); > + > + if (gs_info.ifn =3D=3D IFN_MASK_LEN_SCATTER_STORE) > { > - if (!final_len) > - { > - /* Pass VF value to 'len' argument of > - MASK_LEN_STORE if LOOP_LENS is invalid. */ > - final_len =3D size_int (TYPE_VECTOR_SUBPARTS (vecty= pe)); > - } > + if (loop_lens) > + final_len =3D vect_get_loop_len (loop_vinfo, gsi, loo= p_lens, > + vec_num * ncopies, vec= type, > + vec_num * j + i, 1); > + else > + final_len =3D build_int_cst (sizetype, > + TYPE_VECTOR_SUBPARTS (vect= ype)); > + signed char biasval > + =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); > + bias =3D build_int_cst (intQI_type_node, biasval); > if (!final_mask) > { > - /* Pass all ones value to 'mask' argument of > - MASK_LEN_STORE if final_mask is invalid. */ > mask_vectype =3D truth_type_for (vectype); > final_mask =3D build_minus_one_cst (mask_vectype); > } > } > - if (final_len) > - { > - signed char biasval > - =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); > > - bias =3D build_int_cst (intQI_type_node, biasval); > + gcall *call; > + if (final_len && final_mask) > + call =3D gimple_build_call_internal (IFN_MASK_LEN_SCATTER= _STORE, > + 7, dataref_ptr, vec_of= fset, > + scale, vec_oprnd, fina= l_mask, > + final_len, bias); > + else if (final_mask) > + call > + =3D gimple_build_call_internal (IFN_MASK_SCATTER_STORE,= 5, > + dataref_ptr, vec_offset, = scale, > + vec_oprnd, final_mask); > + else > + call =3D gimple_build_call_internal (IFN_SCATTER_STORE, 4= , > + dataref_ptr, vec_offse= t, > + scale, vec_oprnd); > + gimple_call_set_nothrow (call, true); > + vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > + new_stmt =3D call; > + break; > + } > + else if (memory_access_type =3D=3D VMAT_GATHER_SCATTER) > + { > + /* Emulated scatter. */ > + gcc_assert (!final_mask); > + unsigned HOST_WIDE_INT const_nunits =3D nunits.to_constant = (); > + unsigned HOST_WIDE_INT const_offset_nunits > + =3D TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_cons= tant (); > + vec *ctor_elts; > + vec_alloc (ctor_elts, const_nunits); > + gimple_seq stmts =3D NULL; > + tree elt_type =3D TREE_TYPE (vectype); > + unsigned HOST_WIDE_INT elt_size > + =3D tree_to_uhwi (TYPE_SIZE (elt_type)); > + /* We support offset vectors with more elements > + than the data vector for now. */ > + unsigned HOST_WIDE_INT factor > + =3D const_offset_nunits / const_nunits; > + vec_offset =3D vec_offsets[j / factor]; > + unsigned elt_offset =3D (j % factor) * const_nunits; > + tree idx_type =3D TREE_TYPE (TREE_TYPE (vec_offset)); > + tree scale =3D size_int (gs_info.scale); > + align =3D get_object_alignment (DR_REF (first_dr_info->dr))= ; > + tree ltype =3D build_aligned_type (TREE_TYPE (vectype), ali= gn); > + for (unsigned k =3D 0; k < const_nunits; ++k) > + { > + /* Compute the offsetted pointer. */ > + tree boff =3D size_binop (MULT_EXPR, TYPE_SIZE (idx_typ= e), > + bitsize_int (k + elt_offset)); > + tree idx > + =3D gimple_build (&stmts, BIT_FIELD_REF, idx_type, ve= c_offset, > + TYPE_SIZE (idx_type), boff); > + idx =3D gimple_convert (&stmts, sizetype, idx); > + idx =3D gimple_build (&stmts, MULT_EXPR, sizetype, idx,= scale); > + tree ptr > + =3D gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (datar= ef_ptr), > + dataref_ptr, idx); > + ptr =3D gimple_convert (&stmts, ptr_type_node, ptr); > + /* Extract the element to be stored. */ > + tree elt > + =3D gimple_build (&stmts, BIT_FIELD_REF, TREE_TYPE (v= ectype), > + vec_oprnd, TYPE_SIZE (elt_type), > + bitsize_int (k * elt_size)); > + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > + stmts =3D NULL; > + tree ref > + =3D build2 (MEM_REF, ltype, ptr, build_int_cst (ref_t= ype, 0)); > + new_stmt =3D gimple_build_assign (ref, elt); > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt= , gsi); > } > + break; > + } > > - /* Arguments are ready. Create the new vector stmt. */ > - if (final_len) > - { > - gcall *call; > - tree ptr =3D build_int_cst (ref_type, align * BITS_PER_= UNIT); > - /* Need conversion if it's wrapped with VnQI. */ > - if (vmode !=3D new_vmode) > - { > - tree new_vtype > - =3D build_vector_type_for_mode (unsigned_intQI_ty= pe_node, > - new_vmode); > - tree var > - =3D vect_get_new_ssa_name (new_vtype, vect_simple= _var); > - vec_oprnd > - =3D build1 (VIEW_CONVERT_EXPR, new_vtype, vec_opr= nd); > - gassign *new_stmt > - =3D gimple_build_assign (var, VIEW_CONVERT_EXPR, > - vec_oprnd); > - vect_finish_stmt_generation (vinfo, stmt_info, new_= stmt, > - gsi); > - vec_oprnd =3D var; > - } > + if (i > 0) > + /* Bump the vector pointer. */ > + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_incr= , gsi, > + stmt_info, bump); > > - if (partial_ifn =3D=3D IFN_MASK_LEN_STORE) > - call =3D gimple_build_call_internal (IFN_MASK_LEN_STO= RE, 6, > - dataref_ptr, ptr, > - final_mask, final_= len, > - bias, vec_oprnd); > - else > - call > - =3D gimple_build_call_internal (IFN_LEN_STORE, 5, > - dataref_ptr, ptr, > - final_len, bias, > - vec_oprnd); > - gimple_call_set_nothrow (call, true); > - vect_finish_stmt_generation (vinfo, stmt_info, call, gs= i); > - new_stmt =3D call; > + if (slp) > + vec_oprnd =3D vec_oprnds[i]; > + else if (grouped_store) > + /* For grouped stores vectorized defs are interleaved in > + vect_permute_store_chain(). */ > + vec_oprnd =3D result_chain[i]; > + > + align =3D known_alignment (DR_TARGET_ALIGNMENT (first_dr_info))= ; > + if (alignment_support_scheme =3D=3D dr_aligned) > + misalign =3D 0; > + else if (misalignment =3D=3D DR_MISALIGNMENT_UNKNOWN) > + { > + align =3D dr_alignment (vect_dr_behavior (vinfo, first_dr_i= nfo)); > + misalign =3D 0; > + } > + else > + misalign =3D misalignment; > + if (dataref_offset =3D=3D NULL_TREE > + && TREE_CODE (dataref_ptr) =3D=3D SSA_NAME) > + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, > + misalign); > + align =3D least_bit_hwi (misalign | align); > + > + if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE) > + { > + tree perm_mask =3D perm_mask_for_reverse (vectype); > + tree perm_dest > + =3D vect_create_destination_var (vect_get_store_rhs (stmt= _info), > + vectype); > + tree new_temp =3D make_ssa_name (perm_dest); > + > + /* Generate the permute statement. */ > + gimple *perm_stmt > + =3D gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_opr= nd, > + vec_oprnd, perm_mask); > + vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt, g= si); > + > + perm_stmt =3D SSA_NAME_DEF_STMT (new_temp); > + vec_oprnd =3D new_temp; > + } > + > + /* Compute IFN when LOOP_LENS or final_mask valid. */ > + machine_mode vmode =3D TYPE_MODE (vectype); > + machine_mode new_vmode =3D vmode; > + internal_fn partial_ifn =3D IFN_LAST; > + if (loop_lens) > + { > + opt_machine_mode new_ovmode > + =3D get_len_load_store_mode (vmode, false, &partial_ifn); > + new_vmode =3D new_ovmode.require (); > + unsigned factor > + =3D (new_ovmode =3D=3D vmode) ? 1 : GET_MODE_UNIT_SIZE (v= mode); > + final_len =3D vect_get_loop_len (loop_vinfo, gsi, loop_lens= , > + vec_num * ncopies, vectype, > + vec_num * j + i, factor); > + } > + else if (final_mask) > + { > + if (!can_vec_mask_load_store_p ( > + vmode, TYPE_MODE (TREE_TYPE (final_mask)), false, > + &partial_ifn)) > + gcc_unreachable (); > + } > + > + if (partial_ifn =3D=3D IFN_MASK_LEN_STORE) > + { > + if (!final_len) > + { > + /* Pass VF value to 'len' argument of > + MASK_LEN_STORE if LOOP_LENS is invalid. */ > + final_len =3D size_int (TYPE_VECTOR_SUBPARTS (vectype))= ; > } > - else if (final_mask) > + if (!final_mask) > { > - tree ptr =3D build_int_cst (ref_type, align * BITS_PER_= UNIT); > - gcall *call > - =3D gimple_build_call_internal (IFN_MASK_STORE, 4, > - dataref_ptr, ptr, > - final_mask, vec_oprnd); > - gimple_call_set_nothrow (call, true); > - vect_finish_stmt_generation (vinfo, stmt_info, call, gs= i); > - new_stmt =3D call; > + /* Pass all ones value to 'mask' argument of > + MASK_LEN_STORE if final_mask is invalid. */ > + mask_vectype =3D truth_type_for (vectype); > + final_mask =3D build_minus_one_cst (mask_vectype); > } > - else > + } > + if (final_len) > + { > + signed char biasval > + =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); > + > + bias =3D build_int_cst (intQI_type_node, biasval); > + } > + > + /* Arguments are ready. Create the new vector stmt. */ > + if (final_len) > + { > + gcall *call; > + tree ptr =3D build_int_cst (ref_type, align * BITS_PER_UNIT= ); > + /* Need conversion if it's wrapped with VnQI. */ > + if (vmode !=3D new_vmode) > { > - data_ref =3D fold_build2 (MEM_REF, vectype, > - dataref_ptr, > - dataref_offset > - ? dataref_offset > - : build_int_cst (ref_type, 0)); > - if (alignment_support_scheme =3D=3D dr_aligned) > - ; > - else > - TREE_TYPE (data_ref) > - =3D build_aligned_type (TREE_TYPE (data_ref), > - align * BITS_PER_UNIT); > - vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr= )); > - new_stmt =3D gimple_build_assign (data_ref, vec_oprnd); > + tree new_vtype > + =3D build_vector_type_for_mode (unsigned_intQI_type_n= ode, > + new_vmode); > + tree var =3D vect_get_new_ssa_name (new_vtype, vect_sim= ple_var); > + vec_oprnd =3D build1 (VIEW_CONVERT_EXPR, new_vtype, vec= _oprnd); > + gassign *new_stmt > + =3D gimple_build_assign (var, VIEW_CONVERT_EXPR, vec_= oprnd); > vect_finish_stmt_generation (vinfo, stmt_info, new_stmt= , gsi); > + vec_oprnd =3D var; > } > > - if (slp) > - continue; > - > - next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt_info); > - if (!next_stmt_info) > - break; > + if (partial_ifn =3D=3D IFN_MASK_LEN_STORE) > + call =3D gimple_build_call_internal (IFN_MASK_LEN_STORE, = 6, > + dataref_ptr, ptr, fina= l_mask, > + final_len, bias, vec_o= prnd); > + else > + call =3D gimple_build_call_internal (IFN_LEN_STORE, 5, > + dataref_ptr, ptr, fina= l_len, > + bias, vec_oprnd); > + gimple_call_set_nothrow (call, true); > + vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > + new_stmt =3D call; > + } > + else if (final_mask) > + { > + tree ptr =3D build_int_cst (ref_type, align * BITS_PER_UNIT= ); > + gcall *call > + =3D gimple_build_call_internal (IFN_MASK_STORE, 4, datare= f_ptr, > + ptr, final_mask, vec_oprnd)= ; > + gimple_call_set_nothrow (call, true); > + vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > + new_stmt =3D call; > + } > + else > + { > + data_ref > + =3D fold_build2 (MEM_REF, vectype, dataref_ptr, > + dataref_offset ? dataref_offset > + : build_int_cst (ref_type, = 0)); > + if (alignment_support_scheme =3D=3D dr_aligned) > + ; > + else > + TREE_TYPE (data_ref) > + =3D build_aligned_type (TREE_TYPE (data_ref), > + align * BITS_PER_UNIT); > + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); > + new_stmt =3D gimple_build_assign (data_ref, vec_oprnd); > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gs= i); > } > + > + if (slp) > + continue; > + > + next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt_info); > + if (!next_stmt_info) > + break; > } > if (!slp) > { > -- > 2.31.1