From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by sourceware.org (Postfix) with ESMTPS id 7D1913858C1F for ; Mon, 14 Aug 2023 12:06:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7D1913858C1F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2b95d5ee18dso62953801fa.1 for ; Mon, 14 Aug 2023 05:06:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692014762; x=1692619562; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9alf4FqyK5SFP9hkZXAn0ieKDEVm/eV3g01an0q8r8A=; b=MmGmiMm7qRCTYG3Ukt53pMiW0aJxFZeEndWUtfg36NjvOFppRzllP+6Gh6D5YqGQWc xiduHwHNaoQ3Jo2y2+to89EhwuHWgECZBRgUpQIIlD/ttqjUaCN4Px9J3ytXEUskuzSQ TcpnMlUT4oMeMU3GrJMjoYFErAvv6H5LQrWUXSPDEhcVjGANEmXzS6KhzNDC3g58Z3F3 C429ProvOyh2nhQO42TiVhvZhI1639Y0ldMKNH2B8HcSDJZ0ztEwq3oWj6p1Igy3GkY2 VmdO6m3Qd0aWZrkZnnZI18sSCdX0+VWw9WI49LZwnPKFkLjGEu2gHV7cBaBCsEKvpNmQ ci1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692014762; x=1692619562; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9alf4FqyK5SFP9hkZXAn0ieKDEVm/eV3g01an0q8r8A=; b=DZINte53ti3Ag7Y/h1DaSD6GbCwyi5VK1M3dP3R+EeFqrRv1I04nya4GFRMj6ONUjU 2AjcmpfUzwVkjsszJto4KP4s4kr/2CLMhS0ih7I92vJxECFnjTtMcYh4LDabx7D9vJIp EfYfv/78u31Vm3hakNa6GXo3XfpMV0YGn2CAOSCX+hcqqLm1SRPqjzjU2vaqHUR3Jv+3 ujGQZXbnTEE/ggl44idQDXL06u3C8/Gjzsfh11A/qrH48ZWbYm+p7CQFClaZQv2xWPsY xgR/p/UveoiGfl9+yregAj64wQuPanUQSh1Y3IzXygF3Pcc6Ut3PFDy0492gvH5QZO32 ewWQ== X-Gm-Message-State: AOJu0YyWGN2S7gHZxHX1rAB1cVFyWUmQyWqEAtqNRvS/mNSHiEqC9SFr 0yvNO+h4iCKdBbJIVo+CXI9SbSvOPOmRKRe5fE7zGZiX X-Google-Smtp-Source: AGHT+IEYy6rPp74Ynl38vJV2vQtXhKqwhNSdoM/PJp3vJ+bEKO7o1VgpFvaZDdTpREKMjTj0wOpxxqJWhujl6nXPlqA= X-Received: by 2002:a05:651c:85:b0:2b9:ecab:d924 with SMTP id 5-20020a05651c008500b002b9ecabd924mr6669070ljq.18.1692014761368; Mon, 14 Aug 2023 05:06:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 14 Aug 2023 14:04:44 +0200 Message-ID: Subject: Re: [PATCH] vect: Move VMAT_LOAD_STORE_LANES handlings from final loop nest To: "Kewen.Lin" Cc: GCC Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Aug 14, 2023 at 10:54=E2=80=AFAM Kewen.Lin wr= ote: > > Hi, > > Following Richi's suggestion [1], this patch is to move the > handlings on VMAT_LOAD_STORE_LANES in the final loop nest > of function vectorizable_load to its own loop. Basically > it duplicates the final loop nest, clean up some useless > set up code for the case of VMAT_LOAD_STORE_LANES, remove > some unreachable code. Also remove the corresponding > handlings in the final loop nest. > > Bootstrapped and regtested on x86_64-redhat-linux, > aarch64-linux-gnu and powerpc64{,le}-linux-gnu. OK (I guess the big diff is mostly because of re-indenting). Thanks, Richard. > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_load): Move the handlings on > VMAT_LOAD_STORE_LANES in the final loop nest to its own loop, > and update the final nest accordingly. > --- > gcc/tree-vect-stmts.cc | 1275 ++++++++++++++++++++-------------------- > 1 file changed, 634 insertions(+), 641 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index 4f2d088484c..c361e16cb7b 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -10332,7 +10332,129 @@ vectorizable_load (vec_info *vinfo, > vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, mask, > &vec_masks, mask_vectype); > } > + > tree vec_mask =3D NULL_TREE; > + if (memory_access_type =3D=3D VMAT_LOAD_STORE_LANES) > + { > + gcc_assert (alignment_support_scheme =3D=3D dr_aligned > + || alignment_support_scheme =3D=3D dr_unaligned_support= ed); > + gcc_assert (grouped_load && !slp); > + > + unsigned int inside_cost =3D 0, prologue_cost =3D 0; > + for (j =3D 0; j < ncopies; j++) > + { > + if (costing_p) > + { > + /* An IFN_LOAD_LANES will load all its vector results, > + regardless of which ones we actually need. Account > + for the cost of unused results. */ > + if (first_stmt_info =3D=3D stmt_info) > + { > + unsigned int gaps =3D DR_GROUP_SIZE (first_stmt_info); > + stmt_vec_info next_stmt_info =3D first_stmt_info; > + do > + { > + gaps -=3D 1; > + next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt= _info); > + } > + while (next_stmt_info); > + if (gaps) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "vect_model_load_cost: %d " > + "unused vectors.\n", > + gaps); > + vect_get_load_cost (vinfo, stmt_info, gaps, > + alignment_support_scheme, > + misalignment, false, &inside_co= st, > + &prologue_cost, cost_vec, cost_= vec, > + true); > + } > + } > + vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_= scheme, > + misalignment, false, &inside_cost, > + &prologue_cost, cost_vec, cost_vec, tru= e); > + continue; > + } > + > + /* 1. Create the vector or array pointer update chain. */ > + if (j =3D=3D 0) > + dataref_ptr > + =3D vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_= type, > + at_loop, offset, &dummy, gsi, > + &ptr_incr, false, bump); > + else > + { > + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_in= cr, gsi, > + stmt_info, bump); > + } > + if (mask) > + vec_mask =3D vec_masks[j]; > + > + tree vec_array =3D create_vector_array (vectype, vec_num); > + > + tree final_mask =3D NULL_TREE; > + if (loop_masks) > + final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_mask= s, > + ncopies, vectype, j); > + if (vec_mask) > + final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype, fi= nal_mask, > + vec_mask, gsi); > + > + gcall *call; > + if (final_mask) > + { > + /* Emit: > + VEC_ARRAY =3D MASK_LOAD_LANES (DATAREF_PTR, ALIAS_PTR, > + VEC_MASK). */ > + unsigned int align =3D TYPE_ALIGN (TREE_TYPE (vectype)); > + tree alias_ptr =3D build_int_cst (ref_type, align); > + call =3D gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3= , > + dataref_ptr, alias_ptr, > + final_mask); > + } > + else > + { > + /* Emit: > + VEC_ARRAY =3D LOAD_LANES (MEM_REF[...all elements...])= . */ > + data_ref =3D create_array_ref (aggr_type, dataref_ptr, ref_= type); > + call =3D gimple_build_call_internal (IFN_LOAD_LANES, 1, dat= a_ref); > + } > + gimple_call_set_lhs (call, vec_array); > + gimple_call_set_nothrow (call, true); > + vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > + > + dr_chain.create (vec_num); > + /* Extract each vector into an SSA_NAME. */ > + for (i =3D 0; i < vec_num; i++) > + { > + new_temp =3D read_vector_array (vinfo, stmt_info, gsi, scal= ar_dest, > + vec_array, i); > + dr_chain.quick_push (new_temp); > + } > + > + /* Record the mapping between SSA_NAMEs and statements. */ > + vect_record_grouped_load_vectors (vinfo, stmt_info, dr_chain); > + > + /* Record that VEC_ARRAY is now dead. */ > + vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); > + > + dr_chain.release (); > + > + *vec_stmt =3D STMT_VINFO_VEC_STMTS (stmt_info)[0]; > + } > + > + if (costing_p && dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "vect_model_load_cost: inside_cost =3D %u, " > + "prologue_cost =3D %u .\n", > + inside_cost, prologue_cost); > + > + return true; > + } > + > poly_uint64 group_elt =3D 0; > unsigned int inside_cost =3D 0, prologue_cost =3D 0; > for (j =3D 0; j < ncopies; j++) > @@ -10414,685 +10538,558 @@ vectorizable_load (vec_info *vinfo, > dr_chain.create (vec_num); > > gimple *new_stmt =3D NULL; > - if (memory_access_type =3D=3D VMAT_LOAD_STORE_LANES) > + for (i =3D 0; i < vec_num; i++) > { > - if (costing_p) > - { > - /* An IFN_LOAD_LANES will load all its vector results, > - regardless of which ones we actually need. Account > - for the cost of unused results. */ > - if (grouped_load && first_stmt_info =3D=3D stmt_info) > - { > - unsigned int gaps =3D DR_GROUP_SIZE (first_stmt_info); > - stmt_vec_info next_stmt_info =3D first_stmt_info; > - do > - { > - gaps -=3D 1; > - next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt= _info); > - } > - while (next_stmt_info); > - if (gaps) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_NOTE, vect_location, > - "vect_model_load_cost: %d " > - "unused vectors.\n", > - gaps); > - vect_get_load_cost (vinfo, stmt_info, gaps, > - alignment_support_scheme, > - misalignment, false, &inside_co= st, > - &prologue_cost, cost_vec, cost_= vec, > - true); > - } > - } > - vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_= scheme, > - misalignment, false, &inside_cost, > - &prologue_cost, cost_vec, cost_vec, tru= e); > - continue; > - } > - tree vec_array; > - > - vec_array =3D create_vector_array (vectype, vec_num); > - > tree final_mask =3D NULL_TREE; > - if (loop_masks) > - final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_mask= s, > - ncopies, vectype, j); > - if (vec_mask) > - final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype, > - final_mask, vec_mask, gsi); > - > - gcall *call; > - if (final_mask) > - { > - /* Emit: > - VEC_ARRAY =3D MASK_LOAD_LANES (DATAREF_PTR, ALIAS_PTR, > - VEC_MASK). */ > - unsigned int align =3D TYPE_ALIGN (TREE_TYPE (vectype)); > - tree alias_ptr =3D build_int_cst (ref_type, align); > - call =3D gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3= , > - dataref_ptr, alias_ptr, > - final_mask); > - } > - else > + tree final_len =3D NULL_TREE; > + tree bias =3D NULL_TREE; > + if (!costing_p) > { > - /* Emit: > - VEC_ARRAY =3D LOAD_LANES (MEM_REF[...all elements...])= . */ > - data_ref =3D create_array_ref (aggr_type, dataref_ptr, ref_= type); > - call =3D gimple_build_call_internal (IFN_LOAD_LANES, 1, dat= a_ref); > - } > - gimple_call_set_lhs (call, vec_array); > - gimple_call_set_nothrow (call, true); > - vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > - new_stmt =3D call; > + if (loop_masks) > + final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_= masks, > + vec_num * ncopies, vecty= pe, > + vec_num * j + i); > + if (vec_mask) > + final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype= , > + final_mask, vec_mask, gsi)= ; > > - /* Extract each vector into an SSA_NAME. */ > - for (i =3D 0; i < vec_num; i++) > - { > - new_temp =3D read_vector_array (vinfo, stmt_info, gsi, scal= ar_dest, > - vec_array, i); > - dr_chain.quick_push (new_temp); > + if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_= incr, > + gsi, stmt_info, bump); > } > > - /* Record the mapping between SSA_NAMEs and statements. */ > - vect_record_grouped_load_vectors (vinfo, stmt_info, dr_chain); > - > - /* Record that VEC_ARRAY is now dead. */ > - vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); > - } > - else > - { > - for (i =3D 0; i < vec_num; i++) > + /* 2. Create the vector-load in the loop. */ > + switch (alignment_support_scheme) > { > - tree final_mask =3D NULL_TREE; > - tree final_len =3D NULL_TREE; > - tree bias =3D NULL_TREE; > - if (!costing_p) > - { > - if (loop_masks) > - final_mask > - =3D vect_get_loop_mask (loop_vinfo, gsi, loop_masks= , > - vec_num * ncopies, vectype, > - vec_num * j + i); > - if (vec_mask) > - final_mask =3D prepare_vec_mask (loop_vinfo, mask_vec= type, > - final_mask, vec_mask, = gsi); > - > - if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > - dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, = ptr_incr, > - gsi, stmt_info, bump); > - } > + case dr_aligned: > + case dr_unaligned_supported: > + { > + unsigned int misalign; > + unsigned HOST_WIDE_INT align; > > - /* 2. Create the vector-load in the loop. */ > - switch (alignment_support_scheme) > - { > - case dr_aligned: > - case dr_unaligned_supported: > + if (memory_access_type =3D=3D VMAT_GATHER_SCATTER > + && gs_info.ifn !=3D IFN_LAST) > { > - unsigned int misalign; > - unsigned HOST_WIDE_INT align; > - > - if (memory_access_type =3D=3D VMAT_GATHER_SCATTER > - && gs_info.ifn !=3D IFN_LAST) > + if (costing_p) > { > - if (costing_p) > - { > - unsigned int cnunits > - =3D vect_nunits_for_cost (vectype); > - inside_cost > - =3D record_stmt_cost (cost_vec, cnunits, > - scalar_load, stmt_info,= 0, > - vect_body); > - break; > - } > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > - vec_offset =3D vec_offsets[vec_num * j + i]; > - tree zero =3D build_zero_cst (vectype); > - tree scale =3D size_int (gs_info.scale); > - > - if (gs_info.ifn =3D=3D IFN_MASK_LEN_GATHER_LOAD) > - { > - if (loop_lens) > - final_len > - =3D vect_get_loop_len (loop_vinfo, gsi, l= oop_lens, > - vec_num * ncopies, v= ectype, > - vec_num * j + i, 1); > - else > - final_len =3D build_int_cst (sizetype, > - TYPE_VECTOR_SUBP= ARTS ( > - vectype)); > - signed char biasval > - =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loo= p_vinfo); > - bias =3D build_int_cst (intQI_type_node, bias= val); > - if (!final_mask) > - { > - mask_vectype =3D truth_type_for (vectype)= ; > - final_mask =3D build_minus_one_cst (mask_= vectype); > - } > - } > - > - gcall *call; > - if (final_len && final_mask) > - call =3D gimple_build_call_internal ( > - IFN_MASK_LEN_GATHER_LOAD, 7, dataref_ptr, > - vec_offset, scale, zero, final_mask, final_le= n, > - bias); > - else if (final_mask) > - call =3D gimple_build_call_internal > - (IFN_MASK_GATHER_LOAD, 5, dataref_ptr, > - vec_offset, scale, zero, final_mask); > - else > - call =3D gimple_build_call_internal > - (IFN_GATHER_LOAD, 4, dataref_ptr, > - vec_offset, scale, zero); > - gimple_call_set_nothrow (call, true); > - new_stmt =3D call; > - data_ref =3D NULL_TREE; > + unsigned int cnunits =3D vect_nunits_for_cost (ve= ctype); > + inside_cost > + =3D record_stmt_cost (cost_vec, cnunits, scalar= _load, > + stmt_info, 0, vect_body); > break; > } > - else if (memory_access_type =3D=3D VMAT_GATHER_SCATTE= R) > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > + vec_offset =3D vec_offsets[vec_num * j + i]; > + tree zero =3D build_zero_cst (vectype); > + tree scale =3D size_int (gs_info.scale); > + > + if (gs_info.ifn =3D=3D IFN_MASK_LEN_GATHER_LOAD) > { > - /* Emulated gather-scatter. */ > - gcc_assert (!final_mask); > - unsigned HOST_WIDE_INT const_nunits > - =3D nunits.to_constant (); > - if (costing_p) > - { > - /* For emulated gathers N offset vector eleme= nt > - offset add is consumed by the load). */ > - inside_cost > - =3D record_stmt_cost (cost_vec, const_nunit= s, > - vec_to_scalar, stmt_inf= o, 0, > - vect_body); > - /* N scalar loads plus gathering them into a > - vector. */ > - inside_cost > - =3D record_stmt_cost (cost_vec, const_nunit= s, > - scalar_load, stmt_info,= 0, > - vect_body); > - inside_cost > - =3D record_stmt_cost (cost_vec, 1, vec_cons= truct, > - stmt_info, 0, vect_body= ); > - break; > - } > - unsigned HOST_WIDE_INT const_offset_nunits > - =3D TYPE_VECTOR_SUBPARTS (gs_info.offset_vectyp= e) > - .to_constant (); > - vec *ctor_elts; > - vec_alloc (ctor_elts, const_nunits); > - gimple_seq stmts =3D NULL; > - /* We support offset vectors with more elements > - than the data vector for now. */ > - unsigned HOST_WIDE_INT factor > - =3D const_offset_nunits / const_nunits; > - vec_offset =3D vec_offsets[j / factor]; > - unsigned elt_offset =3D (j % factor) * const_nuni= ts; > - tree idx_type =3D TREE_TYPE (TREE_TYPE (vec_offse= t)); > - tree scale =3D size_int (gs_info.scale); > - align > - =3D get_object_alignment (DR_REF (first_dr_info= ->dr)); > - tree ltype =3D build_aligned_type (TREE_TYPE (vec= type), > - align); > - for (unsigned k =3D 0; k < const_nunits; ++k) > + if (loop_lens) > + final_len > + =3D vect_get_loop_len (loop_vinfo, gsi, loop_= lens, > + vec_num * ncopies, vecty= pe, > + vec_num * j + i, 1); > + else > + final_len > + =3D build_int_cst (sizetype, > + TYPE_VECTOR_SUBPARTS (vectyp= e)); > + signed char biasval > + =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vi= nfo); > + bias =3D build_int_cst (intQI_type_node, biasval)= ; > + if (!final_mask) > { > - tree boff =3D size_binop (MULT_EXPR, > - TYPE_SIZE (idx_type), > - bitsize_int > - (k + elt_offset)); > - tree idx =3D gimple_build (&stmts, BIT_FIELD_= REF, > - idx_type, vec_offset= , > - TYPE_SIZE (idx_type)= , > - boff); > - idx =3D gimple_convert (&stmts, sizetype, idx= ); > - idx =3D gimple_build (&stmts, MULT_EXPR, > - sizetype, idx, scale); > - tree ptr =3D gimple_build (&stmts, PLUS_EXPR, > - TREE_TYPE (dataref_p= tr), > - dataref_ptr, idx); > - ptr =3D gimple_convert (&stmts, ptr_type_node= , ptr); > - tree elt =3D make_ssa_name (TREE_TYPE (vectyp= e)); > - tree ref =3D build2 (MEM_REF, ltype, ptr, > - build_int_cst (ref_type, 0= )); > - new_stmt =3D gimple_build_assign (elt, ref); > - gimple_set_vuse (new_stmt, > - gimple_vuse (gsi_stmt (*gsi)= )); > - gimple_seq_add_stmt (&stmts, new_stmt); > - CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE,= elt); > + mask_vectype =3D truth_type_for (vectype); > + final_mask =3D build_minus_one_cst (mask_vect= ype); > } > - gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT)= ; > - new_stmt =3D gimple_build_assign (NULL_TREE, > - build_constructor > - (vectype, ctor_= elts)); > - data_ref =3D NULL_TREE; > - break; > } > > - if (costing_p) > - break; > - > - align =3D > - known_alignment (DR_TARGET_ALIGNMENT (first_dr_info= )); > - if (alignment_support_scheme =3D=3D dr_aligned) > - misalign =3D 0; > - else if (misalignment =3D=3D DR_MISALIGNMENT_UNKNOWN) > - { > - align =3D dr_alignment > - (vect_dr_behavior (vinfo, first_dr_info)); > - misalign =3D 0; > - } > + gcall *call; > + if (final_len && final_mask) > + call =3D gimple_build_call_internal ( > + IFN_MASK_LEN_GATHER_LOAD, 7, dataref_ptr, vec_off= set, > + scale, zero, final_mask, final_len, bias); > + else if (final_mask) > + call > + =3D gimple_build_call_internal (IFN_MASK_GATHER_L= OAD, 5, > + dataref_ptr, vec_of= fset, > + scale, zero, final_= mask); > else > - misalign =3D misalignment; > - if (dataref_offset =3D=3D NULL_TREE > - && TREE_CODE (dataref_ptr) =3D=3D SSA_NAME) > - set_ptr_info_alignment (get_ptr_info (dataref_ptr), > - align, misalign); > - align =3D least_bit_hwi (misalign | align); > - > - /* Compute IFN when LOOP_LENS or final_mask valid. *= / > - machine_mode vmode =3D TYPE_MODE (vectype); > - machine_mode new_vmode =3D vmode; > - internal_fn partial_ifn =3D IFN_LAST; > - if (loop_lens) > + call > + =3D gimple_build_call_internal (IFN_GATHER_LOAD, = 4, > + dataref_ptr, vec_of= fset, > + scale, zero); > + gimple_call_set_nothrow (call, true); > + new_stmt =3D call; > + data_ref =3D NULL_TREE; > + break; > + } > + else if (memory_access_type =3D=3D VMAT_GATHER_SCATTER) > + { > + /* Emulated gather-scatter. */ > + gcc_assert (!final_mask); > + unsigned HOST_WIDE_INT const_nunits =3D nunits.to_con= stant (); > + if (costing_p) > { > - opt_machine_mode new_ovmode > - =3D get_len_load_store_mode (vmode, true, > - &partial_ifn); > - new_vmode =3D new_ovmode.require (); > - unsigned factor =3D (new_ovmode =3D=3D vmode) > - ? 1 > - : GET_MODE_UNIT_SIZE (vmode); > - final_len > - =3D vect_get_loop_len (loop_vinfo, gsi, loop_le= ns, > - vec_num * ncopies, vectype= , > - vec_num * j + i, factor); > + /* For emulated gathers N offset vector element > + offset add is consumed by the load). */ > + inside_cost > + =3D record_stmt_cost (cost_vec, const_nunits, > + vec_to_scalar, stmt_info, 0= , > + vect_body); > + /* N scalar loads plus gathering them into a > + vector. */ > + inside_cost =3D record_stmt_cost (cost_vec, const= _nunits, > + scalar_load, stmt= _info, > + 0, vect_body); > + inside_cost > + =3D record_stmt_cost (cost_vec, 1, vec_construc= t, > + stmt_info, 0, vect_body); > + break; > } > - else if (final_mask) > + unsigned HOST_WIDE_INT const_offset_nunits > + =3D TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) > + .to_constant (); > + vec *ctor_elts; > + vec_alloc (ctor_elts, const_nunits); > + gimple_seq stmts =3D NULL; > + /* We support offset vectors with more elements > + than the data vector for now. */ > + unsigned HOST_WIDE_INT factor > + =3D const_offset_nunits / const_nunits; > + vec_offset =3D vec_offsets[j / factor]; > + unsigned elt_offset =3D (j % factor) * const_nunits; > + tree idx_type =3D TREE_TYPE (TREE_TYPE (vec_offset)); > + tree scale =3D size_int (gs_info.scale); > + align =3D get_object_alignment (DR_REF (first_dr_info= ->dr)); > + tree ltype > + =3D build_aligned_type (TREE_TYPE (vectype), align)= ; > + for (unsigned k =3D 0; k < const_nunits; ++k) > { > - if (!can_vec_mask_load_store_p ( > - vmode, TYPE_MODE (TREE_TYPE (final_mask)), = true, > - &partial_ifn)) > - gcc_unreachable (); > + tree boff =3D size_binop (MULT_EXPR, TYPE_SIZE (i= dx_type), > + bitsize_int (k + elt_offs= et)); > + tree idx =3D gimple_build (&stmts, BIT_FIELD_REF, > + idx_type, vec_offset, > + TYPE_SIZE (idx_type), bo= ff); > + idx =3D gimple_convert (&stmts, sizetype, idx); > + idx =3D gimple_build (&stmts, MULT_EXPR, sizetype= , idx, > + scale); > + tree ptr =3D gimple_build (&stmts, PLUS_EXPR, > + TREE_TYPE (dataref_ptr), > + dataref_ptr, idx); > + ptr =3D gimple_convert (&stmts, ptr_type_node, pt= r); > + tree elt =3D make_ssa_name (TREE_TYPE (vectype)); > + tree ref =3D build2 (MEM_REF, ltype, ptr, > + build_int_cst (ref_type, 0)); > + new_stmt =3D gimple_build_assign (elt, ref); > + gimple_set_vuse (new_stmt, > + gimple_vuse (gsi_stmt (*gsi))); > + gimple_seq_add_stmt (&stmts, new_stmt); > + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, elt= ); > } > + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > + new_stmt =3D gimple_build_assign ( > + NULL_TREE, build_constructor (vectype, ctor_elts)); > + data_ref =3D NULL_TREE; > + break; > + } > > - if (partial_ifn =3D=3D IFN_MASK_LEN_LOAD) > + if (costing_p) > + break; > + > + align =3D known_alignment (DR_TARGET_ALIGNMENT (first_dr_= info)); > + if (alignment_support_scheme =3D=3D dr_aligned) > + misalign =3D 0; > + else if (misalignment =3D=3D DR_MISALIGNMENT_UNKNOWN) > + { > + align > + =3D dr_alignment (vect_dr_behavior (vinfo, first_dr= _info)); > + misalign =3D 0; > + } > + else > + misalign =3D misalignment; > + if (dataref_offset =3D=3D NULL_TREE > + && TREE_CODE (dataref_ptr) =3D=3D SSA_NAME) > + set_ptr_info_alignment (get_ptr_info (dataref_ptr), ali= gn, > + misalign); > + align =3D least_bit_hwi (misalign | align); > + > + /* Compute IFN when LOOP_LENS or final_mask valid. */ > + machine_mode vmode =3D TYPE_MODE (vectype); > + machine_mode new_vmode =3D vmode; > + internal_fn partial_ifn =3D IFN_LAST; > + if (loop_lens) > + { > + opt_machine_mode new_ovmode > + =3D get_len_load_store_mode (vmode, true, &partial_= ifn); > + new_vmode =3D new_ovmode.require (); > + unsigned factor > + =3D (new_ovmode =3D=3D vmode) ? 1 : GET_MODE_UNIT_S= IZE (vmode); > + final_len =3D vect_get_loop_len (loop_vinfo, gsi, loo= p_lens, > + vec_num * ncopies, vec= type, > + vec_num * j + i, facto= r); > + } > + else if (final_mask) > + { > + if (!can_vec_mask_load_store_p ( > + vmode, TYPE_MODE (TREE_TYPE (final_mask)), true= , > + &partial_ifn)) > + gcc_unreachable (); > + } > + > + if (partial_ifn =3D=3D IFN_MASK_LEN_LOAD) > + { > + if (!final_len) > { > - if (!final_len) > - { > - /* Pass VF value to 'len' argument of > - MASK_LEN_LOAD if LOOP_LENS is invalid. */ > - final_len > - =3D size_int (TYPE_VECTOR_SUBPARTS (vectype= )); > - } > - if (!final_mask) > - { > - /* Pass all ones value to 'mask' argument of > - MASK_LEN_LOAD if final_mask is invalid. *= / > - mask_vectype =3D truth_type_for (vectype); > - final_mask =3D build_minus_one_cst (mask_vect= ype); > - } > + /* Pass VF value to 'len' argument of > + MASK_LEN_LOAD if LOOP_LENS is invalid. */ > + final_len =3D size_int (TYPE_VECTOR_SUBPARTS (vec= type)); > } > - if (final_len) > + if (!final_mask) > { > - signed char biasval > - =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vi= nfo); > - > - bias =3D build_int_cst (intQI_type_node, biasval)= ; > + /* Pass all ones value to 'mask' argument of > + MASK_LEN_LOAD if final_mask is invalid. */ > + mask_vectype =3D truth_type_for (vectype); > + final_mask =3D build_minus_one_cst (mask_vectype)= ; > } > + } > + if (final_len) > + { > + signed char biasval > + =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo)= ; > > - if (final_len) > + bias =3D build_int_cst (intQI_type_node, biasval); > + } > + > + if (final_len) > + { > + tree ptr =3D build_int_cst (ref_type, align * BITS_PE= R_UNIT); > + gcall *call; > + if (partial_ifn =3D=3D IFN_MASK_LEN_LOAD) > + call =3D gimple_build_call_internal (IFN_MASK_LEN_L= OAD, 5, > + dataref_ptr, ptr= , > + final_mask, fina= l_len, > + bias); > + else > + call =3D gimple_build_call_internal (IFN_LEN_LOAD, = 4, > + dataref_ptr, ptr= , > + final_len, bias)= ; > + gimple_call_set_nothrow (call, true); > + new_stmt =3D call; > + data_ref =3D NULL_TREE; > + > + /* Need conversion if it's wrapped with VnQI. */ > + if (vmode !=3D new_vmode) > { > - tree ptr > - =3D build_int_cst (ref_type, align * BITS_PER_U= NIT); > - gcall *call; > - if (partial_ifn =3D=3D IFN_MASK_LEN_LOAD) > - call =3D gimple_build_call_internal (IFN_MASK_L= EN_LOAD, > - 5, dataref_p= tr, > - ptr, final_m= ask, > - final_len, b= ias); > - else > - call =3D gimple_build_call_internal (IFN_LEN_LO= AD, 4, > - dataref_ptr,= ptr, > - final_len, b= ias); > - gimple_call_set_nothrow (call, true); > - new_stmt =3D call; > - data_ref =3D NULL_TREE; > - > - /* Need conversion if it's wrapped with VnQI. */ > - if (vmode !=3D new_vmode) > - { > - tree new_vtype =3D build_vector_type_for_mode= ( > - unsigned_intQI_type_node, new_vmode); > - tree var =3D vect_get_new_ssa_name (new_vtype= , > - vect_simple= _var); > - gimple_set_lhs (call, var); > - vect_finish_stmt_generation (vinfo, stmt_info= , call, > - gsi); > - tree op =3D build1 (VIEW_CONVERT_EXPR, vectyp= e, var); > - new_stmt > - =3D gimple_build_assign (vec_dest, > - VIEW_CONVERT_EXPR, o= p); > - } > + tree new_vtype =3D build_vector_type_for_mode ( > + unsigned_intQI_type_node, new_vmode); > + tree var > + =3D vect_get_new_ssa_name (new_vtype, vect_simp= le_var); > + gimple_set_lhs (call, var); > + vect_finish_stmt_generation (vinfo, stmt_info, ca= ll, > + gsi); > + tree op =3D build1 (VIEW_CONVERT_EXPR, vectype, v= ar); > + new_stmt =3D gimple_build_assign (vec_dest, > + VIEW_CONVERT_EXPR= , op); > } > - else if (final_mask) > + } > + else if (final_mask) > + { > + tree ptr =3D build_int_cst (ref_type, align * BITS_PE= R_UNIT); > + gcall *call =3D gimple_build_call_internal (IFN_MASK_= LOAD, 3, > + dataref_ptr= , ptr, > + final_mask)= ; > + gimple_call_set_nothrow (call, true); > + new_stmt =3D call; > + data_ref =3D NULL_TREE; > + } > + else > + { > + tree ltype =3D vectype; > + tree new_vtype =3D NULL_TREE; > + unsigned HOST_WIDE_INT gap =3D DR_GROUP_GAP (first_st= mt_info); > + unsigned int vect_align > + =3D vect_known_alignment_in_bytes (first_dr_info, v= ectype); > + unsigned int scalar_dr_size > + =3D vect_get_scalar_dr_size (first_dr_info); > + /* If there's no peeling for gaps but we have a gap > + with slp loads then load the lower half of the > + vector only. See get_group_load_store_type for > + when we apply this optimization. */ > + if (slp > + && loop_vinfo > + && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && g= ap !=3D 0 > + && known_eq (nunits, (group_size - gap) * 2) > + && known_eq (nunits, group_size) > + && gap >=3D (vect_align / scalar_dr_size)) > { > - tree ptr =3D build_int_cst (ref_type, > - align * BITS_PER_UNIT); > - gcall *call > - =3D gimple_build_call_internal (IFN_MASK_LOAD, = 3, > - dataref_ptr, ptr, > - final_mask); > - gimple_call_set_nothrow (call, true); > - new_stmt =3D call; > - data_ref =3D NULL_TREE; > + tree half_vtype; > + new_vtype > + =3D vector_vector_composition_type (vectype, 2, > + &half_vtype); > + if (new_vtype !=3D NULL_TREE) > + ltype =3D half_vtype; > } > + tree offset > + =3D (dataref_offset ? dataref_offset > + : build_int_cst (ref_type, 0)); > + if (ltype !=3D vectype > + && memory_access_type =3D=3D VMAT_CONTIGUOUS_REVE= RSE) > + { > + unsigned HOST_WIDE_INT gap_offset > + =3D gap * tree_to_uhwi (TYPE_SIZE_UNIT (elem_ty= pe)); > + tree gapcst =3D build_int_cst (ref_type, gap_offs= et); > + offset =3D size_binop (PLUS_EXPR, offset, gapcst)= ; > + } > + data_ref > + =3D fold_build2 (MEM_REF, ltype, dataref_ptr, offse= t); > + if (alignment_support_scheme =3D=3D dr_aligned) > + ; > else > + TREE_TYPE (data_ref) > + =3D build_aligned_type (TREE_TYPE (data_ref), > + align * BITS_PER_UNIT); > + if (ltype !=3D vectype) > { > - tree ltype =3D vectype; > - tree new_vtype =3D NULL_TREE; > - unsigned HOST_WIDE_INT gap > - =3D DR_GROUP_GAP (first_stmt_info); > - unsigned int vect_align > - =3D vect_known_alignment_in_bytes (first_dr_inf= o, > - vectype); > - unsigned int scalar_dr_size > - =3D vect_get_scalar_dr_size (first_dr_info); > - /* If there's no peeling for gaps but we have a g= ap > - with slp loads then load the lower half of the > - vector only. See get_group_load_store_type fo= r > - when we apply this optimization. */ > - if (slp > - && loop_vinfo > - && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) > - && gap !=3D 0 > - && known_eq (nunits, (group_size - gap) * 2) > - && known_eq (nunits, group_size) > - && gap >=3D (vect_align / scalar_dr_size)) > + vect_copy_ref_info (data_ref, > + DR_REF (first_dr_info->dr)); > + tree tem =3D make_ssa_name (ltype); > + new_stmt =3D gimple_build_assign (tem, data_ref); > + vect_finish_stmt_generation (vinfo, stmt_info, ne= w_stmt, > + gsi); > + data_ref =3D NULL; > + vec *v; > + vec_alloc (v, 2); > + if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REV= ERSE) > { > - tree half_vtype; > - new_vtype > - =3D vector_vector_composition_type (vectype= , 2, > - &half_vty= pe); > - if (new_vtype !=3D NULL_TREE) > - ltype =3D half_vtype; > + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, > + build_zero_cst (ltype= )); > + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem); > } > - tree offset > - =3D (dataref_offset ? dataref_offset > - : build_int_cst (ref_type, 0)= ); > - if (ltype !=3D vectype > - && memory_access_type =3D=3D VMAT_CONTIGUOUS_= REVERSE) > + else > { > - unsigned HOST_WIDE_INT gap_offset > - =3D gap * tree_to_uhwi (TYPE_SIZE_UNIT (ele= m_type)); > - tree gapcst =3D build_int_cst (ref_type, gap_= offset); > - offset =3D size_binop (PLUS_EXPR, offset, gap= cst); > + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem); > + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, > + build_zero_cst (ltype= )); > } > - data_ref > - =3D fold_build2 (MEM_REF, ltype, dataref_ptr, o= ffset); > - if (alignment_support_scheme =3D=3D dr_aligned) > - ; > + gcc_assert (new_vtype !=3D NULL_TREE); > + if (new_vtype =3D=3D vectype) > + new_stmt =3D gimple_build_assign ( > + vec_dest, build_constructor (vectype, v)); > else > - TREE_TYPE (data_ref) > - =3D build_aligned_type (TREE_TYPE (data_ref), > - align * BITS_PER_UNIT); > - if (ltype !=3D vectype) > { > - vect_copy_ref_info (data_ref, > - DR_REF (first_dr_info->dr= )); > - tree tem =3D make_ssa_name (ltype); > - new_stmt =3D gimple_build_assign (tem, data_r= ef); > + tree new_vname =3D make_ssa_name (new_vtype); > + new_stmt =3D gimple_build_assign ( > + new_vname, build_constructor (new_vtype, v)= ); > vect_finish_stmt_generation (vinfo, stmt_info= , > new_stmt, gsi); > - data_ref =3D NULL; > - vec *v; > - vec_alloc (v, 2); > - if (memory_access_type =3D=3D VMAT_CONTIGUOUS= _REVERSE) > - { > - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, > - build_zero_cst (l= type)); > - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem= ); > - } > - else > - { > - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem= ); > - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, > - build_zero_cst (l= type)); > - } > - gcc_assert (new_vtype !=3D NULL_TREE); > - if (new_vtype =3D=3D vectype) > - new_stmt =3D gimple_build_assign ( > - vec_dest, build_constructor (vectype, v))= ; > - else > - { > - tree new_vname =3D make_ssa_name (new_vty= pe); > - new_stmt =3D gimple_build_assign ( > - new_vname, build_constructor (new_vtype= , v)); > - vect_finish_stmt_generation (vinfo, stmt_= info, > - new_stmt, gs= i); > - new_stmt =3D gimple_build_assign ( > - vec_dest, build1 (VIEW_CONVERT_EXPR, ve= ctype, > - new_vname)); > - } > + new_stmt =3D gimple_build_assign ( > + vec_dest, > + build1 (VIEW_CONVERT_EXPR, vectype, new_vna= me)); > } > } > - break; > } > - case dr_explicit_realign: > - { > - if (costing_p) > - break; > - tree ptr, bump; > - > - tree vs =3D size_int (TYPE_VECTOR_SUBPARTS (vectype))= ; > + break; > + } > + case dr_explicit_realign: > + { > + if (costing_p) > + break; > + tree ptr, bump; > > - if (compute_in_loop) > - msq =3D vect_setup_realignment (vinfo, first_stmt_i= nfo, gsi, > - &realignment_token, > - dr_explicit_realign, > - dataref_ptr, NULL); > + tree vs =3D size_int (TYPE_VECTOR_SUBPARTS (vectype)); > > - if (TREE_CODE (dataref_ptr) =3D=3D SSA_NAME) > - ptr =3D copy_ssa_name (dataref_ptr); > - else > - ptr =3D make_ssa_name (TREE_TYPE (dataref_ptr)); > - // For explicit realign the target alignment should b= e > - // known at compile time. > - unsigned HOST_WIDE_INT align =3D > - DR_TARGET_ALIGNMENT (first_dr_info).to_constant (); > - new_stmt =3D gimple_build_assign > - (ptr, BIT_AND_EXPR, dataref_ptr, > - build_int_cst > - (TREE_TYPE (dataref_ptr), > - -(HOST_WIDE_INT) align)); > - vect_finish_stmt_generation (vinfo, stmt_info, > - new_stmt, gsi); > - data_ref > - =3D build2 (MEM_REF, vectype, ptr, > - build_int_cst (ref_type, 0)); > - vect_copy_ref_info (data_ref, DR_REF (first_dr_info->= dr)); > - vec_dest =3D vect_create_destination_var (scalar_dest= , > - vectype); > - new_stmt =3D gimple_build_assign (vec_dest, data_ref)= ; > - new_temp =3D make_ssa_name (vec_dest, new_stmt); > - gimple_assign_set_lhs (new_stmt, new_temp); > - gimple_move_vops (new_stmt, stmt_info->stmt); > - vect_finish_stmt_generation (vinfo, stmt_info, > - new_stmt, gsi); > - msq =3D new_temp; > - > - bump =3D size_binop (MULT_EXPR, vs, > - TYPE_SIZE_UNIT (elem_type)); > - bump =3D size_binop (MINUS_EXPR, bump, size_one_node)= ; > - ptr =3D bump_vector_ptr (vinfo, dataref_ptr, NULL, gs= i, > - stmt_info, bump); > - new_stmt =3D gimple_build_assign > - (NULL_TREE, BIT_AND_EXPR, ptr, > - build_int_cst > - (TREE_TYPE (ptr), -(HOST_WIDE_INT) alig= n)); > - if (TREE_CODE (ptr) =3D=3D SSA_NAME) > - ptr =3D copy_ssa_name (ptr, new_stmt); > - else > - ptr =3D make_ssa_name (TREE_TYPE (ptr), new_stmt); > - gimple_assign_set_lhs (new_stmt, ptr); > - vect_finish_stmt_generation (vinfo, stmt_info, > - new_stmt, gsi); > - data_ref > - =3D build2 (MEM_REF, vectype, ptr, > - build_int_cst (ref_type, 0)); > - break; > - } > - case dr_explicit_realign_optimized: > - { > - if (costing_p) > - break; > - if (TREE_CODE (dataref_ptr) =3D=3D SSA_NAME) > - new_temp =3D copy_ssa_name (dataref_ptr); > - else > - new_temp =3D make_ssa_name (TREE_TYPE (dataref_ptr)= ); > - // We should only be doing this if we know the target > - // alignment at compile time. > - unsigned HOST_WIDE_INT align =3D > - DR_TARGET_ALIGNMENT (first_dr_info).to_constant (); > - new_stmt =3D gimple_build_assign > - (new_temp, BIT_AND_EXPR, dataref_ptr, > - build_int_cst (TREE_TYPE (dataref_ptr), > - -(HOST_WIDE_INT) align)); > - vect_finish_stmt_generation (vinfo, stmt_info, > - new_stmt, gsi); > - data_ref > - =3D build2 (MEM_REF, vectype, new_temp, > - build_int_cst (ref_type, 0)); > - break; > - } > - default: > - gcc_unreachable (); > - } > + if (compute_in_loop) > + msq =3D vect_setup_realignment (vinfo, first_stmt_info,= gsi, > + &realignment_token, > + dr_explicit_realign, > + dataref_ptr, NULL); > + > + if (TREE_CODE (dataref_ptr) =3D=3D SSA_NAME) > + ptr =3D copy_ssa_name (dataref_ptr); > + else > + ptr =3D make_ssa_name (TREE_TYPE (dataref_ptr)); > + // For explicit realign the target alignment should be > + // known at compile time. > + unsigned HOST_WIDE_INT align > + =3D DR_TARGET_ALIGNMENT (first_dr_info).to_constant (); > + new_stmt =3D gimple_build_assign ( > + ptr, BIT_AND_EXPR, dataref_ptr, > + build_int_cst (TREE_TYPE (dataref_ptr), > + -(HOST_WIDE_INT) align)); > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, = gsi); > + data_ref > + =3D build2 (MEM_REF, vectype, ptr, build_int_cst (ref_t= ype, 0)); > + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr))= ; > + vec_dest =3D vect_create_destination_var (scalar_dest, ve= ctype); > + new_stmt =3D gimple_build_assign (vec_dest, data_ref); > + new_temp =3D make_ssa_name (vec_dest, new_stmt); > + gimple_assign_set_lhs (new_stmt, new_temp); > + gimple_move_vops (new_stmt, stmt_info->stmt); > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, = gsi); > + msq =3D new_temp; > + > + bump =3D size_binop (MULT_EXPR, vs, TYPE_SIZE_UNIT (elem_= type)); > + bump =3D size_binop (MINUS_EXPR, bump, size_one_node); > + ptr =3D bump_vector_ptr (vinfo, dataref_ptr, NULL, gsi, s= tmt_info, > + bump); > + new_stmt =3D gimple_build_assign ( > + NULL_TREE, BIT_AND_EXPR, ptr, > + build_int_cst (TREE_TYPE (ptr), -(HOST_WIDE_INT) align)= ); > + if (TREE_CODE (ptr) =3D=3D SSA_NAME) > + ptr =3D copy_ssa_name (ptr, new_stmt); > + else > + ptr =3D make_ssa_name (TREE_TYPE (ptr), new_stmt); > + gimple_assign_set_lhs (new_stmt, ptr); > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, = gsi); > + data_ref > + =3D build2 (MEM_REF, vectype, ptr, build_int_cst (ref_t= ype, 0)); > + break; > + } > + case dr_explicit_realign_optimized: > + { > + if (costing_p) > + break; > + if (TREE_CODE (dataref_ptr) =3D=3D SSA_NAME) > + new_temp =3D copy_ssa_name (dataref_ptr); > + else > + new_temp =3D make_ssa_name (TREE_TYPE (dataref_ptr)); > + // We should only be doing this if we know the target > + // alignment at compile time. > + unsigned HOST_WIDE_INT align > + =3D DR_TARGET_ALIGNMENT (first_dr_info).to_constant (); > + new_stmt =3D gimple_build_assign ( > + new_temp, BIT_AND_EXPR, dataref_ptr, > + build_int_cst (TREE_TYPE (dataref_ptr), > + -(HOST_WIDE_INT) align)); > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, = gsi); > + data_ref =3D build2 (MEM_REF, vectype, new_temp, > + build_int_cst (ref_type, 0)); > + break; > + } > + default: > + gcc_unreachable (); > + } > > - /* One common place to cost the above vect load for differe= nt > - alignment support schemes. */ > - if (costing_p) > - { > - /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we > - only need to take care of the first stmt, whose > - stmt_info is first_stmt_info, vec_num iterating on i= t > - will cover the cost for the remaining, it's consiste= nt > - with transforming. For the prologue cost for realig= n, > - we only need to count it once for the whole group. = */ > - bool first_stmt_info_p =3D first_stmt_info =3D=3D stmt_= info; > - bool add_realign_cost =3D first_stmt_info_p && i =3D=3D= 0; > - if (memory_access_type =3D=3D VMAT_CONTIGUOUS > - || memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERS= E > - || (memory_access_type =3D=3D VMAT_CONTIGUOUS_PERMU= TE > - && (!grouped_load || first_stmt_info_p))) > - vect_get_load_cost (vinfo, stmt_info, 1, > - alignment_support_scheme, misalig= nment, > - add_realign_cost, &inside_cost, > - &prologue_cost, cost_vec, cost_ve= c, > - true); > - } > - else > + /* One common place to cost the above vect load for different > + alignment support schemes. */ > + if (costing_p) > + { > + /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we > + only need to take care of the first stmt, whose > + stmt_info is first_stmt_info, vec_num iterating on it > + will cover the cost for the remaining, it's consistent > + with transforming. For the prologue cost for realign, > + we only need to count it once for the whole group. */ > + bool first_stmt_info_p =3D first_stmt_info =3D=3D stmt_info= ; > + bool add_realign_cost =3D first_stmt_info_p && i =3D=3D 0; > + if (memory_access_type =3D=3D VMAT_CONTIGUOUS > + || memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE > + || (memory_access_type =3D=3D VMAT_CONTIGUOUS_PERMUTE > + && (!grouped_load || first_stmt_info_p))) > + vect_get_load_cost (vinfo, stmt_info, 1, > + alignment_support_scheme, misalignmen= t, > + add_realign_cost, &inside_cost, > + &prologue_cost, cost_vec, cost_vec, t= rue); > + } > + else > + { > + vec_dest =3D vect_create_destination_var (scalar_dest, vect= ype); > + /* DATA_REF is null if we've already built the statement. = */ > + if (data_ref) > { > - vec_dest =3D vect_create_destination_var (scalar_dest, = vectype); > - /* DATA_REF is null if we've already built the statemen= t. */ > - if (data_ref) > - { > - vect_copy_ref_info (data_ref, DR_REF (first_dr_info= ->dr)); > - new_stmt =3D gimple_build_assign (vec_dest, data_re= f); > - } > - new_temp =3D make_ssa_name (vec_dest, new_stmt); > - gimple_set_lhs (new_stmt, new_temp); > - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt= , gsi); > + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr= )); > + new_stmt =3D gimple_build_assign (vec_dest, data_ref); > } > + new_temp =3D make_ssa_name (vec_dest, new_stmt); > + gimple_set_lhs (new_stmt, new_temp); > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gs= i); > + } > > - /* 3. Handle explicit realignment if necessary/supported. > - Create in loop: > - vec_dest =3D realign_load (msq, lsq, realignment_token= ) */ > - if (!costing_p > - && (alignment_support_scheme =3D=3D dr_explicit_realign= _optimized > - || alignment_support_scheme =3D=3D dr_explicit_real= ign)) > - { > - lsq =3D gimple_assign_lhs (new_stmt); > - if (!realignment_token) > - realignment_token =3D dataref_ptr; > - vec_dest =3D vect_create_destination_var (scalar_dest, = vectype); > - new_stmt =3D gimple_build_assign (vec_dest, REALIGN_LOA= D_EXPR, > - msq, lsq, realignment_t= oken); > - new_temp =3D make_ssa_name (vec_dest, new_stmt); > - gimple_assign_set_lhs (new_stmt, new_temp); > - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt= , gsi); > + /* 3. Handle explicit realignment if necessary/supported. > + Create in loop: > + vec_dest =3D realign_load (msq, lsq, realignment_token) *= / > + if (!costing_p > + && (alignment_support_scheme =3D=3D dr_explicit_realign_opt= imized > + || alignment_support_scheme =3D=3D dr_explicit_realign)= ) > + { > + lsq =3D gimple_assign_lhs (new_stmt); > + if (!realignment_token) > + realignment_token =3D dataref_ptr; > + vec_dest =3D vect_create_destination_var (scalar_dest, vect= ype); > + new_stmt =3D gimple_build_assign (vec_dest, REALIGN_LOAD_EX= PR, msq, > + lsq, realignment_token); > + new_temp =3D make_ssa_name (vec_dest, new_stmt); > + gimple_assign_set_lhs (new_stmt, new_temp); > + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gs= i); > > - if (alignment_support_scheme =3D=3D dr_explicit_realign= _optimized) > - { > - gcc_assert (phi); > - if (i =3D=3D vec_num - 1 && j =3D=3D ncopies - 1) > - add_phi_arg (phi, lsq, > - loop_latch_edge (containing_loop), > - UNKNOWN_LOCATION); > - msq =3D lsq; > - } > + if (alignment_support_scheme =3D=3D dr_explicit_realign_opt= imized) > + { > + gcc_assert (phi); > + if (i =3D=3D vec_num - 1 && j =3D=3D ncopies - 1) > + add_phi_arg (phi, lsq, loop_latch_edge (containing_lo= op), > + UNKNOWN_LOCATION); > + msq =3D lsq; > } > + } > > - if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE) > + if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE) > + { > + if (costing_p) > + inside_cost =3D record_stmt_cost (cost_vec, 1, vec_perm, > + stmt_info, 0, vect_body); > + else > { > - if (costing_p) > - inside_cost =3D record_stmt_cost (cost_vec, 1, vec_pe= rm, > - stmt_info, 0, vect_bo= dy); > - else > - { > - tree perm_mask =3D perm_mask_for_reverse (vectype); > - new_temp > - =3D permute_vec_elements (vinfo, new_temp, new_te= mp, > - perm_mask, stmt_info, gsi= ); > - new_stmt =3D SSA_NAME_DEF_STMT (new_temp); > - } > + tree perm_mask =3D perm_mask_for_reverse (vectype); > + new_temp =3D permute_vec_elements (vinfo, new_temp, new= _temp, > + perm_mask, stmt_info, = gsi); > + new_stmt =3D SSA_NAME_DEF_STMT (new_temp); > } > + } > > - /* Collect vector loads and later create their permutation = in > - vect_transform_grouped_load (). */ > - if (!costing_p && (grouped_load || slp_perm)) > - dr_chain.quick_push (new_temp); > + /* Collect vector loads and later create their permutation in > + vect_transform_grouped_load (). */ > + if (!costing_p && (grouped_load || slp_perm)) > + dr_chain.quick_push (new_temp); > > - /* Store vector loads in the corresponding SLP_NODE. */ > - if (!costing_p && slp && !slp_perm) > - slp_node->push_vec_def (new_stmt); > + /* Store vector loads in the corresponding SLP_NODE. */ > + if (!costing_p && slp && !slp_perm) > + slp_node->push_vec_def (new_stmt); > > - /* With SLP permutation we load the gaps as well, without > - we need to skip the gaps after we manage to fully load > - all elements. group_gap_adj is DR_GROUP_SIZE here. */ > - group_elt +=3D nunits; > - if (!costing_p > - && maybe_ne (group_gap_adj, 0U) > - && !slp_perm > - && known_eq (group_elt, group_size - group_gap_adj)) > - { > - poly_wide_int bump_val > - =3D (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) > - * group_gap_adj); > - if (tree_int_cst_sgn > - (vect_dr_behavior (vinfo, dr_info)->step) =3D=3D = -1) > - bump_val =3D -bump_val; > - tree bump =3D wide_int_to_tree (sizetype, bump_val); > - dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, pt= r_incr, > - gsi, stmt_info, bump); > - group_elt =3D 0; > - } > - } > - /* Bump the vector pointer to account for a gap or for excess > - elements loaded for a permuted SLP load. */ > + /* With SLP permutation we load the gaps as well, without > + we need to skip the gaps after we manage to fully load > + all elements. group_gap_adj is DR_GROUP_SIZE here. */ > + group_elt +=3D nunits; > if (!costing_p > && maybe_ne (group_gap_adj, 0U) > - && slp_perm) > + && !slp_perm > + && known_eq (group_elt, group_size - group_gap_adj)) > { > poly_wide_int bump_val > - =3D (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) > - * group_gap_adj); > - if (tree_int_cst_sgn > - (vect_dr_behavior (vinfo, dr_info)->step) =3D=3D -1) > + =3D (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) * group_gap= _adj); > + if (tree_int_cst_sgn (vect_dr_behavior (vinfo, dr_info)->st= ep) > + =3D=3D -1) > bump_val =3D -bump_val; > tree bump =3D wide_int_to_tree (sizetype, bump_val); > dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_in= cr, gsi, > stmt_info, bump); > + group_elt =3D 0; > } > } > + /* Bump the vector pointer to account for a gap or for excess > + elements loaded for a permuted SLP load. */ > + if (!costing_p > + && maybe_ne (group_gap_adj, 0U) > + && slp_perm) > + { > + poly_wide_int bump_val > + =3D (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) * group_gap_adj= ); > + if (tree_int_cst_sgn (vect_dr_behavior (vinfo, dr_info)->step) = =3D=3D -1) > + bump_val =3D -bump_val; > + tree bump =3D wide_int_to_tree (sizetype, bump_val); > + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, = gsi, > + stmt_info, bump); > + } > > if (slp && !slp_perm) > continue; > @@ -11120,39 +11117,36 @@ vectorizable_load (vec_info *vinfo, > } > } > else > - { > - if (grouped_load) > - { > - if (memory_access_type !=3D VMAT_LOAD_STORE_LANES) > + { > + if (grouped_load) > + { > + gcc_assert (memory_access_type =3D=3D VMAT_CONTIGUOUS_PERMU= TE); > + /* We assume that the cost of a single load-lanes instructi= on > + is equivalent to the cost of DR_GROUP_SIZE separate load= s. > + If a grouped access is instead being provided by a > + load-and-permute operation, include the cost of the > + permutes. */ > + if (costing_p && first_stmt_info =3D=3D stmt_info) > { > - gcc_assert (memory_access_type =3D=3D VMAT_CONTIGUOUS_P= ERMUTE); > - /* We assume that the cost of a single load-lanes instr= uction > - is equivalent to the cost of DR_GROUP_SIZE separate = loads. > - If a grouped access is instead being provided by a > - load-and-permute operation, include the cost of the > - permutes. */ > - if (costing_p && first_stmt_info =3D=3D stmt_info) > - { > - /* Uses an even and odd extract operations or shuff= le > - operations for each needed permute. */ > - int group_size =3D DR_GROUP_SIZE (first_stmt_info); > - int nstmts =3D ceil_log2 (group_size) * group_size; > - inside_cost > - +=3D record_stmt_cost (cost_vec, nstmts, vec_perm= , > - stmt_info, 0, vect_body); > + /* Uses an even and odd extract operations or shuffle > + operations for each needed permute. */ > + int group_size =3D DR_GROUP_SIZE (first_stmt_info); > + int nstmts =3D ceil_log2 (group_size) * group_size; > + inside_cost +=3D record_stmt_cost (cost_vec, nstmts, ve= c_perm, > + stmt_info, 0, vect_bod= y); > > - if (dump_enabled_p ()) > - dump_printf_loc ( > - MSG_NOTE, vect_location, > - "vect_model_load_cost: strided group_size =3D %= d .\n", > - group_size); > - } > - else if (!costing_p) > - vect_transform_grouped_load (vinfo, stmt_info, dr_cha= in, > - group_size, gsi); > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "vect_model_load_cost:" > + "strided group_size =3D %d .\n", > + group_size); > + } > + else if (!costing_p) > + { > + vect_transform_grouped_load (vinfo, stmt_info, dr_chain= , > + group_size, gsi); > + *vec_stmt =3D STMT_VINFO_VEC_STMTS (stmt_info)[0]; > } > - if (!costing_p) > - *vec_stmt =3D STMT_VINFO_VEC_STMTS (stmt_info)[0]; > } > else if (!costing_p) > STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); > @@ -11166,7 +11160,8 @@ vectorizable_load (vec_info *vinfo, > { > gcc_assert (memory_access_type !=3D VMAT_INVARIANT > && memory_access_type !=3D VMAT_ELEMENTWISE > - && memory_access_type !=3D VMAT_STRIDED_SLP); > + && memory_access_type !=3D VMAT_STRIDED_SLP > + && memory_access_type !=3D VMAT_LOAD_STORE_LANES); > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > "vect_model_load_cost: inside_cost =3D %u, " > -- > 2.31.1