From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=HLFv=D7=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d])
	by sourceware.org (Postfix) with ESMTPS id 7D1913858C1F
	for <gcc-patches@gcc.gnu.org>; Mon, 14 Aug 2023 12:06:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7D1913858C1F
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2b95d5ee18dso62953801fa.1
        for <gcc-patches@gcc.gnu.org>; Mon, 14 Aug 2023 05:06:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1692014762; x=1692619562;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=9alf4FqyK5SFP9hkZXAn0ieKDEVm/eV3g01an0q8r8A=;
        b=MmGmiMm7qRCTYG3Ukt53pMiW0aJxFZeEndWUtfg36NjvOFppRzllP+6Gh6D5YqGQWc
         xiduHwHNaoQ3Jo2y2+to89EhwuHWgECZBRgUpQIIlD/ttqjUaCN4Px9J3ytXEUskuzSQ
         TcpnMlUT4oMeMU3GrJMjoYFErAvv6H5LQrWUXSPDEhcVjGANEmXzS6KhzNDC3g58Z3F3
         C429ProvOyh2nhQO42TiVhvZhI1639Y0ldMKNH2B8HcSDJZ0ztEwq3oWj6p1Igy3GkY2
         VmdO6m3Qd0aWZrkZnnZI18sSCdX0+VWw9WI49LZwnPKFkLjGEu2gHV7cBaBCsEKvpNmQ
         ci1g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1692014762; x=1692619562;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=9alf4FqyK5SFP9hkZXAn0ieKDEVm/eV3g01an0q8r8A=;
        b=DZINte53ti3Ag7Y/h1DaSD6GbCwyi5VK1M3dP3R+EeFqrRv1I04nya4GFRMj6ONUjU
         2AjcmpfUzwVkjsszJto4KP4s4kr/2CLMhS0ih7I92vJxECFnjTtMcYh4LDabx7D9vJIp
         EfYfv/78u31Vm3hakNa6GXo3XfpMV0YGn2CAOSCX+hcqqLm1SRPqjzjU2vaqHUR3Jv+3
         ujGQZXbnTEE/ggl44idQDXL06u3C8/Gjzsfh11A/qrH48ZWbYm+p7CQFClaZQv2xWPsY
         xgR/p/UveoiGfl9+yregAj64wQuPanUQSh1Y3IzXygF3Pcc6Ut3PFDy0492gvH5QZO32
         ewWQ==
X-Gm-Message-State: AOJu0YyWGN2S7gHZxHX1rAB1cVFyWUmQyWqEAtqNRvS/mNSHiEqC9SFr
	0yvNO+h4iCKdBbJIVo+CXI9SbSvOPOmRKRe5fE7zGZiX
X-Google-Smtp-Source: AGHT+IEYy6rPp74Ynl38vJV2vQtXhKqwhNSdoM/PJp3vJ+bEKO7o1VgpFvaZDdTpREKMjTj0wOpxxqJWhujl6nXPlqA=
X-Received: by 2002:a05:651c:85:b0:2b9:ecab:d924 with SMTP id
 5-20020a05651c008500b002b9ecabd924mr6669070ljq.18.1692014761368; Mon, 14 Aug
 2023 05:06:01 -0700 (PDT)
MIME-Version: 1.0
References: <b39e934e-869e-840d-eb7a-5b2de24146a8@linux.ibm.com>
In-Reply-To: <b39e934e-869e-840d-eb7a-5b2de24146a8@linux.ibm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 14 Aug 2023 14:04:44 +0200
Message-ID: <CAFiYyc09shXyt_43SubVO8qN1GVSXPf3u32QWUmqdzBPMEXa-Q@mail.gmail.com>
Subject: Re: [PATCH] vect: Move VMAT_LOAD_STORE_LANES handlings from final
 loop nest
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, Richard Sandiford <richard.sandiford@arm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, Aug 14, 2023 at 10:54=E2=80=AFAM Kewen.Lin <linkw@linux.ibm.com> wr=
ote:
>
> Hi,
>
> Following Richi's suggestion [1], this patch is to move the
> handlings on VMAT_LOAD_STORE_LANES in the final loop nest
> of function vectorizable_load to its own loop.  Basically
> it duplicates the final loop nest, clean up some useless
> set up code for the case of VMAT_LOAD_STORE_LANES, remove
> some unreachable code.  Also remove the corresponding
> handlings in the final loop nest.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

OK (I guess the big diff is mostly because of re-indenting).

Thanks,
Richard.

> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html
>
> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_load): Move the handlings on
>         VMAT_LOAD_STORE_LANES in the final loop nest to its own loop,
>         and update the final nest accordingly.
> ---
>  gcc/tree-vect-stmts.cc | 1275 ++++++++++++++++++++--------------------
>  1 file changed, 634 insertions(+), 641 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 4f2d088484c..c361e16cb7b 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -10332,7 +10332,129 @@ vectorizable_load (vec_info *vinfo,
>         vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, mask,
>                                        &vec_masks, mask_vectype);
>      }
> +
>    tree vec_mask =3D NULL_TREE;
> +  if (memory_access_type =3D=3D VMAT_LOAD_STORE_LANES)
> +    {
> +      gcc_assert (alignment_support_scheme =3D=3D dr_aligned
> +                 || alignment_support_scheme =3D=3D dr_unaligned_support=
ed);
> +      gcc_assert (grouped_load && !slp);
> +
> +      unsigned int inside_cost =3D 0, prologue_cost =3D 0;
> +      for (j =3D 0; j < ncopies; j++)
> +       {
> +         if (costing_p)
> +           {
> +             /* An IFN_LOAD_LANES will load all its vector results,
> +                regardless of which ones we actually need.  Account
> +                for the cost of unused results.  */
> +             if (first_stmt_info =3D=3D stmt_info)
> +               {
> +                 unsigned int gaps =3D DR_GROUP_SIZE (first_stmt_info);
> +                 stmt_vec_info next_stmt_info =3D first_stmt_info;
> +                 do
> +                   {
> +                     gaps -=3D 1;
> +                     next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt=
_info);
> +                   }
> +                 while (next_stmt_info);
> +                 if (gaps)
> +                   {
> +                     if (dump_enabled_p ())
> +                       dump_printf_loc (MSG_NOTE, vect_location,
> +                                        "vect_model_load_cost: %d "
> +                                        "unused vectors.\n",
> +                                        gaps);
> +                     vect_get_load_cost (vinfo, stmt_info, gaps,
> +                                         alignment_support_scheme,
> +                                         misalignment, false, &inside_co=
st,
> +                                         &prologue_cost, cost_vec, cost_=
vec,
> +                                         true);
> +                   }
> +               }
> +             vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_=
scheme,
> +                                 misalignment, false, &inside_cost,
> +                                 &prologue_cost, cost_vec, cost_vec, tru=
e);
> +             continue;
> +           }
> +
> +         /* 1. Create the vector or array pointer update chain.  */
> +         if (j =3D=3D 0)
> +           dataref_ptr
> +             =3D vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_=
type,
> +                                         at_loop, offset, &dummy, gsi,
> +                                         &ptr_incr, false, bump);
> +         else
> +           {
> +             gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
> +             dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_in=
cr, gsi,
> +                                            stmt_info, bump);
> +           }
> +         if (mask)
> +           vec_mask =3D vec_masks[j];
> +
> +         tree vec_array =3D create_vector_array (vectype, vec_num);
> +
> +         tree final_mask =3D NULL_TREE;
> +         if (loop_masks)
> +           final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_mask=
s,
> +                                            ncopies, vectype, j);
> +         if (vec_mask)
> +           final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype, fi=
nal_mask,
> +                                          vec_mask, gsi);
> +
> +         gcall *call;
> +         if (final_mask)
> +           {
> +             /* Emit:
> +                  VEC_ARRAY =3D MASK_LOAD_LANES (DATAREF_PTR, ALIAS_PTR,
> +                                               VEC_MASK).  */
> +             unsigned int align =3D TYPE_ALIGN (TREE_TYPE (vectype));
> +             tree alias_ptr =3D build_int_cst (ref_type, align);
> +             call =3D gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3=
,
> +                                                dataref_ptr, alias_ptr,
> +                                                final_mask);
> +           }
> +         else
> +           {
> +             /* Emit:
> +                  VEC_ARRAY =3D LOAD_LANES (MEM_REF[...all elements...])=
.  */
> +             data_ref =3D create_array_ref (aggr_type, dataref_ptr, ref_=
type);
> +             call =3D gimple_build_call_internal (IFN_LOAD_LANES, 1, dat=
a_ref);
> +           }
> +         gimple_call_set_lhs (call, vec_array);
> +         gimple_call_set_nothrow (call, true);
> +         vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
> +
> +         dr_chain.create (vec_num);
> +         /* Extract each vector into an SSA_NAME.  */
> +         for (i =3D 0; i < vec_num; i++)
> +           {
> +             new_temp =3D read_vector_array (vinfo, stmt_info, gsi, scal=
ar_dest,
> +                                           vec_array, i);
> +             dr_chain.quick_push (new_temp);
> +           }
> +
> +         /* Record the mapping between SSA_NAMEs and statements.  */
> +         vect_record_grouped_load_vectors (vinfo, stmt_info, dr_chain);
> +
> +         /* Record that VEC_ARRAY is now dead.  */
> +         vect_clobber_variable (vinfo, stmt_info, gsi, vec_array);
> +
> +         dr_chain.release ();
> +
> +         *vec_stmt =3D STMT_VINFO_VEC_STMTS (stmt_info)[0];
> +       }
> +
> +      if (costing_p && dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "vect_model_load_cost: inside_cost =3D %u, "
> +                        "prologue_cost =3D %u .\n",
> +                        inside_cost, prologue_cost);
> +
> +      return true;
> +    }
> +
>    poly_uint64 group_elt =3D 0;
>    unsigned int inside_cost =3D 0, prologue_cost =3D 0;
>    for (j =3D 0; j < ncopies; j++)
> @@ -10414,685 +10538,558 @@ vectorizable_load (vec_info *vinfo,
>         dr_chain.create (vec_num);
>
>        gimple *new_stmt =3D NULL;
> -      if (memory_access_type =3D=3D VMAT_LOAD_STORE_LANES)
> +      for (i =3D 0; i < vec_num; i++)
>         {
> -         if (costing_p)
> -           {
> -             /* An IFN_LOAD_LANES will load all its vector results,
> -                regardless of which ones we actually need.  Account
> -                for the cost of unused results.  */
> -             if (grouped_load && first_stmt_info =3D=3D stmt_info)
> -               {
> -                 unsigned int gaps =3D DR_GROUP_SIZE (first_stmt_info);
> -                 stmt_vec_info next_stmt_info =3D first_stmt_info;
> -                 do
> -                   {
> -                     gaps -=3D 1;
> -                     next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt=
_info);
> -                   }
> -                 while (next_stmt_info);
> -                 if (gaps)
> -                   {
> -                     if (dump_enabled_p ())
> -                       dump_printf_loc (MSG_NOTE, vect_location,
> -                                        "vect_model_load_cost: %d "
> -                                        "unused vectors.\n",
> -                                        gaps);
> -                     vect_get_load_cost (vinfo, stmt_info, gaps,
> -                                         alignment_support_scheme,
> -                                         misalignment, false, &inside_co=
st,
> -                                         &prologue_cost, cost_vec, cost_=
vec,
> -                                         true);
> -                   }
> -               }
> -             vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_=
scheme,
> -                                 misalignment, false, &inside_cost,
> -                                 &prologue_cost, cost_vec, cost_vec, tru=
e);
> -             continue;
> -           }
> -         tree vec_array;
> -
> -         vec_array =3D create_vector_array (vectype, vec_num);
> -
>           tree final_mask =3D NULL_TREE;
> -         if (loop_masks)
> -           final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_mask=
s,
> -                                            ncopies, vectype, j);
> -         if (vec_mask)
> -           final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype,
> -                                          final_mask, vec_mask, gsi);
> -
> -         gcall *call;
> -         if (final_mask)
> -           {
> -             /* Emit:
> -                  VEC_ARRAY =3D MASK_LOAD_LANES (DATAREF_PTR, ALIAS_PTR,
> -                                               VEC_MASK).  */
> -             unsigned int align =3D TYPE_ALIGN (TREE_TYPE (vectype));
> -             tree alias_ptr =3D build_int_cst (ref_type, align);
> -             call =3D gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3=
,
> -                                                dataref_ptr, alias_ptr,
> -                                                final_mask);
> -           }
> -         else
> +         tree final_len =3D NULL_TREE;
> +         tree bias =3D NULL_TREE;
> +         if (!costing_p)
>             {
> -             /* Emit:
> -                  VEC_ARRAY =3D LOAD_LANES (MEM_REF[...all elements...])=
.  */
> -             data_ref =3D create_array_ref (aggr_type, dataref_ptr, ref_=
type);
> -             call =3D gimple_build_call_internal (IFN_LOAD_LANES, 1, dat=
a_ref);
> -           }
> -         gimple_call_set_lhs (call, vec_array);
> -         gimple_call_set_nothrow (call, true);
> -         vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
> -         new_stmt =3D call;
> +             if (loop_masks)
> +               final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_=
masks,
> +                                                vec_num * ncopies, vecty=
pe,
> +                                                vec_num * j + i);
> +             if (vec_mask)
> +               final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype=
,
> +                                              final_mask, vec_mask, gsi)=
;
>
> -         /* Extract each vector into an SSA_NAME.  */
> -         for (i =3D 0; i < vec_num; i++)
> -           {
> -             new_temp =3D read_vector_array (vinfo, stmt_info, gsi, scal=
ar_dest,
> -                                           vec_array, i);
> -             dr_chain.quick_push (new_temp);
> +             if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> +               dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_=
incr,
> +                                              gsi, stmt_info, bump);
>             }
>
> -         /* Record the mapping between SSA_NAMEs and statements.  */
> -         vect_record_grouped_load_vectors (vinfo, stmt_info, dr_chain);
> -
> -         /* Record that VEC_ARRAY is now dead.  */
> -         vect_clobber_variable (vinfo, stmt_info, gsi, vec_array);
> -       }
> -      else
> -       {
> -         for (i =3D 0; i < vec_num; i++)
> +         /* 2. Create the vector-load in the loop.  */
> +         switch (alignment_support_scheme)
>             {
> -             tree final_mask =3D NULL_TREE;
> -             tree final_len =3D NULL_TREE;
> -             tree bias =3D NULL_TREE;
> -             if (!costing_p)
> -               {
> -                 if (loop_masks)
> -                   final_mask
> -                     =3D vect_get_loop_mask (loop_vinfo, gsi, loop_masks=
,
> -                                           vec_num * ncopies, vectype,
> -                                           vec_num * j + i);
> -                 if (vec_mask)
> -                   final_mask =3D prepare_vec_mask (loop_vinfo, mask_vec=
type,
> -                                                  final_mask, vec_mask, =
gsi);
> -
> -                 if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> -                   dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, =
ptr_incr,
> -                                                  gsi, stmt_info, bump);
> -               }
> +           case dr_aligned:
> +           case dr_unaligned_supported:
> +             {
> +               unsigned int misalign;
> +               unsigned HOST_WIDE_INT align;
>
> -             /* 2. Create the vector-load in the loop.  */
> -             switch (alignment_support_scheme)
> -               {
> -               case dr_aligned:
> -               case dr_unaligned_supported:
> +               if (memory_access_type =3D=3D VMAT_GATHER_SCATTER
> +                   && gs_info.ifn !=3D IFN_LAST)
>                   {
> -                   unsigned int misalign;
> -                   unsigned HOST_WIDE_INT align;
> -
> -                   if (memory_access_type =3D=3D VMAT_GATHER_SCATTER
> -                       && gs_info.ifn !=3D IFN_LAST)
> +                   if (costing_p)
>                       {
> -                       if (costing_p)
> -                         {
> -                           unsigned int cnunits
> -                             =3D vect_nunits_for_cost (vectype);
> -                           inside_cost
> -                             =3D record_stmt_cost (cost_vec, cnunits,
> -                                                 scalar_load, stmt_info,=
 0,
> -                                                 vect_body);
> -                           break;
> -                         }
> -                       if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> -                         vec_offset =3D vec_offsets[vec_num * j + i];
> -                       tree zero =3D build_zero_cst (vectype);
> -                       tree scale =3D size_int (gs_info.scale);
> -
> -                       if (gs_info.ifn =3D=3D IFN_MASK_LEN_GATHER_LOAD)
> -                         {
> -                           if (loop_lens)
> -                             final_len
> -                               =3D vect_get_loop_len (loop_vinfo, gsi, l=
oop_lens,
> -                                                    vec_num * ncopies, v=
ectype,
> -                                                    vec_num * j + i, 1);
> -                           else
> -                             final_len =3D build_int_cst (sizetype,
> -                                                        TYPE_VECTOR_SUBP=
ARTS (
> -                                                          vectype));
> -                           signed char biasval
> -                             =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loo=
p_vinfo);
> -                           bias =3D build_int_cst (intQI_type_node, bias=
val);
> -                           if (!final_mask)
> -                             {
> -                               mask_vectype =3D truth_type_for (vectype)=
;
> -                               final_mask =3D build_minus_one_cst (mask_=
vectype);
> -                             }
> -                         }
> -
> -                       gcall *call;
> -                       if (final_len && final_mask)
> -                         call =3D gimple_build_call_internal (
> -                           IFN_MASK_LEN_GATHER_LOAD, 7, dataref_ptr,
> -                           vec_offset, scale, zero, final_mask, final_le=
n,
> -                           bias);
> -                       else if (final_mask)
> -                         call =3D gimple_build_call_internal
> -                           (IFN_MASK_GATHER_LOAD, 5, dataref_ptr,
> -                            vec_offset, scale, zero, final_mask);
> -                       else
> -                         call =3D gimple_build_call_internal
> -                           (IFN_GATHER_LOAD, 4, dataref_ptr,
> -                            vec_offset, scale, zero);
> -                       gimple_call_set_nothrow (call, true);
> -                       new_stmt =3D call;
> -                       data_ref =3D NULL_TREE;
> +                       unsigned int cnunits =3D vect_nunits_for_cost (ve=
ctype);
> +                       inside_cost
> +                         =3D record_stmt_cost (cost_vec, cnunits, scalar=
_load,
> +                                             stmt_info, 0, vect_body);
>                         break;
>                       }
> -                   else if (memory_access_type =3D=3D VMAT_GATHER_SCATTE=
R)
> +                   if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> +                     vec_offset =3D vec_offsets[vec_num * j + i];
> +                   tree zero =3D build_zero_cst (vectype);
> +                   tree scale =3D size_int (gs_info.scale);
> +
> +                   if (gs_info.ifn =3D=3D IFN_MASK_LEN_GATHER_LOAD)
>                       {
> -                       /* Emulated gather-scatter.  */
> -                       gcc_assert (!final_mask);
> -                       unsigned HOST_WIDE_INT const_nunits
> -                         =3D nunits.to_constant ();
> -                       if (costing_p)
> -                         {
> -                           /* For emulated gathers N offset vector eleme=
nt
> -                              offset add is consumed by the load).  */
> -                           inside_cost
> -                             =3D record_stmt_cost (cost_vec, const_nunit=
s,
> -                                                 vec_to_scalar, stmt_inf=
o, 0,
> -                                                 vect_body);
> -                           /* N scalar loads plus gathering them into a
> -                              vector.  */
> -                           inside_cost
> -                             =3D record_stmt_cost (cost_vec, const_nunit=
s,
> -                                                 scalar_load, stmt_info,=
 0,
> -                                                 vect_body);
> -                           inside_cost
> -                             =3D record_stmt_cost (cost_vec, 1, vec_cons=
truct,
> -                                                 stmt_info, 0, vect_body=
);
> -                           break;
> -                         }
> -                       unsigned HOST_WIDE_INT const_offset_nunits
> -                         =3D TYPE_VECTOR_SUBPARTS (gs_info.offset_vectyp=
e)
> -                             .to_constant ();
> -                       vec<constructor_elt, va_gc> *ctor_elts;
> -                       vec_alloc (ctor_elts, const_nunits);
> -                       gimple_seq stmts =3D NULL;
> -                       /* We support offset vectors with more elements
> -                          than the data vector for now.  */
> -                       unsigned HOST_WIDE_INT factor
> -                         =3D const_offset_nunits / const_nunits;
> -                       vec_offset =3D vec_offsets[j / factor];
> -                       unsigned elt_offset =3D (j % factor) * const_nuni=
ts;
> -                       tree idx_type =3D TREE_TYPE (TREE_TYPE (vec_offse=
t));
> -                       tree scale =3D size_int (gs_info.scale);
> -                       align
> -                         =3D get_object_alignment (DR_REF (first_dr_info=
->dr));
> -                       tree ltype =3D build_aligned_type (TREE_TYPE (vec=
type),
> -                                                        align);
> -                       for (unsigned k =3D 0; k < const_nunits; ++k)
> +                       if (loop_lens)
> +                         final_len
> +                           =3D vect_get_loop_len (loop_vinfo, gsi, loop_=
lens,
> +                                                vec_num * ncopies, vecty=
pe,
> +                                                vec_num * j + i, 1);
> +                       else
> +                         final_len
> +                           =3D build_int_cst (sizetype,
> +                                            TYPE_VECTOR_SUBPARTS (vectyp=
e));
> +                       signed char biasval
> +                         =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vi=
nfo);
> +                       bias =3D build_int_cst (intQI_type_node, biasval)=
;
> +                       if (!final_mask)
>                           {
> -                           tree boff =3D size_binop (MULT_EXPR,
> -                                                   TYPE_SIZE (idx_type),
> -                                                   bitsize_int
> -                                                     (k + elt_offset));
> -                           tree idx =3D gimple_build (&stmts, BIT_FIELD_=
REF,
> -                                                    idx_type, vec_offset=
,
> -                                                    TYPE_SIZE (idx_type)=
,
> -                                                    boff);
> -                           idx =3D gimple_convert (&stmts, sizetype, idx=
);
> -                           idx =3D gimple_build (&stmts, MULT_EXPR,
> -                                               sizetype, idx, scale);
> -                           tree ptr =3D gimple_build (&stmts, PLUS_EXPR,
> -                                                    TREE_TYPE (dataref_p=
tr),
> -                                                    dataref_ptr, idx);
> -                           ptr =3D gimple_convert (&stmts, ptr_type_node=
, ptr);
> -                           tree elt =3D make_ssa_name (TREE_TYPE (vectyp=
e));
> -                           tree ref =3D build2 (MEM_REF, ltype, ptr,
> -                                              build_int_cst (ref_type, 0=
));
> -                           new_stmt =3D gimple_build_assign (elt, ref);
> -                           gimple_set_vuse (new_stmt,
> -                                            gimple_vuse (gsi_stmt (*gsi)=
));
> -                           gimple_seq_add_stmt (&stmts, new_stmt);
> -                           CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE,=
 elt);
> +                           mask_vectype =3D truth_type_for (vectype);
> +                           final_mask =3D build_minus_one_cst (mask_vect=
ype);
>                           }
> -                       gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT)=
;
> -                       new_stmt =3D gimple_build_assign (NULL_TREE,
> -                                                       build_constructor
> -                                                         (vectype, ctor_=
elts));
> -                       data_ref =3D NULL_TREE;
> -                       break;
>                       }
>
> -                   if (costing_p)
> -                     break;
> -
> -                   align =3D
> -                     known_alignment (DR_TARGET_ALIGNMENT (first_dr_info=
));
> -                   if (alignment_support_scheme =3D=3D dr_aligned)
> -                     misalign =3D 0;
> -                   else if (misalignment =3D=3D DR_MISALIGNMENT_UNKNOWN)
> -                     {
> -                       align =3D dr_alignment
> -                         (vect_dr_behavior (vinfo, first_dr_info));
> -                       misalign =3D 0;
> -                     }
> +                   gcall *call;
> +                   if (final_len && final_mask)
> +                     call =3D gimple_build_call_internal (
> +                       IFN_MASK_LEN_GATHER_LOAD, 7, dataref_ptr, vec_off=
set,
> +                       scale, zero, final_mask, final_len, bias);
> +                   else if (final_mask)
> +                     call
> +                       =3D gimple_build_call_internal (IFN_MASK_GATHER_L=
OAD, 5,
> +                                                     dataref_ptr, vec_of=
fset,
> +                                                     scale, zero, final_=
mask);
>                     else
> -                     misalign =3D misalignment;
> -                   if (dataref_offset =3D=3D NULL_TREE
> -                       && TREE_CODE (dataref_ptr) =3D=3D SSA_NAME)
> -                     set_ptr_info_alignment (get_ptr_info (dataref_ptr),
> -                                             align, misalign);
> -                   align =3D least_bit_hwi (misalign | align);
> -
> -                   /* Compute IFN when LOOP_LENS or final_mask valid.  *=
/
> -                   machine_mode vmode =3D TYPE_MODE (vectype);
> -                   machine_mode new_vmode =3D vmode;
> -                   internal_fn partial_ifn =3D IFN_LAST;
> -                   if (loop_lens)
> +                     call
> +                       =3D gimple_build_call_internal (IFN_GATHER_LOAD, =
4,
> +                                                     dataref_ptr, vec_of=
fset,
> +                                                     scale, zero);
> +                   gimple_call_set_nothrow (call, true);
> +                   new_stmt =3D call;
> +                   data_ref =3D NULL_TREE;
> +                   break;
> +                 }
> +               else if (memory_access_type =3D=3D VMAT_GATHER_SCATTER)
> +                 {
> +                   /* Emulated gather-scatter.  */
> +                   gcc_assert (!final_mask);
> +                   unsigned HOST_WIDE_INT const_nunits =3D nunits.to_con=
stant ();
> +                   if (costing_p)
>                       {
> -                       opt_machine_mode new_ovmode
> -                         =3D get_len_load_store_mode (vmode, true,
> -                                                    &partial_ifn);
> -                       new_vmode =3D new_ovmode.require ();
> -                       unsigned factor =3D (new_ovmode =3D=3D vmode)
> -                                           ? 1
> -                                           : GET_MODE_UNIT_SIZE (vmode);
> -                       final_len
> -                         =3D vect_get_loop_len (loop_vinfo, gsi, loop_le=
ns,
> -                                              vec_num * ncopies, vectype=
,
> -                                              vec_num * j + i, factor);
> +                       /* For emulated gathers N offset vector element
> +                          offset add is consumed by the load).  */
> +                       inside_cost
> +                         =3D record_stmt_cost (cost_vec, const_nunits,
> +                                             vec_to_scalar, stmt_info, 0=
,
> +                                             vect_body);
> +                       /* N scalar loads plus gathering them into a
> +                          vector.  */
> +                       inside_cost =3D record_stmt_cost (cost_vec, const=
_nunits,
> +                                                       scalar_load, stmt=
_info,
> +                                                       0, vect_body);
> +                       inside_cost
> +                         =3D record_stmt_cost (cost_vec, 1, vec_construc=
t,
> +                                             stmt_info, 0, vect_body);
> +                       break;
>                       }
> -                   else if (final_mask)
> +                   unsigned HOST_WIDE_INT const_offset_nunits
> +                     =3D TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype)
> +                         .to_constant ();
> +                   vec<constructor_elt, va_gc> *ctor_elts;
> +                   vec_alloc (ctor_elts, const_nunits);
> +                   gimple_seq stmts =3D NULL;
> +                   /* We support offset vectors with more elements
> +                      than the data vector for now.  */
> +                   unsigned HOST_WIDE_INT factor
> +                     =3D const_offset_nunits / const_nunits;
> +                   vec_offset =3D vec_offsets[j / factor];
> +                   unsigned elt_offset =3D (j % factor) * const_nunits;
> +                   tree idx_type =3D TREE_TYPE (TREE_TYPE (vec_offset));
> +                   tree scale =3D size_int (gs_info.scale);
> +                   align =3D get_object_alignment (DR_REF (first_dr_info=
->dr));
> +                   tree ltype
> +                     =3D build_aligned_type (TREE_TYPE (vectype), align)=
;
> +                   for (unsigned k =3D 0; k < const_nunits; ++k)
>                       {
> -                       if (!can_vec_mask_load_store_p (
> -                             vmode, TYPE_MODE (TREE_TYPE (final_mask)), =
true,
> -                             &partial_ifn))
> -                         gcc_unreachable ();
> +                       tree boff =3D size_binop (MULT_EXPR, TYPE_SIZE (i=
dx_type),
> +                                               bitsize_int (k + elt_offs=
et));
> +                       tree idx =3D gimple_build (&stmts, BIT_FIELD_REF,
> +                                                idx_type, vec_offset,
> +                                                TYPE_SIZE (idx_type), bo=
ff);
> +                       idx =3D gimple_convert (&stmts, sizetype, idx);
> +                       idx =3D gimple_build (&stmts, MULT_EXPR, sizetype=
, idx,
> +                                           scale);
> +                       tree ptr =3D gimple_build (&stmts, PLUS_EXPR,
> +                                                TREE_TYPE (dataref_ptr),
> +                                                dataref_ptr, idx);
> +                       ptr =3D gimple_convert (&stmts, ptr_type_node, pt=
r);
> +                       tree elt =3D make_ssa_name (TREE_TYPE (vectype));
> +                       tree ref =3D build2 (MEM_REF, ltype, ptr,
> +                                          build_int_cst (ref_type, 0));
> +                       new_stmt =3D gimple_build_assign (elt, ref);
> +                       gimple_set_vuse (new_stmt,
> +                                        gimple_vuse (gsi_stmt (*gsi)));
> +                       gimple_seq_add_stmt (&stmts, new_stmt);
> +                       CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, elt=
);
>                       }
> +                   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +                   new_stmt =3D gimple_build_assign (
> +                     NULL_TREE, build_constructor (vectype, ctor_elts));
> +                   data_ref =3D NULL_TREE;
> +                   break;
> +                 }
>
> -                   if (partial_ifn =3D=3D IFN_MASK_LEN_LOAD)
> +               if (costing_p)
> +                 break;
> +
> +               align =3D known_alignment (DR_TARGET_ALIGNMENT (first_dr_=
info));
> +               if (alignment_support_scheme =3D=3D dr_aligned)
> +                 misalign =3D 0;
> +               else if (misalignment =3D=3D DR_MISALIGNMENT_UNKNOWN)
> +                 {
> +                   align
> +                     =3D dr_alignment (vect_dr_behavior (vinfo, first_dr=
_info));
> +                   misalign =3D 0;
> +                 }
> +               else
> +                 misalign =3D misalignment;
> +               if (dataref_offset =3D=3D NULL_TREE
> +                   && TREE_CODE (dataref_ptr) =3D=3D SSA_NAME)
> +                 set_ptr_info_alignment (get_ptr_info (dataref_ptr), ali=
gn,
> +                                         misalign);
> +               align =3D least_bit_hwi (misalign | align);
> +
> +               /* Compute IFN when LOOP_LENS or final_mask valid.  */
> +               machine_mode vmode =3D TYPE_MODE (vectype);
> +               machine_mode new_vmode =3D vmode;
> +               internal_fn partial_ifn =3D IFN_LAST;
> +               if (loop_lens)
> +                 {
> +                   opt_machine_mode new_ovmode
> +                     =3D get_len_load_store_mode (vmode, true, &partial_=
ifn);
> +                   new_vmode =3D new_ovmode.require ();
> +                   unsigned factor
> +                     =3D (new_ovmode =3D=3D vmode) ? 1 : GET_MODE_UNIT_S=
IZE (vmode);
> +                   final_len =3D vect_get_loop_len (loop_vinfo, gsi, loo=
p_lens,
> +                                                  vec_num * ncopies, vec=
type,
> +                                                  vec_num * j + i, facto=
r);
> +                 }
> +               else if (final_mask)
> +                 {
> +                   if (!can_vec_mask_load_store_p (
> +                         vmode, TYPE_MODE (TREE_TYPE (final_mask)), true=
,
> +                         &partial_ifn))
> +                     gcc_unreachable ();
> +                 }
> +
> +               if (partial_ifn =3D=3D IFN_MASK_LEN_LOAD)
> +                 {
> +                   if (!final_len)
>                       {
> -                       if (!final_len)
> -                         {
> -                           /* Pass VF value to 'len' argument of
> -                              MASK_LEN_LOAD if LOOP_LENS is invalid.  */
> -                           final_len
> -                             =3D size_int (TYPE_VECTOR_SUBPARTS (vectype=
));
> -                         }
> -                       if (!final_mask)
> -                         {
> -                           /* Pass all ones value to 'mask' argument of
> -                              MASK_LEN_LOAD if final_mask is invalid.  *=
/
> -                           mask_vectype =3D truth_type_for (vectype);
> -                           final_mask =3D build_minus_one_cst (mask_vect=
ype);
> -                         }
> +                       /* Pass VF value to 'len' argument of
> +                          MASK_LEN_LOAD if LOOP_LENS is invalid.  */
> +                       final_len =3D size_int (TYPE_VECTOR_SUBPARTS (vec=
type));
>                       }
> -                   if (final_len)
> +                   if (!final_mask)
>                       {
> -                       signed char biasval
> -                         =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vi=
nfo);
> -
> -                       bias =3D build_int_cst (intQI_type_node, biasval)=
;
> +                       /* Pass all ones value to 'mask' argument of
> +                          MASK_LEN_LOAD if final_mask is invalid.  */
> +                       mask_vectype =3D truth_type_for (vectype);
> +                       final_mask =3D build_minus_one_cst (mask_vectype)=
;
>                       }
> +                 }
> +               if (final_len)
> +                 {
> +                   signed char biasval
> +                     =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo)=
;
>
> -                   if (final_len)
> +                   bias =3D build_int_cst (intQI_type_node, biasval);
> +                 }
> +
> +               if (final_len)
> +                 {
> +                   tree ptr =3D build_int_cst (ref_type, align * BITS_PE=
R_UNIT);
> +                   gcall *call;
> +                   if (partial_ifn =3D=3D IFN_MASK_LEN_LOAD)
> +                     call =3D gimple_build_call_internal (IFN_MASK_LEN_L=
OAD, 5,
> +                                                        dataref_ptr, ptr=
,
> +                                                        final_mask, fina=
l_len,
> +                                                        bias);
> +                   else
> +                     call =3D gimple_build_call_internal (IFN_LEN_LOAD, =
4,
> +                                                        dataref_ptr, ptr=
,
> +                                                        final_len, bias)=
;
> +                   gimple_call_set_nothrow (call, true);
> +                   new_stmt =3D call;
> +                   data_ref =3D NULL_TREE;
> +
> +                   /* Need conversion if it's wrapped with VnQI.  */
> +                   if (vmode !=3D new_vmode)
>                       {
> -                       tree ptr
> -                         =3D build_int_cst (ref_type, align * BITS_PER_U=
NIT);
> -                       gcall *call;
> -                       if (partial_ifn =3D=3D IFN_MASK_LEN_LOAD)
> -                         call =3D gimple_build_call_internal (IFN_MASK_L=
EN_LOAD,
> -                                                            5, dataref_p=
tr,
> -                                                            ptr, final_m=
ask,
> -                                                            final_len, b=
ias);
> -                       else
> -                         call =3D gimple_build_call_internal (IFN_LEN_LO=
AD, 4,
> -                                                            dataref_ptr,=
 ptr,
> -                                                            final_len, b=
ias);
> -                       gimple_call_set_nothrow (call, true);
> -                       new_stmt =3D call;
> -                       data_ref =3D NULL_TREE;
> -
> -                       /* Need conversion if it's wrapped with VnQI.  */
> -                       if (vmode !=3D new_vmode)
> -                         {
> -                           tree new_vtype =3D build_vector_type_for_mode=
 (
> -                             unsigned_intQI_type_node, new_vmode);
> -                           tree var =3D vect_get_new_ssa_name (new_vtype=
,
> -                                                             vect_simple=
_var);
> -                           gimple_set_lhs (call, var);
> -                           vect_finish_stmt_generation (vinfo, stmt_info=
, call,
> -                                                        gsi);
> -                           tree op =3D build1 (VIEW_CONVERT_EXPR, vectyp=
e, var);
> -                           new_stmt
> -                             =3D gimple_build_assign (vec_dest,
> -                                                    VIEW_CONVERT_EXPR, o=
p);
> -                         }
> +                       tree new_vtype =3D build_vector_type_for_mode (
> +                         unsigned_intQI_type_node, new_vmode);
> +                       tree var
> +                         =3D vect_get_new_ssa_name (new_vtype, vect_simp=
le_var);
> +                       gimple_set_lhs (call, var);
> +                       vect_finish_stmt_generation (vinfo, stmt_info, ca=
ll,
> +                                                    gsi);
> +                       tree op =3D build1 (VIEW_CONVERT_EXPR, vectype, v=
ar);
> +                       new_stmt =3D gimple_build_assign (vec_dest,
> +                                                       VIEW_CONVERT_EXPR=
, op);
>                       }
> -                   else if (final_mask)
> +                 }
> +               else if (final_mask)
> +                 {
> +                   tree ptr =3D build_int_cst (ref_type, align * BITS_PE=
R_UNIT);
> +                   gcall *call =3D gimple_build_call_internal (IFN_MASK_=
LOAD, 3,
> +                                                             dataref_ptr=
, ptr,
> +                                                             final_mask)=
;
> +                   gimple_call_set_nothrow (call, true);
> +                   new_stmt =3D call;
> +                   data_ref =3D NULL_TREE;
> +                 }
> +               else
> +                 {
> +                   tree ltype =3D vectype;
> +                   tree new_vtype =3D NULL_TREE;
> +                   unsigned HOST_WIDE_INT gap =3D DR_GROUP_GAP (first_st=
mt_info);
> +                   unsigned int vect_align
> +                     =3D vect_known_alignment_in_bytes (first_dr_info, v=
ectype);
> +                   unsigned int scalar_dr_size
> +                     =3D vect_get_scalar_dr_size (first_dr_info);
> +                   /* If there's no peeling for gaps but we have a gap
> +                      with slp loads then load the lower half of the
> +                      vector only.  See get_group_load_store_type for
> +                      when we apply this optimization.  */
> +                   if (slp
> +                       && loop_vinfo
> +                       && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && g=
ap !=3D 0
> +                       && known_eq (nunits, (group_size - gap) * 2)
> +                       && known_eq (nunits, group_size)
> +                       && gap >=3D (vect_align / scalar_dr_size))
>                       {
> -                       tree ptr =3D build_int_cst (ref_type,
> -                                                 align * BITS_PER_UNIT);
> -                       gcall *call
> -                         =3D gimple_build_call_internal (IFN_MASK_LOAD, =
3,
> -                                                       dataref_ptr, ptr,
> -                                                       final_mask);
> -                       gimple_call_set_nothrow (call, true);
> -                       new_stmt =3D call;
> -                       data_ref =3D NULL_TREE;
> +                       tree half_vtype;
> +                       new_vtype
> +                         =3D vector_vector_composition_type (vectype, 2,
> +                                                           &half_vtype);
> +                       if (new_vtype !=3D NULL_TREE)
> +                         ltype =3D half_vtype;
>                       }
> +                   tree offset
> +                     =3D (dataref_offset ? dataref_offset
> +                                       : build_int_cst (ref_type, 0));
> +                   if (ltype !=3D vectype
> +                       && memory_access_type =3D=3D VMAT_CONTIGUOUS_REVE=
RSE)
> +                     {
> +                       unsigned HOST_WIDE_INT gap_offset
> +                         =3D gap * tree_to_uhwi (TYPE_SIZE_UNIT (elem_ty=
pe));
> +                       tree gapcst =3D build_int_cst (ref_type, gap_offs=
et);
> +                       offset =3D size_binop (PLUS_EXPR, offset, gapcst)=
;
> +                     }
> +                   data_ref
> +                     =3D fold_build2 (MEM_REF, ltype, dataref_ptr, offse=
t);
> +                   if (alignment_support_scheme =3D=3D dr_aligned)
> +                     ;
>                     else
> +                     TREE_TYPE (data_ref)
> +                       =3D build_aligned_type (TREE_TYPE (data_ref),
> +                                             align * BITS_PER_UNIT);
> +                   if (ltype !=3D vectype)
>                       {
> -                       tree ltype =3D vectype;
> -                       tree new_vtype =3D NULL_TREE;
> -                       unsigned HOST_WIDE_INT gap
> -                         =3D DR_GROUP_GAP (first_stmt_info);
> -                       unsigned int vect_align
> -                         =3D vect_known_alignment_in_bytes (first_dr_inf=
o,
> -                                                          vectype);
> -                       unsigned int scalar_dr_size
> -                         =3D vect_get_scalar_dr_size (first_dr_info);
> -                       /* If there's no peeling for gaps but we have a g=
ap
> -                          with slp loads then load the lower half of the
> -                          vector only.  See get_group_load_store_type fo=
r
> -                          when we apply this optimization.  */
> -                       if (slp
> -                           && loop_vinfo
> -                           && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> -                           && gap !=3D 0
> -                           && known_eq (nunits, (group_size - gap) * 2)
> -                           && known_eq (nunits, group_size)
> -                           && gap >=3D (vect_align / scalar_dr_size))
> +                       vect_copy_ref_info (data_ref,
> +                                           DR_REF (first_dr_info->dr));
> +                       tree tem =3D make_ssa_name (ltype);
> +                       new_stmt =3D gimple_build_assign (tem, data_ref);
> +                       vect_finish_stmt_generation (vinfo, stmt_info, ne=
w_stmt,
> +                                                    gsi);
> +                       data_ref =3D NULL;
> +                       vec<constructor_elt, va_gc> *v;
> +                       vec_alloc (v, 2);
> +                       if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REV=
ERSE)
>                           {
> -                           tree half_vtype;
> -                           new_vtype
> -                             =3D vector_vector_composition_type (vectype=
, 2,
> -                                                               &half_vty=
pe);
> -                           if (new_vtype !=3D NULL_TREE)
> -                             ltype =3D half_vtype;
> +                           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
> +                                                   build_zero_cst (ltype=
));
> +                           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem);
>                           }
> -                       tree offset
> -                         =3D (dataref_offset ? dataref_offset
> -                                           : build_int_cst (ref_type, 0)=
);
> -                       if (ltype !=3D vectype
> -                           && memory_access_type =3D=3D VMAT_CONTIGUOUS_=
REVERSE)
> +                       else
>                           {
> -                           unsigned HOST_WIDE_INT gap_offset
> -                             =3D gap * tree_to_uhwi (TYPE_SIZE_UNIT (ele=
m_type));
> -                           tree gapcst =3D build_int_cst (ref_type, gap_=
offset);
> -                           offset =3D size_binop (PLUS_EXPR, offset, gap=
cst);
> +                           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem);
> +                           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
> +                                                   build_zero_cst (ltype=
));
>                           }
> -                       data_ref
> -                         =3D fold_build2 (MEM_REF, ltype, dataref_ptr, o=
ffset);
> -                       if (alignment_support_scheme =3D=3D dr_aligned)
> -                         ;
> +                       gcc_assert (new_vtype !=3D NULL_TREE);
> +                       if (new_vtype =3D=3D vectype)
> +                         new_stmt =3D gimple_build_assign (
> +                           vec_dest, build_constructor (vectype, v));
>                         else
> -                         TREE_TYPE (data_ref)
> -                           =3D build_aligned_type (TREE_TYPE (data_ref),
> -                                                 align * BITS_PER_UNIT);
> -                       if (ltype !=3D vectype)
>                           {
> -                           vect_copy_ref_info (data_ref,
> -                                               DR_REF (first_dr_info->dr=
));
> -                           tree tem =3D make_ssa_name (ltype);
> -                           new_stmt =3D gimple_build_assign (tem, data_r=
ef);
> +                           tree new_vname =3D make_ssa_name (new_vtype);
> +                           new_stmt =3D gimple_build_assign (
> +                             new_vname, build_constructor (new_vtype, v)=
);
>                             vect_finish_stmt_generation (vinfo, stmt_info=
,
>                                                          new_stmt, gsi);
> -                           data_ref =3D NULL;
> -                           vec<constructor_elt, va_gc> *v;
> -                           vec_alloc (v, 2);
> -                           if (memory_access_type =3D=3D VMAT_CONTIGUOUS=
_REVERSE)
> -                             {
> -                               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
> -                                                       build_zero_cst (l=
type));
> -                               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem=
);
> -                             }
> -                           else
> -                             {
> -                               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem=
);
> -                               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
> -                                                       build_zero_cst (l=
type));
> -                             }
> -                           gcc_assert (new_vtype !=3D NULL_TREE);
> -                           if (new_vtype =3D=3D vectype)
> -                             new_stmt =3D gimple_build_assign (
> -                               vec_dest, build_constructor (vectype, v))=
;
> -                           else
> -                             {
> -                               tree new_vname =3D make_ssa_name (new_vty=
pe);
> -                               new_stmt =3D gimple_build_assign (
> -                                 new_vname, build_constructor (new_vtype=
, v));
> -                               vect_finish_stmt_generation (vinfo, stmt_=
info,
> -                                                            new_stmt, gs=
i);
> -                               new_stmt =3D gimple_build_assign (
> -                                 vec_dest, build1 (VIEW_CONVERT_EXPR, ve=
ctype,
> -                                                   new_vname));
> -                             }
> +                           new_stmt =3D gimple_build_assign (
> +                             vec_dest,
> +                             build1 (VIEW_CONVERT_EXPR, vectype, new_vna=
me));
>                           }
>                       }
> -                   break;
>                   }
> -               case dr_explicit_realign:
> -                 {
> -                   if (costing_p)
> -                     break;
> -                   tree ptr, bump;
> -
> -                   tree vs =3D size_int (TYPE_VECTOR_SUBPARTS (vectype))=
;
> +               break;
> +             }
> +           case dr_explicit_realign:
> +             {
> +               if (costing_p)
> +                 break;
> +               tree ptr, bump;
>
> -                   if (compute_in_loop)
> -                     msq =3D vect_setup_realignment (vinfo, first_stmt_i=
nfo, gsi,
> -                                                   &realignment_token,
> -                                                   dr_explicit_realign,
> -                                                   dataref_ptr, NULL);
> +               tree vs =3D size_int (TYPE_VECTOR_SUBPARTS (vectype));
>
> -                   if (TREE_CODE (dataref_ptr) =3D=3D SSA_NAME)
> -                     ptr =3D copy_ssa_name (dataref_ptr);
> -                   else
> -                     ptr =3D make_ssa_name (TREE_TYPE (dataref_ptr));
> -                   // For explicit realign the target alignment should b=
e
> -                   // known at compile time.
> -                   unsigned HOST_WIDE_INT align =3D
> -                     DR_TARGET_ALIGNMENT (first_dr_info).to_constant ();
> -                   new_stmt =3D gimple_build_assign
> -                                (ptr, BIT_AND_EXPR, dataref_ptr,
> -                                 build_int_cst
> -                                 (TREE_TYPE (dataref_ptr),
> -                                  -(HOST_WIDE_INT) align));
> -                   vect_finish_stmt_generation (vinfo, stmt_info,
> -                                                new_stmt, gsi);
> -                   data_ref
> -                     =3D build2 (MEM_REF, vectype, ptr,
> -                               build_int_cst (ref_type, 0));
> -                   vect_copy_ref_info (data_ref, DR_REF (first_dr_info->=
dr));
> -                   vec_dest =3D vect_create_destination_var (scalar_dest=
,
> -                                                           vectype);
> -                   new_stmt =3D gimple_build_assign (vec_dest, data_ref)=
;
> -                   new_temp =3D make_ssa_name (vec_dest, new_stmt);
> -                   gimple_assign_set_lhs (new_stmt, new_temp);
> -                   gimple_move_vops (new_stmt, stmt_info->stmt);
> -                   vect_finish_stmt_generation (vinfo, stmt_info,
> -                                                new_stmt, gsi);
> -                   msq =3D new_temp;
> -
> -                   bump =3D size_binop (MULT_EXPR, vs,
> -                                      TYPE_SIZE_UNIT (elem_type));
> -                   bump =3D size_binop (MINUS_EXPR, bump, size_one_node)=
;
> -                   ptr =3D bump_vector_ptr (vinfo, dataref_ptr, NULL, gs=
i,
> -                                          stmt_info, bump);
> -                   new_stmt =3D gimple_build_assign
> -                                (NULL_TREE, BIT_AND_EXPR, ptr,
> -                                 build_int_cst
> -                                 (TREE_TYPE (ptr), -(HOST_WIDE_INT) alig=
n));
> -                   if (TREE_CODE (ptr) =3D=3D SSA_NAME)
> -                     ptr =3D copy_ssa_name (ptr, new_stmt);
> -                   else
> -                     ptr =3D make_ssa_name (TREE_TYPE (ptr), new_stmt);
> -                   gimple_assign_set_lhs (new_stmt, ptr);
> -                   vect_finish_stmt_generation (vinfo, stmt_info,
> -                                                new_stmt, gsi);
> -                   data_ref
> -                     =3D build2 (MEM_REF, vectype, ptr,
> -                               build_int_cst (ref_type, 0));
> -                   break;
> -                 }
> -               case dr_explicit_realign_optimized:
> -                 {
> -                   if (costing_p)
> -                     break;
> -                   if (TREE_CODE (dataref_ptr) =3D=3D SSA_NAME)
> -                     new_temp =3D copy_ssa_name (dataref_ptr);
> -                   else
> -                     new_temp =3D make_ssa_name (TREE_TYPE (dataref_ptr)=
);
> -                   // We should only be doing this if we know the target
> -                   // alignment at compile time.
> -                   unsigned HOST_WIDE_INT align =3D
> -                     DR_TARGET_ALIGNMENT (first_dr_info).to_constant ();
> -                   new_stmt =3D gimple_build_assign
> -                     (new_temp, BIT_AND_EXPR, dataref_ptr,
> -                      build_int_cst (TREE_TYPE (dataref_ptr),
> -                                    -(HOST_WIDE_INT) align));
> -                   vect_finish_stmt_generation (vinfo, stmt_info,
> -                                                new_stmt, gsi);
> -                   data_ref
> -                     =3D build2 (MEM_REF, vectype, new_temp,
> -                               build_int_cst (ref_type, 0));
> -                   break;
> -                 }
> -               default:
> -                 gcc_unreachable ();
> -               }
> +               if (compute_in_loop)
> +                 msq =3D vect_setup_realignment (vinfo, first_stmt_info,=
 gsi,
> +                                               &realignment_token,
> +                                               dr_explicit_realign,
> +                                               dataref_ptr, NULL);
> +
> +               if (TREE_CODE (dataref_ptr) =3D=3D SSA_NAME)
> +                 ptr =3D copy_ssa_name (dataref_ptr);
> +               else
> +                 ptr =3D make_ssa_name (TREE_TYPE (dataref_ptr));
> +               // For explicit realign the target alignment should be
> +               // known at compile time.
> +               unsigned HOST_WIDE_INT align
> +                 =3D DR_TARGET_ALIGNMENT (first_dr_info).to_constant ();
> +               new_stmt =3D gimple_build_assign (
> +                 ptr, BIT_AND_EXPR, dataref_ptr,
> +                 build_int_cst (TREE_TYPE (dataref_ptr),
> +                                -(HOST_WIDE_INT) align));
> +               vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, =
gsi);
> +               data_ref
> +                 =3D build2 (MEM_REF, vectype, ptr, build_int_cst (ref_t=
ype, 0));
> +               vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr))=
;
> +               vec_dest =3D vect_create_destination_var (scalar_dest, ve=
ctype);
> +               new_stmt =3D gimple_build_assign (vec_dest, data_ref);
> +               new_temp =3D make_ssa_name (vec_dest, new_stmt);
> +               gimple_assign_set_lhs (new_stmt, new_temp);
> +               gimple_move_vops (new_stmt, stmt_info->stmt);
> +               vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, =
gsi);
> +               msq =3D new_temp;
> +
> +               bump =3D size_binop (MULT_EXPR, vs, TYPE_SIZE_UNIT (elem_=
type));
> +               bump =3D size_binop (MINUS_EXPR, bump, size_one_node);
> +               ptr =3D bump_vector_ptr (vinfo, dataref_ptr, NULL, gsi, s=
tmt_info,
> +                                      bump);
> +               new_stmt =3D gimple_build_assign (
> +                 NULL_TREE, BIT_AND_EXPR, ptr,
> +                 build_int_cst (TREE_TYPE (ptr), -(HOST_WIDE_INT) align)=
);
> +               if (TREE_CODE (ptr) =3D=3D SSA_NAME)
> +                 ptr =3D copy_ssa_name (ptr, new_stmt);
> +               else
> +                 ptr =3D make_ssa_name (TREE_TYPE (ptr), new_stmt);
> +               gimple_assign_set_lhs (new_stmt, ptr);
> +               vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, =
gsi);
> +               data_ref
> +                 =3D build2 (MEM_REF, vectype, ptr, build_int_cst (ref_t=
ype, 0));
> +               break;
> +             }
> +           case dr_explicit_realign_optimized:
> +             {
> +               if (costing_p)
> +                 break;
> +               if (TREE_CODE (dataref_ptr) =3D=3D SSA_NAME)
> +                 new_temp =3D copy_ssa_name (dataref_ptr);
> +               else
> +                 new_temp =3D make_ssa_name (TREE_TYPE (dataref_ptr));
> +               // We should only be doing this if we know the target
> +               // alignment at compile time.
> +               unsigned HOST_WIDE_INT align
> +                 =3D DR_TARGET_ALIGNMENT (first_dr_info).to_constant ();
> +               new_stmt =3D gimple_build_assign (
> +                 new_temp, BIT_AND_EXPR, dataref_ptr,
> +                 build_int_cst (TREE_TYPE (dataref_ptr),
> +                                -(HOST_WIDE_INT) align));
> +               vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, =
gsi);
> +               data_ref =3D build2 (MEM_REF, vectype, new_temp,
> +                                  build_int_cst (ref_type, 0));
> +               break;
> +             }
> +           default:
> +             gcc_unreachable ();
> +           }
>
> -             /* One common place to cost the above vect load for differe=
nt
> -                alignment support schemes.  */
> -             if (costing_p)
> -               {
> -                 /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we
> -                    only need to take care of the first stmt, whose
> -                    stmt_info is first_stmt_info, vec_num iterating on i=
t
> -                    will cover the cost for the remaining, it's consiste=
nt
> -                    with transforming.  For the prologue cost for realig=
n,
> -                    we only need to count it once for the whole group.  =
*/
> -                 bool first_stmt_info_p =3D first_stmt_info =3D=3D stmt_=
info;
> -                 bool add_realign_cost =3D first_stmt_info_p && i =3D=3D=
 0;
> -                 if (memory_access_type =3D=3D VMAT_CONTIGUOUS
> -                     || memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERS=
E
> -                     || (memory_access_type =3D=3D VMAT_CONTIGUOUS_PERMU=
TE
> -                         && (!grouped_load || first_stmt_info_p)))
> -                   vect_get_load_cost (vinfo, stmt_info, 1,
> -                                       alignment_support_scheme, misalig=
nment,
> -                                       add_realign_cost, &inside_cost,
> -                                       &prologue_cost, cost_vec, cost_ve=
c,
> -                                       true);
> -               }
> -             else
> +         /* One common place to cost the above vect load for different
> +            alignment support schemes.  */
> +         if (costing_p)
> +           {
> +             /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we
> +                only need to take care of the first stmt, whose
> +                stmt_info is first_stmt_info, vec_num iterating on it
> +                will cover the cost for the remaining, it's consistent
> +                with transforming.  For the prologue cost for realign,
> +                we only need to count it once for the whole group.  */
> +             bool first_stmt_info_p =3D first_stmt_info =3D=3D stmt_info=
;
> +             bool add_realign_cost =3D first_stmt_info_p && i =3D=3D 0;
> +             if (memory_access_type =3D=3D VMAT_CONTIGUOUS
> +                 || memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE
> +                 || (memory_access_type =3D=3D VMAT_CONTIGUOUS_PERMUTE
> +                     && (!grouped_load || first_stmt_info_p)))
> +               vect_get_load_cost (vinfo, stmt_info, 1,
> +                                   alignment_support_scheme, misalignmen=
t,
> +                                   add_realign_cost, &inside_cost,
> +                                   &prologue_cost, cost_vec, cost_vec, t=
rue);
> +           }
> +         else
> +           {
> +             vec_dest =3D vect_create_destination_var (scalar_dest, vect=
ype);
> +             /* DATA_REF is null if we've already built the statement.  =
*/
> +             if (data_ref)
>                 {
> -                 vec_dest =3D vect_create_destination_var (scalar_dest, =
vectype);
> -                 /* DATA_REF is null if we've already built the statemen=
t.  */
> -                 if (data_ref)
> -                   {
> -                     vect_copy_ref_info (data_ref, DR_REF (first_dr_info=
->dr));
> -                     new_stmt =3D gimple_build_assign (vec_dest, data_re=
f);
> -                   }
> -                 new_temp =3D make_ssa_name (vec_dest, new_stmt);
> -                 gimple_set_lhs (new_stmt, new_temp);
> -                 vect_finish_stmt_generation (vinfo, stmt_info, new_stmt=
, gsi);
> +                 vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr=
));
> +                 new_stmt =3D gimple_build_assign (vec_dest, data_ref);
>                 }
> +             new_temp =3D make_ssa_name (vec_dest, new_stmt);
> +             gimple_set_lhs (new_stmt, new_temp);
> +             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gs=
i);
> +           }
>
> -             /* 3. Handle explicit realignment if necessary/supported.
> -                Create in loop:
> -                  vec_dest =3D realign_load (msq, lsq, realignment_token=
)  */
> -             if (!costing_p
> -                 && (alignment_support_scheme =3D=3D dr_explicit_realign=
_optimized
> -                     || alignment_support_scheme =3D=3D dr_explicit_real=
ign))
> -               {
> -                 lsq =3D gimple_assign_lhs (new_stmt);
> -                 if (!realignment_token)
> -                   realignment_token =3D dataref_ptr;
> -                 vec_dest =3D vect_create_destination_var (scalar_dest, =
vectype);
> -                 new_stmt =3D gimple_build_assign (vec_dest, REALIGN_LOA=
D_EXPR,
> -                                                 msq, lsq, realignment_t=
oken);
> -                 new_temp =3D make_ssa_name (vec_dest, new_stmt);
> -                 gimple_assign_set_lhs (new_stmt, new_temp);
> -                 vect_finish_stmt_generation (vinfo, stmt_info, new_stmt=
, gsi);
> +         /* 3. Handle explicit realignment if necessary/supported.
> +            Create in loop:
> +              vec_dest =3D realign_load (msq, lsq, realignment_token)  *=
/
> +         if (!costing_p
> +             && (alignment_support_scheme =3D=3D dr_explicit_realign_opt=
imized
> +                 || alignment_support_scheme =3D=3D dr_explicit_realign)=
)
> +           {
> +             lsq =3D gimple_assign_lhs (new_stmt);
> +             if (!realignment_token)
> +               realignment_token =3D dataref_ptr;
> +             vec_dest =3D vect_create_destination_var (scalar_dest, vect=
ype);
> +             new_stmt =3D gimple_build_assign (vec_dest, REALIGN_LOAD_EX=
PR, msq,
> +                                             lsq, realignment_token);
> +             new_temp =3D make_ssa_name (vec_dest, new_stmt);
> +             gimple_assign_set_lhs (new_stmt, new_temp);
> +             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gs=
i);
>
> -                 if (alignment_support_scheme =3D=3D dr_explicit_realign=
_optimized)
> -                   {
> -                     gcc_assert (phi);
> -                     if (i =3D=3D vec_num - 1 && j =3D=3D ncopies - 1)
> -                       add_phi_arg (phi, lsq,
> -                                    loop_latch_edge (containing_loop),
> -                                    UNKNOWN_LOCATION);
> -                     msq =3D lsq;
> -                   }
> +             if (alignment_support_scheme =3D=3D dr_explicit_realign_opt=
imized)
> +               {
> +                 gcc_assert (phi);
> +                 if (i =3D=3D vec_num - 1 && j =3D=3D ncopies - 1)
> +                   add_phi_arg (phi, lsq, loop_latch_edge (containing_lo=
op),
> +                                UNKNOWN_LOCATION);
> +                 msq =3D lsq;
>                 }
> +           }
>
> -             if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE)
> +         if (memory_access_type =3D=3D VMAT_CONTIGUOUS_REVERSE)
> +           {
> +             if (costing_p)
> +               inside_cost =3D record_stmt_cost (cost_vec, 1, vec_perm,
> +                                               stmt_info, 0, vect_body);
> +             else
>                 {
> -                 if (costing_p)
> -                   inside_cost =3D record_stmt_cost (cost_vec, 1, vec_pe=
rm,
> -                                                   stmt_info, 0, vect_bo=
dy);
> -                 else
> -                   {
> -                     tree perm_mask =3D perm_mask_for_reverse (vectype);
> -                     new_temp
> -                       =3D permute_vec_elements (vinfo, new_temp, new_te=
mp,
> -                                               perm_mask, stmt_info, gsi=
);
> -                     new_stmt =3D SSA_NAME_DEF_STMT (new_temp);
> -                   }
> +                 tree perm_mask =3D perm_mask_for_reverse (vectype);
> +                 new_temp =3D permute_vec_elements (vinfo, new_temp, new=
_temp,
> +                                                  perm_mask, stmt_info, =
gsi);
> +                 new_stmt =3D SSA_NAME_DEF_STMT (new_temp);
>                 }
> +           }
>
> -             /* Collect vector loads and later create their permutation =
in
> -                vect_transform_grouped_load ().  */
> -             if (!costing_p && (grouped_load || slp_perm))
> -               dr_chain.quick_push (new_temp);
> +         /* Collect vector loads and later create their permutation in
> +            vect_transform_grouped_load ().  */
> +         if (!costing_p && (grouped_load || slp_perm))
> +           dr_chain.quick_push (new_temp);
>
> -             /* Store vector loads in the corresponding SLP_NODE.  */
> -             if (!costing_p && slp && !slp_perm)
> -               slp_node->push_vec_def (new_stmt);
> +         /* Store vector loads in the corresponding SLP_NODE.  */
> +         if (!costing_p && slp && !slp_perm)
> +           slp_node->push_vec_def (new_stmt);
>
> -             /* With SLP permutation we load the gaps as well, without
> -                we need to skip the gaps after we manage to fully load
> -                all elements.  group_gap_adj is DR_GROUP_SIZE here.  */
> -             group_elt +=3D nunits;
> -             if (!costing_p
> -                 && maybe_ne (group_gap_adj, 0U)
> -                 && !slp_perm
> -                 && known_eq (group_elt, group_size - group_gap_adj))
> -               {
> -                 poly_wide_int bump_val
> -                   =3D (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
> -                      * group_gap_adj);
> -                 if (tree_int_cst_sgn
> -                       (vect_dr_behavior (vinfo, dr_info)->step) =3D=3D =
-1)
> -                   bump_val =3D -bump_val;
> -                 tree bump =3D wide_int_to_tree (sizetype, bump_val);
> -                 dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, pt=
r_incr,
> -                                                gsi, stmt_info, bump);
> -                 group_elt =3D 0;
> -               }
> -           }
> -         /* Bump the vector pointer to account for a gap or for excess
> -            elements loaded for a permuted SLP load.  */
> +         /* With SLP permutation we load the gaps as well, without
> +            we need to skip the gaps after we manage to fully load
> +            all elements.  group_gap_adj is DR_GROUP_SIZE here.  */
> +         group_elt +=3D nunits;
>           if (!costing_p
>               && maybe_ne (group_gap_adj, 0U)
> -             && slp_perm)
> +             && !slp_perm
> +             && known_eq (group_elt, group_size - group_gap_adj))
>             {
>               poly_wide_int bump_val
> -               =3D (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
> -                  * group_gap_adj);
> -             if (tree_int_cst_sgn
> -                   (vect_dr_behavior (vinfo, dr_info)->step) =3D=3D -1)
> +               =3D (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) * group_gap=
_adj);
> +             if (tree_int_cst_sgn (vect_dr_behavior (vinfo, dr_info)->st=
ep)
> +                 =3D=3D -1)
>                 bump_val =3D -bump_val;
>               tree bump =3D wide_int_to_tree (sizetype, bump_val);
>               dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_in=
cr, gsi,
>                                              stmt_info, bump);
> +             group_elt =3D 0;
>             }
>         }
> +      /* Bump the vector pointer to account for a gap or for excess
> +        elements loaded for a permuted SLP load.  */
> +      if (!costing_p
> +         && maybe_ne (group_gap_adj, 0U)
> +         && slp_perm)
> +       {
> +         poly_wide_int bump_val
> +           =3D (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) * group_gap_adj=
);
> +         if (tree_int_cst_sgn (vect_dr_behavior (vinfo, dr_info)->step) =
=3D=3D -1)
> +           bump_val =3D -bump_val;
> +         tree bump =3D wide_int_to_tree (sizetype, bump_val);
> +         dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, =
gsi,
> +                                        stmt_info, bump);
> +       }
>
>        if (slp && !slp_perm)
>         continue;
> @@ -11120,39 +11117,36 @@ vectorizable_load (vec_info *vinfo,
>             }
>         }
>        else
> -        {
> -          if (grouped_load)
> -           {
> -             if (memory_access_type !=3D VMAT_LOAD_STORE_LANES)
> +       {
> +         if (grouped_load)
> +           {
> +             gcc_assert (memory_access_type =3D=3D VMAT_CONTIGUOUS_PERMU=
TE);
> +             /* We assume that the cost of a single load-lanes instructi=
on
> +                is equivalent to the cost of DR_GROUP_SIZE separate load=
s.
> +                If a grouped access is instead being provided by a
> +                load-and-permute operation, include the cost of the
> +                permutes.  */
> +             if (costing_p && first_stmt_info =3D=3D stmt_info)
>                 {
> -                 gcc_assert (memory_access_type =3D=3D VMAT_CONTIGUOUS_P=
ERMUTE);
> -                 /* We assume that the cost of a single load-lanes instr=
uction
> -                    is equivalent to the cost of DR_GROUP_SIZE separate =
loads.
> -                    If a grouped access is instead being provided by a
> -                    load-and-permute operation, include the cost of the
> -                    permutes.  */
> -                 if (costing_p && first_stmt_info =3D=3D stmt_info)
> -                   {
> -                     /* Uses an even and odd extract operations or shuff=
le
> -                        operations for each needed permute.  */
> -                     int group_size =3D DR_GROUP_SIZE (first_stmt_info);
> -                     int nstmts =3D ceil_log2 (group_size) * group_size;
> -                     inside_cost
> -                       +=3D record_stmt_cost (cost_vec, nstmts, vec_perm=
,
> -                                            stmt_info, 0, vect_body);
> +                 /* Uses an even and odd extract operations or shuffle
> +                    operations for each needed permute.  */
> +                 int group_size =3D DR_GROUP_SIZE (first_stmt_info);
> +                 int nstmts =3D ceil_log2 (group_size) * group_size;
> +                 inside_cost +=3D record_stmt_cost (cost_vec, nstmts, ve=
c_perm,
> +                                                  stmt_info, 0, vect_bod=
y);
>
> -                     if (dump_enabled_p ())
> -                       dump_printf_loc (
> -                         MSG_NOTE, vect_location,
> -                         "vect_model_load_cost: strided group_size =3D %=
d .\n",
> -                         group_size);
> -                   }
> -                 else if (!costing_p)
> -                   vect_transform_grouped_load (vinfo, stmt_info, dr_cha=
in,
> -                                                group_size, gsi);
> +                 if (dump_enabled_p ())
> +                   dump_printf_loc (MSG_NOTE, vect_location,
> +                                    "vect_model_load_cost:"
> +                                    "strided group_size =3D %d .\n",
> +                                    group_size);
> +               }
> +             else if (!costing_p)
> +               {
> +                 vect_transform_grouped_load (vinfo, stmt_info, dr_chain=
,
> +                                              group_size, gsi);
> +                 *vec_stmt =3D STMT_VINFO_VEC_STMTS (stmt_info)[0];
>                 }
> -             if (!costing_p)
> -               *vec_stmt =3D STMT_VINFO_VEC_STMTS (stmt_info)[0];
>             }
>           else if (!costing_p)
>             STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> @@ -11166,7 +11160,8 @@ vectorizable_load (vec_info *vinfo,
>      {
>        gcc_assert (memory_access_type !=3D VMAT_INVARIANT
>                   && memory_access_type !=3D VMAT_ELEMENTWISE
> -                 && memory_access_type !=3D VMAT_STRIDED_SLP);
> +                 && memory_access_type !=3D VMAT_STRIDED_SLP
> +                 && memory_access_type !=3D VMAT_LOAD_STORE_LANES);
>        if (dump_enabled_p ())
>         dump_printf_loc (MSG_NOTE, vect_location,
>                          "vect_model_load_cost: inside_cost =3D %u, "
> --
> 2.31.1