From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=jxh/=CW=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c])
	by sourceware.org (Postfix) with ESMTPS id C16F03858D35
	for <gcc-patches@gcc.gnu.org>; Tue,  4 Jul 2023 11:18:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C16F03858D35
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id F0772227A3;
	Tue,  4 Jul 2023 11:17:59 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1688469479; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Ion2RPLS8h4CPVPV+/ywvjSV6uBm3Xj4DF8QcwqxnwQ=;
	b=D5qTy8bCaCapn9ZeltRoQgSlkyj/OWnUGKIibAohd74y3h/a+VYCpj6cwr0i8QZ/3gaIvb
	fJzgdTBJ3AU34IFuPH4ScmtZP1Atuew0pJFMteZTEDIDFhyM1H4hwERDLxIOBTFaB5bCcP
	bByqp46tqkMa/DX32BuBG81TLHMmxqs=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1688469479;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Ion2RPLS8h4CPVPV+/ywvjSV6uBm3Xj4DF8QcwqxnwQ=;
	b=+uejT+QZbcnLD5UQxetz/e0H7GoXdSlMV316z6FQBoas8Es5mPo+Wi7/gdGpO5/Ub/t/Ql
	74ZsPxuL71zzbqAg==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id DBD212C141;
	Tue,  4 Jul 2023 11:17:59 +0000 (UTC)
Date: Tue, 4 Jul 2023 11:17:59 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
cc: gcc-patches <gcc-patches@gcc.gnu.org>, 
    "richard.sandiford" <richard.sandiford@arm.com>
Subject: Re: Re: [PATCH V3] VECT: Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE
 into vectorizer
In-Reply-To: <ABE1FE4526703BD9+2023070419130418362656@rivai.ai>
Message-ID: <nycvar.YFH.7.77.849.2307041116100.4723@jbgna.fhfr.qr>
References: <20230704095406.3518145-1-juzhe.zhong@rivai.ai>,  <nycvar.YFH.7.77.849.2307041100090.4723@jbgna.fhfr.qr> <ABE1FE4526703BD9+2023070419130418362656@rivai.ai>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, 4 Jul 2023, juzhe.zhong@rivai.ai wrote:

> Hi, Richi.
> 
> >> Eh, it's obvious that we should have the "vectorized" form
> >> also in the 'scalar' variant.  If you think there's no reasonable
> >> way to add a value for len or bias then instead re-order the
> >> arguments so 'mask' comes first and the len/bias pair last.
> 
> I found this patch is buggy when I am testing LEN_MAS_GATHER_LOAD.
> And reoder 'mask' comes first can not help.
> 
> Here, we have 'GATHER_LOAD' and 'MASK_GATHER_LOAD',
> For GATHER_LOAD ((sizetype) src_18(D), _6, 1, 0);
> I change it into:
> 
> LEN_MASK_GATHER_LOAD ((sizetype) src_18(D), _6, 1, 0);
> In this situation,  internal_fn_mask_index 
> should return -1.
> 
> Wheras, MASK_GATHER_LOAD ((sizetype) _56, _8, 1, 0, _33);
> I change it into LEN_MASK_GATHER_LOAD ((sizetype) _56, _8, 1, 0, _33);
> it should return the index = 4.
> 
> I can't differentiate them by only using LEN_MASK_GATHER_LOAD.
> Could I revise internal_fn_mask_index 
> as follows ?

No, please adjust the gather pattern recognition to produce either
appropriate LEN_ variant IFNs or simply keep only the unconditional
and conditional mask variants from patterns but code generate
the len_ variants.  I don't really see what the problem is.
Maybe you fail to specify the appropriate ifn when you inspect
the scalar internal fn call?

> int
> internal_fn_mask_index (internal_fn fn, int nargs)
> {
>   switch (fn)
>     {
>     case IFN_MASK_LOAD:
>     case IFN_MASK_LOAD_LANES:
>     case IFN_MASK_STORE:
>     case IFN_MASK_STORE_LANES:
>       return 2;
> 
>     case IFN_MASK_GATHER_LOAD:
>     case IFN_MASK_SCATTER_STORE:
>     case IFN_LEN_MASK_LOAD:
>     case IFN_LEN_MASK_STORE:
>       return 4;
> 
>     /* LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE are different
>        on arguments before and after vectorized.
> 
>          Before vectorized:
>            LEN_MASK_GATHER_LOAD (ptr, align, offset, mask);
> 
>          After vectorized:
>            LEN_MASK_GATHER_LOAD (ptr, align, offset, len, bias, mask);
>     */
>     case IFN_LEN_MASK_GATHER_LOAD:
>     case IFN_LEN_MASK_SCATTER_STORE:
>       return nargs == 4 ? -1 : nargs == 5 ? 4 : 6;
> 
>     default:
>       return (conditional_internal_fn_code (fn) != ERROR_MARK
>               || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
>     }
> }
> 
> 
> Thanks.
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-04 19:05
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH V3] VECT: Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer
> On Tue, 4 Jul 2023, juzhe.zhong@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > 
> > Hi, Richard and Richi.
> > 
> > Address comments from Richard.
> > 
> > Make gs_info.ifn = LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE.
> > 
> > Since:
> >     /* LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE are different
> >        on arguments before and after vectorized.
> > 
> > Before vectorized:
> >    LEN_MASK_GATHER_LOAD (ptr, align, offset, mask);
> > 
> > After vectorized:
> >    LEN_MASK_GATHER_LOAD (ptr, align, offset, len, bias, mask);
> >     */
> > 
> > I add "vectorized_p" default argument into internal_fn_mask_index.
> > So that we could simplify the codes.
>  
> Eh, it's obvious that we should have the "vectorized" form
> also in the 'scalar' variant.  If you think there's no reasonable
> way to add a value for len or bias then instead re-order the
> arguments so 'mask' comes first and the len/bias pair last.
>  
> But IMHO "any" len/bias value should do here.
>  
> The rest looks OK now.
>  
> Thanks,
> Richard.
>  
> > The len_mask_gather_load/len_mask_scatter_store patterns have been added.
> > Now, this patch applies them into vectorizer.
> > 
> > Here is the example:
> > 
> > void
> > f (int *restrict a,
> >    int *restrict b, int n,
> >    int base, int step,
> >    int *restrict cond)
> > {
> >   for (int i = 0; i < n; ++i)
> >     {
> >       if (cond[i])
> >         a[i * 4] = b[i];
> >     }
> > }
> > 
> > Gimple IR:
> > 
> >   <bb 3> [local count: 105119324]:
> >   _58 = (unsigned long) n_13(D);
> > 
> >   <bb 4> [local count: 630715945]:
> >   # vectp_cond.7_45 = PHI <vectp_cond.7_46(4), cond_14(D)(3)>
> >   # vectp_b.11_51 = PHI <vectp_b.11_52(4), b_15(D)(3)>
> >   # vectp_a.14_55 = PHI <vectp_a.14_56(4), a_16(D)(3)>
> >   # ivtmp_59 = PHI <ivtmp_60(4), _58(3)>
> >   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [2, 2]);
> >   ivtmp_44 = _61 * 4;
> >   vect__4.9_47 = .LEN_MASK_LOAD (vectp_cond.7_45, 32B, _61, 0, { -1, ... });
> >   mask__24.10_49 = vect__4.9_47 != { 0, ... };
> >   vect__8.13_53 = .LEN_MASK_LOAD (vectp_b.11_51, 32B, _61, 0, mask__24.10_49);
> >   ivtmp_54 = _61 * 16;
> >   .LEN_MASK_SCATTER_STORE (vectp_a.14_55, { 0, 16, 32, ... }, 1, vect__8.13_53, _61, 0, mask__24.10_49);
> >   vectp_cond.7_46 = vectp_cond.7_45 + ivtmp_44;
> >   vectp_b.11_52 = vectp_b.11_51 + ivtmp_44;
> >   vectp_a.14_56 = vectp_a.14_55 + ivtmp_54;
> >   ivtmp_60 = ivtmp_59 - _61;
> >   if (ivtmp_60 != 0)
> >     goto <bb 4>; [83.33%]
> >   else
> >     goto <bb 5>; [16.67%]
> > 
> > gcc/ChangeLog:
> > 
> >         * internal-fn.cc (internal_fn_mask_index): Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer.
> >         * internal-fn.h (internal_fn_mask_index): Ditto.
> >         * optabs-query.cc (supports_vec_gather_load_p): Ditto.
> >         (supports_vec_scatter_store_p): Ditto.
> >         * tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto.
> >         * tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p): Ditto.
> >         (check_load_store_for_partial_vectors): Ditto.
> >         (vect_get_strided_load_store_ops): Ditto.
> >         (vectorizable_store): Ditto.
> >         (vectorizable_load): Ditto.
> > 
> > ---
> >  gcc/internal-fn.cc         |  16 ++++-
> >  gcc/internal-fn.h          |   2 +-
> >  gcc/optabs-query.cc        |   2 +
> >  gcc/tree-vect-data-refs.cc |  18 ++++-
> >  gcc/tree-vect-stmts.cc     | 135 +++++++++++++++++++++++++++++++------
> >  5 files changed, 150 insertions(+), 23 deletions(-)
> > 
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 303df102d81..2c78c870de8 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4483,7 +4483,7 @@ internal_fn_len_index (internal_fn fn)
> >     otherwise return -1.  */
> >  
> >  int
> > -internal_fn_mask_index (internal_fn fn)
> > +internal_fn_mask_index (internal_fn fn, bool vectoried_p)
> >  {
> >    switch (fn)
> >      {
> > @@ -4499,9 +4499,21 @@ internal_fn_mask_index (internal_fn fn)
> >      case IFN_LEN_MASK_STORE:
> >        return 4;
> >  
> > +    /* LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE are different
> > +       on arguments before and after vectorized.
> > +
> > + Before vectorized:
> > +    LEN_MASK_GATHER_LOAD (ptr, align, offset, mask);
> > +
> > + After vectorized:
> > +    LEN_MASK_GATHER_LOAD (ptr, align, offset, len, bias, mask);
> > +    */
> >      case IFN_LEN_MASK_GATHER_LOAD:
> >      case IFN_LEN_MASK_SCATTER_STORE:
> > -      return 6;
> > +      if (vectoried_p)
> > + return 6;
> > +      else
> > + return 4;
> >  
> >      default:
> >        return (conditional_internal_fn_code (fn) != ERROR_MARK
> > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> > index 4234bbfed87..e9168c16297 100644
> > --- a/gcc/internal-fn.h
> > +++ b/gcc/internal-fn.h
> > @@ -233,7 +233,7 @@ extern bool can_interpret_as_conditional_op_p (gimple *, tree *,
> >  extern bool internal_load_fn_p (internal_fn);
> >  extern bool internal_store_fn_p (internal_fn);
> >  extern bool internal_gather_scatter_fn_p (internal_fn);
> > -extern int internal_fn_mask_index (internal_fn);
> > +extern int internal_fn_mask_index (internal_fn, bool = true);
> >  extern int internal_fn_len_index (internal_fn);
> >  extern int internal_fn_stored_value_index (internal_fn);
> >  extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
> > diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
> > index 2fdd0d34354..bf1f484e874 100644
> > --- a/gcc/optabs-query.cc
> > +++ b/gcc/optabs-query.cc
> > @@ -676,6 +676,7 @@ supports_vec_gather_load_p (machine_mode mode)
> >      this_fn_optabs->supports_vec_gather_load[mode]
> >        = (supports_vec_convert_optab_p (gather_load_optab, mode)
> >  || supports_vec_convert_optab_p (mask_gather_load_optab, mode)
> > + || supports_vec_convert_optab_p (len_mask_gather_load_optab, mode)
> >  ? 1 : -1);
> >  
> >    return this_fn_optabs->supports_vec_gather_load[mode] > 0;
> > @@ -692,6 +693,7 @@ supports_vec_scatter_store_p (machine_mode mode)
> >      this_fn_optabs->supports_vec_scatter_store[mode]
> >        = (supports_vec_convert_optab_p (scatter_store_optab, mode)
> >  || supports_vec_convert_optab_p (mask_scatter_store_optab, mode)
> > + || supports_vec_convert_optab_p (len_mask_scatter_store_optab, mode)
> >  ? 1 : -1);
> >  
> >    return this_fn_optabs->supports_vec_scatter_store[mode] > 0;
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index ebe93832b1e..ab2af103cb4 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -3873,16 +3873,24 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p,
> >      return false;
> >  
> >    /* Work out which function we need.  */
> > -  internal_fn ifn, alt_ifn;
> > +  internal_fn ifn, alt_ifn, alt_ifn2;
> >    if (read_p)
> >      {
> >        ifn = masked_p ? IFN_MASK_GATHER_LOAD : IFN_GATHER_LOAD;
> >        alt_ifn = IFN_MASK_GATHER_LOAD;
> > +      /* When target supports LEN_MASK_GATHER_LOAD, we always
> > + use LEN_MASK_GATHER_LOAD regardless whether len and
> > + mask are valid or not.  */
> > +      alt_ifn2 = IFN_LEN_MASK_GATHER_LOAD;
> >      }
> >    else
> >      {
> >        ifn = masked_p ? IFN_MASK_SCATTER_STORE : IFN_SCATTER_STORE;
> >        alt_ifn = IFN_MASK_SCATTER_STORE;
> > +      /* When target supports LEN_MASK_SCATTER_STORE, we always
> > + use LEN_MASK_SCATTER_STORE regardless whether len and
> > + mask are valid or not.  */
> > +      alt_ifn2 = IFN_LEN_MASK_SCATTER_STORE;
> >      }
> >  
> >    for (;;)
> > @@ -3909,6 +3917,14 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p,
> >    *offset_vectype_out = offset_vectype;
> >    return true;
> >  }
> > +      else if (internal_gather_scatter_fn_supported_p (alt_ifn2, vectype,
> > +        memory_type,
> > +        offset_vectype, scale))
> > + {
> > +   *ifn_out = alt_ifn2;
> > +   *offset_vectype_out = offset_vectype;
> > +   return true;
> > + }
> >  
> >        if (TYPE_PRECISION (offset_type) >= POINTER_SIZE
> >    && TYPE_PRECISION (offset_type) >= element_bits)
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index a0c39268bf0..33ec33f8b8d 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -433,7 +433,7 @@ exist_non_indexing_operands_for_use_p (tree use, stmt_vec_info stmt_info)
> >        if (call && gimple_call_internal_p (call))
> >  {
> >    internal_fn ifn = gimple_call_internal_fn (call);
> > -   int mask_index = internal_fn_mask_index (ifn);
> > +   int mask_index = internal_fn_mask_index (ifn, false);
> >    if (mask_index >= 0
> >        && use == gimple_call_arg (call, mask_index))
> >      return true;
> > @@ -1771,6 +1771,18 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype,
> >     gs_info->offset_vectype,
> >     gs_info->scale))
> >  {
> > +   ifn = (is_load
> > + ? IFN_LEN_MASK_GATHER_LOAD
> > + : IFN_LEN_MASK_SCATTER_STORE);
> > +   if (internal_gather_scatter_fn_supported_p (ifn, vectype,
> > +       gs_info->memory_type,
> > +       gs_info->offset_vectype,
> > +       gs_info->scale))
> > +     {
> > +       vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> > +       vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
> > +       return;
> > +     }
> >    if (dump_enabled_p ())
> >      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >       "can't operate on partial vectors because"
> > @@ -3129,16 +3141,39 @@ vect_get_gather_scatter_ops (loop_vec_info loop_vinfo,
> >  static void
> >  vect_get_strided_load_store_ops (stmt_vec_info stmt_info,
> >  loop_vec_info loop_vinfo,
> > + gimple_stmt_iterator *gsi,
> >  gather_scatter_info *gs_info,
> > - tree *dataref_bump, tree *vec_offset)
> > + tree *dataref_bump, tree *vec_offset,
> > + vec_loop_lens *loop_lens)
> >  {
> >    struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
> >    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> >  
> > -  tree bump = size_binop (MULT_EXPR,
> > -   fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
> > -   size_int (TYPE_VECTOR_SUBPARTS (vectype)));
> > -  *dataref_bump = cse_and_gimplify_to_preheader (loop_vinfo, bump);
> > +  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> > +    {
> > +      /* _31 = .SELECT_VL (ivtmp_29, POLY_INT_CST [4, 4]);
> > + ivtmp_8 = _31 * 16 (step in bytes);
> > + .LEN_MASK_SCATTER_STORE (vectp_a.9_7, ... );
> > + vectp_a.9_26 = vectp_a.9_7 + ivtmp_8;  */
> > +      tree loop_len
> > + = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0);
> > +      tree tmp
> > + = fold_build2 (MULT_EXPR, sizetype,
> > +        fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
> > +        loop_len);
> > +      tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
> > +      gassign *assign = gimple_build_assign (bump, tmp);
> > +      gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> > +      *dataref_bump = bump;
> > +    }
> > +  else
> > +    {
> > +      tree bump
> > + = size_binop (MULT_EXPR,
> > +       fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
> > +       size_int (TYPE_VECTOR_SUBPARTS (vectype)));
> > +      *dataref_bump = cse_and_gimplify_to_preheader (loop_vinfo, bump);
> > +    }
> >  
> >    /* The offset given in GS_INFO can have pointer type, so use the element
> >       type of the vector instead.  */
> > @@ -8013,7 +8048,7 @@ vectorizable_store (vec_info *vinfo,
> >    return false;
> >  }
> >  
> > -      int mask_index = internal_fn_mask_index (ifn);
> > +      int mask_index = internal_fn_mask_index (ifn, false);
> >        if (mask_index >= 0
> >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> >        &mask, NULL, &mask_dt, &mask_vectype))
> > @@ -8685,8 +8720,8 @@ vectorizable_store (vec_info *vinfo,
> >    else if (memory_access_type == VMAT_GATHER_SCATTER)
> >      {
> >        aggr_type = elem_type;
> > -      vect_get_strided_load_store_ops (stmt_info, loop_vinfo, &gs_info,
> > -        &bump, &vec_offset);
> > +      vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, &gs_info,
> > +        &bump, &vec_offset, loop_lens);
> >      }
> >    else
> >      {
> > @@ -8915,6 +8950,8 @@ vectorizable_store (vec_info *vinfo,
> >        unsigned HOST_WIDE_INT align;
> >  
> >        tree final_mask = NULL_TREE;
> > +       tree final_len = NULL_TREE;
> > +       tree bias = NULL_TREE;
> >        if (loop_masks)
> >  final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
> >  vec_num * ncopies,
> > @@ -8929,8 +8966,41 @@ vectorizable_store (vec_info *vinfo,
> >    if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> >      vec_offset = vec_offsets[vec_num * j + i];
> >    tree scale = size_int (gs_info.scale);
> > +
> > +   if (gs_info.ifn == IFN_LEN_MASK_SCATTER_STORE)
> > +     {
> > +       if (loop_lens)
> > + {
> > +   final_len
> > +     = vect_get_loop_len (loop_vinfo, gsi, loop_lens,
> > + vec_num * ncopies, vectype,
> > + vec_num * j + i, 1);
> > + }
> > +       else
> > + {
> > +   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> > +   final_len
> > +     = build_int_cst (iv_type,
> > +      TYPE_VECTOR_SUBPARTS (vectype));
> > + }
> > +       signed char biasval
> > + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> > +       bias = build_int_cst (intQI_type_node, biasval);
> > +       if (!final_mask)
> > + {
> > +   mask_vectype = truth_type_for (vectype);
> > +   final_mask = build_minus_one_cst (mask_vectype);
> > + }
> > +     }
> > +
> >    gcall *call;
> > -   if (final_mask)
> > +   if (final_len && final_len)
> > +     call
> > +       = gimple_build_call_internal (IFN_LEN_MASK_SCATTER_STORE,
> > +     7, dataref_ptr, vec_offset,
> > +     scale, vec_oprnd, final_len,
> > +     bias, final_mask);
> > +   else if (final_mask)
> >      call = gimple_build_call_internal
> >        (IFN_MASK_SCATTER_STORE, 5, dataref_ptr, vec_offset,
> >         scale, vec_oprnd, final_mask);
> > @@ -9047,9 +9117,6 @@ vectorizable_store (vec_info *vinfo,
> >        machine_mode vmode = TYPE_MODE (vectype);
> >        machine_mode new_vmode = vmode;
> >        internal_fn partial_ifn = IFN_LAST;
> > -       /* Produce 'len' and 'bias' argument.  */
> > -       tree final_len = NULL_TREE;
> > -       tree bias = NULL_TREE;
> >        if (loop_lens)
> >  {
> >    opt_machine_mode new_ovmode
> > @@ -10177,8 +10244,8 @@ vectorizable_load (vec_info *vinfo,
> >    else if (memory_access_type == VMAT_GATHER_SCATTER)
> >      {
> >        aggr_type = elem_type;
> > -      vect_get_strided_load_store_ops (stmt_info, loop_vinfo, &gs_info,
> > -        &bump, &vec_offset);
> > +      vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, &gs_info,
> > +        &bump, &vec_offset, loop_lens);
> >      }
> >    else
> >      {
> > @@ -10339,6 +10406,8 @@ vectorizable_load (vec_info *vinfo,
> >    for (i = 0; i < vec_num; i++)
> >      {
> >        tree final_mask = NULL_TREE;
> > +       tree final_len = NULL_TREE;
> > +       tree bias = NULL_TREE;
> >        if (loop_masks
> >    && memory_access_type != VMAT_INVARIANT)
> >  final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
> > @@ -10368,8 +10437,39 @@ vectorizable_load (vec_info *vinfo,
> >    vec_offset = vec_offsets[vec_num * j + i];
> >  tree zero = build_zero_cst (vectype);
> >  tree scale = size_int (gs_info.scale);
> > +
> > + if (gs_info.ifn == IFN_LEN_MASK_GATHER_LOAD)
> > +   {
> > +     if (loop_lens)
> > +       {
> > + final_len = vect_get_loop_len (
> > +   loop_vinfo, gsi, loop_lens, vec_num * ncopies,
> > +   vectype, vec_num * j + i, 1);
> > +       }
> > +     else
> > +       {
> > + tree iv_type
> > +   = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> > + final_len = build_int_cst (
> > +   iv_type, TYPE_VECTOR_SUBPARTS (vectype));
> > +       }
> > +     signed char biasval
> > +       = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> > +     bias = build_int_cst (intQI_type_node, biasval);
> > +     if (!final_mask)
> > +       {
> > + mask_vectype = truth_type_for (vectype);
> > + final_mask = build_minus_one_cst (mask_vectype);
> > +       }
> > +   }
> > +
> >  gcall *call;
> > - if (final_mask)
> > + if (final_len && final_mask)
> > +   call = gimple_build_call_internal (
> > +     IFN_LEN_MASK_GATHER_LOAD, 7, dataref_ptr,
> > +     vec_offset, scale, zero, final_len, bias,
> > +     final_mask);
> > + else if (final_mask)
> >    call = gimple_build_call_internal
> >      (IFN_MASK_GATHER_LOAD, 5, dataref_ptr,
> >       vec_offset, scale, zero, final_mask);
> > @@ -10462,9 +10562,6 @@ vectorizable_load (vec_info *vinfo,
> >      machine_mode vmode = TYPE_MODE (vectype);
> >      machine_mode new_vmode = vmode;
> >      internal_fn partial_ifn = IFN_LAST;
> > -     /* Produce 'len' and 'bias' argument.  */
> > -     tree final_len = NULL_TREE;
> > -     tree bias = NULL_TREE;
> >      if (loop_lens)
> >        {
> >  opt_machine_mode new_ovmode
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)