From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=0XbL=GP=suse.de=rguenther@sourceware.org>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by sourceware.org (Postfix) with ESMTPS id 71B4D3858CDA
	for <gcc-patches@gcc.gnu.org>; Thu,  2 Nov 2023 14:27:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 71B4D3858CDA
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 71B4D3858CDA
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.220.28
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698935232; cv=none;
	b=OWUiWlCoz01IiMTz7iuvPJehuXwEnyRxUfUc0usGpKNAttO5RBGSk9G9WJJkX0h1lQpTzXqVQOKT7Dx5G8Aicr0I6YithGEMYDUqVvZXOJT0GtMeSMWoT3duBTeXn4orVLmwFaE3snznodz28e1M6HOOTooi2ueS2uFKYERitk0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1698935232; c=relaxed/simple;
	bh=oGG/nrspWa4swxggtOXzuSZvomQvltuHXGZiaw1GhsI=;
	h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID:
	 MIME-Version; b=j0uJfGp/iADwpusCOSeiwK+y4LXa9NaHNA0HQxCtf3z17eFR0WTkTQckm0n6suf9nTWeTwlcJ69wM2CTGnKlkNdcy4mhuGIgmPxZuqg8yr2eMQkCZ2Sj7E1RYxd0c+Nea13JvCYU2ZSWNkmny3Qh/MovAO+/6jZQyWq5OyMyx9k=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id 7F4EC218E0;
	Thu,  2 Nov 2023 14:27:01 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1698935221; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=fypit3PSNrrcHqcHqGHYSKzslJ/RI7yqmVSy5M58kH0=;
	b=MCNyRwZQbpXfWQu6XWS6cQs3R4FPyO0MVjT0ylcW22gQn3t//AXPjBDu3fb7B8yUxo281J
	YEQQxhqVEmWDRHxr0kyOg3lwpwNDLu2iPH9aGZWbbZQKybPGwAtgJfbqSDwalyqORgrEiz
	07ioUlJkM+Ah9+99XylEgkVJVdTlx5E=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1698935221;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=fypit3PSNrrcHqcHqGHYSKzslJ/RI7yqmVSy5M58kH0=;
	b=2LXcAygc/nie2ALVgaFDoYPYFepHafljRq0aXgSjsa+x8igQoAgTC0GPh3VNWzNdFOO+cE
	5o3WT4DO47iz4TCQ==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 18C242C1C4;
	Thu,  2 Nov 2023 14:27:01 +0000 (UTC)
Date: Thu, 2 Nov 2023 14:27:01 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: =?GB2312?B?1tO+09Xc?= <juzhe.zhong@rivai.ai>
cc: gcc-patches <gcc-patches@gcc.gnu.org>, Jeff Law <jeffreyalaw@gmail.com>, 
    "richard.sandiford" <richard.sandiford@arm.com>, 
    "rdapp.gcc" <rdapp.gcc@gmail.com>
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store
 OPTABS/IFN
In-Reply-To: <D14F5A8E325F2984+2023110222233664948210@rivai.ai>
Message-ID: <nycvar.YFH.7.77.849.2311021425540.8772@jbgna.fhfr.qr>
References: <20231031095920.3210489-1-juzhe.zhong@rivai.ai>,  <nycvar.YFH.7.77.849.2311021344390.8772@jbgna.fhfr.qr> <D14F5A8E325F2984+2023110222233664948210@rivai.ai>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Thu, 2 Nov 2023, ??? wrote:

> Hi, Richi.
> 
> >> Do we really need to have two modes for the optab though or could we
> >> simply require the target to support arbitrary offset modes (give it
> >> is implicitly constrained to ptr_mode for the base already)?  Or
> >> properly extend/truncate the offset at expansion time, say to ptr_mode
> >> or to the mode of sizetype.
> 
> For RVV, it's ok by default set stride type as ptr_mode/size_type by default.
> Is it ok that I define strided load/store as single mode optab and default Pmode as stride operand?
> How about scale and signed/unsigned operand ?
> It seems scale operand can be removed ? Since we can pass DR_STEP directly to the stride arguments.
> But I think we can't remove signed/unsigned operand since for strided mode = SI mode, the unsigned
> maximum stride = 2^31 wheras signed is 2 ^ 30.

On the GIMPLE side I think we want to have a sizetype operand and
indeed drop 'scale', the sizetype operand should be readily available.

> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 21:52
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
> Subject: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
>  
> > As previous Richard's suggested, we should support strided load/store in
> > loop vectorizer instead hacking RISC-V backend.
> > 
> > This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> > 
> > The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with
> > changing vector offset into scalar stride.
>  
> I see that it follows gather/scatter.  I'll note that when introducing
> those we failed to add a specifier for TBAA and alignment info for the
> data access.  That means we have to use alias-set zero for the accesses
> (I see existing targets use UNSPECs with some not elaborate MEM anyway,
> but TBAA info might have been the "easy" and obvious property to 
> preserve).  For alignment we either have to assume unaligned or reject
> vectorization of accesses that do not have their original scalar accesses
> naturally aligned (aligned according to their mode).  We don't seem
> to check that though.
>  
> It might be fine to go forward with this since gather/scatter are broken
> in a similar way.
>  
> Do we really need to have two modes for the optab though or could we
> simply require the target to support arbitrary offset modes (give it
> is implicitly constrained to ptr_mode for the base already)?  Or
> properly extend/truncate the offset at expansion time, say to ptr_mode
> or to the mode of sizetype.
>  
> Thanks,
> Richard.
> > We don't have strided_load/strided_store and mask_strided_load/mask_strided_store since
> > it't unlikely RVV will have such optabs and we can't add the patterns that we can't test them.
> >
> > 
> > gcc/ChangeLog:
> > 
> > * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> > * internal-fn.cc (internal_load_fn_p): Ditto.
> > (internal_strided_fn_p): Ditto.
> > (internal_fn_len_index): Ditto.
> > (internal_fn_mask_index): Ditto.
> > (internal_fn_stored_value_index): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> > (MASK_LEN_STRIDED_STORE): Ditto.
> > * internal-fn.h (internal_strided_fn_p): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * optabs.def (OPTAB_CD): Ditto.
> > 
> > ---
> >  gcc/doc/md.texi     | 51 +++++++++++++++++++++++++++++++++++++++++++++
> >  gcc/internal-fn.cc  | 44 ++++++++++++++++++++++++++++++++++++++
> >  gcc/internal-fn.def |  4 ++++
> >  gcc/internal-fn.h   |  2 ++
> >  gcc/optabs.def      |  2 ++
> >  5 files changed, 103 insertions(+)
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..5bac713a0dd 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of the result should
> >  be loaded from memory and clear if element @var{i} of the result should be undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> >  
> > +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}@var{n}}
> > +Load several separate memory locations into a destination vector of mode @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of mode @var{n}.
> > +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
> > +For each element index i:
> > +
> > +@itemize @bullet
> > +@item
> > +extend the stride to address width, using zero
> > +extension if operand 3 is 1 and sign extension if operand 3 is zero;
> > +@item
> > +multiply the extended stride by operand 4;
> > +@item
> > +add the result to the base; and
> > +@item
> > +load the value at that address (operand 1 + @var{i} * multiplied and extended stride) into element @var{i} of operand 0.
> > +@end itemize
> > +
> > +Similar to mask_len_load, the instruction loads at most (operand 6 + operand 7) elements from memory.
> > +Bit @var{i} of the mask is set if element @var{i} of the result should
> > +be loaded from memory and clear if element @var{i} of the result should be undefined.
> > +Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> > +
> >  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
> >  @item @samp{scatter_store@var{m}@var{n}}
> >  Store a vector of mode @var{m} into several distinct memory locations.
> > @@ -5131,6 +5157,31 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory.
> >  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> >  
> > +@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
> > +@item @samp{mask_len_strided_store@var{m}@var{n}}
> > +Store a vector of mode m into several distinct memory locations.
> > +Operand 0 is a scalar base address and operand 1 is scalar stride of mode @var{n}.
> > +Operand 2 is the vector of values that should be stored, which is of mode @var{m}.
> > +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step.
> > +For each element index i:
> > +
> > +@itemize @bullet
> > +@item
> > +extend the stride to address width, using zero
> > +extension if operand 2 is 1 and sign extension if operand 2 is zero;
> > +@item
> > +multiply the extended stride by operand 3;
> > +@item
> > +add the result to the base; and
> > +@item
> > +store element @var{i} of operand 4 to that address (operand 1 + @var{i} * multiplied and extended stride).
> > +@end itemize
> > +
> > +Similar to mask_len_store, the instruction stores at most (operand 6 + operand 7) elements of (operand 4) to memory.
> > +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
> > +Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> > +
> >  @cindex @code{vec_set@var{m}} instruction pattern
> >  @item @samp{vec_set@var{m}}
> >  Set given field in the vector value.  Operand 0 is the vector to modify,
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index e7451b96353..f7f85aa7dde 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4596,6 +4596,7 @@ internal_load_fn_p (internal_fn fn)
> >      case IFN_GATHER_LOAD:
> >      case IFN_MASK_GATHER_LOAD:
> >      case IFN_MASK_LEN_GATHER_LOAD:
> > +    case IFN_MASK_LEN_STRIDED_LOAD:
> >      case IFN_LEN_LOAD:
> >      case IFN_MASK_LEN_LOAD:
> >        return true;
> > @@ -4648,6 +4649,22 @@ internal_gather_scatter_fn_p (internal_fn fn)
> >      }
> >  }
> >  
> > +/* Return true if IFN is some form of strided load or strided store.  */
> > +
> > +bool
> > +internal_strided_fn_p (internal_fn fn)
> > +{
> > +  switch (fn)
> > +    {
> > +    case IFN_MASK_LEN_STRIDED_LOAD:
> > +    case IFN_MASK_LEN_STRIDED_STORE:
> > +      return true;
> > +
> > +    default:
> > +      return false;
> > +    }
> > +}
> > +
> >  /* If FN takes a vector len argument, return the index of that argument,
> >     otherwise return -1.  */
> >  
> > @@ -4662,6 +4679,8 @@ internal_fn_len_index (internal_fn fn)
> >  
> >      case IFN_MASK_LEN_GATHER_LOAD:
> >      case IFN_MASK_LEN_SCATTER_STORE:
> > +    case IFN_MASK_LEN_STRIDED_LOAD:
> > +    case IFN_MASK_LEN_STRIDED_STORE:
> >      case IFN_COND_LEN_FMA:
> >      case IFN_COND_LEN_FMS:
> >      case IFN_COND_LEN_FNMA:
> > @@ -4719,6 +4738,8 @@ internal_fn_mask_index (internal_fn fn)
> >      case IFN_MASK_SCATTER_STORE:
> >      case IFN_MASK_LEN_GATHER_LOAD:
> >      case IFN_MASK_LEN_SCATTER_STORE:
> > +    case IFN_MASK_LEN_STRIDED_LOAD:
> > +    case IFN_MASK_LEN_STRIDED_STORE:
> >        return 4;
> >  
> >      default:
> > @@ -4740,6 +4761,7 @@ internal_fn_stored_value_index (internal_fn fn)
> >      case IFN_SCATTER_STORE:
> >      case IFN_MASK_SCATTER_STORE:
> >      case IFN_MASK_LEN_SCATTER_STORE:
> > +    case IFN_MASK_LEN_STRIDED_STORE:
> >        return 3;
> >  
> >      case IFN_LEN_STORE:
> > @@ -4784,6 +4806,28 @@ internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type,
> >    && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
> >  }
> >  
> > +/* Return true if the target supports strided load or strided store function
> > +   IFN.  For loads, VECTOR_TYPE is the vector type of the load result,
> > +   while for stores it is the vector type of the stored data argument.
> > +   STRIDE_TYPE is the type that holds the stride from the previous element
> > +   memory address of each loaded or stored element.
> > +   SCALE is the amount by which these stride should be multiplied
> > +   *after* they have been extended to address width.  */
> > +
> > +bool
> > +internal_strided_fn_supported_p (internal_fn ifn, tree vector_type,
> > + tree offset_type, int scale)
> > +{
> > +  optab optab = direct_internal_fn_optab (ifn);
> > +  insn_code icode = convert_optab_handler (optab, TYPE_MODE (vector_type),
> > +    TYPE_MODE (offset_type));
> > +  int output_ops = internal_load_fn_p (ifn) ? 1 : 0;
> > +  bool unsigned_p = TYPE_UNSIGNED (offset_type);
> > +  return (icode != CODE_FOR_nothing
> > +   && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p))
> > +   && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
> > +}
> > +
> >  /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN
> >     for pointers of type TYPE when the accesses have LENGTH bytes and their
> >     common byte alignment is ALIGN.  */
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index a2023ab9c3d..0fa532e8f6b 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -199,6 +199,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
> >         mask_gather_load, gather_load)
> >  DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
> >         mask_len_gather_load, gather_load)
> > +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
> > +        mask_len_strided_load, gather_load)
> >  
> >  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
> >  DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
> > @@ -208,6 +210,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
> >         mask_scatter_store, scatter_store)
> >  DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
> >         mask_len_scatter_store, scatter_store)
> > +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
> > +        mask_len_strided_store, scatter_store)
> >  
> >  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
> >  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> > index 99de13a0199..8379c61dff7 100644
> > --- a/gcc/internal-fn.h
> > +++ b/gcc/internal-fn.h
> > @@ -235,11 +235,13 @@ extern bool can_interpret_as_conditional_op_p (gimple *, tree *,
> >  extern bool internal_load_fn_p (internal_fn);
> >  extern bool internal_store_fn_p (internal_fn);
> >  extern bool internal_gather_scatter_fn_p (internal_fn);
> > +extern bool internal_strided_fn_p (internal_fn);
> >  extern int internal_fn_mask_index (internal_fn);
> >  extern int internal_fn_len_index (internal_fn);
> >  extern int internal_fn_stored_value_index (internal_fn);
> >  extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
> >      tree, tree, int);
> > +extern bool internal_strided_fn_supported_p (internal_fn, tree, tree, int);
> >  extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree,
> >  poly_uint64, unsigned int);
> >  #define VECT_PARTIAL_BIAS_UNSUPPORTED 127
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 2ccbe4197b7..3d85ac5f678 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -98,9 +98,11 @@ OPTAB_CD(mask_len_store_optab, "mask_len_store$a$b")
> >  OPTAB_CD(gather_load_optab, "gather_load$a$b")
> >  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
> >  OPTAB_CD(mask_len_gather_load_optab, "mask_len_gather_load$a$b")
> > +OPTAB_CD(mask_len_strided_load_optab, "mask_len_strided_load$a$b")
> >  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
> >  OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
> >  OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
> > +OPTAB_CD(mask_len_strided_store_optab, "mask_len_strided_store$a$b")
> >  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
> >  OPTAB_CD(vec_init_optab, "vec_init$a$b")
> >  
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)