From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id 35FAD3857C4F for ; Sat, 17 Jun 2023 14:19:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 35FAD3857C4F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-517ab9a4a13so1362145a12.1 for ; Sat, 17 Jun 2023 07:19:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687011588; x=1689603588; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ksQOlxUsimVLMrCVSrttwY8NmXWJtuMD2PpvPOO1HSU=; b=W0XsdHn9KlT2j47Mbe9q0iGygSdX8FZhCLCsWxLKa3kMRWPnLcx5s8GMnB03mVqWou DgAsAi81lBKNB6gjI0Ny6AEj7/1VjwzP8zl17i0btnMr7LjoimXecy0NrJutjN7mwTdk mwIcXttHpEAyRIvKor305/fjfWalDoPT14v6gR1WdGIMq1rRvJp/1YyCp0GCdVpvSLpX s5DyvY+9cjHJqxBYYfC6fUtGeKh0UyV+5op67EMYwNCEDKmr8sQxoMd2iOM2QufOm3Xi 0ui7TX/I9WIedda3C8zYpTIjg9wHT1+SXIJ45CTg/RU2Bm+B0YNJr6GWOy6uhRnrrkUc BDTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687011588; x=1689603588; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ksQOlxUsimVLMrCVSrttwY8NmXWJtuMD2PpvPOO1HSU=; b=N5pVGwLdu5gkAK/AlbsutUEDL1z7G+8xEmsdTwaPZmkNgd7Dir3V0ED6c6SLmOi5n3 jxyC/Wq0dE6FuUfOkLcZnqaPQ5Tx3KXruHgcGGMIxp9OyKYaY5d3LWJ7OPFlQjKLZiK5 JW1X2etfa6YGUyh6nyj7WMgBVD064JMO2JuaINzoRBPZ1KJXOGte6s32uXNoe0yGyA+6 htoq9N7xNUGQHxAWKpqUJuHSzwwAw0ZUgkkU2dujmCrPVVf//N8fps0fbNB8b6pK+cHC mSV9F3hvbNYxjPsULAiRtR4sqHPTJfvk4yBfNnd3kDAM0QRxj8L5oFr0gRcYw5+Tvo6f CKYg== X-Gm-Message-State: AC+VfDwwjs0vLBNUp6pn41ORM5zdv2FLrNJ0rfjodcqmi4/M87FzK04a KZNK6SZfIChp6rJXUaYvGWY= X-Google-Smtp-Source: ACHHUZ41rk42dhbijT5Rg+PwvBSWYr6nYS4C0fsM7/Um4BdMfYL2APR7Rs8M1PjQNR7WB7gYKFYTxw== X-Received: by 2002:a17:90a:19d:b0:25e:95a6:898f with SMTP id 29-20020a17090a019d00b0025e95a6898fmr5106515pjc.11.1687011587868; Sat, 17 Jun 2023 07:19:47 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id t15-20020a17090ad50f00b0024e05b7ba8bsm3013370pju.25.2023.06.17.07.19.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 17 Jun 2023 07:19:47 -0700 (PDT) Message-ID: <43e33266-e449-4ddf-4ea4-938dcd458665@gmail.com> Date: Sat, 17 Jun 2023 08:19:45 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH V6] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs Content-Language: en-US To: juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, rdapp.gcc@gmail.com References: <20230616102912.262207-1-juzhe.zhong@rivai.ai> From: Jeff Law In-Reply-To: <20230616102912.262207-1-juzhe.zhong@rivai.ai> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 6/16/23 04:29, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong > > This patch bootstrap pass on X86, ok for trunk ? > > Accoding to comments from Richi, split the first patch to add ifn && optabs > of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this > patch. And also add BIAS argument for possible s390's future use. > > The description of the patterns in doc are coming Robin. > > After this patch is approved, will send the second patch to apply len_mask_* > patterns into vectorizer. > > Target like ARM SVE in GCC has an elegant way to handle both loop control > and flow control simultaneously: > > loop_control_mask = WHILE_ULT > flow_control_mask = comparison > control_mask = loop_control_mask & flow_control_mask; > MASK_LOAD (control_mask) > MASK_STORE (control_mask) > > However, targets like RVV (RISC-V Vector) can not use this approach in > auto-vectorization since RVV use length in loop control. > > This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets > like RISC-V that uses length in loop control. > Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length > or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR. > Mask is the outcome of comparison. > > LEN_MASK_ LOAD/STORE format is defined as follows: > 1). LEN_MASK_LOAD (ptr, align, length, mask). > 2). LEN_MASK_STORE (ptr, align, length, mask, vec). > > Consider these 4 following cases: > > VLA: Variable-length auto-vectorization > VLS: Specific-length auto-vectorization > > Case 1 (VLS): -mrvv-vector-bits=128 IR (Does not use LEN_MASK_*): > Code: v1 = MEM (...) > for (int i = 0; i < 4; i++) v2 = MEM (...) > a[i] = b[i] + c[i]; v3 = v1 + v2 > MEM[...] = v3 > > Case 2 (VLS): -mrvv-vector-bits=128 IR (LEN_MASK_* with length = VF, mask = comparison): > Code: mask = comparison > for (int i = 0; i < 4; i++) v1 = LEN_MASK_LOAD (length = VF, mask) > if (cond[i]) v2 = LEN_MASK_LOAD (length = VF, mask) > a[i] = b[i] + c[i]; v3 = v1 + v2 > LEN_MASK_STORE (length = VF, mask, v3) > > Case 3 (VLA): > Code: loop_len = SELECT_VL or MIN > for (int i = 0; i < n; i++) v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) > a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) > v3 = v1 + v2 > LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3) > > Case 4 (VLA): > Code: loop_len = SELECT_VL or MIN > for (int i = 0; i < n; i++) mask = comparison > if (cond[i]) v1 = LEN_MASK_LOAD (length = loop_len, mask) > a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask) > v3 = v1 + v2 > LEN_MASK_STORE (length = loop_len, mask, v3) > > Co-authored-by: Robin Dapp > > gcc/ChangeLog: > > * doc/md.texi: Add len_mask{load,store}. > * genopinit.cc (main): Ditto. > (CMP_NAME): Ditto. > * internal-fn.cc (len_maskload_direct): Ditto. > (len_maskstore_direct): Ditto. > (expand_call_mem_ref): Ditto. > (expand_partial_load_optab_fn): Ditto. > (expand_len_maskload_optab_fn): Ditto. > (expand_partial_store_optab_fn): Ditto. > (expand_len_maskstore_optab_fn): Ditto. > (direct_len_maskload_optab_supported_p): Ditto. > (direct_len_maskstore_optab_supported_p): Ditto. > * internal-fn.def (LEN_MASK_LOAD): Ditto. > (LEN_MASK_STORE): Ditto. > * optabs.def (OPTAB_CD): Ditto. You should probably mention the change to len_load in the ChangeLog entry for md.texi. > @@ -5136,6 +5136,57 @@ of @code{QI} elements. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern > +@item @samp{len_maskload@var{m}@var{n}} > +Perform a masked load of (operand 2 + operand 4) elements from vector memory > +operand 1 into vector register operand 0, setting the other elements of > +operand 0 to undefined values. This is a combination of len_load and maskload. > +Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2 > +has whichever integer mode the target prefers. A mask is specified in > +operand 3 which must be of type @var{n}. The mask has lower precedence than > +the length and is itself subject to length masking, > +i.e. only mask indices < (operand 2 + operand 4) are used. > +Operand 4 conceptually has mode @code{QI}.s/vector memory/memory/ That first sentence is a bit difficult to parse. Perhaps break it into two sentences. Something like this? Perform a masked load from the memory location pointed to be operand 1 into register operand 0. operand 2 + operand 4 elements are loaded from memory and other elements in operand 0 are set to undefined values. Presumably the element length is encoded by the modes? Is this worth mentioning? > + > +This pattern is not allowed to @code{FAIL}. If the pattern is not allowed to fail, then what code enforces the bias argument's restrictions? I don't see it in the generic expander code. > + > +@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern > +@item @samp{len_maskstore@var{m}@var{n}} > +Perform a masked store of (operand 2 + operand 4) vector elements from vector register > +operand 1 into memory operand 0, leaving the other elements of operand 0 unchanged. > +This is a combination of len_store and maskstore. I'd think this could be adjusted in roughly the same manner as I suggested for loads. > diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc > index 0c1b6859ca0..6bd8858a1d9 100644 > --- a/gcc/genopinit.cc > +++ b/gcc/genopinit.cc > @@ -386,7 +387,8 @@ main (int argc, const char **argv) > { > #define CMP_NAME(N) !strncmp (p->name, (N), strlen ((N))) > if (CMP_NAME("while_ult") || CMP_NAME ("len_load") > - || CMP_NAME ("len_store")) > + || CMP_NAME ("len_store")|| CMP_NAME ("len_maskload") Whitespace nit. Insert a space before the "||". Looks reasonable to me. I think we're going to need a V2, but I don't see major conceptual issues, just minor details to work through. jeff