From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by sourceware.org (Postfix) with ESMTPS id 263F03858D1E for ; Tue, 7 Feb 2023 08:07:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 263F03858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wm1-x32b.google.com with SMTP id m16-20020a05600c3b1000b003dc4050c94aso10745235wms.4 for ; Tue, 07 Feb 2023 00:07:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=cqe7cVXOvyTR1NNHAoWhq75KsSECToxt66XJZ1+XnYk=; b=qOz1SApgV8FLSmEGGv+qcRCSB24IOu9Cx4KhEc6AB/vLz+NEGVW7H1SvEsL3X81/St b6sHAHzEHRtnIezOxz26ir8DC8qK3zjL9pZDJ/WJk2ld6GvT1/Pa5VDU0Cq6yTGm7Heb 7BjI5nbFyAkt2Nn5mw15uvpoHHQII/X97JQSI/jmSVQkwbM1JOLkWt2zBZhRQKQfAi+w mp46fmrhmq6H5ZyknEEqdXAIEG9fcJfZFwe+0pMkI49dfCfrNsXHRCX2WZKjUa++Ega+ LiWsP3HXoTVl8BIBbUPGI7F3FQWUCguednn4kHPF+AsUf1kBTrO0H1gZvjlHPOHX0nkF kBlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cqe7cVXOvyTR1NNHAoWhq75KsSECToxt66XJZ1+XnYk=; b=QUhYgsFn3sAKSM47nem4LVeGnxkSJyjyLMrFH9GJ0t8QzujeW7xJomEaCaavcEiaVQ GpH4Wb2s0kG9HtTN3CCw28FuRENPFmx62tvQULRTadzKzegld5XGrGgeebTeBlD33Rmu 81D/DGbLBv8oNxdyUqtqNSedGY6nkjYsmP9+/l9lrswlBJ0ODAXnvWekAHn5/qmLng2B 2y9mSQIGk6Ta96KXscOZFkZkvahdjususNJKUij5r2d+LICFqRAci1GWjj74IyPTTCHx PZ/leo+mcj4yAn/2xGD9prcKSZ1Mkl0te5/ZJBKHatloxB8+LBw2MJV9kB/iQl1awvoM lkqg== X-Gm-Message-State: AO0yUKWp+6IGst2PBj0JCu+hvs9OzDnDyX3gq9zV8rgu1at1tMOp4B2j ulSb9vZ/YuDtKGkUty5g4bMUfh76q9EyzPDdalMzrw== X-Google-Smtp-Source: AK7set9OL9Z49RhNOVs2lgKVb65tFfn6GMW4elq2MFzM9qwTOkCJC4hpnoVhHdwWBdmABwF3D3fCDk6Zw8Wr/f2H2zs= X-Received: by 2002:a05:600c:3b22:b0:3dc:b713:67ae with SMTP id m34-20020a05600c3b2200b003dcb71367aemr831226wms.68.1675757261882; Tue, 07 Feb 2023 00:07:41 -0800 (PST) MIME-Version: 1.0 References: <000f01d938d8$00cdf7d0$0269e770$@nextmovesoftware.com> <05dc01d93a39$8b8c12a0$a2a437e0$@nextmovesoftware.com> In-Reply-To: <05dc01d93a39$8b8c12a0$a2a437e0$@nextmovesoftware.com> From: Prathamesh Kulkarni Date: Tue, 7 Feb 2023 13:37:05 +0530 Message-ID: Subject: Re: [DOC PATCH] Document the VEC_PERM_EXPR tree code (and minor clean-ups). To: Roger Sayle Cc: Richard Sandiford , Richard Biener , GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 6 Feb 2023 at 20:14, Roger Sayle wrote: > > > Perhaps I'm missing something (I'm not too familiar with SVE semantics), but > is there > a reason that the solution for PR96473 uses a VEC_PERM_EXPR and not just a > VEC_DUPLICATE_EXPR? The folding of sv1d1rq (svptrue_..., ...) doesn't seem > to > require either the blending or the permutation functionality of a > VEC_PERM_EXPR. > Instead, it seems to be misusing (the modified) VEC_PERM_EXPR as a form of > VIEW_CONVERT_EXPR that allows us to convert/mismatch the type of the > operands > to the type of the result. Hi, I am not sure if we could use VEC_DUPLICATE_EXPR for PR96463 case as-is. Perhaps we could extend VEC_DUPLICATE_EXPR to take N operands, so the resulting vector has npatterns = N, nelts_per_pattern = 1 ? AFAIU, extending VEC_PERM_EXPR to handle vectors with different lengths, would allow for more optimization opportunities besides PR96463. > > Conceptually, (as in Richard's original motivation for the PR), > svint32_t foo (int32x4_t x) { return svld1rq (svptrue_b8 (), &x[0]); } > can be optimized to (something like) > svint32_t foo (int32x4_t x) { return svdup_32 (x[0]); } // or dup z0.q, > z0.q[0] equivalent I guess that should be equivalent to svdupq_s32 (x[0], x[1], x[2], x[3]) ? Thanks, Prathamesh > hence it makes sense for fold to transform the gimple form of the first, > into the > gimple form of the second(?) > > Just curious. > Roger > -- > > > -----Original Message----- > > From: Richard Sandiford > > Sent: 06 February 2023 12:22 > > To: Richard Biener > > Cc: Roger Sayle ; GCC Patches > patches@gcc.gnu.org> > > Subject: Re: [DOC PATCH] Document the VEC_PERM_EXPR tree code (and minor > > clean-ups). > > > > Richard Biener writes: > > > On Sat, Feb 4, 2023 at 9:35 PM Roger Sayle > > wrote: > > >> > > >> > > >> This patch (primarily) documents the VEC_PERM_EXPR tree code in > > >> generic.texi. For ease of review, it is provided below as a pair of > > >> diffs. The first contains just the new text added to describe > > >> VEC_PERM_EXPR, the second tidies up this part of the documentation by > > >> sorting the tree codes into alphabetical order, and providing > > >> consistent section naming/capitalization, so changing this section > > >> from "Vectors" to "Vector Expressions" (matching the nearby "Unary > > >> and Binary Expressions"). > > >> > > >> Tested with make pdf and make html on x86_64-pc-linux-gnu. > > >> The reviewer(s) can decide whether to approve just the new content, > > >> or the content+clean-up. Ok for mainline? > > > > > > +@item VEC_PERM_EXPR > > > +This node represents a vector permute/blend operation. The three > > > +operands must be vectors of the same number of elements. The first > > > +and second operands must be vectors of the same type as the entire > > > +expression, > > > > > > this was recently relaxed for the case of constant permutes in which > > > case the first and second operands only have to have the same element > > > type as the result. See tree-cfg.cc:verify_gimple_assign_ternary. > > > > > > The following description will become a bit more awkward here and for > > > rhs1/rhs2 with different number of elements the modulo interpretation > > > doesn't hold - I believe we require in-bounds elements for constant > > > permutes. Richard can probably clarify things here. > > > > I thought that the modulo behaviour still applies when the node has a > constant > > selector, it's just that the in-range form is the canonical one. > > > > With variable-length vectors, I think it's in principle possible to have a > stepped > > constant selector whose start elements are in-range but whose final > elements > > aren't (and instead wrap around when applied). > > E.g. the selector could zip the last quarter of the inputs followed by the > first > > quarter. > > > > Thanks, > > Richard >