From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id 0515F3858C3A for ; Tue, 25 Jul 2023 09:27:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0515F3858C3A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x432.google.com with SMTP id ffacd0b85a97d-3159d5e409dso4109765f8f.0 for ; Tue, 25 Jul 2023 02:27:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1690277251; x=1690882051; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=adWP4fu23mVJqaQMAfKB8S2++Bdk6TusKWD6OKcS5kw=; b=bVnD3Ce0GMYYuKaDYX/fIVRkBIQ5cimdE8tNrykX5XxA9+UEyHROApGhAwXna46Cbl vULo+XHviQ3wJ+buGD8flKzvHan4xv2g6iJ+2xM/HSTpCdbOel0GalMz62SW+npiA/yh CPJ8s3gXorrLXhB0VWWv/35hYGfS5o533eVlSsiS2qTxOjbO7Cnpz5T9GKHEUtEgk4xV 52OGsjvJ0mdtvP3vr+6YvR2AfR+GC6/MfcnnfHJVYm9+4NuAHWz4n8Se2MHXErGB8xow 2is/VoaLeiZFV+VrR9iYX32wxIa4MYLWwkyRdfgshnwnEUvugwi8qcp1GUWAOujixZaI 3ABg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690277251; x=1690882051; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=adWP4fu23mVJqaQMAfKB8S2++Bdk6TusKWD6OKcS5kw=; b=ZGxdcjYrMtmgJx3PtO12Lx7MS4fJ4OYc/WGXUQB8F2F6q74VrZLRmqHL5i5/BI1905 X+p5oLaBMNazZcIwvnt3kNskGijim7TJzNUymVVa/jjmzGUsgZpB4P6oezCC+b16tFc+ bUf25EUdq2TDtyMnWJLW/7GEcxXgJwz7yU6o8J89wSC1oZminBKPh2QxSsbjTQ2jyFCr DjJUzgwBdEz17sk0keYZiPj38TvlED4UbO3ZYZH1SVmIfA7q+kid8fFC3Fvf2iZFhYqx YTcyZwzogac6yKTRl3SH3PfD3fvai03u4HPXf6Eft9g9RT9DHql/ZBcdGfAf74w2QynK l8TA== X-Gm-Message-State: ABy/qLb9E1g5qKMPO0hrKVJguS2Wcitty4ndoIMPTJqK5hr8eSFfrMpA MQrhRaWtSbDWyOKEPduZeIC+teZT/rpGWAMC9e9OPLfYDW/UClei X-Google-Smtp-Source: APBJJlFRaWGiRdJJKYK7xmWy5+Jd1WLYRSq2wT83JRI+ZefiQgUo1yisHWUYimbCBYc81//WEWXe5wh9u2bRXxJEN7w= X-Received: by 2002:a5d:660d:0:b0:317:6c3:c94d with SMTP id n13-20020a5d660d000000b0031706c3c94dmr1592065wru.14.1690277251471; Tue, 25 Jul 2023 02:27:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Tue, 25 Jul 2023 14:56:56 +0530 Message-ID: Subject: Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors To: gcc Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 17 Jul 2023 at 17:44, Prathamesh Kulkarni wrote: > > Hi Richard, > This is reworking of patch to extend fold_vec_perm to handle VLA vectors. > The attached patch unifies handling of VLS and VLA vector_csts, while > using fallback code > for ctors. > > For VLS vector, the patch ignores underlying encoding, and > uses npatterns = nelts, and nelts_per_pattern = 1. > > For VLA patterns, if sel has a stepped sequence, then it > only chooses elements from a particular pattern of a particular > input vector. > > To make things simpler, the patch imposes following constraints: > (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2. > (b) The step size for a stepped sequence is a power of 2, and > multiple of npatterns of chosen input vector. > (c) Runtime vector length of sel is a multiple of sel_npatterns. > So, we don't handle sel.length = 2 + 2x and npatterns = 4. > > Eg: > op0, op1: npatterns = 2, nelts_per_pattern = 3 > op0_len = op1_len = 16 + 16x. > sel = { 0, 0, 2, 0, 4, 0, ... } > npatterns = 2, nelts_per_pattern = 3. > > For pattern {0, 2, 4, ...} > Let, > a1 = 2 > S = step size = 2 > > Let Esel denote number of elements per pattern in sel at runtime. > Esel = (16 + 16x) / npatterns_sel > = (16 + 16x) / 2 > = (8 + 8x) > > So, last element of pattern: > ae = a1 + (Esel - 2) * S > = 2 + (8 + 8x - 2) * 2 > = 14 + 16x > > a1 /trunc arg0_len = 2 / (16 + 16x) = 0 > ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0 > Since both are equal with quotient = 0, we select elements from op0. > > Since step size (S) is a multiple of npatterns(op0), we select > all elements from same pattern of op0. > > res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns)) > = max (2, max (2, 2) > = 2 > > res_nelts_per_pattern = max (op0_nelts_per_pattern, > max (op1_nelts_per_pattern, > sel_nelts_per_pattern)) > = max (3, max (3, 3)) > = 3 > > So res has encoding with npatterns = 2, nelts_per_pattern = 3. > res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... } > > Unfortunately, this results in an issue for poly_int_cst index: > For example, > op0, op1: npatterns = 1, nelts_per_pattern = 3 > op0_len = op1_len = 4 + 4x > > sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1 > > In this case, > a1 = 5 + 4x > S = (6 + 4x) - (5 + 4x) = 1 > Esel = 4 + 4x > > ae = a1 + (esel - 2) * S > = (5 + 4x) + (4 + 4x - 2) * 1 > = 7 + 8x > > IIUC, 7 + 8x will always be index for last element of op1 ? > if x = 0, len = 4, 7 + 8x = 7 > if x = 1, len = 8, 7 + 8x = 15, etc. > So the stepped sequence will always choose elements > from op1 regardless of vector length for above case ? > > However, > ae /trunc op0_len > = (7 + 8x) / (4 + 4x) > which is not defined because 7/4 != 8/4 > and we return NULL_TREE, but I suppose the expected result would be: > res: { op1[0], op1[1], op1[2], ... } ? > > The patch passes bootstrap+test on aarch64-linux-gnu with and without sve, > and on x86_64-unknown-linux-gnu. > I would be grateful for suggestions on how to proceed. Hi Richard, ping: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624675.html Thanks, Prathamesh > > Thanks, > Prathamesh