From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0CC733858D32 for ; Thu, 10 Aug 2023 15:57:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0CC733858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1FBECD75; Thu, 10 Aug 2023 08:58:15 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3DAE83F6C4; Thu, 10 Aug 2023 08:57:32 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,gcc Patches , richard.sandiford@arm.com Cc: gcc Patches Subject: Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors References: Date: Thu, 10 Aug 2023 16:57:31 +0100 In-Reply-To: (Prathamesh Kulkarni's message of "Thu, 10 Aug 2023 20:03:19 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-19.9 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Prathamesh Kulkarni writes: >> static bool >> is_simple_vla_size (poly_uint64 size) >> { >> if (size.is_constant ()) >> return false; >> for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i) >> if (size[i] != (i <= 1 ? size[0] : 0)) > Just wondering is this should be (i == 1 ? size[0] : 0) since i is > initialized to 1 ? Both work. I prefer <= 1 because it doesn't depend on the micro optimisation to start at coefficient 1. In a theoretical 3-indeterminate poly_int, we want the first 2 coefficients to be nonzero and the rest to be zero. > IIUC, is_simple_vla_size should return true for polynomials of first > degree and having same coeff like 4 + 4x ? FWIW, poly_int only supports first-degree polynomials at the moment. coeffs>2 means there is more than one indeterminate, rather than a higher power. >> return false; >> return true; >> } >> >> >> FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT) >> { >> auto nunits = GET_MODE_NUNITS (mode); >> if (!is_simple_vla_size (nunits)) >> continue; >> if (nunits[0] ...) >> test_... (mode); >> ... >> >> } >> >> test_vnx4si_v4si and test_v4si_vnx4si look good. But with the >> loop structure above, I think we can apply the test_vnx4si and >> test_vnx16qi to more cases. So the classification isn't the >> exact number of elements, but instead a limit. >> >> I think the nunits[0] conditions for test_vnx4si are as follows >> (inspection only, so could be wrong): >> >> > +/* Test cases where result and input vectors are VNx4SI */ >> > + >> > +static void >> > +test_vnx4si (machine_mode vmode) >> > +{ >> > + /* Case 1: mask = {0, ...} */ >> > + { >> > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); >> > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); >> > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > + >> > + vec_perm_builder builder (len, 1, 1); >> > + builder.quick_push (0); >> > + vec_perm_indices sel (builder, 2, len); >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); >> > + >> > + tree expected_res[] = { vector_cst_elt (res, 0) }; > This should be { vector_cst_elt (arg0, 0) }; will fix in next patch. >> > + validate_res (1, 1, res, expected_res); >> > + } >> >> nunits[0] >= 2 (could be all nunits if the inputs had nelts_per_pattern==1, >> which I think would be better) > IIUC, the vectors that can be used for a particular test should have > nunits[0] >= res_npatterns, > where res_npatterns is as computed in fold_vec_perm_cst without the > canonicalization ? > For above test -- res_npatterns = max(2, max (2, 1)) == 2, so we > require nunits[0] >= 2 ? > Which implies we can use above test for vectors with length 2 + 2x, 4 + 4x, etc. Right, that's what I meant. With the inputs as they stand it has to be nunits[0] >= 2. We need that form the inputs correctly. But if the inputs instead had nelts_per_pattern == 1, the test would work for all nunits. > Sorry if this sounds like a silly question -- Won't nunits[0] >= 2 > cover all nunits, > since a vector, at a minimum, will contain 2 elements ? Not necessarily. VNx1TI makes conceptual sense. We just don't use it currently (although that'll change with SME). And we do have single-element VLS vectors like V1DI and V1DF. Thanks, Richard