From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by sourceware.org (Postfix) with ESMTPS id 6F97B385C677 for ; Tue, 13 Dec 2022 06:06:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6F97B385C677 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-lf1-x12c.google.com with SMTP id z26so3242674lfu.8 for ; Mon, 12 Dec 2022 22:06:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=2rgMbfhoYCtvHl18Xh7JnUKNx6tAgpjF5dsYZoUK+ok=; b=StRHQkf+KR74aCY4bCt4cKrTwYq+HnfCqfePGvFHkKDYAdkh8Nylv6vMhCJZslf0Na +6phqSDownoWkSqUyTa7hSoOX1DWQCYgmgunzqppujf2WTbYwdHp15/CzuH5AXGDMK7p sRh6TY01Zfr236U9wvGc3qL5A0VWUCE7DbKYBTEBN/IdXi2TTbkY1WqQTVT7eFr3hZde AeZj4vASsAVzUz/EAJOAtWKKbBhWXrXhCoK454j8CYysGvlnbScDfS4GbUHhQImlUF1A PheX4qdFSh6wDWE6aiFTIxy0Zt5r07UK0m/XXyGJdQgURWsCdFB8QGxtVz4War9eJTuR CZmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2rgMbfhoYCtvHl18Xh7JnUKNx6tAgpjF5dsYZoUK+ok=; b=kP7rJoWeVVB9ppXk8G/qict+XqciCgsdd9XvhOmR+SpKHvQ10wD0O5UI/kPZQ92HIs SIugF2U7QEqdnScRjvR4kSchBjjnzOknvyA8dZpL8mFLnN8VQ4BTHejvyIXHAJLtmbbn Twa3sf7A4UVcy0SERLtevr0GAiXVDJsQXToYQQPp8hmcIiOnAe0xbNe/0Y9P/d8RFeJS 6TSaRY4X3Jd4W7o5tWpbcumWT8ZEh8YwXRBBRipEFQLIpBNrTUoSgp33Xce5w6kTb7e3 QnLmDoyFRl3fEIf0Z0HBrDjp3C29XjgIAncEpTpFF1sOKlgwHFeImTaTAVsqNuCzKbgg I1fA== X-Gm-Message-State: ANoB5pm82TKso/04KqADe4Ga6bajGHnXLQCsEqbzIovLnoxojpcUp/Q+ fUK8jXy5CKTFQjTbiJ8WSDinGXQuPoyRhZaJncjppz+Bw5ejaw== X-Google-Smtp-Source: AA0mqf4Cb+UF0+iXZA+JTcyKTqvygimcQcFTK9K9y84tafYRcL7BbXQjgAGreMSJyMfIma1ep7V7+LJF53+xtkNt4rU= X-Received: by 2002:a05:6512:3ca0:b0:4a2:2c4b:8138 with SMTP id h32-20020a0565123ca000b004a22c4b8138mr26524607lfv.14.1670911591125; Mon, 12 Dec 2022 22:06:31 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Tue, 13 Dec 2022 11:35:54 +0530 Message-ID: Subject: Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner To: Prathamesh Kulkarni via Gcc-patches , Prathamesh Kulkarni , Richard Biener , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="000000000000d75b0f05efaf6b33" X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SCC_10_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000d75b0f05efaf6b33 Content-Type: text/plain; charset="UTF-8" On Tue, 6 Dec 2022 at 21:00, Richard Sandiford wrote: > > Prathamesh Kulkarni via Gcc-patches writes: > > On Fri, 4 Nov 2022 at 14:00, Prathamesh Kulkarni > > wrote: > >> > >> On Mon, 31 Oct 2022 at 15:27, Richard Sandiford > >> wrote: > >> > > >> > Prathamesh Kulkarni writes: > >> > > On Wed, 26 Oct 2022 at 21:07, Richard Sandiford > >> > > wrote: > >> > >> > >> > >> Sorry for the slow response. I wanted to find some time to think > >> > >> about this a bit more. > >> > >> > >> > >> Prathamesh Kulkarni writes: > >> > >> > On Fri, 30 Sept 2022 at 21:38, Richard Sandiford > >> > >> > wrote: > >> > >> >> > >> > >> >> Richard Sandiford via Gcc-patches writes: > >> > >> >> > Prathamesh Kulkarni writes: > >> > >> >> >> Sorry to ask a silly question but in which case shall we select 2nd vector ? > >> > >> >> >> For num_poly_int_coeffs == 2, > >> > >> >> >> a1 /trunc n1 == (a1 + 0x) / (n1.coeffs[0] + n1.coeffs[1]*x) > >> > >> >> >> If a1/trunc n1 succeeds, > >> > >> >> >> 0 / n1.coeffs[1] == a1/n1.coeffs[0] == 0. > >> > >> >> >> So, a1 has to be < n1.coeffs[0] ? > >> > >> >> > > >> > >> >> > Remember that a1 is itself a poly_int. It's not necessarily a constant. > >> > >> >> > > >> > >> >> > E.g. the TRN1 .D instruction maps to a VEC_PERM_EXPR with the selector: > >> > >> >> > > >> > >> >> > { 0, 2 + 2x, 1, 4 + 2x, 2, 6 + 2x, ... } > >> > >> >> > >> > >> >> Sorry, should have been: > >> > >> >> > >> > >> >> { 0, 2 + 2x, 2, 4 + 2x, 4, 6 + 2x, ... } > >> > >> > Hi Richard, > >> > >> > Thanks for the clarifications, and sorry for late reply. > >> > >> > I have attached POC patch that tries to implement the above approach. > >> > >> > Passes bootstrap+test on x86_64-linux-gnu and aarch64-linux-gnu for VLS vectors. > >> > >> > > >> > >> > For VLA vectors, I have only done limited testing so far. > >> > >> > It seems to pass couple of tests written in the patch for > >> > >> > nelts_per_pattern == 3, > >> > >> > and folds the following svld1rq test: > >> > >> > int32x4_t v = {1, 2, 3, 4}; > >> > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]) > >> > >> > into: > >> > >> > return {1, 2, 3, 4, ...}; > >> > >> > I will try to bootstrap+test it on SVE machine to test further for VLA folding. > >> > >> > > >> > >> > I have a couple of questions: > >> > >> > 1] When mask selects elements from same vector but from different patterns: > >> > >> > For eg: > >> > >> > arg0 = {1, 11, 2, 12, 3, 13, ...}, > >> > >> > arg1 = {21, 31, 22, 32, 23, 33, ...}, > >> > >> > mask = {0, 0, 0, 1, 0, 2, ... }, > >> > >> > All have npatterns = 2, nelts_per_pattern = 3. > >> > >> > > >> > >> > With above mask, > >> > >> > Pattern {0, ...} selects arg0[0], ie {1, ...} > >> > >> > Pattern {0, 1, 2, ...} selects arg0[0], arg0[1], arg0[2], ie {1, 11, 2, ...} > >> > >> > While arg0[0] and arg0[2] belong to same pattern, arg0[1] belongs to different > >> > >> > pattern in arg0. > >> > >> > The result is: > >> > >> > res = {1, 1, 1, 11, 1, 2, ...} > >> > >> > In this case, res's 2nd pattern {1, 11, 2, ...} is encoded with: > >> > >> > with a0 = 1, a1 = 11, S = -9. > >> > >> > Is that expected tho ? It seems to create a new encoding which > >> > >> > wasn't present in the input vector. For instance, the next elem in > >> > >> > sequence would be -7, > >> > >> > which is not present originally in arg0. > >> > >> > >> > >> Yeah, you're right, sorry. Going back to: > >> > >> > >> > >> (2) The explicit encoding can be used to produce a sequence of N*Ex*Px > >> > >> elements for any integer N. This extended sequence can be reencoded > >> > >> as having N*Px patterns, with Ex staying the same. > >> > >> > >> > >> I guess we need to pick an N for the selector such that each new > >> > >> selector pattern (each one out of the N*Px patterns) selects from > >> > >> the *same pattern* of the same data input. > >> > >> > >> > >> So if a particular pattern in the selector has a step S, and the data > >> > >> input it selects from has Pi patterns, N*S must be a multiple of Pi. > >> > >> N must be a multiple of least_common_multiple(S,Pi)/S. > >> > >> > >> > >> I think that means that the total number of patterns in the result > >> > >> (Pr from previous messages) can safely be: > >> > >> > >> > >> Ps * least_common_multiple( > >> > >> least_common_multiple(S[1], P[input(1)]) / S[1], > >> > >> ... > >> > >> least_common_multiple(S[Ps], P[input(Ps)]) / S[Ps] > >> > >> ) > >> > >> > >> > >> where: > >> > >> > >> > >> Ps = the number of patterns in the selector > >> > >> S[I] = the step for selector pattern I (I being 1-based) > >> > >> input(I) = the data input selected by selector pattern I (I being 1-based) > >> > >> P[I] = the number of patterns in data input I > >> > >> > >> > >> That's getting quite complicated :-) If we allow arbitrary P[...] > >> > >> and S[...] then it could also get large. Perhaps we should finally > >> > >> give up on the general case and limit this to power-of-2 patterns and > >> > >> power-of-2 steps, so that least_common_multiple becomes MAX. Maybe that > >> > >> simplifies other things as well. > >> > >> > >> > >> What do you think? > >> > > Hi Richard, > >> > > Thanks for the suggestions. Yeah I suppose we can initially add support for > >> > > power-of-2 patterns and power-of-2 steps and try to generalize it in > >> > > follow up patches if possible. > >> > > > >> > > Sorry if this sounds like a silly ques -- if we are going to have > >> > > pattern in selector, select *same pattern from same input vector*, > >> > > instead of re-encoding the selector to have N * Ps patterns, would it > >> > > make sense for elements in selector to denote pattern number itself > >> > > instead of element index > >> > > if input vectors are VLA ? > >> > > > >> > > For eg: > >> > > op0 = {1, 2, 3, 4, 1, 2, 3, 5, 1, 2, 3, 6, ...} > >> > > op1 = {...} > >> > > with npatterns == 4, nelts_per_pattern == 3, > >> > > sel = {0, 3} should pick pattern 0 and pattern 3 from op0, > >> > > so, res = {1, 4, 1, 5, 1, 6, ...} > >> > > Not sure if this is correct tho. > >> > > >> > This wouldn't allow us to represent things like a "duplicate one > >> > element", or "copy the leading N elements from the first input and > >> > the other elements from elements N+ of the second input", which we > >> > can with the current scheme. > >> > > >> > The restriction about each (unwound) selector pattern selecting from the > >> > same input pattern only applies to case where the selector pattern is > >> > stepped (and only applies to the stepped part of the pattern, not the > >> > leading element). The restriction is also local to this code; it > >> > doesn't make other VEC_PERM_EXPRs invalid. > >> Hi Richard, > >> Thanks for the clarifications. > >> Just to clarify your approach with an eg: > >> Let selected input vector be: > >> arg0: {a0, b0, c0, d0, > >> a0 + S, b0 + S, c0 + S, d0 + S, > >> a0 + 2S, b0 + 2S, c0 + 2S, dd + 2S, ...} > >> where arg0 has npatterns = 4, and nelts_per_pattern = 3. > >> > >> Let sel = {0, 0, 1, 2, 2, 4, ...} > >> where sel_npatterns = 2 and sel_nelts_per_pattern = 3 > >> > >> So, the first pattern in sel: > >> p1: {0, 1, 2, ...} which will select {a0, b0, c0, ...} > >> which would be incorrect, since they belong to different patterns in arg0. > >> So to select elements from same pattern in arg0, we need to divide p1 > >> into at least N1 = P_arg0 / S0 = 4 distinct patterns. > >> > >> Similarly for second pattern in sel: > >> p2: {0, 2, 4, ...}, we need to divide it into > >> at least N2 = P_arg0 / S1 = 2 distinct patterns. > >> > >> Select N = max(N1, N2) = 4 > >> So, the selector will be extended to N * Ps * Es = 4 * 2 * 3 == 24 elements, > >> and will be re-encoded with N*Ps = 8 patterns: > >> > >> re-encoded sel: > >> {a0, b0, c0, d0, a0 + S, b0 + S, c0 + S, d0 + S, > >> a0 + 2S, b0 + 2S, c0 + 2S, d0 + 2S, a0 + 3S, b0 + 3S, c0 + 3S, d0 + 3S, > >> a0 + 4S, b0 + 4S, c0 + 4s, d0 + 4S, a0 + 5S, b0 + 5S, c0 + 5S, d0 + 5S, > >> ...} > >> > >> with 8 patterns, > >> p1: {a0, a0 + 2S, a0 + 4S, ...} > >> p2: {b0, b0 + 2S, b0 + 4S, ...} > >> ... > >> which select elements from same pattern from same input vector. > >> Does this look correct ? > >> > >> For feasibility, we can check initially that sel_npatterns, arg0_npatterns, > >> arg1_npatterns are powers of 2 and for each stepped pattern, > >> it's stepped size S is a power of 2. I suppose this will be sufficient > >> to ensure that sel can be re-encoded with N*Ps npatterns > >> such that each new pattern selects elements from same pattern > >> of the input vector ? > >> > >> Then compute N: > >> N = 1; > >> for (every pattern p in sel) > >> { > >> op = corresponding input vector for pattern; > >> S = step_size (p); > >> N_pattern = max (S, npatterns (op)) / S; > >> N = max(N, N_pattern) > >> } > >> > >> and re-encode selector with N*Ps patterns. > >> I guess rest of the patch will mostly stay the same. > > Hi, > > I have attached a POC patch based on the above approach. > > For the above eg: > > arg0 = {1, 11, 2, 12, 3, 13, ...} // npatterns = 2, nelts_per_pattern = 3, > > and > > sel = {0, 0, 0, 1, 0, 2, ...} > > with sel_npatterns == 2 and sel_nelts_per_pattern == 3. > > > > For pattern, {0, 1, 2, ...} it will select elements from different > > patterns from arg0, which is incorrect. > > So we choose N = P1/S = 2/1 = 2, where P1 is number of elements in arg0. > > So re-encoded sel = { 0, 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, ...} > > with following patterns: > > p1 = { 0, ... } > > p2 = { 0, 2, 4, ... } > > p3 = { 0, ... } > > p4 = { 1, 3, 5, ... } > > which should be correct since each element from the respective > > patterns in sel chooses > > elements from same pattern from arg0. > > So, res = { 1, 1, 1, 11, 1, 2, 1, 12, 1, 3, 1, 13, ... } > > Does this look correct ? > > Yeah. But like I said above: > > The restriction about each (unwound) selector pattern selecting from the > same input pattern only applies to case where the selector pattern is > stepped (and only applies to the stepped part of the pattern, not the > leading element). > > If the selector nelts-per-pattern is 1 or 2 then we can support all > power-of-2 cases, with the final npatterns being the maximum of the > source nelts-per-patterns. > > Also, going back to an earlier part of the discussion, I think we > should use this technique for both VLA and VLS, and only fall back > to the VLS-specific approach if the VLA approach fails. > > So I suggest we put the VLA code in its own function and have > the VLS-only path kick in when the VLA code fails. If the code is > having to pass a lot of state around, it might make sense to define > a local class, store the state in member variables, and use member > functions for the various subroutines. I don't know if that will > work out neater though. Hi Richard, Thanks for the suggestions. I have attached an updated POC patch, that does the following: (a) Uses VLA approach by default, and falls back to VLS specific folding if VLA approach fails for VLS vectors. (b) Separates cases for sel_nelts_per_pattern < 3 and sel_nelts_per_pattern == 3. (c) Allows, a0 to select different vector from a1 .. ae. I have written a few unit tests in the patch for testing the same. Does the patch look in the right direction ? The patch has an issue for the following case marked as "case 9" in test_vec_perm_vla_folding: arg0 = { 1, 11, 2, 12, 3, 13, ... } arg1 = { 21, 31, 22, 32, 23, 33, ... } arg0 and arg1 have npatterns = 2, nelts_per_pattern = 3. mask = { 4 + 4x, 5 + 4x, 6 + 4x, ... } where 4 + 4x is runtime vector length. npatterns = 1, nelts_per_pattern = 3. a1 = 5 + 4x ae = a1 + (esel - 2) * S = (5 + 4x) + (4 + 4x - 2) * 1 = 7 + 8x Since (7 + 8x) /trunc (4 + 4x) returns false, fold_vec_perm returns NULL_TREE. Is that expected for the above mask ? I intended it to select the second vector similar to, sel = { 0, 1, 2 .. }, which would select the first vector by re-encoding sel as { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ... } with two patterns: {0, 2, 4, ...} and {1, 3, 5, ...} The first would select elements from first pattern from arg0, while the second pattern would select elements from second pattern from arg0. with result effectively having same encoding as arg0. Shouldn't sel = { 4 + 4x, 5 + 4x, 6 + 4x, ... } similarly select arg1 ? PS: I will be on vacation next week. Thanks, Prathamesh > > > @@ -10494,38 +10497,55 @@ fold_mult_zconjz (location_t loc, tree type, tree expr) > > build_zero_cst (itype)); > > } > > > > +/* Check if PATTERN in SEL selects either ARG0 or ARG1, > > + and return the selected arg, otherwise return NULL_TREE. */ > > > > -/* Helper function for fold_vec_perm. Store elements of VECTOR_CST or > > - CONSTRUCTOR ARG into array ELTS, which has NELTS elements, and return > > - true if successful. */ > > - > > -static bool > > -vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts) > > +static tree > > +get_vector_for_pattern (tree arg0, tree arg1, > > + const vec_perm_indices &sel, unsigned pattern, > > + unsigned sel_npatterns, int &S) > > { > > - unsigned HOST_WIDE_INT i, nunits; > > + unsigned sel_nelts_per_pattern = sel.encoding ().nelts_per_pattern (); > > > > - if (TREE_CODE (arg) == VECTOR_CST > > - && VECTOR_CST_NELTS (arg).is_constant (&nunits)) > > + poly_uint64 n1 = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + poly_uint64 nsel = sel.length (); > > + poly_uint64 esel; > > + > > + if (!multiple_p (nsel, sel_npatterns, &esel)) > > + return NULL_TREE; > > + > > + poly_uint64 a1 = sel[pattern + sel_npatterns]; > > + S = 0; > > + if (sel_nelts_per_pattern == 3) > > { > > - for (i = 0; i < nunits; ++i) > > - elts[i] = VECTOR_CST_ELT (arg, i); > > + poly_uint64 a2 = sel[pattern + 2 * sel_npatterns]; > > + S = (a2 - a1).to_constant (); > > The code hasn't proven that this to_constant is safe. > > > + if (S != 0 && !pow2p_hwi (S)) > > + return NULL_TREE; > > } > > - else if (TREE_CODE (arg) == CONSTRUCTOR) > > + > > + poly_uint64 ae = a1 + (esel - 2) * S; > > + uint64_t q1, qe; > > + poly_uint64 r1, re; > > + > > + if (!can_div_trunc_p (a1, n1, &q1, &r1) > > + || !can_div_trunc_p (ae, n1, &qe, &re) > > + || (q1 != qe)) > > + return NULL_TREE; > > Going back to the above: this check doesn't make sense for > sel_nelts_per_pattern != 3. > > Thanks, > Richard > > > + > > + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; > > + > > + if (S < 0) > > { > > - constructor_elt *elt; > > + poly_uint64 a0 = sel[pattern]; > > + if (!known_eq (S, a1 - a0)) > > + return NULL_TREE; > > > > - FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (arg), i, elt) > > - if (i >= nelts || TREE_CODE (TREE_TYPE (elt->value)) == VECTOR_TYPE) > > - return false; > > - else > > - elts[i] = elt->value; > > + if (!known_gt (re, VECTOR_CST_NPATTERNS (arg))) > > + return NULL_TREE; > > } > > - else > > - return false; > > - for (; i < nelts; i++) > > - elts[i] > > - = fold_convert (TREE_TYPE (TREE_TYPE (arg)), integer_zero_node); > > - return true; > > + > > + return arg; > > } > > > > /* Attempt to fold vector permutation of ARG0 and ARG1 vectors using SEL > > @@ -10539,41 +10559,135 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) > > unsigned HOST_WIDE_INT nelts; > > bool need_ctor = false; > > > > - if (!sel.length ().is_constant (&nelts)) > > - return NULL_TREE; > > - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > - && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > - && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), sel.length ()) > > + && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))); > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > return NULL_TREE; > > > > - tree *in_elts = XALLOCAVEC (tree, nelts * 2); > > - if (!vec_cst_ctor_to_array (arg0, nelts, in_elts) > > - || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts)) > > + unsigned res_npatterns = 0; > > + unsigned res_nelts_per_pattern = 0; > > + unsigned sel_npatterns = 0; > > + tree *vector_for_pattern = NULL; > > + > > + if (TREE_CODE (arg0) == VECTOR_CST > > + && TREE_CODE (arg1) == VECTOR_CST > > + && !sel.length ().is_constant ()) > > + { > > + unsigned arg0_npatterns = VECTOR_CST_NPATTERNS (arg0); > > + unsigned arg1_npatterns = VECTOR_CST_NPATTERNS (arg1); > > + sel_npatterns = sel.encoding ().npatterns (); > > + > > + if (!pow2p_hwi (arg0_npatterns) > > + || !pow2p_hwi (arg1_npatterns) > > + || !pow2p_hwi (sel_npatterns)) > > + return NULL_TREE; > > + > > + unsigned N = 1; > > + vector_for_pattern = XALLOCAVEC (tree, sel_npatterns); > > + for (unsigned i = 0; i < sel_npatterns; i++) > > + { > > + int S = 0; > > + tree op = get_vector_for_pattern (arg0, arg1, sel, i, sel_npatterns, S); > > + if (!op) > > + return NULL_TREE; > > + vector_for_pattern[i] = op; > > + unsigned N_pattern = > > + (S > 0) ? std::max(S, VECTOR_CST_NPATTERNS (op)) / S : 1; > > + N = std::max (N, N_pattern); > > + } > > + > > + res_npatterns > > + = std::max (sel_npatterns * N, std::max (arg0_npatterns, arg1_npatterns)); > > + > > + res_nelts_per_pattern > > + = std::max(sel.encoding ().nelts_per_pattern (), > > + std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0), > > + VECTOR_CST_NELTS_PER_PATTERN (arg1))); > > + } > > + else if (sel.length ().is_constant (&nelts) > > + && TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)).is_constant () > > + && TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)).to_constant () == nelts) > > + { > > + /* For VLS vectors, treat all vectors with > > + npatterns = nelts, nelts_per_pattern = 1. */ > > + res_npatterns = sel_npatterns = nelts; > > + res_nelts_per_pattern = 1; > > + vector_for_pattern = XALLOCAVEC (tree, nelts); > > + for (unsigned i = 0; i < nelts; i++) > > + { > > + HOST_WIDE_INT index; > > + if (!sel[i].is_constant (&index)) > > + return NULL_TREE; > > + vector_for_pattern[i] = (index < nelts) ? arg0 : arg1; > > + } > > + } > > + else > > return NULL_TREE; > > > > - tree_vector_builder out_elts (type, nelts, 1); > > - for (i = 0; i < nelts; i++) > > + tree_vector_builder out_elts (type, res_npatterns, > > + res_nelts_per_pattern); > > + unsigned res_nelts = res_npatterns * res_nelts_per_pattern; > > + for (unsigned i = 0; i < res_nelts; i++) > > { > > - HOST_WIDE_INT index; > > - if (!sel[i].is_constant (&index)) > > - return NULL_TREE; > > - if (!CONSTANT_CLASS_P (in_elts[index])) > > - need_ctor = true; > > - out_elts.quick_push (unshare_expr (in_elts[index])); > > + /* For VLA vectors, i % sel_npatterns would give the original > > + pattern the element belongs to, which is sufficient to get the arg. > > + Even if sel_npatterns has been multiplied by N, > > + they will always come from the same input vector. > > + For VLS vectors, sel_npatterns == res_nelts == nelts, > > + so i % sel_npatterns == i since i < nelts */ > > + > > + tree arg = vector_for_pattern[i % sel_npatterns]; > > + unsigned HOST_WIDE_INT index; > > + > > + if (arg == arg0) > > + { > > + if (!sel[i].is_constant ()) > > + return NULL_TREE; > > + index = sel[i].to_constant (); > > + } > > + else > > + { > > + gcc_assert (arg == arg1); > > + poly_uint64 n1 = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + uint64_t q; > > + poly_uint64 r; > > + > > + /* Divide sel[i] by input vector length, to obtain remainder, > > + which would be the index for either input vector. */ > > + if (!can_div_trunc_p (sel[i], n1, &q, &r)) > > + return NULL_TREE; > > + > > + if (!r.is_constant (&index)) > > + return NULL_TREE; > > + } > > + > > + tree elem; > > + if (TREE_CODE (arg) == CONSTRUCTOR) > > + { > > + gcc_assert (index < nelts); > > + if (index >= vec_safe_length (CONSTRUCTOR_ELTS (arg))) > > + return NULL_TREE; > > + elem = CONSTRUCTOR_ELT (arg, index)->value; > > + if (VECTOR_TYPE_P (TREE_TYPE (elem))) > > + return NULL_TREE; > > + need_ctor = true; > > + } > > + else > > + elem = vector_cst_elt (arg, index); > > + out_elts.quick_push (elem); > > } > > > > if (need_ctor) > > { > > vec *v; > > - vec_alloc (v, nelts); > > - for (i = 0; i < nelts; i++) > > + vec_alloc (v, res_nelts); > > + for (i = 0; i < res_nelts; i++) > > CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, out_elts[i]); > > return build_constructor (type, v); > > } > > - else > > - return out_elts.build (); > > + return out_elts.build (); > > } > > > > /* Try to fold a pointer difference of type TYPE two address expressions of > > @@ -16910,6 +17024,97 @@ test_vec_duplicate_folding () > > ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0)); > > } > > > > +static tree > > +build_vec_int_cst (unsigned npatterns, unsigned nelts_per_pattern, > > + int *encoded_elems) > > +{ > > + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (integer_type_node); > > + machine_mode vmode = targetm.vectorize.preferred_simd_mode (int_mode); > > + //machine_mode vmode = VNx4SImode; > > + poly_uint64 nunits = GET_MODE_NUNITS (vmode); > > + tree vectype = build_vector_type (integer_type_node, nunits); > > + > > + tree_vector_builder builder (vectype, npatterns, nelts_per_pattern); > > + for (unsigned i = 0; i < npatterns * nelts_per_pattern; i++) > > + builder.quick_push (build_int_cst (integer_type_node, encoded_elems[i])); > > + return builder.build (); > > +} > > + > > +static void > > +test_vec_perm_vla_folding () > > +{ > > + int arg0_elems[] = { 1, 11, 2, 12, 3, 13 }; > > + tree arg0 = build_vec_int_cst (2, 3, arg0_elems); > > + > > + int arg1_elems[] = { 21, 31, 22, 32, 23, 33 }; > > + tree arg1 = build_vec_int_cst (2, 3, arg1_elems); > > + > > + if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)).is_constant () > > + || TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)).is_constant ()) > > + return; > > + > > + /* Case 1: For mask: {0, 1, 2, ...}, npatterns == 1, nelts_per_pattern == 3, > > + should select arg0. */ > > + { > > + int mask_elems[] = {0, 1, 2}; > > + tree mask = build_vec_int_cst (1, 3, mask_elems); > > + tree res = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (arg0), arg0, arg1, mask); > > + ASSERT_TRUE (res != NULL_TREE); > > + ASSERT_TRUE (VECTOR_CST_NPATTERNS (res) == 2); > > + ASSERT_TRUE (VECTOR_CST_NELTS_PER_PATTERN (res) == 3); > > + > > + unsigned res_nelts = vector_cst_encoded_nelts (res); > > + for (unsigned i = 0; i < res_nelts; i++) > > + ASSERT_TRUE (operand_equal_p (VECTOR_CST_ELT (res, i), > > + VECTOR_CST_ELT (arg0, i), 0)); > > + } > > + > > + /* Case 2: For mask: {4, 5, 6, ...}, npatterns == 1, nelts_per_pattern == 3, > > + should return NULL because for len = 4 + 4x, > > + if x == 0, we select from arg1 > > + if x > 0, we select from arg0 > > + and thus cannot determine result at compile time. */ > > + { > > + int mask_elems[] = {4, 5, 6}; > > + tree mask = build_vec_int_cst (1, 3, mask_elems); > > + tree res = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (arg0), arg0, arg1, mask); > > + gcc_assert (res == NULL_TREE); > > + } > > + > > + /* Case 3: > > + mask: {0, 0, 0, 1, 0, 2, ...} > > + npatterns == 2, nelts_per_pattern == 3 > > + Pattern {0, ...} should select arg0[0], ie, 1. > > + Pattern {0, 1, 2, ...} should select arg0: {1, 11, 2, ...}, > > + so res = {1, 1, 1, 11, 1, 2, ...}. */ > > + { > > + int mask_elems[] = {0, 0, 0, 1, 0, 2}; > > + tree mask = build_vec_int_cst (2, 3, mask_elems); > > + tree res = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (arg0), arg0, arg1, mask); > > + ASSERT_TRUE (VECTOR_CST_NPATTERNS (res) == 4); > > + ASSERT_TRUE (VECTOR_CST_NELTS_PER_PATTERN (res) == 3); > > + > > + /* Check encoding: {1, 1, 1, 11, 1, 2, 1, 12, 1, 3, 1, 13, ...} */ > > + int res_encoded_elems[] = {1, 1, 1, 11, 1, 2, 1, 12, 1, 3, 1, 13}; > > + for (unsigned i = 0; i < vector_cst_encoded_nelts (res); i++) > > + ASSERT_TRUE (wi::to_wide(VECTOR_CST_ELT (res, i)) == res_encoded_elems[i]); > > + } > > + > > + /* Case 4: > > + mask: {0, 4 + 4x, 0, 5 + 4x, 0, 6 + 4x, ...} > > + npatterns == 2, nelts_per_pattern == 3 > > + Pattern {0, ...} should select arg0[1] > > + Pattern {4 + 4x, 5 + 4x, 6 + 4x, ...} should select from arg1, since: > > + a1 = 5 + 4x > > + ae = (5 + 4x) + ((4 + 4x) / 2 - 2) * 1 > > + = 5 + 6x > > + Since a1/4+4x == ae/4+4x == 1, we select arg1[0], arg1[1], arg1[2], ... > > + res: {1, 21, 1, 31, 1, 22, ... } > > + FIXME: How to build vector with poly_int elems ? */ > > + > > + /* Case 5: S < 0. */ > > +} > > + > > /* Run all of the selftests within this file. */ > > > > void > > @@ -16918,6 +17123,7 @@ fold_const_cc_tests () > > test_arithmetic_folding (); > > test_vector_folding (); > > test_vec_duplicate_folding (); > > + test_vec_perm_vla_folding (); > > } > > > > } // namespace selftest --000000000000d75b0f05efaf6b33 Content-Type: text/plain; charset="US-ASCII"; name="gnu-790-7.txt" Content-Disposition: attachment; filename="gnu-790-7.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lbltat8x0 ZGlmZiAtLWdpdCBhL2djYy9mb2xkLWNvbnN0LmNjIGIvZ2NjL2ZvbGQtY29uc3QuY2MKaW5kZXgg ZTgwYmU4MDQ5ZTEuLmIxZWQ5MGU2MjliIDEwMDY0NAotLS0gYS9nY2MvZm9sZC1jb25zdC5jYwor KysgYi9nY2MvZm9sZC1jb25zdC5jYwpAQCAtODUsNiArODUsOSBAQCBhbG9uZyB3aXRoIEdDQzsg c2VlIHRoZSBmaWxlIENPUFlJTkczLiAgSWYgbm90IHNlZQogI2luY2x1ZGUgInZlYy1wZXJtLWlu ZGljZXMuaCIKICNpbmNsdWRlICJhc2FuLmgiCiAjaW5jbHVkZSAiZ2ltcGxlLXJhbmdlLmgiCisj aW5jbHVkZSA8YWxnb3JpdGhtPgorI2luY2x1ZGUgImdpbXBsZS1wcmV0dHktcHJpbnQuaCIKKyNp bmNsdWRlICJ0cmVlLXByZXR0eS1wcmludC5oIgogCiAvKiBOb256ZXJvIGlmIHdlIGFyZSBmb2xk aW5nIGNvbnN0YW50cyBpbnNpZGUgYW4gaW5pdGlhbGl6ZXIgb3IgYSBDKysKICAgIG1hbmlmZXN0 bHktY29uc3RhbnQtZXZhbHVhdGVkIGNvbnRleHQ7IHplcm8gb3RoZXJ3aXNlLgpAQCAtMTA1NDQs NiArMTA1NDcsMTI0IEBAIHZlY19jc3RfY3Rvcl90b19hcnJheSAodHJlZSBhcmcsIHVuc2lnbmVk IGludCBuZWx0cywgdHJlZSAqZWx0cykKICAgcmV0dXJuIHRydWU7CiB9CiAKK3N0YXRpYyBzdGQ6 OnBhaXI8dHJlZSwgaW50PgorZ2V0X3ZlY3Rvcl9mb3JfcGF0dGVybiAodHJlZSBhcmcwLCB0cmVl IGFyZzEsIGNvbnN0IHZlY19wZXJtX2luZGljZXMgJnNlbCwKKwkJCXVuc2lnbmVkIHBhdHRlcm4p Cit7CisgIHVuc2lnbmVkIHNlbF9uZWx0c19wZXJfcGF0dGVybiA9IHNlbC5lbmNvZGluZyAoKS5u ZWx0c19wZXJfcGF0dGVybiAoKTsKKyAgZ2NjX2Fzc2VydCAoc2VsX25lbHRzX3Blcl9wYXR0ZXJu ID09IDMpOworCisgIHVuc2lnbmVkIHNlbF9ucGF0dGVybnMgPSBzZWwuZW5jb2RpbmcgKCkubnBh dHRlcm5zICgpOworICBwb2x5X3VpbnQ2NCBsZW4gPSBzZWwubGVuZ3RoICgpOworICBwb2x5X3Vp bnQ2NCBlc2VsOworCisgIGlmICghbXVsdGlwbGVfcCAobGVuLCBzZWxfbnBhdHRlcm5zLCAmZXNl bCkpCisgICAgcmV0dXJuIHN0ZDo6bWFrZV9wYWlyIChOVUxMX1RSRUUsIDApOworCisgIHBvbHlf dWludDY0IGExID0gc2VsW3BhdHRlcm4gKyBzZWxfbnBhdHRlcm5zXTsKKyAgcG9seV91aW50NjQg YTIgPSBzZWxbcGF0dGVybiArIDIgKiBzZWxfbnBhdHRlcm5zXTsKKyAgcG9seV91aW50NjQgZGlm ZiA9IGEyIC0gYTE7CisgIGlmICghZGlmZi5pc19jb25zdGFudCAoKSkKKyAgICByZXR1cm4gc3Rk OjptYWtlX3BhaXIgKE5VTExfVFJFRSwgMCk7CisgIGludCBTID0gZGlmZi50b19jb25zdGFudCAo KTsKKyAgaWYgKCFwb3cycF9od2kgKFMpKQorICAgIHJldHVybiBzdGQ6Om1ha2VfcGFpciAoTlVM TF9UUkVFLCAwKTsKKworICBwb2x5X3VpbnQ2NCBhZSA9IGExICsgKGVzZWwgLSAyKSAqIFM7Cisg IHVpbnQ2NF90IHExLCBxZTsKKyAgcG9seV91aW50NjQgcjEsIHJlOworICBpZiAoIWNhbl9kaXZf dHJ1bmNfcCAoYTEsIGxlbiwgJnExLCAmcjEpCisgICAgICB8fCAhY2FuX2Rpdl90cnVuY19wIChh ZSwgbGVuLCAmcWUsICZyZSkKKyAgICAgIHx8IChxMSAhPSBxZSkpCisgICAgcmV0dXJuIHN0ZDo6 bWFrZV9wYWlyIChOVUxMX1RSRUUsIDApOworCisgIHRyZWUgYXJnID0gKChxMSAmIDEpID09IDAp ID8gYXJnMCA6IGFyZzE7CisKKyAgaWYgKFMgPCAwCisgICAgICAmJiAha25vd25fZXEgKFMsIGEx IC0gc2VsW3BhdHRlcm5dKQorICAgICAgJiYgIWtub3duX2d0IChyZSwgVkVDVE9SX0NTVF9OUEFU VEVSTlMgKGFyZykpKQorICAgIHJldHVybiBzdGQ6Om1ha2VfcGFpciAoTlVMTF9UUkVFLCAwKTsK KworICByZXR1cm4gc3RkOjptYWtlX3BhaXIgKGFyZywgUyk7Cit9CisKK3N0YXRpYyB0cmVlCitm b2xkX3ZlY19wZXJtX3ZsYSAodHJlZSB0eXBlLCB0cmVlIGFyZzAsIHRyZWUgYXJnMSwgY29uc3Qg dmVjX3Blcm1faW5kaWNlcyAmc2VsKQoreworICBpZiAoVFJFRV9DT0RFIChhcmcwKSAhPSBWRUNU T1JfQ1NUCisgICAgICB8fCBUUkVFX0NPREUgKGFyZzEpICE9IFZFQ1RPUl9DU1QpCisgICAgcmV0 dXJuIE5VTExfVFJFRTsKKworICB1bnNpZ25lZCBhcmcwX25wYXR0ZXJucyA9IFZFQ1RPUl9DU1Rf TlBBVFRFUk5TIChhcmcwKTsKKyAgdW5zaWduZWQgYXJnMV9ucGF0dGVybnMgPSBWRUNUT1JfQ1NU X05QQVRURVJOUyAoYXJnMSk7CisgIHVuc2lnbmVkIHNlbF9ucGF0dGVybnMgPSBzZWwuZW5jb2Rp bmcgKCkubnBhdHRlcm5zICgpOworICB1bnNpZ25lZCBzZWxfbmVsdHNfcGVyX3BhdHRlcm4gPSBz ZWwuZW5jb2RpbmcgKCkubmVsdHNfcGVyX3BhdHRlcm4gKCk7CisgIHBvbHlfdWludDY0IG5lbHRz ID0gc2VsLmxlbmd0aCAoKTsKKworICBpZiAoIXBvdzJwX2h3aSAoYXJnMF9ucGF0dGVybnMpCisg ICAgICB8fCAhcG93MnBfaHdpIChhcmcxX25wYXR0ZXJucykKKyAgICAgIHx8ICFwb3cycF9od2kg KHNlbF9ucGF0dGVybnMpKQorICAgIHJldHVybiBOVUxMX1RSRUU7CisKKyAgdW5zaWduZWQgTiA9 IDE7CisgIGlmIChzZWxfbmVsdHNfcGVyX3BhdHRlcm4gPT0gMykKKyAgICBmb3IgKHVuc2lnbmVk IGkgPSAwOyBpIDwgc2VsX25wYXR0ZXJuczsgaSsrKQorICAgICAgeworCXN0ZDo6cGFpcjx0cmVl LCBpbnQ+IHJldCA9IGdldF92ZWN0b3JfZm9yX3BhdHRlcm4gKGFyZzAsIGFyZzEsIHNlbCwgaSk7 CisJdHJlZSBhcmcgPSByZXQuZmlyc3Q7CisJaWYgKCFhcmcpCisJICByZXR1cm4gTlVMTF9UUkVF OworCisJaW50IFMgPSByZXQuc2Vjb25kOworCS8qIFMgY2FuIGJlIDAgaWYgb25lIG9mIHRoZSBw YXR0ZXJucyBpcyBhIGR1cCBidXQKKwkgICBvdGhlciBpcyBhIHN0ZXBwZWQgc2VxdWVuY2UuIEZv ciBlZzogezAsIDAsIDAsIDEsIDAsIDIsIC4uLn0gKi8KKwl1bnNpZ25lZCBOX3BhdHRlcm4KKwkg ID0gKFMgPiAwKSA/IHN0ZDo6bWF4PGludD4gKFMsIFZFQ1RPUl9DU1RfTlBBVFRFUk5TIChhcmcp KSAvIFMgOiAxOworCU4gPSBzdGQ6Om1heCAoTiwgTl9wYXR0ZXJuKTsKKyAgICAgIH0KKworICB1 bnNpZ25lZCByZXNfbnBhdHRlcm5zCisgICAgPSBzdGQ6Om1heCAoc2VsX25wYXR0ZXJucyAqIE4s IHN0ZDo6bWF4IChhcmcwX25wYXR0ZXJucywgYXJnMV9ucGF0dGVybnMpKTsKKworICB1bnNpZ25l ZCByZXNfbmVsdHNfcGVyX3BhdHRlcm4KKyAgICA9IHN0ZDo6bWF4IChzZWxfbmVsdHNfcGVyX3Bh dHRlcm4sIHN0ZDo6bWF4IChWRUNUT1JfQ1NUX05FTFRTX1BFUl9QQVRURVJOIChhcmcwKSwKKwkJ CQkJCSBWRUNUT1JfQ1NUX05FTFRTX1BFUl9QQVRURVJOIChhcmcxKSkpOworCisgIHRyZWVfdmVj dG9yX2J1aWxkZXIgb3V0X2VsdHMgKHR5cGUsIHJlc19ucGF0dGVybnMsIHJlc19uZWx0c19wZXJf cGF0dGVybik7CisgIGZvciAodW5zaWduZWQgaSA9IDA7IGkgPCByZXNfbnBhdHRlcm5zICogcmVz X25lbHRzX3Blcl9wYXR0ZXJuOyBpKyspCisgICAgeworICAgICAgLyogR2V0IHBhdHRlcm4gY29y cmVzcG9uZGluZyB0byBzZWxbaV0gYW5kIHVzZSB0aGF0IHRvIGZpZ3VyZSBvdXQKKwkgdGhlIGlu cHV0IHZlY3Rvci4KKwkgRm9yIGEgc3RlcHBlZCBzZXF1ZW5jZSwgYTAgbWF5IGNob29zZSBkaWZm ZXJlbnQgdmVjdG9yLAorCSBob3dldmVyIGExIC4uLiBhZSBtdXN0IHNlbGVjdCBmcm9tIHNhbWUg cGF0dGVybiBmcm9tIHNhbWUgdmVjdG9yLgorCSBTbyBpZiBpIDwgc2VsX25wYXR0ZXJucywgc2V0 IHBhdHRlcm5faW5kZXggdG8gaW5kZXggb2YgYTAsCisJIGFuZCBpZiBpID49IHNlbF9ucGF0dGVy bnMsIHNldCBwYXR0ZXJuX2luZGV4IHRvIGluZGV4IG9mIGExLiAgKi8KKworICAgICAgdW5zaWdu ZWQgcGF0dGVybl9pbmRleCA9IGkgJSBzZWxfbnBhdHRlcm5zOworICAgICAgaWYgKGkgPj0gc2Vs X25wYXR0ZXJucykKKwlwYXR0ZXJuX2luZGV4ICs9IHNlbF9ucGF0dGVybnM7CisKKyAgICAgIHVp bnQ2NF90IHE7CisgICAgICBwb2x5X3VpbnQ2NCByOworICAgICAgaWYgKCFjYW5fZGl2X3RydW5j X3AgKHNlbFtwYXR0ZXJuX2luZGV4XSwgbmVsdHMsICZxLCAmcikpCisJcmV0dXJuIE5VTExfVFJF RTsKKyAgICAgIHRyZWUgYXJnID0gKChxICYgMSkgPT0gMCkgPyBhcmcwIDogYXJnMTsKKworICAg ICAgdW5zaWduZWQgSE9TVF9XSURFX0lOVCBpbmRleDsKKyAgICAgIGlmIChzZWxbaV0uaXNfY29u c3RhbnQgKCkpCisJaW5kZXggPSBzZWxbaV0udG9fY29uc3RhbnQgKCk7CisgICAgICBlbHNlCisJ eworCSAgcG9seV91aW50NjQgZGlmZiA9IHNlbFtpXSAtIG5lbHRzOworCSAgaWYgKCFkaWZmLmlz X2NvbnN0YW50ICgmaW5kZXgpKQorCSAgICByZXR1cm4gTlVMTF9UUkVFOworCX0KKyAgICAgIHRy ZWUgZWxlbSA9IHZlY3Rvcl9jc3RfZWx0IChhcmcsIGluZGV4KTsKKyAgICAgIG91dF9lbHRzLnF1 aWNrX3B1c2ggKGVsZW0pOworICAgIH0KKyAgcmV0dXJuIG91dF9lbHRzLmJ1aWxkICgpOworfQor CiAvKiBBdHRlbXB0IHRvIGZvbGQgdmVjdG9yIHBlcm11dGF0aW9uIG9mIEFSRzAgYW5kIEFSRzEg dmVjdG9ycyB1c2luZyBTRUwKICAgIHNlbGVjdG9yLiAgUmV0dXJuIHRoZSBmb2xkZWQgVkVDVE9S X0NTVCBvciBDT05TVFJVQ1RPUiBpZiBzdWNjZXNzZnVsLAogICAgTlVMTF9UUkVFIG90aGVyd2lz ZS4gICovCkBAIC0xMDU1NSwxNSArMTA2NzYsMjAgQEAgZm9sZF92ZWNfcGVybSAodHJlZSB0eXBl LCB0cmVlIGFyZzAsIHRyZWUgYXJnMSwgY29uc3QgdmVjX3Blcm1faW5kaWNlcyAmc2VsKQogICB1 bnNpZ25lZCBIT1NUX1dJREVfSU5UIG5lbHRzOwogICBib29sIG5lZWRfY3RvciA9IGZhbHNlOwog Ci0gIGlmICghc2VsLmxlbmd0aCAoKS5pc19jb25zdGFudCAoJm5lbHRzKSkKLSAgICByZXR1cm4g TlVMTF9UUkVFOwotICBnY2NfYXNzZXJ0IChrbm93bl9lcSAoVFlQRV9WRUNUT1JfU1VCUEFSVFMg KHR5cGUpLCBuZWx0cykKLQkgICAgICAmJiBrbm93bl9lcSAoVFlQRV9WRUNUT1JfU1VCUEFSVFMg KFRSRUVfVFlQRSAoYXJnMCkpLCBuZWx0cykKLQkgICAgICAmJiBrbm93bl9lcSAoVFlQRV9WRUNU T1JfU1VCUEFSVFMgKFRSRUVfVFlQRSAoYXJnMSkpLCBuZWx0cykpOworICBnY2NfYXNzZXJ0IChr bm93bl9lcSAoVFlQRV9WRUNUT1JfU1VCUEFSVFMgKHR5cGUpLCBzZWwubGVuZ3RoICgpKQorCSAg ICAgICYmIGtub3duX2VxIChUWVBFX1ZFQ1RPUl9TVUJQQVJUUyAoVFJFRV9UWVBFIChhcmcwKSks CisJCQkgICBUWVBFX1ZFQ1RPUl9TVUJQQVJUUyAoVFJFRV9UWVBFIChhcmcxKSkpKTsKKwogICBp ZiAoVFJFRV9UWVBFIChUUkVFX1RZUEUgKGFyZzApKSAhPSBUUkVFX1RZUEUgKHR5cGUpCiAgICAg ICB8fCBUUkVFX1RZUEUgKFRSRUVfVFlQRSAoYXJnMSkpICE9IFRSRUVfVFlQRSAodHlwZSkpCiAg ICAgcmV0dXJuIE5VTExfVFJFRTsKIAorICBpZiAodHJlZSByZXMgPSBmb2xkX3ZlY19wZXJtX3Zs YSAodHlwZSwgYXJnMCwgYXJnMSwgc2VsKSkKKyAgICByZXR1cm4gcmVzOworCisgIGlmICghc2Vs Lmxlbmd0aCgpLmlzX2NvbnN0YW50ICgmbmVsdHMpKQorICAgIHJldHVybiBOVUxMX1RSRUU7CisK ICAgdHJlZSAqaW5fZWx0cyA9IFhBTExPQ0FWRUMgKHRyZWUsIG5lbHRzICogMik7CiAgIGlmICgh dmVjX2NzdF9jdG9yX3RvX2FycmF5IChhcmcwLCBuZWx0cywgaW5fZWx0cykKICAgICAgIHx8ICF2 ZWNfY3N0X2N0b3JfdG9fYXJyYXkgKGFyZzEsIG5lbHRzLCBpbl9lbHRzICsgbmVsdHMpKQpAQCAt MTY5NTgsNiArMTcwODQsMTY4IEBAIHRlc3RfdmVjX2R1cGxpY2F0ZV9mb2xkaW5nICgpCiAgIEFT U0VSVF9UUlVFIChvcGVyYW5kX2VxdWFsX3AgKGR1cDVfZXhwciwgZHVwNV9jc3QsIDApKTsKIH0K IAorc3RhdGljIHRyZWUKK2J1aWxkX3ZlY19pbnRfY3N0ICh1bnNpZ25lZCBucGF0dGVybnMsIHVu c2lnbmVkIG5lbHRzX3Blcl9wYXR0ZXJuLAorCQkgICBpbnQgKmVuY29kZWRfZWxlbXMpCit7Cisg IHNjYWxhcl9pbnRfbW9kZSBpbnRfbW9kZSA9IFNDQUxBUl9JTlRfVFlQRV9NT0RFIChpbnRlZ2Vy X3R5cGVfbm9kZSk7CisgIC8vbWFjaGluZV9tb2RlIHZtb2RlID0gdGFyZ2V0bS52ZWN0b3JpemUu cHJlZmVycmVkX3NpbWRfbW9kZSAoaW50X21vZGUpOworICBtYWNoaW5lX21vZGUgdm1vZGUgPSBW Tng0U0ltb2RlOworICBwb2x5X3VpbnQ2NCBudW5pdHMgPSBHRVRfTU9ERV9OVU5JVFMgKHZtb2Rl KTsKKyAgdHJlZSB2ZWN0eXBlID0gYnVpbGRfdmVjdG9yX3R5cGUgKGludGVnZXJfdHlwZV9ub2Rl LCBudW5pdHMpOworCisgIHRyZWVfdmVjdG9yX2J1aWxkZXIgYnVpbGRlciAodmVjdHlwZSwgbnBh dHRlcm5zLCBuZWx0c19wZXJfcGF0dGVybik7CisgIGZvciAodW5zaWduZWQgaSA9IDA7IGkgPCBu cGF0dGVybnMgKiBuZWx0c19wZXJfcGF0dGVybjsgaSsrKQorICAgIGJ1aWxkZXIucXVpY2tfcHVz aCAoYnVpbGRfaW50X2NzdCAoaW50ZWdlcl90eXBlX25vZGUsIGVuY29kZWRfZWxlbXNbaV0pKTsK KyAgcmV0dXJuIGJ1aWxkZXIuYnVpbGQgKCk7Cit9CisKK3N0YXRpYyB0cmVlCitmb2xkX3ZlY19w ZXJtX3ZsYV9tYXNrICh0cmVlIHR5cGUsIHRyZWUgYXJnMCwgdHJlZSBhcmcxLAorCQkJdW5zaWdu ZWQgbWFza19ucGF0dGVybnMsCisJCQl1bnNpZ25lZCBtYXNrX25lbHRzX3Blcl9wYXR0ZXJuLAor CQkJcG9seV91aW50NjQgKm1hc2tfZWxlbXMpCit7CisgIHBvbHlfdWludDY0IGxlbiA9IFRZUEVf VkVDVE9SX1NVQlBBUlRTICh0eXBlKTsKKyAgdmVjX3Blcm1fYnVpbGRlciBidWlsZGVyIChsZW4s IG1hc2tfbnBhdHRlcm5zLCBtYXNrX25lbHRzX3Blcl9wYXR0ZXJuKTsKKyAgZm9yICh1bnNpZ25l ZCBpID0gMDsgaSA8IG1hc2tfbnBhdHRlcm5zICogbWFza19uZWx0c19wZXJfcGF0dGVybjsgaSsr KQorICAgIGJ1aWxkZXIucXVpY2tfcHVzaCAobWFza19lbGVtc1tpXSk7CisgIHZlY19wZXJtX2lu ZGljZXMgc2VsIChidWlsZGVyLCAoYXJnMCA9PSBhcmcxKSA/IDEgOiAyLCBsZW4pOworICByZXR1 cm4gZm9sZF92ZWNfcGVybV92bGEgKHR5cGUsIGFyZzAsIGFyZzEsIHNlbCk7Cit9CisKK3N0YXRp YyB2b2lkCitjaGVja192ZWNfcGVybV92bGFfcmVzdWx0KHRyZWUgcmVzLCBpbnQgKnJlc19lbGVt cywKKwkJCSAgdW5zaWduZWQgbnBhdHRlcm5zLCB1bnNpZ25lZCBuZWx0c19wZXJfcGF0dGVybikK K3sKKyAgQVNTRVJUX1RSVUUgKFRSRUVfQ09ERSAocmVzKSA9PSBWRUNUT1JfQ1NUKTsKKyAgQVNT RVJUX1RSVUUgKFZFQ1RPUl9DU1RfTlBBVFRFUk5TIChyZXMpID09IG5wYXR0ZXJucyk7CisgIEFT U0VSVF9UUlVFIChWRUNUT1JfQ1NUX05FTFRTX1BFUl9QQVRURVJOIChyZXMpID09IG5lbHRzX3Bl cl9wYXR0ZXJuKTsKKworICBmb3IgKHVuc2lnbmVkIGkgPSAwOyBpIDwgdmVjdG9yX2NzdF9lbmNv ZGVkX25lbHRzIChyZXMpOyBpKyspCisgICAgQVNTRVJUX1RSVUUgKHdpOjp0b193aWRlIChWRUNU T1JfQ1NUX0VMVCAocmVzLCBpKSkgPT0gcmVzX2VsZW1zW2ldKTsKK30KKworc3RhdGljIHZvaWQK K3Rlc3RfdmVjX3Blcm1fdmxhX2ZvbGRpbmcgKCkKK3sKKyAgaW50IGFyZzBfZWxlbXNbXSA9IHsg MSwgMTEsIDIsIDEyLCAzLCAxMyB9OworICB0cmVlIGFyZzAgPSBidWlsZF92ZWNfaW50X2NzdCAo MiwgMywgYXJnMF9lbGVtcyk7CisKKyAgaW50IGFyZzFfZWxlbXNbXSA9IHsgMjEsIDMxLCAyMiwg MzIsIDIzLCAzMyB9OworICB0cmVlIGFyZzEgPSBidWlsZF92ZWNfaW50X2NzdCAoMiwgMywgYXJn MV9lbGVtcyk7CisKKyAgaWYgKFRZUEVfVkVDVE9SX1NVQlBBUlRTIChUUkVFX1RZUEUgKGFyZzAp KS5pc19jb25zdGFudCAoKQorICAgICAgfHwgVFlQRV9WRUNUT1JfU1VCUEFSVFMgKFRSRUVfVFlQ RSAoYXJnMSkpLmlzX2NvbnN0YW50ICgpKQorICAgIHJldHVybjsKKworICAvKiBDYXNlIDE6IG1h c2sgaXMgezAsIDEsIDIsIDMsIC4uLiB9LgorICAgICBucGF0dGVybnMgPSA0LCBuZWx0c19wZXJf cGF0dGVybiA9IDEuCisgICAgIEFsbCBlbGVtZW50cyBpbiBtYXNrIDwgNCArIDR4LiAgKi8KKyAg eworICAgIHBvbHlfdWludDY0IG1hc2tfZWxlbXNbXSA9IHswLCAxLCAyLCAzfTsKKyAgICB0cmVl IHJlcyA9IGZvbGRfdmVjX3Blcm1fdmxhX21hc2sgKFRSRUVfVFlQRSAoYXJnMCksIGFyZzAsIGFy ZzEsIDQsIDEsIG1hc2tfZWxlbXMpOworICAgIGludCByZXNfZWxlbXNbXSA9IHsgYXJnMF9lbGVt c1swXSwgYXJnMF9lbGVtc1sxXSwgYXJnMF9lbGVtc1syXSwgYXJnMF9lbGVtc1szXSB9OworICAg IGNoZWNrX3ZlY19wZXJtX3ZsYV9yZXN1bHQgKHJlcywgcmVzX2VsZW1zLCA0LCAxKTsKKyAgfQor CisgIC8qIENhc2UgMjogbWFzayA9IHswLCA0LCAxLCA1LCAuLi59CisgICAgIG5wYXR0ZXJucyA9 IDQsIG5lbHRzX3Blcl9wYXR0ZXJuID0gMS4KKyAgICAgU2hvdWxkIHJldHVybiBOVUxMLCBiZWNh dXNlIHJlc3VsdCBjYW5ub3QgYmUgZGV0ZXJtaW5lZCBhdCBjb21waWxlIHRpbWUsCisgICAgIHNp bmNlIGxlbiA9IDQgKyA0eCBhbmQgdGh1cyB7NCwgNX0gY2FuIHNlbGVjdCBlaXRoZXIgZnJvbSBh cmcwIG9yIGFyZzEKKyAgICAgZGVwZW5kaW5nIG9uIHJ1bnRpbWUgbGVuZ3RoIG9mIHRoZSB2ZWN0 b3IuICAqLworICB7CisgICAgcG9seV91aW50NjQgbWFza19lbGVtc1tdID0gezAsIDQsIDEsIDV9 OworICAgIHRyZWUgcmVzID0gZm9sZF92ZWNfcGVybV92bGFfbWFzayAoVFJFRV9UWVBFIChhcmcw KSwgYXJnMCwgYXJnMSwgNCwgMSwgbWFza19lbGVtcyk7CisgICAgQVNTRVJUX1RSVUUgKHJlcyA9 PSBOVUxMX1RSRUUpOworICB9CisKKyAgLyogQ2FzZSAzOiBtYXNrID0geyA0ICsgNHgsIDUgKyA0 eCwgNiArIDR4LCA3ICsgNHgsIC4uLiB9CisgICAgIG5wYXR0ZXJucyA9IDQsIG5lbHRzX3Blcl9w YXR0ZXJuID0gMQorICAgICBBbGwgZWxlbWVudHMgaW4gbWFzayA+PSA0ICsgNHguICAqLworICB7 CisgICAgcG9seV91aW50NjQgbGVuID0gVFlQRV9WRUNUT1JfU1VCUEFSVFMgKFRSRUVfVFlQRSAo YXJnMCkpOworICAgIHBvbHlfdWludDY0IG1hc2tfZWxlbXNbXSA9IHsgbGVuLCBsZW4gKyAxLCBs ZW4gKyAyLCBsZW4gKyAzIH07CisgICAgdHJlZSByZXMgPSBmb2xkX3ZlY19wZXJtX3ZsYV9tYXNr IChUUkVFX1RZUEUgKGFyZzApLCBhcmcwLCBhcmcxLCA0LCAxLCBtYXNrX2VsZW1zKTsKKyAgICBp bnQgcmVzX2VsZW1zW10gPSB7IGFyZzFfZWxlbXNbMF0sIGFyZzFfZWxlbXNbMV0sIGFyZzFfZWxl bXNbMl0sIGFyZzFfZWxlbXNbM10gfTsKKyAgICBjaGVja192ZWNfcGVybV92bGFfcmVzdWx0IChy ZXMsIHJlc19lbGVtcywgNCwgMSk7CisgIH0KKworICAvKiBDYXNlIDQ6IG1hc2sgPSB7MCwgMSwg NCArIDR4LCA1ICsgNHgsIC4uLiB9CisgICAgIG5wYXR0ZXJucyA9IDQsIG5lbHRzX3Blcl9wYXR0 ZXJuID0gMQorICAgICByZXMgPSB7IGFyZzBbMF0sIGFyZzBbMV0sIGFyZzFbMF0sIGFyZzFbMV0s IC4uLiB9ICAqLworICB7CisgICAgcG9seV91aW50NjQgbGVuID0gVFlQRV9WRUNUT1JfU1VCUEFS VFMgKFRSRUVfVFlQRSAoYXJnMCkpOworICAgIHBvbHlfdWludDY0IG1hc2tfZWxlbXNbXSA9IHsg MCwgMSwgbGVuLCBsZW4gKyAxIH07CisgICAgdHJlZSByZXMgPSBmb2xkX3ZlY19wZXJtX3ZsYV9t YXNrIChUUkVFX1RZUEUgKGFyZzApLCBhcmcwLCBhcmcxLCA0LCAxLCBtYXNrX2VsZW1zKTsKKyAg ICBpbnQgcmVzX2VsZW1zW10gPSB7IGFyZzBfZWxlbXNbMF0sIGFyZzBfZWxlbXNbMV0sIGFyZzFf ZWxlbXNbMF0sIGFyZzFfZWxlbXNbMV0gfTsKKyAgICBjaGVja192ZWNfcGVybV92bGFfcmVzdWx0 IChyZXMsIHJlc19lbGVtcywgNCwgMSk7CisgIH0KKworICAvKiBDYXNlIDU6IG1hc2sgPSB7MCwg MSwgMiwgMywgNCArIDR4LCA1ICsgNHgsIDYgKyA0eCwgNyArIDR4LCAuLi4gfQorICAgICBucGF0 dGVybnMgPSA0LCBuZWx0c19wZXJfcGF0dGVybiA9IDIuCisgICAgIHJlcyA9IHsgYXJnMFswXSwg YXJnMFsxXSwgYXJnMFsyXSwgYXJnMFszXSwgYXJnMVswXSwgYXJnMVsxXSwgYXJnMVsyXSwgYXJn MVszXSwgLi4uIH0gICovCisgIHsKKyAgICBwb2x5X3VpbnQ2NCBsZW4gPSBUWVBFX1ZFQ1RPUl9T VUJQQVJUUyAoVFJFRV9UWVBFIChhcmcwKSk7CisgICAgcG9seV91aW50NjQgbWFza19lbGVtc1td ID0geyAwLCAxLCAyLCAzLCBsZW4sIGxlbiArIDEsIGxlbiArIDIsIGxlbiArIDMgfTsKKyAgICB0 cmVlIHJlcyA9IGZvbGRfdmVjX3Blcm1fdmxhX21hc2sgKFRSRUVfVFlQRSAoYXJnMCksIGFyZzAs IGFyZzEsIDQsIDIsIG1hc2tfZWxlbXMpOworICAgIGludCByZXNfZWxlbXNbXSA9IHsgYXJnMF9l bGVtc1swXSwgYXJnMF9lbGVtc1sxXSwgYXJnMF9lbGVtc1syXSwgYXJnMF9lbGVtc1szXSwKKwkJ CWFyZzFfZWxlbXNbMF0sIGFyZzFfZWxlbXNbMV0sIGFyZzFfZWxlbXNbMl0sIGFyZzFfZWxlbXNb M10gfTsKKyAgICBjaGVja192ZWNfcGVybV92bGFfcmVzdWx0IChyZXMsIHJlc19lbGVtcywgNCwg Mik7CisgIH0KKworICAvKiBDYXNlIDY6IG1hc2sgPSB7MCwgLi4uIH0uCisgICAgIG5wYXR0ZXJu cyA9IDEsIG5lbHRzX3Blcl9wYXR0ZXJuID0gMS4KKyAgICAgVGVzdCBmb3IgbnBhdHRlcm5zKG1h c2spIDwgbnBhdHRlcm5zKGFyZzApICAqLworICB7CisgICAgcG9seV91aW50NjQgbWFza19lbGVt c1tdID0gezB9OworICAgIHRyZWUgcmVzID0gZm9sZF92ZWNfcGVybV92bGFfbWFzayAoVFJFRV9U WVBFIChhcmcwKSwgYXJnMCwgYXJnMSwgMSwgMSwgbWFza19lbGVtcyk7CisgICAgaW50IHJlc19l bGVtc1tdID0geyBhcmcwX2VsZW1zWzBdIH07CisgICAgY2hlY2tfdmVjX3Blcm1fdmxhX3Jlc3Vs dCAocmVzLCByZXNfZWxlbXMsIDEsIDEpOworICB9CisKKyAgLyogQ2FzZSA3OiBtYXNrID0geyAw LCAxLCAyLCAuLi4gfS4KKyAgICAgbnBhdHRlcm5zID0gMSwgbmVsdHNfcGVyX3BhdHRlcm4gPSAz LgorICAgICBTaW5jZSB7MCwgMSwgMn0gd2lsbCBzZWxlY3QgezEsIDExLCAyfSBpdCB3aWxsIGJl IGluY29ycmVjdC4KKyAgICAgUmUtZW5jb2RlIHNlbCBzdWNoIHRoYXQgZWFjaCBwYXR0ZXJuIG9m IHNlbCBzZWxlY3RzIGVsZW1lbnRzIGZyb20KKyAgICAgc2FtZSBwYXR0ZXJuIGZyb20gYXJnMC4K KyAgICAgVGh1cyB0aGUgcGF0dGVybiBtdXN0IGJlIGRpdmlkZWQgaW50bworICAgICBucGF0dGVy bnMoYXJnMCkgLyBTID0gMiAvIDEgPSAyIGRpc3RpbmN0IHBhdHRlcm5zLgorICAgICBSZS1lbmNv ZGVkIHNlbDogezAsIDEsIDIsIDMsIDQsIDUsIC4uLn0KKyAgICAgd2l0aCBwYXR0ZXJuczogezAs IDIsIDQsIC4uLn0gYW5kIHsxLCAzLCA1LCAuLi59CisgICAgIE5vdyBlYWNoIHBhdHRlcm4gc2Vs ZWN0cyBlbGVtZW50cyBvbmx5IGZyb20gc2FtZSBwYXR0ZXJuCisgICAgIG9mIGFyZzAuCisgICAg IEV4cGVjdGVkIHJlczogezEsIDExLCAyLCAxMiwgMywgMTMsIC4uLn0gICovCisgIHsKKyAgICBw b2x5X3VpbnQ2NCBtYXNrX2VsZW1zW10gPSB7IDAsIDEsIDIgfTsKKyAgICB0cmVlIHJlcyA9IGZv bGRfdmVjX3Blcm1fdmxhX21hc2sgKFRSRUVfVFlQRSAoYXJnMCksIGFyZzAsIGFyZzEsIDEsIDMs IG1hc2tfZWxlbXMpOworICAgIGludCByZXNfZWxlbXNbXSA9IHsgYXJnMF9lbGVtc1swXSwgYXJn MF9lbGVtc1sxXSwgYXJnMF9lbGVtc1syXSwgYXJnMF9lbGVtc1szXSwKKwkJCWFyZzBfZWxlbXNb NF0sIGFyZzBfZWxlbXNbNV0gfTsKKyAgICBjaGVja192ZWNfcGVybV92bGFfcmVzdWx0IChyZXMs IHJlc19lbGVtcywgMiwgMyk7CisgIH0KKworICAvKiBDYXNlIDg6IG1hc2sgPSB7bGVuLCAwLCAx LCAuLi4gfQorICAgICBucGF0dGVybnMgPSAxLCBuZWx0c19wZXJfcGF0dGVybiA9IDMuCisgICAg IFRlc3QgZm9yIGNhc2Ugd2hlbiBhMCBzZWxlY3RzIGEgZGlmZmVyZW50IHZlY3RvciBmcm9tIGEx IC4uLiBhZS4gICovCisgIHsKKyAgICBwb2x5X3VpbnQ2NCBsZW4gPSBUWVBFX1ZFQ1RPUl9TVUJQ QVJUUyAoVFJFRV9UWVBFIChhcmcwKSk7CisgICAgcG9seV91aW50NjQgbWFza19lbGVtc1tdID0g e2xlbiwgMCwgMX07CisgICAgaW50IHJlc19lbGVtc1tdID0geyBhcmcxX2VsZW1zWzBdLCBhcmcw X2VsZW1zWzBdLCBhcmcwX2VsZW1zWzFdLCBhcmcwX2VsZW1zWzJdLAorCQkJYXJnMF9lbGVtc1sz XSwgYXJnMF9lbGVtc1s0XSB9OworICAgIHRyZWUgcmVzID0gZm9sZF92ZWNfcGVybV92bGFfbWFz ayAoVFJFRV9UWVBFIChhcmcwKSwgYXJnMCwgYXJnMSwgMSwgMywgbWFza19lbGVtcyk7CisgICAg Y2hlY2tfdmVjX3Blcm1fdmxhX3Jlc3VsdCAocmVzLCByZXNfZWxlbXMsIDIsIDMpOworICB9CisK KyAgLyogQ2FzZSA5OiBtYXNrID0ge2xlbiwgbGVuICsgMSwgbGVuICsgMiwgLi4ufQorICAgICBu cGF0dGVybnMgPSAxLCBuZWx0c19wZXJfcGF0dGVybiA9IDMuICAqLworICB7CisgICAgcG9seV91 aW50NjQgbGVuID0gVFlQRV9WRUNUT1JfU1VCUEFSVFMgKFRSRUVfVFlQRSAoYXJnMCkpOworICAg IHBvbHlfdWludDY0IG1hc2tfZWxlbXNbXSA9IHsgbGVuLCBsZW4gKyAxLCBsZW4gKyAyIH07Cisg ICAgdHJlZSByZXMgPSBmb2xkX3ZlY19wZXJtX3ZsYV9tYXNrIChUUkVFX1RZUEUgKGFyZzApLCBh cmcwLCBhcmcxLCAxLCAzLCBtYXNrX2VsZW1zKTsKKyAgfQorCit9CisKIC8qIFJ1biBhbGwgb2Yg dGhlIHNlbGZ0ZXN0cyB3aXRoaW4gdGhpcyBmaWxlLiAgKi8KIAogdm9pZApAQCAtMTY5NjYsNiAr MTcyNTQsNyBAQCBmb2xkX2NvbnN0X2NjX3Rlc3RzICgpCiAgIHRlc3RfYXJpdGhtZXRpY19mb2xk aW5nICgpOwogICB0ZXN0X3ZlY3Rvcl9mb2xkaW5nICgpOwogICB0ZXN0X3ZlY19kdXBsaWNhdGVf Zm9sZGluZyAoKTsKKyAgdGVzdF92ZWNfcGVybV92bGFfZm9sZGluZyAoKTsKIH0KIAogfSAvLyBu YW1lc3BhY2Ugc2VsZnRlc3QK --000000000000d75b0f05efaf6b33--