From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by sourceware.org (Postfix) with ESMTPS id AA4C13858C54 for ; Thu, 18 Aug 2022 12:50:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AA4C13858C54 Received: by mail-ej1-x636.google.com with SMTP id j8so3026735ejx.9 for ; Thu, 18 Aug 2022 05:50:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=DKd1GRAih7xuaZ7fxn8QNuyOOJE+VsGAbVPBBZy0ts4=; b=HI4oUzVdGeH1T1L3Qu9dAreHiEvBhgpzCg8pqEheEnZVfEZjuM18q2cWsatsA6TaSA BPBob0CdI+1DyhViNZyFM//Agj6OZA9jX1xzwN4SIV4nxe/oUqexOR913kLinfTxiEtI eOCVP5D2LicKyHm/5f4j/2gry0kPlFFeYJIbGOSnPaFaxsdQiZ6UGh34EQ08WSMortVI koCwLhUeYCmbcg6g02dFHIBVOKhmlno2h7K0PQeiXTvXENBMdW9a9Fd10RBROIOnGuOV 5xayq63aMiWH889yIOTnMPJ8Rp97lvLQ9cnSvf7KRUPbUs+O9WHAWd9qW+uxcKdEAlO9 1YoQ== X-Gm-Message-State: ACgBeo0J6PQDTu1R6ibM4wbtJwbxK0pu77pA0KQyiVpSd21GlpSTrYuo frp2OIG5k6RMz3N1tBTmgmM+jXXF94+vOd+meK48wg== X-Google-Smtp-Source: AA6agR5FRyfiiSh4RcavXT3GxG2O2myf/dGN5uSeQmvoDIUKJNCpThqkeA9b7lFrG944Gv46Vvyo5rxKvJhb3aQHTco= X-Received: by 2002:a17:907:2896:b0:730:983c:4621 with SMTP id em22-20020a170907289600b00730983c4621mr1837963ejc.502.1660827040262; Thu, 18 Aug 2022 05:50:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Thu, 18 Aug 2022 18:20:03 +0530 Message-ID: Subject: Re: ICE after folding svld1rq to vec_perm_expr duing forwprop To: Richard Biener Cc: gcc Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2022 12:50:43 -0000 On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni wrote: > > On Wed, 17 Aug 2022 at 17:01, Richard Biener wrote: > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > wrote: > > > > > > Prathamesh Kulkarni writes: > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener wrote: > > > >> > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > >> wrote: > > > >> > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener w>> > > > > > >> > > > > > >> > > /* If result vector has greater length than input vector, > > > >> > > + then allow permuting two vectors as long as: > > > >> > > + a) sel.nelts_per_pattern == 1 > > > >> > > + b) sel.npatterns == len of input vector. > > > >> > > + The intent is to permute input vectors, and > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > >> > > + > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > >> > > + { > > > >> > > + nelts = sel.encoding ().npatterns (); > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > >> > > + return NULL_TREE; > > > >> > > + } > > > >> > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > >> > > explicitely only the case of a period that's same as the > > > >> > > element count in the input vectors. > > > >> > > > > > >> > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > >> > > node, int spc, dump_flags_t flags, > > > >> > > pp_space (pp); > > > >> > > } > > > >> > > } > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > >> > > + pp_string (pp, ", ... "); > > > >> > > pp_right_brace (pp); > > > >> > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > >> > Well, it got created for the following case after folding: > > > >> > svint32_t f2(int a, int b, int c, int d) > > > >> > { > > > >> > int32x4_t v = {a, b, c, d}; > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > >> > } > > > >> > > > > >> > The svld1rq_s32 call gets folded to: > > > >> > v = {a, b, c, d} > > > >> > lhs = VEC_PERM_EXPR > > > >> > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > >> > need_ctor is thus true: > > > >> > lhs = {a, b, c, d, ...} > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > >> > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > >> but do we actually make sure to do this duplication? > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > constructor looks pretty bad. > > > > Should we avoid folding VLA constructors for now ? > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > sequence at the start of an otherwise zero vector. I'm not sure > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > I guess these are 2 different issues: > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > For (a), I think the issue with using: > > > > res_type = gimple_assign_lhs (stmt) > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > if we go thru > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > in fold_vec_perm. > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > > > > following semantics: > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > Yeah. > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > Does that look reasonable ? > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > width vectors > > > > with len(lhs_type) being a multiple of len(res_type). > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > know the length of the VLA vector then the sizes could be different > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > then I think it will bail out because op2_units will not be a compile > > > > time constant, > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > ones in PR106360. > > > > Does it look OK ? > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > I'm not sure it's worth applying independently of the follow-on patch > > > though, if that patch is in the offing anyway. > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > instead? > > > > I think the point was they are not necssarily the same when we > > looked through a VIEW_CONVERT? A comment might be in order > > here. > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > VIEW_CONVERT_EXPR. For instance in following case: > > _78 = (void *) ivtmp.21_73; > _92 = MEM [(uint8_t *)_78]; > _91 = {_92, 0}; > vect__1.6_90 = VIEW_CONVERT_EXPR(_91); > _88 = MEM [(uint8_t *)_78 + 16B]; > _87 = {_88, 0}; > vect__1.7_86 = VIEW_CONVERT_EXPR(_87); > vect__1.8_85 = VEC_PERM_EXPR 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > simplify_permutation looks thru V_C_E, and tries to fold: > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > In this case, lhs type (V16QI) differs from res type (V2DI), and we > hit assert in fold_vec_perm. Oops sorry, just to clarify -- this hit assert in fold_vec_perm only when we used res_type = lhs_type. It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. The issue with using res_type = TREE_TYPE (arg0), comes for case when arg0 is fixed length, and op2 is VLA. Thanks, Prathamesh > > > > Btw, please don't add new asserts that trivially hold in critical paths. > Well, it didn't hold true for following case when op2 is VLA: > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > because we're passing res_type == V4SI, and op2_type == VNx4SI from > simplify_permutation. > > The vec_perm pattern hits: > (if (sel.series_p (0, 1, 0, 1)) > { op0; } > > and folds it to: > lhs = {1, 2, 3, 4} > > which results in error during verify_gimple since lhs and rhs types > differ (VNx4SI, V4SI). > > I suppose we want to do the transforms: > sel.series_p (0, 1, 0, 1) -> op0 and, > sel.series_p (0, 1, nelts, 1) -> op1 > only if input vectors and sel have same length ? > > Thanks, > Prathamesh > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 330c1db0c8e..aa20cc713c5 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -7845,6 +7845,12 @@ and, > > (with > > { > > tree op0 = @0, op1 = @1, op2 = @2; > > + > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > + > > machine_mode result_mode = TYPE_MODE (type); > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > Thanks, > > > Richard