From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A23993858425 for ; Tue, 16 Aug 2022 16:30:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A23993858425 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F024E106F; Tue, 16 Aug 2022 09:30:23 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8BFA33F67D; Tue, 16 Aug 2022 09:30:22 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni , Richard Biener , gcc Patches , richard.sandiford@arm.com Cc: Richard Biener , gcc Patches Subject: Re: ICE after folding svld1rq to vec_perm_expr duing forwprop References: Date: Tue, 16 Aug 2022 17:30:21 +0100 In-Reply-To: (Prathamesh Kulkarni's message of "Thu, 11 Aug 2022 18:53:11 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-45.5 required=5.0 tests=BAYES_00, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2022 16:30:25 -0000 Prathamesh Kulkarni writes: > On Tue, 9 Aug 2022 at 18:42, Richard Biener wrote: >> >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni >> wrote: >> > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener w>> > > >> > > >> > > /* If result vector has greater length than input vector, >> > > + then allow permuting two vectors as long as: >> > > + a) sel.nelts_per_pattern == 1 >> > > + b) sel.npatterns == len of input vector. >> > > + The intent is to permute input vectors, and >> > > + dup the elements in resulting vector to target vector length. */ >> > > + >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) >> > > + { >> > > + nelts = sel.encoding ().npatterns (); >> > > + if (sel.encoding ().nelts_per_pattern () != 1 >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) >> > > + return NULL_TREE; >> > > + } >> > > >> > > so the only case you add is non-VLA to VLA and there >> > > explicitely only the case of a period that's same as the >> > > element count in the input vectors. >> > > >> > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree >> > > node, int spc, dump_flags_t flags, >> > > pp_space (pp); >> > > } >> > > } >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) >> > > + pp_string (pp, ", ... "); >> > > pp_right_brace (pp); >> > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? >> > Well, it got created for the following case after folding: >> > svint32_t f2(int a, int b, int c, int d) >> > { >> > int32x4_t v = {a, b, c, d}; >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); >> > } >> > >> > The svld1rq_s32 call gets folded to: >> > v = {a, b, c, d} >> > lhs = VEC_PERM_EXPR >> > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to >> > VLA constructor, since elements in v (in_elts) are not constant, and >> > need_ctor is thus true: >> > lhs = {a, b, c, d, ...} >> > I added "..." to make it more explicit that it's a VLA constructor. >> >> But I doubt we do anything reasonable with such a beast? Do we? >> I suppose it's like a vec_duplicate if you view it as V1TImode >> but do we actually make sure to do this duplication? > I am not sure. As mentioned above, the current code-gen for VLA > constructor looks pretty bad. > Should we avoid folding VLA constructors for now ? VLA constructors aren't really a thing. At least, the only VLA vector you could represent with current CONSTRUCTOR nodes is a fixed-length sequence at the start of an otherwise zero vector. I'm not sure we even use that though (perhaps we do and I've forgotten). > I guess these are 2 different issues: > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > For (a), I think the issue with using: > res_type = gimple_assign_lhs (stmt) > in previous patch, was that op2's type will change to match tgt_units, > if we go thru > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > and may thus not be same as len(lhs_type) anymore, and hit the assert > in fold_vec_perm. > > IIUC, for lhs = VEC_PERM_EXPR, we now have the > following semantics: > (1) Element types for lhs, rhs1 and rhs2 should be the same. > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). Yeah. > The attached patch changes res_type from TREE_TYPE (arg0) to following: > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > TYPE_VECTOR_SUBPARTS (op2)) > so it has same element type as arg0 (and arg1) and len of op2. > Does that look reasonable ? > > If we need a cast from res_type to lhs_type, then both would be fixed > width vectors > with len(lhs_type) being a multiple of len(res_type). > IIUC, we don't support casting from VLA vector to/from fixed width vector, Yes, that's not supported as a cast. If the compiler knows the length of the "VLA" vector then it's not VLA. If it doesn't know the length of the VLA vector then the sizes could be different (preventing VIEW_CONVERT_EXPR) and the number of elements could be different (preventing pointwise CONVERT_EXPRs). > or from VLA vector of one type to VLA vector of other type ? That's supported though. They work just like VLS vectors: if the sizes are the same then we can use VIEW_CONVERT_EXPR, if the number of elements are the same then we can do pointwise conversions (e.g. element-by-element extensions, truncations, conversions to float, conversions to integer, etc). > Currently, if op2 is VLA, and we enter the branch: > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > then I think it will bail out because op2_units will not be a compile > time constant, > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > The patch resolves ICE's for aarch64 cases, without regressing the > ones in PR106360. > Does it look OK ? Richi should have the final say, but it looks OK in principle to me. I'm not sure it's worth applying independently of the follow-on patch though, if that patch is in the offing anyway. I guess my only question is whether tree-ssa-forwprop.cc really needs to build a new type. Couldn't it just use the TREE_TYPE of the lhs instead? Thanks, Richard